Search results for Apple Silicon

Décio Luiz Gazzoni Filho, Guilherme Brandão, Julio LópezPublished 20240409 Show abstract PDF
Efficient polynomial multiplication routines are critical to the performance of latticebased postquantum cryptography (PQC). As PQC standards only recently started to emerge, CPUs still lack specialized instructions to accelerate such routines. Meanwhile, deep learning has grown immeasurably in importance. Its workloads call for teraflopslevel of processing power for linear algebra operations, mainly matrix multiplication. Computer architects have responded by introducing ISA extensions, coprocessors and specialpurpose cores to accelerate such operations. In particular, Apple ships an undocumented matrixmultiplication coprocessor, AMX, in hundreds of millions of mobile phones, tablets and personal computers. Our work repurposes AMX to implement polynomial multiplication and applies it to the NTRU cryptosystem, setting new speed records on the Apple M1 and M3 systemsonchip (SoCs): polynomial multiplication, key generation, encapsulation and decapsulation are sped up by $1.54$–$3.07\times$, $1.08$–$1.33\times$, $1.11$–$1.50\times$ and $1.20$–$1.98\times$, respectively, over the previous stateoftheart.