Search results

4 results sorted by publication date

A Comprehensive Survey on Post-Quantum TLS

Nouri Alnahawi, Johannes Müller, Jan Oupický, Alexander Wiesmaier

Published 2024-07-08 Show abstract

PDF

Transport Layer Security (TLS) is the backbone security protocol of the Internet. As this fundamental protocol is at risk from future quantum attackers, many proposals have been made to protect TLS against this threat by implementing post-quantum cryptography (PQC). The widespread interest in post-quantum TLS has given rise to a large number of solutions over the last decade. These proposals differ in many aspects, including the security properties they seek to protect, the efficiency and trustworthiness of their post-quantum building blocks, and the application scenarios they consider, to name a few.

Based on an extensive literature review, we classify existing solutions according to their general approaches, analyze their individual contributions, and present the results of our extensive performance experiments. Based on these insights, we identify the most reasonable candidates for post-quantum TLS, which research problems in this area have already been solved, and which are still open. Overall, our work provides a well-founded reference point for researching post-quantum TLS and preparing TLS in practice for the quantum age.

Fast polynomial multiplication using matrix multiplication accelerators with applications to NTRU on Apple M1/M3 SoCs

Décio Luiz Gazzoni Filho, Guilherme Brandão, Julio López

Published 2024-04-09 Show abstract

PDF

Efficient polynomial multiplication routines are critical to the performance of lattice-based post-quantum cryptography (PQC). As PQC standards only recently started to emerge, CPUs still lack specialized instructions to accelerate such routines. Meanwhile, deep learning has grown immeasurably in importance. Its workloads call for teraflops-level of processing power for linear algebra operations, mainly matrix multiplication. Computer architects have responded by introducing ISA extensions, coprocessors and special-purpose cores to accelerate such operations. In particular, Apple ships an undocumented matrix-multiplication coprocessor, AMX, in hundreds of millions of mobile phones, tablets and personal computers. Our work repurposes AMX to implement polynomial multiplication and applies it to the NTRU cryptosystem, setting new speed records on the Apple M1 and M3 systems-on-chip (SoCs): polynomial multiplication, key generation, encapsulation and decapsulation are sped up by $1.54$–$3.07\times$, $1.08$–$1.33\times$, $1.11$–$1.50\times$ and $1.20$–$1.98\times$, respectively, over the previous state-of-the-art.

New Attacks on LowMC Using Partial Sets in the Single-Data Setting

Subhadeep Banik, Andrea Caforio, Serge Vaudenay

Published 2024-04-09 Show abstract

PDF

The LowMC family of block ciphers was proposed by Albrecht et al. in Eurocrypt 2015, specifically targeting adoption in FHE and MPC applications due to its low multiplicative complexity. The construction operates a 3-bit quadratic S-box as the sole non-linear transformation in the algorithm. In contrast, both the linear layer and round key generation are achieved through multiplications of full rank matrices over GF(2). The cipher is instantiable using a diverse set of default configurations, some of which have partial non-linear layers i.e., in which the S-boxes are not applied over the entire internal state of the cipher.

The significance of cryptanalysing LowMC was elevated by its inclusion into the NIST PQC digital signature scheme PICNIC in which a successful key recovery using a single plaintext/ciphertext pair is akin to retrieving the secret signing key. The current state-of-the-art attack in this setting is due to Dinur at Eurocrypt 2021, in which a novel way of enumerating roots of a Boolean system of equation is morphed into a key-recovery procedure that undercuts an ordinary exhaustive search in terms of time complexity for the variants of the cipher up to five rounds.

In this work, we demonstrate that this technique can efficiently be enriched with a specific linearization strategy that reduces the algebraic degree of the non-linear layer as put forward by Banik et al. at IACR ToSC 2020(4). This amalgamation yields new attacks on certain instances of LowMC up to seven rounds.

A provably masked implementation of BIKE Key Encapsulation Mechanism

Loïc Demange, Mélissa Rossi

Published 2024-04-09 Show abstract

PDF

BIKE is a post-quantum key encapsulation mechanism (KEM) selected for the 4th round of the NIST's standardization campaign. It relies on the hardness of the syndrome decoding problem for quasi-cyclic codes and on the indistinguishability of the public key from a random element, and provides the most competitive performance among round 4 candidates, which makes it relevant for future real-world use cases. Analyzing its side-channel resistance has been highly encouraged by the community and several works have already outlined various side-channel weaknesses and proposed ad-hoc countermeasures. However, in contrast to the well-documented research line on masking lattice-based algorithms, the possibility of generically protecting code-based algorithms by masking has only been marginally studied in a 2016 paper by Chen et al. in SAC 2015. At this stage of the standardization campaign, it is important to assess the possibility of fully masking BIKE scheme and the resulting cost in terms of performances.

In this work, we provide the first high-order masked implementation of a code-based algorithm. We had to tackle many issues such as finding proper ways to handle large sparse polynomials, masking the key-generation algorithm or keeping the benefit of the bitslicing. In this paper, we present all the gadgets necessary to provide a fully masked implementation of BIKE, we discuss our different implementation choices and we propose a full proof of masking in the Ishai Sahai and Wagner (Crypto 2003) model.

More practically, we also provide an open C-code masked implementation of the key-generation, encapsulation and decapsulation algorithms with extensive benchmarks. While the obtained performance is slower than existing masked lattice-based algorithms, we show that masking at order 1, 2, 3, 4 and 5 implies a performance penalty of x5.8, x14.2, x24.4, x38 and x55.6 compared to order 0 (unmasked and unoptimized BIKE). This scaling is encouraging and no Boolean to Arithmetic conversion has been used.