5 results sorted by publication date
The Module Learning With Errors (MLWE)-based Key Encapsulation Mechanism (KEM) Kyber is NIST's new standard scheme for post-quantum encryption. As a building block, Kyber uses a Chosen Plaintext Attack (CPA)-secure Public Key Encryption (PKE) scheme, referred to as Kyber.CPAPKE. In this paper we study the robustness of Kyber.CPAPKE against key mismatch attacks.
We demonstrate that Kyber's security levels can be compromised if having access to a few mismatch queries of Kyber.CPAPKE, by striking a balance between the parallelization level and the cost of lattice reduction for post-processing. This highlights the imperative need to strictly prohibit key reuse in Kyber.CPAPKE.
We further propose an adaptive method to enhance parallel mismatch attacks, initially proposed by Shao et al. at AsiaCCS 2024, thereby significantly reducing query complexity. This method combines the adaptive attack with post-processing via lattice reduction to retrieve the final secret key entries. Our method proves its efficacy by reducing query complexity by 14.6 % for Kyber512 and 7.5 % for Kyber768/Kyber1024.
Furthermore, this approach has the potential to improve multi-value Plaintext-Checking (PC) oracle-based side-channel attacks and fault-injection attacks against Kyber itself.
Most of the previous attacks on Dilithium exploit side-channel information which is leaked during the computation of the polynomial multiplication cs1, where s1 is a small-norm secret and c is a verifier's challenge. In this paper, we present a new attack utilizing leakage during secret key unpacking in the signing algorithm. The unpacking is also used in other post-quantum cryptographic algorithms, including Kyber, because inputs and outputs of their API functions are byte arrays. Exploiting leakage during unpacking is more challenging than exploiting leakage during the computation of cs1 since c varies for each signing, while the unpacked secret key remains constant. Therefore, post-processing is required in the latter case to recover a full secret key. We present two variants of post-processing. In the first one, a half of the coefficients of the secret s1 and the error s2 is recovered by profiled deep learning-assisted power analysis and the rest is derived by solving linear equations based on t = As1 + s2, where A and t are parts of the public key. This case assumes knowledge of the least significant bits of t, t0. The second variant uses lattice reduction to derive s1 without the knowledge of t0. However, it needs a larger portion of s1 to be recovered by power analysis. We evaluate both variants on an ARM Cortex-M4 implementation of Dilithium-2. The experiments show that the attack assuming the knowledge of t0 can recover s1 from a single trace captured from a different from profiling device with a non-negligible probability.
The security of lattice-based crytography (LWE, NTRU, and FHE) depends on the hardness of the shortest-vector problem (SVP). Sieving algorithms give the lowest asymptotic runtime to solve SVP, but depend on exponential memory. Memory access costs much more in reality than in the RAM model, so we consider a computational model where processors, memory, and meters of wire are in constant proportions to each other. While this adds substantial costs to route data during lattice sieving, we modify existing algorithms to amortize these costs and find that, asymptotically, a classical computer can achieve the previous RAM model cost of $2^{0.2925d+o(d)}$ to sieve a $d$-dimensional lattice for a computer existing in 3 or more spatial dimensions, and can reach $2^{0.3113d+o(d)}$ in 2 spatial dimensions, where “spatial dimensions” are the dimensions of the physical geometry in which the computer exists.
Since this result implies an increased cost in 2 spatial dimensions, we make several assumptions about the constant terms of memory access and lattice attacks so that we can give bit security estimates for Kyber and Dilithium. These estimates support but do not increase the security categories claimed in the Kyber and Dilithium specifications, at least with respect to known attacks.
We survey various mathematical tools used in software works multiplying polynomials in \[ \frac{\mathbb{Z}_q[x]}{\left\langle {x^n - \alpha x - \beta} \right\rangle}. \] In particular, we survey implementation works targeting polynomial multiplications in lattice-based cryptosystems Dilithium, Kyber, NTRU, NTRU Prime, and Saber with instruction set architectures/extensions Armv7-M, Armv7E-M, Armv8-A, and AVX2.
There are three emphases in this paper: (i) modular arithmetic, (ii) homomorphisms, and (iii) vectorization. For modular arithmetic, we survey Montgomery, Barrett, and Plantard multiplications. For homomorphisms, we survey (a) various homomorphisms such as Cooley–Tukey FFT, Good–Thomas FFT, Bruun's FFT, Rader's FFT, Karatsuba, and Toom–Cook; (b) various algebraic techniques for adjoining nice properties to the coefficient rings, including localization, Schönhage's FFT, Nussbaumer's FFT, and coefficient ring switching; and (c) various algebraic techniques related to the polynomial moduli, including twisting, composed multiplication, evaluation at $\infty$, truncation, incomplete transformation, striding, and Toeplitz matrix-vector product. For vectorization, we survey the relations between homomorphisms and vector arithmetic.
We then go through several case studies: We compare the implementations of modular multiplications used in Dilithium and Kyber, explain how the matrix-to-vector structure was exploited in Saber, and review the design choices of transformations for NTRU and NTRU Prime with vectorization. Finally, we outline several interesting implementation projects.
We introduce InspectorGadget, an Open-Source Python-based software for assessing and comparing the complexity of masking gadgets. By providing a limited set of characteristics of a hardware platform, our tool allows to estimate the cost of a masking gadget in terms of cycle count equivalent and memory footprint. InspectorGadget is highly flexible. It enables the user to define her own estimation functions, as well as to expand the set of gadgets and predefined microcontrollers. As a case-study, we produce a fair comparison of several masked versions of Kyber compression function from the literature, together with novel alternatives automatically generated by our tool. Our results confirm that an interesting middle ground exists between theoretical performance measures (asymptotic complexity or operations count) and real implementations benchmarks (clock cycle accurate evaluations). InspectorGadget offers both simplicity and genericity while capturing the main performance-related parameters of a hardware platform.