Communications in Cryptology IACR CiC


Dates are inconsistent
13 results sorted by publication date
Editors in chief
Call for papers: IACR Communications in Cryptology Submit a paper Communications in Cryptology is a journal for original research papers which welcomes submissions on any topic in cryptology. This covers all research topics in cryptography and cryptanalysis, including but not limited to foundational theory and mathematics the design, proposal, and analysis of cryptographic primitives a...
Sebastian Kolby, Divya Ravi, Sophia Yakoubov
Published 2024-10-07 PDFPDF

YOSO MPC (Gentry et al., Crypto 2021) is a new MPC framework where each participant can speak at most once. This models an adaptive adversary’s ability to watch the network and corrupt or destroy parties it deems significant based on their communication. By using private channels to anonymous receivers (e.g. by encrypting to a public key whose owner is unknown), the communication complexity of YOSO MPC can scale sublinearly with the total number N of available parties, even when the adversary’s corruption threshold is linear in N (e.g. just under N/2). It was previously an open problem whether YOSO MPC can achieve guaranteed output delivery in a constant number of rounds without relying on trusted setup. In this work, we show that this can indeed be accomplished. We demonstrate three different approaches: the first two (which we call YaOSO and YOSO-GLS) use two and three rounds of communication, respectively. Our third approach (which we call YOSO-LHSS) uses O(d) rounds, where d is the multiplicative depth of the circuit being evaluated; however, it can be used to bootstrap any constant-round YOSO protocol that requires setup, by generating that setup within YOSO-LHSS. Though YOSO-LHSS requires more rounds than our first two approaches, it may be more practical, since the zero knowledge proofs it employs are more efficient to instantiate. As a contribution of independent interest, we introduce a verifiable state propagation UC functionality, which allows parties to send private message which are verifiably derived in the “correct” way (according to the protocol in question) to anonymous receivers. This is a natural functionality to build YOSO protocols on top of.

Qian Guo, Erik Mårtensson, Adrian Åström
Published 2024-10-07 PDFPDF

The Module Learning With Errors (MLWE)-based Key Encapsulation Mechanism (KEM) Kyber is NIST's new standard scheme for post-quantum encryption. As a building block, Kyber uses a Chosen Plaintext Attack (CPA)-secure Public Key Encryption (PKE) scheme, referred to as Kyber.CPAPKE. In this paper we study the robustness of Kyber.CPAPKE against key mismatch attacks.

We demonstrate that Kyber's security levels can be compromised if having access to a few mismatch queries of Kyber.CPAPKE, by striking a balance between the parallelization level and the cost of lattice reduction for post-processing. This highlights the imperative need to strictly prohibit key reuse in Kyber.CPAPKE.

We further propose an adaptive method to enhance parallel mismatch attacks, initially proposed by Shao et al. at AsiaCCS 2024, thereby significantly reducing query complexity. This method combines the adaptive attack with post-processing via lattice reduction to retrieve the final secret key entries. Our method proves its efficacy by reducing query complexity by 14.6 % for Kyber512 and 7.5 % for Kyber768/Kyber1024.

Furthermore, this approach has the potential to improve multi-value Plaintext-Checking (PC) oracle-based side-channel attacks and fault-injection attacks against Kyber itself.

Aein Rezaei Shahmirzadi, Michael Hutter
Published 2024-10-07 PDFPDF

Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating efficient Boolean masked adders. In this paper, we present first- and second-order secure hardware implementations to perform B2A mask conversion efficiently without using masked adder structures. We first introduce a first-order secure low-latency gadget that executes a B2A2k in a single cycle. Furthermore, we propose a second-order secure B2A2k gadget that has a latency of only 4 clock cycles. Both gadgets are independent of the input word size k. We then show how these new primitives lead to improved B2Aq hardware implementations that perform a B2A mask conversion of integers modulo an arbitrary number. Our results show that our new gadgets outperform comparable solutions by more than a magnitude in terms of resource requirements and are at least 3 times faster in terms of latency and throughput. All gadgets have been formally verified and proven secure in the glitch-robust PINI security model. We additionally confirm the security of our gadgets on an FPGA platform using practical TVLA tests.

Nima Mahdion, Elisabeth Oswald
Published 2024-10-07 PDFPDF

Software implementations of cryptographic algorithms often use masking schemes as a countermeasure against side channel attacks. A number of recent results show clearly the challenge of implementing masking schemes in such a way, that (unforeseen) micro-architectural effects do not cause masking flaws that undermine the intended security goal of an implementation. So far, utilising a higher-order version of the non-specific (fixed-vs-random) input test of the Test Vector Leakage Assessment (TVLA) framework has been the best option to identify such flaws. The drawbacks of this method are both its significant computation cost, as well as its inability to pinpoint which interaction of masking shares leads to the flaw. In this paper we propose a novel version, the fixed-vs-random shares test, to tackle both drawbacks. We explain our method and show its application to three case studies, where each time it outperforms its conventional TVLA counterpart. The drawback of our method is that it requires control over the shares, which, we argue, is practically feasible in the context of in-house evaluation and testing for software implementations.

Lichao Wu, Sébastien Tiran, Guilherme Perin, Stjepan Picek
Published 2024-10-07 PDFPDF

Side-channel Collision Attacks (SCCA) is a classical method that exploits information dependency leaked during cryptographic operations. Unlike collision attacks that seek instances where two different inputs to a cryptographic algorithm yield identical outputs, SCCAs specifically target the internal state, where identical outputs are more likely. Although SCCA does not rely on the pre-assumption of the leakage model, it explicitly operates on precise trace segments reflecting the target operation, which is challenging to perform when the leakage measurements are noisy. Besides, its attack performance may vary dramatically, as it relies on selecting a reference byte (and its corresponding leakages) to “collide” other bytes. A poor selection would lead to many bytes unrecoverable. These two facts make its real-world application problematic.

This paper addresses these challenges by introducing a novel plaintext-based SCCA. We leverage the bijective relationship between plaintext and secret data, using plaintext as labels to train profiling models to depict leakages from varying operations. By comparing the leakage representations produced by the profiling model instead of the leakage segmentation itself, all secret key differences can be revealed simultaneously without processing leakage traces. Furthermore, we propose a novel error correction scheme to rectify false predictions further. Experimental results show that our approach significantly surpasses the state-of-the-art SCCA in both attack performance and computational complexity (e.g., training time reduced from approximately three hours to five minutes). These findings underscore our method's effectiveness and practicality in real-world attack scenarios.

Ruize Wang, Kalle Ngo, Joel Gärtner, Elena Dubrova
Published 2024-10-07 PDFPDF

Most of the previous attacks on Dilithium exploit side-channel information which is leaked during the computation of the polynomial multiplication cs1, where s1 is a small-norm secret and c is a verifier's challenge. In this paper, we present a new attack utilizing leakage during secret key unpacking in the signing algorithm. The unpacking is also used in other post-quantum cryptographic algorithms, including Kyber, because inputs and outputs of their API functions are byte arrays. Exploiting leakage during unpacking is more challenging than exploiting leakage during the computation of cs1 since c varies for each signing, while the unpacked secret key remains constant. Therefore, post-processing is required in the latter case to recover a full secret key. We present two variants of post-processing. In the first one, a half of the coefficients of the secret s1 and the error s2 is recovered by profiled deep learning-assisted power analysis and the rest is derived by solving linear equations based on t = As1 + s2, where A and t are parts of the public key. This case assumes knowledge of the least significant bits of t, t0. The second variant uses lattice reduction to derive s1 without the knowledge of t0. However, it needs a larger portion of s1 to be recovered by power analysis. We evaluate both variants on an ARM Cortex-M4 implementation of Dilithium-2. The experiments show that the attack assuming the knowledge of t0 can recover s1 from a single trace captured from a different from profiling device with a non-negligible probability.

Soichiro Kobayashi, Rei Ueno, Yosuke Todo, Naofumi Homma
Published 2024-10-07 PDFPDF

This paper presents a new side-channel attack (SCA) on unrolled implementations of stream ciphers, with a particular focus on Trivium. Most conventional SCAs predominantly concentrate on leakage of some first rounds prior to the sufficient diffusion of the secret key and initial vector (IV). However, recently, unrolled hardware implementation has become common and practical, which achieves higher throughput and energy efficiency compared to a round-based hardware. The applicability of conventional SCAs to such unrolled hardware is unclear because the leakage of the first rounds from unrolled hardware is hardly observed. In this paper, focusing on Trivium, we propose a novel SCA on unrolled stream cipher hardware, which can exploit leakage of rounds latter than 80, while existing SCAs exploited intermediate values earlier than 80 rounds. We first analyze the algebraic equations representing the intermediate values of these rounds and present the recursive restricted linear decomposition (RRLD) strategy. This approach uses correlation power analysis (CPA) to estimate the intermediate values of latter rounds. Furthermore, we present a chosen-IV strategy for a successful key recovery through linearization. We experimentally demonstrate that the proposed SCA achieves the key recovery of a 288-round unrolled Trivium hardware implementation using 360,000 traces. Finally, we evaluate the performance of unrolled Trivium hardware implementations to clarify the trade-off between performance and SCA (in)security. The proposed SCA requires 34.5 M traces for a key recovery of 384-round unrolled Trivium implementation and is not applicable to 576-round unrolled hardware.

Dinal Kamel, François-Xavier Standaert, Olivier Bronchain
Published 2024-10-07 PDFPDF

Raccoon is a lattice-based scheme submitted to the NIST 2022 call for additional post-quantum signatures. One of its main selling points is that its design is intrinsically easy to mask against side-channel attacks. So far, Raccoon's physical security guarantees were only stated in the abstract probing model. In this paper, we discuss how these probing security results translate into guarantees in more realistic leakage models. We also highlight that this translation differs from what is usually observed (e.g., in symmetric cryptography), due to the algebraic structure of Raccoon's operations. For this purpose, we perform an in-depth information theoretic evaluation of Raccoon's most innovative part, namely the AddRepNoise function which allows generating its arithmetic shares on-the-fly. Our results are twofold. First, we show that the resulting shares do not enforce a statistical security order (i.e., the need for the side-channel adversary to estimate higher-order moments of the leakage distribution), as usually expected when masking. Second, we observe that the first-order leakage on the (large) random coefficients manipulated by Raccoon cannot be efficiently turned into leakage on the (smaller) coefficients of its long-term secret. Concretely, our information theoretic evaluations for relevant leakage functions also suggest that Raccoon's masked implementations can ensure high security with less shares than suggested by a conservative analysis in the probing model.

Lichao Wu, Azade Rezaeezade, Amir Ali-pour, Guilherme Perin, Stjepan Picek
Published 2024-10-07 PDFPDF

Profiling side-channel analysis has gained widespread acceptance in both academic and industrial realms due to its robust capacity to unveil protected secrets, even in the presence of countermeasures. To harness this capability, an adversary must access a clone of the target device to acquire profiling measurements, labeling them with leakage models. The challenge of finding an effective leakage model, especially for a protected dataset with a low signal-to-noise ratio or weak correlation between actual leakages and labels, often necessitates an intuitive engineering approach, as otherwise, the attack will not perform well.

In this paper, we introduce a deep learning approach with a flexible leakage model, referred to as the multi-bit model. Instead of trying to learn a pre-determined representation of the target intermediate data, we utilize the concept of the stochastic model to decompose the label into bits. Then, the deep learning model is used to classify each bit independently. This versatile multi-bit model can adjust to existing leakage models like the Hamming weight and Most Significant Bit while also possessing the flexibility to adapt to complex leakage scenarios. To further improve the attack efficiency, we extend the multi-bit model to profile all 16 subkey bytes simultaneously, which requires negligible computational effort. The experimental results show that the proposed methods can efficiently break all key bytes across four considered datasets while the conventional leakage models fail. Our work signifies a significant step forward in deep learning-based side-channel attacks, showcasing a high degree of flexibility and efficiency with the proposed leakage model.

Gaëtan Cassiers, Loïc Masure, Charles Momin, Thorben Moos, Amir Moradi, François-Xavier Standaert
Published 2024-07-08 PDFPDF

Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are available, without taking the cost of their generation into account, leads to a poor understanding of the efficiency vs. security tradeoff of masked implementations. This is especially relevant in case of hardware masking schemes which are known to consume large amounts of random bits per cycle due to parallelism. Currently, there seems to be no consensus on how to most efficiently derive many pseudo-random bits per clock cycle from an initial seed and with properties suitable for masked hardware implementations. In this work, we evaluate a number of building blocks for this purpose and find that hardware-oriented stream ciphers like Trivium and its reduced-security variant Bivium B outperform most competitors when implemented in an unrolled fashion. Unrolled implementations of these primitives enable the flexible generation of many bits per cycle, which is crucial for satisfying the large randomness demands of state-of-the-art masking schemes. According to our analysis, only Linear Feedback Shift Registers (LFSRs), when also unrolled, are capable of producing long non-repetitive sequences of random-looking bits at a higher rate per cycle for the same or lower cost as Trivium and Bivium B. Yet, these instances do not provide black-box security as they generate only linear outputs. We experimentally demonstrate that using multiple output bits from an LFSR in the same masked implementation can violate probing security and even lead to harmful randomness cancellations. Circumventing these problems, and enabling an independent analysis of randomness generation and masking, requires the use of cryptographically stronger primitives like stream ciphers. As a result of our studies, we provide an evidence-based estimate for the cost of securely generating $n$ fresh random bits per cycle. Depending on the desired level of black-box security and operating frequency, this cost can be as low as $20n$ to $30n$ ASIC gate equivalents (GE) or $3n$ to $4n$ FPGA look-up tables (LUTs), where $n$ is the number of random bits required. Our results demonstrate that the cost per bit is (sometimes significantly) lower than estimated in previous works, incentivizing parallelism whenever exploitable. This provides further motivation to potentially move low randomness usage from a primary to a secondary design goal in hardware masking research.

Gabrielle De Micheli, Nadia Heninger
Published 2024-04-09 PDFPDF

Side-channel attacks targeting cryptography may leak only partial or indirect information about the secret keys. There are a variety of techniques in the literature for recovering secret keys from partial information. In this work, we survey several of the main families of partial key recovery algorithms for RSA, (EC)DSA, and (elliptic curve) Diffie-Hellman, the classical public-key cryptosystems in common use today. We categorize the known techniques by the structure of the information that is learned by the attacker, and give simplified examples for each technique to illustrate the underlying ideas.

Loïc Demange, Mélissa Rossi
Published 2024-04-09 PDFPDF

BIKE is a post-quantum key encapsulation mechanism (KEM) selected for the 4th round of the NIST's standardization campaign. It relies on the hardness of the syndrome decoding problem for quasi-cyclic codes and on the indistinguishability of the public key from a random element, and provides the most competitive performance among round 4 candidates, which makes it relevant for future real-world use cases. Analyzing its side-channel resistance has been highly encouraged by the community and several works have already outlined various side-channel weaknesses and proposed ad-hoc countermeasures. However, in contrast to the well-documented research line on masking lattice-based algorithms, the possibility of generically protecting code-based algorithms by masking has only been marginally studied in a 2016 paper by Chen et al. in SAC 2015. At this stage of the standardization campaign, it is important to assess the possibility of fully masking BIKE scheme and the resulting cost in terms of performances.

In this work, we provide the first high-order masked implementation of a code-based algorithm. We had to tackle many issues such as finding proper ways to handle large sparse polynomials, masking the key-generation algorithm or keeping the benefit of the bitslicing. In this paper, we present all the gadgets necessary to provide a fully masked implementation of BIKE, we discuss our different implementation choices and we propose a full proof of masking in the Ishai Sahai and Wagner (Crypto 2003) model.

More practically, we also provide an open C-code masked implementation of the key-generation, encapsulation and decapsulation algorithms with extensive benchmarks. While the obtained performance is slower than existing masked lattice-based algorithms, we show that masking at order 1, 2, 3, 4 and 5 implies a performance penalty of x5.8, x14.2, x24.4, x38 and x55.6 compared to order 0 (unmasked and unoptimized BIKE). This scaling is encouraging and no Boolean to Arithmetic conversion has been used.