Communications in Cryptology IACR CiC


Dates are inconsistent
35 results sorted by publication date
Editors in chief
Call for papers: IACR Communications in Cryptology Submit a paper Communications in Cryptology is a journal for original research papers which welcomes submissions on any topic in cryptology. This covers all research topics in cryptography and cryptanalysis, including but not limited to foundational theory and mathematics the design, proposal, and analysis of cryptographic primitives a...
Editors in chief
Policy on publication ethics Communications in Cryptology (CiC) is committed to ensuring ethics and quality in research. We therefore expect everyone involved in the journal to follow our principles (see below) and ethics. See the related IACR docs here and here. Duties for Authors Confidentiality You may not ask Editorial Board members for information about your submission befor...
Editors in chief
Frequently asked questions The International Association for Cryptologic Research (IACR) Communications in Cryptology (CiC) was approved by the Membership in the IACR 2022 election and targets publications that advance the field, but with a broader range of contributions than the ones accepted by the IACR flagship or area conferences. What are the main principles of CiC? Low-cost open ...
Carsten Baum, Jens Berlips, Walther Chen, Ivan B. Damgård, Kevin M. Esvelt, Leonard Foner, Dana Gretton, Martin Kysel, Ronald L. Rivest, Lawrence Roy, Francesca Sage-Ling, Adi Shamir, Vinod Vaikuntanathan, Lynn Van Hauwe, Theia Vogel, Benjamin Weinstein-Raun, Daniel Wichs, Stephen Wooster, Andrew C. Yao, Yu Yu
Published 2024-10-07 PDFPDF

Oblivious Pseudorandom Functions (OPRFs) allow a client to evaluate a pseudorandom function (PRF) on her secret input based on a key that is held by a server. In the process, the client only learns the PRF output but not the key, while the server neither learns the input nor the output of the client. The arguably most popular OPRF is due to Naor, Pinkas and Reingold (Eurocrypt 2009). It is based on an Oblivious Exponentiation by the server, with passive security under the Decisional Diffie-Hellman assumption. In this work, we strengthen the security guarantees of the NPR OPRF by protecting it against active attacks of the server. We have implemented our solution and report on the performance. Our main result is a new batch OPRF protocol which is secure against maliciously corrupted servers, but is essentially as efficient as the semi-honest solution. More precisely, the computation (and communication) overhead is a multiplicative factor $o(1)$ as the batch size increases. The obvious solution using zero-knowledge proofs would have a constant factor overhead at best, which can be too expensive for certain deployments. Our protocol relies on a novel version of the DDH problem, which we call the Oblivious Exponentiation Problem (OEP), and we give evidence for its hardness in the Generic Group model. We also present a variant of our maliciously secure protocol that does not rely on the OEP but nevertheless only has overhead $o(1)$ over the known semi-honest protocol. Moreover, we show that our techniques can also be used to efficiently protect threshold blind BLS signing and threshold ElGamal decryption against malicious attackers.

Aein Rezaei Shahmirzadi, Michael Hutter
Published 2024-10-07 PDFPDF

Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating efficient Boolean masked adders. In this paper, we present first- and second-order secure hardware implementations to perform B2A mask conversion efficiently without using masked adder structures. We first introduce a first-order secure low-latency gadget that executes a B2A2k in a single cycle. Furthermore, we propose a second-order secure B2A2k gadget that has a latency of only 4 clock cycles. Both gadgets are independent of the input word size k. We then show how these new primitives lead to improved B2Aq hardware implementations that perform a B2A mask conversion of integers modulo an arbitrary number. Our results show that our new gadgets outperform comparable solutions by more than a magnitude in terms of resource requirements and are at least 3 times faster in terms of latency and throughput. All gadgets have been formally verified and proven secure in the glitch-robust PINI security model. We additionally confirm the security of our gadgets on an FPGA platform using practical TVLA tests.

Rebecca Hay, Elisabeth Oswald
Published 2024-10-07 PDFPDF

New proposals for scalable key rank estimation methods have appeared recently, in particular the sampling based approach MCRank. The idea is that one can consistently estimate the key rank by sampling only a small portion of the key space as a “proxy”, leading to both an accurate and scalable approach, at least in comparison with another approach based on histograms. We show that the (earlier) GEEA algorithm is in fact a sampling based algorithm, and provide an in-depth comparison between GEEA (when adapted to produce rank estimates rather than guessing entropy estimates), GM bounds, MCRank and the currently most performant counting based rank estimation as implemented in the Labynkyr library. We find that although MCRank does live up to the promised accuracy and scalability for probability-based distinguishers, it fails to handle cases with unusual distinguisher distributions.

Furthermore, we put forward a novel proposal for a highly scalable key rank estimation method by introducing the notion of an “attacker budget”. Our proposal is based on the idea that, in particular for very long keys, the exact key rank is less important than the knowledge whether a key is within a certain bound. Thus our “budget approach” is based on efficiently checking if the result of an attack is such that the attacker's budget suffices for successful enumeration. Our budget approach scales linearly with the key size and thus enables security estimations even for post-quantum key lengths.

Nima Mahdion, Elisabeth Oswald
Published 2024-10-07 PDFPDF

Software implementations of cryptographic algorithms often use masking schemes as a countermeasure against side channel attacks. A number of recent results show clearly the challenge of implementing masking schemes in such a way, that (unforeseen) micro-architectural effects do not cause masking flaws that undermine the intended security goal of an implementation. So far, utilising a higher-order version of the non-specific (fixed-vs-random) input test of the Test Vector Leakage Assessment (TVLA) framework has been the best option to identify such flaws. The drawbacks of this method are both its significant computation cost, as well as its inability to pinpoint which interaction of masking shares leads to the flaw. In this paper we propose a novel version, the fixed-vs-random shares test, to tackle both drawbacks. We explain our method and show its application to three case studies, where each time it outperforms its conventional TVLA counterpart. The drawback of our method is that it requires control over the shares, which, we argue, is practically feasible in the context of in-house evaluation and testing for software implementations.

Ruize Wang, Kalle Ngo, Joel Gärtner, Elena Dubrova
Published 2024-10-07 PDFPDF

Most of the previous attacks on Dilithium exploit side-channel information which is leaked during the computation of the polynomial multiplication cs1, where s1 is a small-norm secret and c is a verifier's challenge. In this paper, we present a new attack utilizing leakage during secret key unpacking in the signing algorithm. The unpacking is also used in other post-quantum cryptographic algorithms, including Kyber, because inputs and outputs of their API functions are byte arrays. Exploiting leakage during unpacking is more challenging than exploiting leakage during the computation of cs1 since c varies for each signing, while the unpacked secret key remains constant. Therefore, post-processing is required in the latter case to recover a full secret key. We present two variants of post-processing. In the first one, a half of the coefficients of the secret s1 and the error s2 is recovered by profiled deep learning-assisted power analysis and the rest is derived by solving linear equations based on t = As1 + s2, where A and t are parts of the public key. This case assumes knowledge of the least significant bits of t, t0. The second variant uses lattice reduction to derive s1 without the knowledge of t0. However, it needs a larger portion of s1 to be recovered by power analysis. We evaluate both variants on an ARM Cortex-M4 implementation of Dilithium-2. The experiments show that the attack assuming the knowledge of t0 can recover s1 from a single trace captured from a different from profiling device with a non-negligible probability.

Ida Bruhns, Sebastian Berndt, Jonas Sander, Thomas Eisenbarth
Published 2024-10-07 PDFPDF

Machine learning applications gain more and more access to highly sensitive information while simultaneously requiring more and more computation resources. Hence, the need for outsourcing these computational expensive tasks while still ensuring security and confidentiality of the data is imminent. In their seminal work, Tramer and Boneh presented the Slalom protocol for privacy-preserving inference by splitting the computation into a data-independent preprocessing phase and a very efficient online phase. In this work, we present a new method to significantly speed up the preprocessing phase by introducing the Carnival protocol. Carnival leverages the pseudo-randomness of the Subset sum problem to also enable efficient outsourcing during the preprocessing phase. In addition to a security proof we also include an empirical study analyzing the landscape of the uniformity of the output of the Subset sum function for smaller parameters. Our findings show that Carnival is a great candidate for real-world implementations.

Ward Beullens, Pierre Briaud, Morten Øygarden
Published 2024-10-07 PDFPDF

Restricted syndrome decoding problems (R-SDP and R-SDP($G$)) provide an interesting basis for post-quantum cryptography. Indeed, they feature in CROSS, a submission in the ongoing process for standardizing post-quantum signatures.

This work improves our understanding of the security of both problems. Firstly, we propose and implement a novel collision attack on R-SDP($G$) that provides the best attack under realistic restrictions on memory. Secondly, we derive precise complexity estimates for algebraic attacks on R-SDP that are shown to be accurate by our experiments. We note that neither of these improvements threatens the updated parameters of CROSS.

Soichiro Kobayashi, Rei Ueno, Yosuke Todo, Naofumi Homma
Published 2024-10-07 PDFPDF

This paper presents a new side-channel attack (SCA) on unrolled implementations of stream ciphers, with a particular focus on Trivium. Most conventional SCAs predominantly concentrate on leakage of some first rounds prior to the sufficient diffusion of the secret key and initial vector (IV). However, recently, unrolled hardware implementation has become common and practical, which achieves higher throughput and energy efficiency compared to a round-based hardware. The applicability of conventional SCAs to such unrolled hardware is unclear because the leakage of the first rounds from unrolled hardware is hardly observed. In this paper, focusing on Trivium, we propose a novel SCA on unrolled stream cipher hardware, which can exploit leakage of rounds latter than 80, while existing SCAs exploited intermediate values earlier than 80 rounds. We first analyze the algebraic equations representing the intermediate values of these rounds and present the recursive restricted linear decomposition (RRLD) strategy. This approach uses correlation power analysis (CPA) to estimate the intermediate values of latter rounds. Furthermore, we present a chosen-IV strategy for a successful key recovery through linearization. We experimentally demonstrate that the proposed SCA achieves the key recovery of a 288-round unrolled Trivium hardware implementation using 360,000 traces. Finally, we evaluate the performance of unrolled Trivium hardware implementations to clarify the trade-off between performance and SCA (in)security. The proposed SCA requires 34.5 M traces for a key recovery of 384-round unrolled Trivium implementation and is not applicable to 576-round unrolled hardware.

Damien Vidal, Claire Delaplace, Sorina Ionica
Published 2024-10-07 PDFPDF

The Crossbred algorithm is currently the state-of-the-art method for solving overdetermined multivariate polynomial systems over $\mathbb{F}_2$. Since its publication in 2017, several record breaking implementations have been proposed and demonstrate the power of this hybrid approach. Despite these practical results, the complexity of this algorithm and the choice of optimal parameters for it are difficult open questions. In this paper, we prove a bivariate generating series for potentially admissible parameters of the Crossbred algorithm.

Laurent-Stéphane Didier, Nadia El Mrabet, Léa Glandus, Jean-Marc Robert
Published 2024-10-07 PDFPDF

This paper presents software implementations of batch computations, dealing with multi-precision integer operations. In this work, we use the Single Instruction Multiple Data (SIMD) AVX512 instruction set of the x86-64 processors, in particular the vectorized fused multiplier-adder VPMADD52. We focus on batch multiplications, squarings, modular multiplications, modular squarings and constant time modular exponentiations of 8 values using a word-slicing storage. We explore the use of Schoolbook and Karatsuba approaches with operands up to 4108 and 4154 bits respectively. We also introduce a truncated multiplication that speeds up the computation of the Montgomery modular reduction in the context of software implementation. Our Truncated Montgomery modular multiplication improvement offers speed gains of almost 20 % over the conventional non-truncated versions. Compared to the state-of-the-art GMP and OpenSSL libraries, our speedup modular operations are more than 4 times faster. Compared to OpenSSL BN_mod_exp_mont_consttimex2 using AVX512 and madd52* (madd52hi or madd52lo) in 256-bit registers, in fixed-window exponentiations of sizes $1024$ and $2048$, our 512-bit implementation provides speedups of respectively 1.75 and 1.38, while the 256-bit version speedups are 1.51 and 1.05 for $1024$ and $2048$-bit sizes (batch of 4 values in this case).

Dinal Kamel, François-Xavier Standaert, Olivier Bronchain
Published 2024-10-07 PDFPDF

Raccoon is a lattice-based scheme submitted to the NIST 2022 call for additional post-quantum signatures. One of its main selling points is that its design is intrinsically easy to mask against side-channel attacks. So far, Raccoon's physical security guarantees were only stated in the abstract probing model. In this paper, we discuss how these probing security results translate into guarantees in more realistic leakage models. We also highlight that this translation differs from what is usually observed (e.g., in symmetric cryptography), due to the algebraic structure of Raccoon's operations. For this purpose, we perform an in-depth information theoretic evaluation of Raccoon's most innovative part, namely the AddRepNoise function which allows generating its arithmetic shares on-the-fly. Our results are twofold. First, we show that the resulting shares do not enforce a statistical security order (i.e., the need for the side-channel adversary to estimate higher-order moments of the leakage distribution), as usually expected when masking. Second, we observe that the first-order leakage on the (large) random coefficients manipulated by Raccoon cannot be efficiently turned into leakage on the (smaller) coefficients of its long-term secret. Concretely, our information theoretic evaluations for relevant leakage functions also suggest that Raccoon's masked implementations can ensure high security with less shares than suggested by a conservative analysis in the probing model.

Robin Geelen
Published 2024-10-07 PDFPDF

Numerous applications in homomorphic encryption require an operation that moves the slots of a ciphertext to the coefficients of a different ciphertext. For the BGV and BFV schemes, the only efficient algorithms to implement this slot-to-coefficient transformation were proposed in the setting of non-power-of-two cyclotomic rings. In this paper, we devise an FFT-like method to decompose the slot-to-coefficient transformation (and its inverse) for power-of-two cyclotomic rings. The proposed method can handle both fully and sparsely packed slots. Our algorithm brings down the computational complexity of the slot-to-coefficient transformation from a linear to a logarithmic number of FHE operations, which is shown via a detailed complexity analysis.

The new procedures are implemented in Microsoft SEAL for BFV. The experiments report a speedup of up to 44 times when packing 2^12 elements from GF(8191^8). We also study a fully packed bootstrapping operation that refreshes 2^15 elements from GF(65537) and obtain an amortized speedup of 12 times.

Diego F. Aranha, Georgios Fotiadis, Aurore Guillevic
Published 2024-10-07 PDFPDF

For more than two decades, pairings have been a fundamental tool for designing elegant cryptosystems, varying from digital signature schemes to more complex privacy-preserving constructions. However, the advancement of quantum computing threatens to undermine public-key cryptography. Concretely, it is widely accepted that a future large-scale quantum computer would be capable to break any public-key cryptosystem used today, rendering today's public-key cryptography obsolete and mandating the transition to quantum-safe cryptographic solutions. This necessity is enforced by numerous recognized government bodies around the world, including NIST which initiated the first open competition in standardizing post-quantum (PQ) cryptographic schemes, focusing primarily on digital signatures and key encapsulation/public-key encryption schemes. Despite the current efforts in standardizing PQ primitives, the landscape of complex, privacy-preserving cryptographic protocols, e.g., zkSNARKs/zkSTARKs, is at an early stage. Existing solutions suffer from various disadvantages in terms of efficiency and compactness and in addition, they need to undergo the required scrutiny to gain the necessary trust in the academic and industrial domains. Therefore, it is believed that the migration to purely quantum-safe cryptography would require an intermediate step where current classically secure protocols and quantum-safe solutions will co-exist. This is enforced by the report of the Commercial National Security Algorithm Suite version 2.0, mandating transition to quantum-safe cryptographic algorithms by 2033 and suggesting to incorporate ECC at 192-bit security in the meantime. To this end, the present paper aims at providing a comprehensive study on pairings at 192-bit security level. We start with an exhaustive review in the literature to search for all possible recommendations of such pairing constructions, from which we extract the most promising candidates in terms of efficiency and security, with respect to the advanced Special TNFS attacks. Our analysis is focused, not only on the pairing computation itself, but on additional operations that are relevant in pairing-based applications, such as hashing to pairing groups, cofactor clearing and subgroup membership testing. We implement all functionalities of the most promising candidates within the RELIC cryptographic toolkit in order to identify the most efficient pairing implementation at 192-bit security and provide extensive experimental results.

Gaëtan Cassiers, Loïc Masure, Charles Momin, Thorben Moos, Amir Moradi, François-Xavier Standaert
Published 2024-07-08 PDFPDF

Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are available, without taking the cost of their generation into account, leads to a poor understanding of the efficiency vs. security tradeoff of masked implementations. This is especially relevant in case of hardware masking schemes which are known to consume large amounts of random bits per cycle due to parallelism. Currently, there seems to be no consensus on how to most efficiently derive many pseudo-random bits per clock cycle from an initial seed and with properties suitable for masked hardware implementations. In this work, we evaluate a number of building blocks for this purpose and find that hardware-oriented stream ciphers like Trivium and its reduced-security variant Bivium B outperform most competitors when implemented in an unrolled fashion. Unrolled implementations of these primitives enable the flexible generation of many bits per cycle, which is crucial for satisfying the large randomness demands of state-of-the-art masking schemes. According to our analysis, only Linear Feedback Shift Registers (LFSRs), when also unrolled, are capable of producing long non-repetitive sequences of random-looking bits at a higher rate per cycle for the same or lower cost as Trivium and Bivium B. Yet, these instances do not provide black-box security as they generate only linear outputs. We experimentally demonstrate that using multiple output bits from an LFSR in the same masked implementation can violate probing security and even lead to harmful randomness cancellations. Circumventing these problems, and enabling an independent analysis of randomness generation and masking, requires the use of cryptographically stronger primitives like stream ciphers. As a result of our studies, we provide an evidence-based estimate for the cost of securely generating $n$ fresh random bits per cycle. Depending on the desired level of black-box security and operating frequency, this cost can be as low as $20n$ to $30n$ ASIC gate equivalents (GE) or $3n$ to $4n$ FPGA look-up tables (LUTs), where $n$ is the number of random bits required. Our results demonstrate that the cost per bit is (sometimes significantly) lower than estimated in previous works, incentivizing parallelism whenever exploitable. This provides further motivation to potentially move low randomness usage from a primary to a secondary design goal in hardware masking research.

Guilhèm Assael, Philippe Elbaz-Vincent
Published 2024-07-08 PDFPDF

Several cryptographic schemes, including lattice-based cryptography and the SHA-2 family of hash functions, involve both integer arithmetic and Boolean logic. Each of these classes of operations, considered separately, can be efficiently implemented under the masking countermeasure when resistance against vertical attacks is required. However, protecting interleaved arithmetic and logic operations is much more expensive, requiring either additional masking conversions to switch between masking schemes, or implementing arithmetic functions as nonlinear operations over a Boolean masking. Both solutions can be achieved by providing masked arithmetic addition over Boolean shares, which is an operation with relatively long latency and usually high area utilization in hardware. A further complication arises when the arithmetic performed by the scheme is over a prime modulus, which is common in lattice-based cryptography. In this work, we propose a first-order masked implementation of arithmetic addition over Boolean shares occupying a very small area, while still having reasonable latency. Our proposal is specifically tuned for efficient addition and subtraction modulo an arbitrary integer, but it can also be configured at runtime for power-of-two arithmetic. To the best of our knowledge, we propose the first such construction whose security is formally proven in the glitch+transition-robust probing model.

Qinyi Li, Xavier Boyen
Published 2024-07-08 PDFPDF

Public-key searchable encryption allows keyword-associated tokens to be used to test if a ciphertext contains specific keywords. Due to the low entropies of keywords, the token holder can create ciphertexts from candidate keywords and test them using the token in hand to recover the keywords, known as inside keyword guessing attacks (IKGA). Public-key authenticated encryption with keyword search is a searchable encryption proposed to defend against such attacks. It ensures the sender's private key protects the ciphertexts from the IKGA. PAEKS schemes with reasonable security and practical efficiency remain elusive despite many proposals. This work provides a simple generic PAEKS scheme from non-interactive key exchange (NIKE) and symmetric-key equality-predicate encryption with three new constructions for the latter, respectively from pseudorandom functions (PRFs), the decision bilinear Diffie-Hellman assumption, and the learning-with-errors assumption. Instantiating our generic scheme, we derive several PAEKS schemes from the most well-known assumptions, with some of them achieving full cipher-keyword indistinguishability and full token indistinguishability in the standard model, for the first time. Our instantiated schemes allow practical implementations and outperform the existing PAEKS schemes under the same assumptions.

Jianhua Wang, Tao Huang, Shuang Wu, Zilong Liu
Published 2024-07-08 PDFPDF

In this paper, we aim to explore the design of low-latency authenticated encryption schemes particularly for memory encryption, with a focus on the temporal uniqueness property. To achieve this, we present the low-latency Pseudo-Random Function (PRF) called Twinkle with an output up to 1152 bits. Leveraging only one block of Twinkle, we developed Twinkle-AE, a specialized authenticated encryption scheme with six variants covering different cache line sizes and security requirements. We also propose Twinkle-PA, a pointer authentication algorithm, which takes a 64-bit pointer and 64-bit context as input and outputs a tag of 1 to 32 bits.

We conducted thorough security evaluations of both the PRFs and these schemes, examining their robustness against various common attacks. The results of our cryptanalysis indicate that these designs successfully achieve their targeted security objectives.

Hardware implementations using the FreePDK45nm library show that Twinkle-AE achieves an encryption and authentication latency of 3.83 ns for a cache line. In comparison, AES-CTR with WC-MAC scheme and Ascon-128a achieve latencies of 9.78 ns and 27.30 ns, respectively. Moreover, Twinkle-AE is also most area-effective for the 1024-bit cache line. For the pointer authentication scheme Twinkle-PA, the latency is 2.04 ns, while QARMA-64-sigma0 has a latency of 5.57 ns.

Vincent Hwang
Published 2024-07-08 PDFPDF

We survey various mathematical tools used in software works multiplying polynomials in \[ \frac{\mathbb{Z}_q[x]}{\left\langle {x^n - \alpha x - \beta} \right\rangle}. \] In particular, we survey implementation works targeting polynomial multiplications in lattice-based cryptosystems Dilithium, Kyber, NTRU, NTRU Prime, and Saber with instruction set architectures/extensions Armv7-M, Armv7E-M, Armv8-A, and AVX2.

There are three emphases in this paper: (i) modular arithmetic, (ii) homomorphisms, and (iii) vectorization. For modular arithmetic, we survey Montgomery, Barrett, and Plantard multiplications. For homomorphisms, we survey (a) various homomorphisms such as Cooley–Tukey FFT, Good–Thomas FFT, Bruun's FFT, Rader's FFT, Karatsuba, and Toom–Cook; (b) various algebraic techniques for adjoining nice properties to the coefficient rings, including localization, Schönhage's FFT, Nussbaumer's FFT, and coefficient ring switching; and (c) various algebraic techniques related to the polynomial moduli, including twisting, composed multiplication, evaluation at $\infty$, truncation, incomplete transformation, striding, and Toeplitz matrix-vector product. For vectorization, we survey the relations between homomorphisms and vector arithmetic.

We then go through several case studies: We compare the implementations of modular multiplications used in Dilithium and Kyber, explain how the matrix-to-vector structure was exploited in Saber, and review the design choices of transformations for NTRU and NTRU Prime with vectorization. Finally, we outline several interesting implementation projects.

Décio Luiz Gazzoni Filho, Tomás S. R. Silva, Julio López
Published 2024-07-08 PDFPDF

We present a solution to the open problem of designing a linear-time, unbiased and timing attack-resistant shuffling algorithm for fixed-weight sampling. Although it can be implemented without timing leakages of secret data in any architecture, we illustrate with ARMv7-M and ARMv8-A implementations; for the latter, we take advantage of architectural features such as NEON and conditional instructions, which are representative of features available on architectures targeting similar systems, such as Intel. Our proposed algorithm improves asymptotically upon the current approach based on constant-time sorting networks ($O(n)$ versus $O(n \log^2 n)$), and an implementation of the new algorithm applied to NTRU is also faster in practice, by a factor of up to $6.91\ (591\%)$ on ARMv8-A cores and $12.89\ (1189\%)$ on the Cortex-M4; it also requires fewer uniform random bits. This translates into performance improvements for NTRU encapsulation, compared to state-of-the-art implementations, of up to 50% on ARMv8-A cores and 72% on the Cortex-M4, and small improvements to key generation (up to 2.7% on ARMv8-A cores and 6.1% on the Cortex-M4), with negligible impact on code size and a slight improvement in RAM usage for the Cortex-M4.

Nouri Alnahawi, Johannes Müller, Jan Oupický, Alexander Wiesmaier
Published 2024-07-08 PDFPDF

Transport Layer Security (TLS) is the backbone security protocol of the Internet. As this fundamental protocol is at risk from future quantum attackers, many proposals have been made to protect TLS against this threat by implementing post-quantum cryptography (PQC). The widespread interest in post-quantum TLS has given rise to a large number of solutions over the last decade. These proposals differ in many aspects, including the security properties they seek to protect, the efficiency and trustworthiness of their post-quantum building blocks, and the application scenarios they consider, to name a few.

Based on an extensive literature review, we classify existing solutions according to their general approaches, analyze their individual contributions, and present the results of our extensive performance experiments. Based on these insights, we identify the most reasonable candidates for post-quantum TLS, which research problems in this area have already been solved, and which are still open. Overall, our work provides a well-founded reference point for researching post-quantum TLS and preparing TLS in practice for the quantum age.

Yi-Hsiu Chen, Yehuda Lindell
Published 2024-07-08 PDFPDF

Fischlin's transform (CRYPTO 2005) is an alternative to the Fiat-Shamir transform that enables straight-line extraction when proving knowledge. In this work we focus on the problem of using the Fischlin transform to construct UC-secure zero-knowledge from Sigma protocols, since UC security – that guarantees security under general concurrent composition – requires straight-line (non-rewinding) simulators. We provide a slightly simplified transform that is much easier to understand, and present algorithmic and implementation optimizations that significantly improve the running time. It appears that the main obstacles to the use of Fischlin in practice is its computational cost and implementation complexity (with multiple parameters that need to be chosen). We provide clear guidelines and a simple methodology for choosing parameters, and show that with our optimizations the running-time is far lower than expected. For just one example, on a 2023 MacBook, the cost of proving the knowledge of discrete log with Fischlin is only 0.41ms (on a single core). This is 15 times slower than plain Fiat-Shamir on the same machine, which is a significant multiple but objectively not significant in many applications. We also extend the transform so that it can be applied to batch proofs, and show how this can be much more efficient than individually proving each statement. We hope that this paper will both encourage and help practitioners implement the Fischlin transform where relevant.

Camille Mutschler, Laurent Imbert, Thomas Roche
Published 2024-07-08 PDFPDF

We introduce InspectorGadget, an Open-Source Python-based software for assessing and comparing the complexity of masking gadgets. By providing a limited set of characteristics of a hardware platform, our tool allows to estimate the cost of a masking gadget in terms of cycle count equivalent and memory footprint. InspectorGadget is highly flexible. It enables the user to define her own estimation functions, as well as to expand the set of gadgets and predefined microcontrollers. As a case-study, we produce a fair comparison of several masked versions of Kyber compression function from the literature, together with novel alternatives automatically generated by our tool. Our results confirm that an interesting middle ground exists between theoretical performance measures (asymptotic complexity or operations count) and real implementations benchmarks (clock cycle accurate evaluations). InspectorGadget offers both simplicity and genericity while capturing the main performance-related parameters of a hardware platform.

Sean Murphy, Rachel Player
Published 2024-07-08 PDFPDF

This paper develops Central Limit arguments for analysing the noise in ciphertexts in two homomorphic encryption schemes that are based on Ring-LWE. The first main contribution of this paper is to present and evaluate an average-case noise analysis for the BGV scheme. Our approach relies on the recent work of Costache et al.(SAC 2023) that gives the approximation of a polynomial product as a multivariate Normal distribution. We show how this result can be applied in the BGV context and evaluate its efficacy. We find this average-case approach can much more closely model the noise growth in BGV implementations than prior approaches, but in some cases it can also underestimate the practical noise growth. Our second main contribution is to develop a Central Limit framework to analyse the noise growth in the homomorphic Ring-LWE cryptosystem of Lyubashevsky, Peikert and Regev (Eurocrypt 2013, full version). Our approach is very general: apart from finite variance, no assumption on the distribution of the noise is required (in particular, the noise need not be subgaussian). We show that our approach leads to tighter bounds for the probability of decryption failure than those of prior work.

Kemal Bicakci, Kemal Ulker, Yusuf Uzunay, Halis Taha Şahin, Muhammed Said Gündoğan
Published 2024-07-08 PDFPDF

The adversary model of white-box cryptography includes an extreme case where the adversary, sitting at the endpoint, has full access to a cryptographic scheme. Motivating by the fact that most existing white-box implementations focus on symmetric encryption, we present implementations for hash-based signatures so that the security against white-box attackers (who have read-only access to data with a size bounded by a space-hardness parameter M) depends on the availability of a white-box secure cipher (in addition to a general one-way function). We also introduce parameters and key-generation complexity results for white-box secure instantiation of stateless hash-based signature scheme SPHINCS+, one of the NIST selections for quantum-resistant digital signature algorithms, and its older version SPHINCS. We also present a hash tree-based solution for one-time passwords secure in a white-box attacker context. We implement the proposed solutions and share our performance results.

Décio Luiz Gazzoni Filho, Guilherme Brandão, Julio López
Published 2024-04-09 PDFPDF

Efficient polynomial multiplication routines are critical to the performance of lattice-based post-quantum cryptography (PQC). As PQC standards only recently started to emerge, CPUs still lack specialized instructions to accelerate such routines. Meanwhile, deep learning has grown immeasurably in importance. Its workloads call for teraflops-level of processing power for linear algebra operations, mainly matrix multiplication. Computer architects have responded by introducing ISA extensions, coprocessors and special-purpose cores to accelerate such operations. In particular, Apple ships an undocumented matrix-multiplication coprocessor, AMX, in hundreds of millions of mobile phones, tablets and personal computers. Our work repurposes AMX to implement polynomial multiplication and applies it to the NTRU cryptosystem, setting new speed records on the Apple M1 and M3 systems-on-chip (SoCs): polynomial multiplication, key generation, encapsulation and decapsulation are sped up by $1.54$–$3.07\times$, $1.08$–$1.33\times$, $1.11$–$1.50\times$ and $1.20$–$1.98\times$, respectively, over the previous state-of-the-art.

Pierrick Méaux, Jeongeun Park, Hilder V. L. Pereira
Published 2024-04-09 PDFPDF

Fully Homomorphic Encryption (FHE) is a powerful tool to achieve non-interactive privacy preserving protocols with optimal computation/communication complexity. However, the main disadvantage is that the actual communication cost (bandwidth) is high due to the large size of FHE ciphertexts. As a solution, a technique called transciphering (also known as Hybrid Homomorphic Encryption) was introduced to achieve almost optimal bandwidth for such protocols. However, all existing works require clients to fix a precision for the messages or a mathematical structure for the message space beforehand. It results in unwanted constraints on the plaintext size or underlying structure of FHE based applications.

In this article, we introduce a new approach for transciphering which does not require fixed message precision decided by the client, for the first time. In more detail, a client uses any kind of FHE-friendly symmetric cipher for $\{0,1\}$ to send its input data encrypted bit-by-bit, then the server can choose a precision $p$ depending on the application and homomorphically transforms the encrypted bits into FHE ciphertexts encrypting integers in $\mathbb{Z}_p$. To illustrate our new technique, we evaluate a transciphering using FiLIP cipher and adapt the most practical homomorphic evaluation technique [CCS'22] to keep the practical latency. As a result, our proof-of-concept implementation for $p$ from $2^2$ to $2^8$ takes only from $13$ ms to $137$ ms.

Jingwen Chen, Qun Liu, Yanhong Fan, Lixuan Wu, Boyun Li, Meiqin Wang
Published 2024-04-09 PDFPDF

In recent years, quantum technology has been rapidly developed. As security analyses for symmetric ciphers continue to emerge, many require an evaluation of the resources needed for the quantum circuit implementation of the encryption algorithm. In this regard, we propose the quantum circuit decision problem, which requires us to determine whether there exists a quantum circuit for a given permutation f using M ancilla qubits and no more than K quantum gates within the circuit depth D. Firstly, we investigate heuristic algorithms and classical SAT-based models in previous works, revealing their limitations in solving the problem. Hence, we innovatively propose an improved SAT-based model incorporating three metrics of quantum circuits. The model enables us to find the optimal quantum circuit of an arbitrary 3 or 4-bit S-box under a given optimization goal based on SAT solvers, which has proved the optimality of circuits constructed by the tool, LIGHTER-R. Then, by combining different criteria in the model, we find more compact quantum circuit implementations of S-boxes such as RECTANGLE and GIFT. For GIFT S-box, our model provides the optimal quantum circuit that only requires 8 gates with a depth of 31. Furthermore, our model can be generalized to linear layers and improve the previous SAT-based model proposed by Huang et al. in ASIACRYPT 2022 by adding the criteria on the number of qubits and the circuit depth.

Loïc Demange, Mélissa Rossi
Published 2024-04-09 PDFPDF

BIKE is a post-quantum key encapsulation mechanism (KEM) selected for the 4th round of the NIST's standardization campaign. It relies on the hardness of the syndrome decoding problem for quasi-cyclic codes and on the indistinguishability of the public key from a random element, and provides the most competitive performance among round 4 candidates, which makes it relevant for future real-world use cases. Analyzing its side-channel resistance has been highly encouraged by the community and several works have already outlined various side-channel weaknesses and proposed ad-hoc countermeasures. However, in contrast to the well-documented research line on masking lattice-based algorithms, the possibility of generically protecting code-based algorithms by masking has only been marginally studied in a 2016 paper by Chen et al. in SAC 2015. At this stage of the standardization campaign, it is important to assess the possibility of fully masking BIKE scheme and the resulting cost in terms of performances.

In this work, we provide the first high-order masked implementation of a code-based algorithm. We had to tackle many issues such as finding proper ways to handle large sparse polynomials, masking the key-generation algorithm or keeping the benefit of the bitslicing. In this paper, we present all the gadgets necessary to provide a fully masked implementation of BIKE, we discuss our different implementation choices and we propose a full proof of masking in the Ishai Sahai and Wagner (Crypto 2003) model.

More practically, we also provide an open C-code masked implementation of the key-generation, encapsulation and decapsulation algorithms with extensive benchmarks. While the obtained performance is slower than existing masked lattice-based algorithms, we show that masking at order 1, 2, 3, 4 and 5 implies a performance penalty of x5.8, x14.2, x24.4, x38 and x55.6 compared to order 0 (unmasked and unoptimized BIKE). This scaling is encouraging and no Boolean to Arithmetic conversion has been used.

Akira Takahashi, Greg Zaverucha
Published 2024-04-09 PDFPDF

Verifiable encryption (VE) is a protocol where one can provide assurance that an encrypted plaintext satisfies certain properties, or relations. It is an important building block in cryptography with many useful applications, such as key escrow, group signatures, optimistic fair exchange, and others. However, the majority of previous VE schemes are restricted to instantiation with specific public-key encryption schemes or relations. In this work, we propose a novel framework that realizes VE protocols using zero-knowledge proof systems based on the MPC-in-the-head paradigm (Ishai et al. STOC 2007). Our generic compiler can turn a large class of zero-knowledge proofs into secure VE protocols for any secure public-key encryption scheme with the undeniability property, a notion that essentially guarantees binding of encryption when used as a commitment scheme. Our framework is versatile: because the circuit proven by the MPC-in-the-head prover is decoupled from a complex encryption function, the work of the prover is focused on proving the encrypted data satisfies the relation, not the proof of plaintext knowledge. Hence, our approach allows for instantiation with various combinations of properties about the encrypted data and encryption functions. We then consider concrete applications, to demonstrate the efficiency of our framework, by first giving a new approach and implementation to verifiably encrypt discrete logarithms in any prime order group more efficiently than was previously known. Then we give the first practical verifiable encryption scheme for AES keys with post-quantum security, along with an implementation and benchmarks.

Shahla Atapoor, Karim Baghery, Hilder V. L. Pereira, Jannik Spiessens
Published 2024-04-09 PDFPDF

Fully Homomorphic Encryption (FHE) is a prevalent cryptographic primitive that allows for computation on encrypted data. In various cryptographic protocols, this enables outsourcing computation to a third party while retaining the privacy of the inputs to the computation. However, these schemes make an honest-but-curious assumption about the adversary. Previous work has tried to remove this assumption by combining FHE with Verifiable Computation (VC). Recent work has increased the flexibility of this approach by introducing integrity checks for homomorphic computations over rings. However, efficient FHE for circuits of large multiplicative depth also requires non-ring computations called maintenance operations, i.e. modswitching and keyswitching, which cannot be efficiently verified by existing constructions. We propose the first efficiently verifiable FHE scheme that allows for arbitrary depth homomorphic circuits by utilizing the double-CRT representation in which FHE schemes are typically computed, and using lattice-based SNARKs to prove components of this computation separately, including the maintenance operations. Therefore, our construction can theoretically handle bootstrapping operations. We also present the first implementation of a verifiable computation on encrypted data for a computation that contains multiple ciphertext-ciphertext multiplications. Concretely, we verify the homomorphic computation of an approximate neural network containing three layers and >100 ciphertexts in less than 1 second while maintaining reasonable prover costs.

Dan Boneh, Benedikt Bünz, Ben Fisch
Published 2024-04-09 PDFPDF

A verifiable delay function (VDF) is an important tool used for adding delay in decentralized applications. This paper surveys and compares two beautiful verifiable delay functions, one due to Pietrzak, and the other due to Wesolowski, In addition, we provide a new computational proof of security for one of them, present an attack on an incorrect implementation of the other, and compare the complexity assumptions needed for both schemes.

Fabio Campos, Jorge Chávez-Saab, Jesús-Javier Chi-Domínguez, Michael Meyer, Krijn Reijnders, Francisco Rodríguez-Henríquez, Peter Schwabe, Thom Wiggers
Published 2024-04-09 PDFPDF

In this work, we assess the real-world practicality of CSIDH, an isogeny-based non-interactive key exchange. We provide the first thorough assessment of the practicality of CSIDH in higher parameter sizes for conservative estimates of quantum security, and with protection against physical attacks.

This requires a three-fold analysis of CSIDH. First, we describe two approaches to efficient high-security CSIDH implementations, based on SQALE and CTIDH. Second, we optimize such high-security implementations, on a high level by improving several subroutines, and on a low level by improving the finite field arithmetic. Third, we benchmark the performance of high-security CSIDH. As a stand-alone primitive, our implementations outperform previous results by a factor up to 2.53×.

As a real-world use case considering network protocols, we use CSIDH in TLS variants that allow early authentication through a NIKE. Although our instantiations of CSIDH have smaller communication requirements than post-quantum KEM and signature schemes, even our highly-optimized implementations result in too-large handshake latency (tens of seconds), showing that CSIDH is only practical in niche cases.