Communications in Cryptology IACR CiC


Dates are inconsistent
3 results sorted by publication date
Gaëtan Cassiers, Loïc Masure, Charles Momin, Thorben Moos, Amir Moradi, François-Xavier Standaert
Published 2024-07-08 PDFPDF

Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are available, without taking the cost of their generation into account, leads to a poor understanding of the efficiency vs. security tradeoff of masked implementations. This is especially relevant in case of hardware masking schemes which are known to consume large amounts of random bits per cycle due to parallelism. Currently, there seems to be no consensus on how to most efficiently derive many pseudo-random bits per clock cycle from an initial seed and with properties suitable for masked hardware implementations. In this work, we evaluate a number of building blocks for this purpose and find that hardware-oriented stream ciphers like Trivium and its reduced-security variant Bivium B outperform most competitors when implemented in an unrolled fashion. Unrolled implementations of these primitives enable the flexible generation of many bits per cycle, which is crucial for satisfying the large randomness demands of state-of-the-art masking schemes. According to our analysis, only Linear Feedback Shift Registers (LFSRs), when also unrolled, are capable of producing long non-repetitive sequences of random-looking bits at a higher rate per cycle for the same or lower cost as Trivium and Bivium B. Yet, these instances do not provide black-box security as they generate only linear outputs. We experimentally demonstrate that using multiple output bits from an LFSR in the same masked implementation can violate probing security and even lead to harmful randomness cancellations. Circumventing these problems, and enabling an independent analysis of randomness generation and masking, requires the use of cryptographically stronger primitives like stream ciphers. As a result of our studies, we provide an evidence-based estimate for the cost of securely generating $n$ fresh random bits per cycle. Depending on the desired level of black-box security and operating frequency, this cost can be as low as $20n$ to $30n$ ASIC gate equivalents (GE) or $3n$ to $4n$ FPGA look-up tables (LUTs), where $n$ is the number of random bits required. Our results demonstrate that the cost per bit is (sometimes significantly) lower than estimated in previous works, incentivizing parallelism whenever exploitable. This provides further motivation to potentially move low randomness usage from a primary to a secondary design goal in hardware masking research.

Guilhèm Assael, Philippe Elbaz-Vincent
Published 2024-07-08 PDFPDF

Several cryptographic schemes, including lattice-based cryptography and the SHA-2 family of hash functions, involve both integer arithmetic and Boolean logic. Each of these classes of operations, considered separately, can be efficiently implemented under the masking countermeasure when resistance against vertical attacks is required. However, protecting interleaved arithmetic and logic operations is much more expensive, requiring either additional masking conversions to switch between masking schemes, or implementing arithmetic functions as nonlinear operations over a Boolean masking. Both solutions can be achieved by providing masked arithmetic addition over Boolean shares, which is an operation with relatively long latency and usually high area utilization in hardware. A further complication arises when the arithmetic performed by the scheme is over a prime modulus, which is common in lattice-based cryptography. In this work, we propose a first-order masked implementation of arithmetic addition over Boolean shares occupying a very small area, while still having reasonable latency. Our proposal is specifically tuned for efficient addition and subtraction modulo an arbitrary integer, but it can also be configured at runtime for power-of-two arithmetic. To the best of our knowledge, we propose the first such construction whose security is formally proven in the glitch+transition-robust probing model.

Camille Mutschler, Laurent Imbert, Thomas Roche
Published 2024-07-08 PDFPDF

We introduce InspectorGadget, an Open-Source Python-based software for assessing and comparing the complexity of masking gadgets. By providing a limited set of characteristics of a hardware platform, our tool allows to estimate the cost of a masking gadget in terms of cycle count equivalent and memory footprint. InspectorGadget is highly flexible. It enables the user to define her own estimation functions, as well as to expand the set of gadgets and predefined microcontrollers. As a case-study, we produce a fair comparison of several masked versions of Kyber compression function from the literature, together with novel alternatives automatically generated by our tool. Our results confirm that an interesting middle ground exists between theoretical performance measures (asymptotic complexity or operations count) and real implementations benchmarks (clock cycle accurate evaluations). InspectorGadget offers both simplicity and genericity while capturing the main performance-related parameters of a hardware platform.