A provably masked implementation of BIKE Key Encapsulation Mechanism

. BIKE is a post-quantum key encapsulation mechanism (KEM) selected for the 4th round of the NIST’s standardization campaign. It relies on the hardness of the syndrome decoding problem for quasi-cyclic codes and on the indistinguishability of the public key from a random element, and provides the most competitive performance among round 4 candidates, which makes it relevant for future real-world use cases. Analyzing its side-channel resistance has been highly encouraged by the community and several works have already outlined various side-channel weaknesses and proposed ad-hoc countermeasures. However, in contrast to the well-documented research line on masking lattice-based algorithms, the possibility of generically protecting code-based algorithms by masking has only been marginally studied in a 2016 paper by Cong Chen et al. At this stage of the standardization campaign, it is important to assess the possibility of fully masking BIKE scheme and the resulting cost in terms of performances. In this work, we provide the first high-order masked implementation of a code-based algorithm. We had to tackle many issues such as finding proper ways to handle large sparse polynomials, masking the key-generation algorithm or keeping the benefit of the bitslicing. In this paper, we present all the gadgets necessary to provide a fully masked implementation of BIKE , we discuss our different implementation choices and we propose a full proof of masking in the Ishai Sahai and Wagner (Crypto 2003) model. More practically, we also provide an open C-code masked implementation of the key-generation, encapsulation and decapsulation algorithms with extensive benchmarks. While the obtained performance is slower than existing masked lattice-based algorithms, the scaling in the masking order is still encouraging and no Boolean to Arithmetic conversion has been used. We hope that this work can be a starting point for future analysis and optimization.


Introduction
In response to the potential quantum threat, the NIST has initiated a standardization campaign in 2017 for defining new post-quantum algorithms.Different families of mathematical problems have received a lot of attention.Particularly, lattices and error-correcting codes stood out as interesting building blocks for post-quantum schemes.Six years after the first round of the campaign, three lattice-based schemes have been selected as future NIST post-quantum standards.But in parallel, the standardization campaign continues for other key-encapsulation schemes like code-based ones, because it is important to be able to have diverse schemes based on other structures.BIKE [ABB + 22], a round 4 candidate for the NIST standardization process, is still under analysis by the research community.It relies on the hardness of the syndrome decoding problem for quasi-cyclic codes and on the indistinguishability of the public key from a random element.It is the most efficient and offers the lowest key sizes of all round 4 candidates.BIKE belongs to a line of research on Quasi-Cyclic Moderate Density Parity-Check (QC-MDPC) code-based schemes started by [MTSB13].The advantage of QC-MDPC-based algorithms is the sparse structure of the underlying variables.Such codes allow for the use of iterative bit-flipping decoding algorithms (detailed later in the paper) as part of the decapsulation.The first implementations of QC-MDPC codes [vMG14, MOG15,vMHG16] were not constant time and vulnerable to time attacks.In 2016, Chou proposed a portable constant-time C implementation [Cho16].Some implementation proposals have been made over the years [DG19, GAB19, BOG20], and another constant-time C implementation was introduced in 2020 [DGK20], especially improving the decoding part, and which is claimed protected against timing and cache attacks.Cortex-M4 optimized implementations of BIKE have been introduced later in [CCK21].Subsequently, optimizations have been proposed in [CGKT22].
Existing side-channel attacks Timing vulnerabilities have been handled with priority in the previously stated implementations.However, the authors of [GHJ + 22] have highlighted the possibility of using timing information of the constant weight word sampler in the decapsulation in order to apply [GJS16]'s reaction attack.Such a vulnerability has been thwarted by redesigning the word sampler in [Sen21].
On the power-consumption attacks side, several works have outlined various sidechannel weaknesses and proposed ad-hoc countermeasures.Indeed, while BIKE's sparse and structured private keys are essential for providing good performances and compactness, this exact structure and redundancy can be exploited by side-channel attacks in order to lower down the difficulty of the underlying decoding problem.For instance, Chou's implementation has been targeted by a differential power analysis attack on the syndrome computation in [RHHM17].Later, an improvement of the previous attack and a single-trace analysis exploiting leakage in the syndrome computation were provided in [SKC + 19].Very recently, [CARG23] introduced a new single-trace attack on the most recent implementation of BIKE.The authors use unsupervised clustering techniques on the trace during the cyclic shifts computation to recover some bits of the positions of the ones in the private key.Next, they combine such knowledge with classical information set decoding techniques to recover the full key.

Existing generic side-channel protections
The current implementations of codebased schemes are claimed to be protected against timing and cache attacks, but they are never fully masked, i.e. masked from key generation to decapsulation.Masking is known as the most deployed countermeasure against physical attackers and is widely applied in embedded systems.Masking consists in randomizing any secret-dependent intermediate variable.Each of these secret-dependent intermediate variables, say x, is split into d + 1 variables (x i ) 0≤i≤d called "shares".The integer d is referred to as the masking order.In this paper, the only necessary type of masking is Boolean masking.In other words, a sensitive variable x is shared in (x i ) 0≤i≤d such that (1) While F 2 -linear operations can straightforwardly be applied share-wise, non-linear operations are more complex and require additional randomness, as shown in [ISW03].
Proving the security of a masked design consists in showing that the joint distribution of any set of at most d intermediate variables is independent of the secrets.But, the bigger the algorithm is, the more dependencies to be considered in the proof.Fortunately, several works have defined intermediate security properties that simplify the security proofs [RP10, CPRR14, BBD + 16]: one can focus on proving the properties on small parts of the algorithms, denoted gadgets, and it is possible to securely compose the pieces together.
Much effort has been performed on provably masking lattice-based primitives in the past five years and many challenges have been overcome.For example, [BBE + 18] introduced a new security notion to justify unmasking certain intermediate steps.In [GR20], the authors proposed a masked implementation of the qTesla signature scheme [BAA +  The picture is less abundant when it comes to code-based schemes.One explanation could come from the large sparse polynomials leading to potential prohibitive performances or the complex counter-based decoder.The authors of [KLRBG22] propose a first-order masked inversion in multiplicative masking.Another recent work [KLRBG23] presents a way to mask BIKE's key generation with a fixed weight polynomial sampling technique and arithmetic to Boolean conversions.

Our contribution
In this paper, we provide the first provable high-order masked implementation of a code-based algorithm.We detail every masked gadget that is necessary for masking BIKE's key generation and decapsulation.The proofs are given in the d-probing model.Let us detail some aspects of our design.
• No mask conversion Mask-conversion gadgets consist in modifying the underlying masking operation, e.g.going from ⊕ to an addition in Z q .Even if the unmasked functionality is the identity function, these gadgets are known to be heavy in terms of computation time.Despite efficiency improvements since their introduction e.g. in [CGV14,Cor17,CGTV15], current secure mask conversion algorithms run in time at least O(d 2 ).Contrary to lattices, BIKE is fundamentally relying on binary operations.While the authors of [KLRBG23] have included mask conversion in their design, we believe that keeping only Boolean masking would be more natural and efficient.In this work, we give the first evidence that it is possible to completely mask BIKE without any mask conversion.
• Sparse versus dense representation.BIKE's intermediate variables are sparse polynomials with coefficients in F 2 .An important question arose rapidly when designing a masked BIKE: Should we represent the masked polynomials in dense form or keep the sparse structure and mask the indices of the non-zero coefficient instead?For the dense form, the number of non-zero coefficients is protected but the multiplication requires a masked Karatsuba-based multiplication algorithm.For the sparse form, the number of non-zero coefficients is accessible by timing attacks but a lighter multiplication algorithm based on cyclic shifts is possible.The sparse representation intuitively seems lighter but some parts necessarily required the dense form for security.For completeness, we decided to analyze both following approaches: 1.A fully-dense implementation where the polynomials are masked in dense form.2. A hybrid sparse-dense implementation where the polynomials are represented in sparse form whenever the number of non-zero coefficient is independent from any secret data.
Interestingly, our experiments showed that a fully-dense approach seems more relevant, especially for high orders.While (2) and (1) seem equivalent for one or two shares, (1) looks indeed more relevant for higher orders.This difference might shrink with more optimizations of the cyclic shift, as it will be discussed in the future work section.
• Many new gadgets.A lot of new gadgets needed to be introduced for masking BIKE.Although BIKE's bitslice addition technique turned out to operate well with Boolean masking, some other parts of the key generation were more challenging to mask.For example the Fisher-Yates sampling algorithm/technique and the polynomial inversion required many loops and subroutines.More generally, we provide in this paper all elementary gadgets that are necessary to mask BIKE even if their design did not pose any particular issue.We believe that they can be of independent interest for masking future code-based schemes.
We provide an open C-code implementation of the key-generation, encapsulation and decapsulation algorithms with detailed benchmarks.Although theoretically quadratic [ISW03], several post quantum masked designs can lead to an experimental scale in the masking order that tends to be exponential [BBE + 18, Table 1].The scaling we've obtained is very encouraging, as our experiments seem to indicate a quadratic scaling.We believe that it is even possible to further improve and optimize our code and maybe reach quasilinearity in the masking order.We hope that this work can be a first building block towards masked code-based cryptography and could lead to future analysis and new optimization.

Organization of the paper
In Section 2, we introduce all the necessary background on masking, QC-MDPC codes and BIKE.In Section 3, we present our general masked construction along with its composition security proof.In Section 4, we detail the gadgets.
For brevity, we only detail a few main gadgets and refer to Appendix A for the description of the remaining gadgets.Finally, in Section 5, we present our implementation and its benchmarks.We conclude with the future work in Section 6.

Masking
A shared variable (x i ) 0≤i≤d according to Eq. (1) will be denoted by x for readability.Note that for a masking order d, there are d + 1 shares.
Definition 1 (d-probing Security or ISW security [ISW03]).An algorithm is d-probing secure iff the joint distribution of any set of at most d internal intermediate values is independent of the secrets.
Even if d-probing security seems far from realistic side-channel protection, it is actually backed-up by theoretical model reductions that relate the d-probing security to side-channel security up to a certain level of noise [DDF14].Moreover, [CJRR99] showed that the number of measurements required to mount a successful side-channel attack usually increases exponentially in the masking order.
In addition to Definition 1, other intermediate security properties were introduced to ease the security proofs [RP10, CPRR14, BBD + 16].The focus can be placed on proving these properties on small parts of the algorithms, denoted gadgets.• its return expression is b and its last instruction is of the form b ← refresh( b )

Definition 2 (Gadget
• its sequence of parameters is a = { a 0 , ..., a n }, its i-th instruction is b ← refresh( a i ) for 1 ≤ i ≤ n, and a i is not used anywhere else in the algorithm We are going to rely on these definitions and this proposition to create our gadgets.

Codes
In this paper, we will only introduce the relevant information for masking BIKE.Not many aspects of coding theory are needed for understanding our work.
Definition 4 (Binary linear codes).A binary linear code C of length n and dimension r is a r-dimensional vector subspace of F n 2 .Is it possible to represent it in two equivalent ways: • either using a generator matrix • or using a parity-check matrix H ∈ F Definition 5 (Circulant matrix).An r × r matrix A is circulant if each row is a cyclic shift of the previous row.More precisely, A is of the form We say that A is generated by the vector (a 0 , • • • , a r−1 ).
Remark 1.It is possible to define an isomorphism between the ring of polynomials F 2 [X]/(X r − 1) and the set of circulant matrices of order r.To a vector (a 0 , • • • , a r−1 ) generating a circulant matrix, one can associate the polynomial r−1 i=0 a i X i .Multiplication and inversion can then be performed either with matrix multiplication or polynomial multiplication.
Definition 6 (Quasi-circulant matrix).A matrix is quasi-circulant if it is composed of circulant square blocks of size greater than 2.
For example, let be two circulant matrices.The matrix C = [A|B] defined as the concatenation of A and B is a quasi-circulant matrix.
Remark 2. Similarly to Remark 1, it is possible to represent quasi-circulant matrices as sets of polynomials.
Definition 7 (Quasi-cyclic code).A binary code C is quasi-cyclic iff it admits a quasicirculant generating matrix.
Definition 8 (QC-MDPC code).Let n, r, w be integer parameters for length, dimension and minimum code weight.A [n, r, w] QC-MDPC code C is a quasi-cyclic code that admits a parity-check matrix H such that H has a constant row weight w ≈ √ n.More precisely, let r and w be integer parameters.BIKE relies on [2r, r, w] QC-MDPC codes.Its private key corresponds to the parity check matrix.The security of the scheme reduces to quasi-cyclic variants of hard problems from coding theory [Ale03,BMvT78].We refer to [ABB + 22] for more information about the security and design rationale.BIKE's first building block is a public key encryption scheme (PKE) based on a variant of the Niederreiter framework [Nie86].The plaintext is represented by the sparse vector (e 0 , e 1 ), and the ciphertext by its syndrome.The decryption is performed with a decoding procedure that will be presented below in Section 2.4.Next, this PKE is converted into an IND-CCA KEM with the application of the Fujisaki-Okamoto transformation [FO99,HHK17].For the scheme to be truly IND-CCA, there must be conditions on the decoding failure rate (also called DFR), which is the case here with the chosen decoder.

BIKE scheme
Let us detail the key generation (KeyGen), Encapsulation (Encaps) and Decapsulation (Decaps) algorithms in more details.In addition to the parameters r and w, let us define t and ℓ as integer parameters.We denote R = F 2 [X]/(X r − 1) the underlying cyclic polynomial ring.Let us define as respectively the private key space, the error space, the message space and the shared key space.In the above, we denote by |h| the Hamming weight of the polynomial h, i.e. the number of non-zero coefficients of h.The Fujisaki-Okamoto transformation requires several hash functions: In the following, we denote a $ ← − B when a is sampled uniformly at random from B, and ← is an assigment of value.

Algorithm 1 Keygen
(e 0 , e 1 ) ←H(m) 3: c ← (e 0 + e 1 h, m⊕L(e 0 , e 1 )) Parameter setting As defined in the specifications, the parameters should satisfy several constraints.The block length r should be a prime number, and 2 should be primitive modulo r.The parameter w should be such that w = 2d ≈ √ n with d being odd.In addition, the error weight should be such that t ≈ √ n.We present the instantiated parameters in Table 1.
In the following, we will not discuss hash functions any further (see Section 3.3 for more details on implementation).We will use the H notation to represent the code's parity matrix, where h 0 and h 1 are the polynomials describing its two circulating blocks.

Decoding QC-MDPC codes
The choice of the decoder has a crucial impact on the security and the performances of the scheme.As QC-MDPC codes have sparse parity matrices, decoding techniques usually rely on Bit-Flipping techniques originally introduced in [Gal62] for low density parity-check matrices.
Technically, the Bit Flipping algorithm is presented in Algorithm 4 and works as follows: over several iterations, we compute the syndrome cH T where c is the ciphertext and H T is the transposed parity matrix of the code.Next, we count the number of unsatisfied parity-check equations for each position.If the counter for a position exceeds T , a pre-computed threshold (on the fly according to the weight of the syndrome), the position is flipped and the syndrome is recomputed.Let syndrome be the syndrome computation, counter the counter computation, and threshold the threshold computation function.We refer to [ABB + 22] for details. .This is a bitflipping algorithm that introduces two classification zones, with two different thresholds: the black zone and the gray zone.Two additional iterations are performed to verify the choices made during the classification.The BGF decoding algorithm is presented in Algorithm 5.This decoder also has a fixed number of iterations (set at 5), to avoid timing attacks.

Masked BIKE
We present here the core contribution of this paper: a fully masked encapsulation, decapsulation and key generation for BIKE.While the encapsulation uses mostly public data, most of it had to be masked anyway as part of the decapsulation process due to the IND-CCA transform.Thus, for a perfectly complete masked design, the masked encapsulation is also included in our code.The masked decapsulation is obviously the most important part as it is the primary target of side-channel attacks.A masked key generation can also be relevant to prevent single-trace key recovery attacks when the private key is generated.A masked encapsulation might be relevant in advanced attack models to prevent single-trace message-recovery attacks.
In this section, we present the salient ideas of our masking design.Details on some selected underlying gadgets will be presented later in Section 4. Some gadgets were already

Gadget
Security Property Reference introduced in the literature but many new gadgets have been introduced to achieve our design.The complete list of gadgets is summed-up in Tables 2 and 3.

Sparse and dense notation
BIKE's private key H is a sparse polynomial (see Remark 2).For masking such polynomials, both approaches are valid: either we represent in its dense form or we keep the sparse structure and mask the indices of the non-zero coefficients instead.Since the number of non-zero coefficient is a public parameter, two approaches are potentially valid.The sparse representation intuitively seems lighter but some part (such as error generation) will require the dense form for security reasons.For completeness, we analyze both approaches: (1) an implementation where H is masked in dense form and (2) a hybrid-sparse-dense implementation where both dense and sparse forms of H are stored.
The masked private key will then be denoted by h 0 • , h 1 • when it is masked in sparse form (i.e. the indices of the non-zero coefficients are masked) and it will be denoted by h 0 , h 1 when the full polynomial is masked.The same convention is applied for other intermediate variables that can be masked in dense or sparse form.For simplicity of reading, we define masked elements by omitting the order of masking.Thus, when we define h 0 ∈ F r 2 , h 0 actually has dimension (F r 2 ) d+1 .
Let sparse_to_dense be an algorithm that converts the sparse notation into a dense notation by multiplying the sparse polynomial by a dense polynomial equal to 1.This procedure is straightforwardly d-NI.

Key generation
The masked key generation is introduced below in Algorithm 6.We use a masked version of the Fisher-Yates algorithm.It consists in drawing a vector of n random elements, where each position i contains a value between 0 and n − i.Since it is important to avoid any duplicates, we go through the array backwards and we replace the value by the index i in case of duplicates.Despite a bias in the distribution, this does not affect the security of the scheme as proved in [Sen21].This will allow us to generate our private keys h 0 and h 1 , to then compute the public key h.Provided that all the gadgets enjoy the d-NI property, their sequential combination leads to a d-NI algorithm.Thus we have the following result.
Theorem 1.The masked key generation algorithm is d − NI.
Remark 3. In practice, we tend to make this algorithm d − SNI by adding a refresh on h 0 and h 1 before returning them, allowing us to use the freshly created keys without having to renew the randomness.Also, we can see that the public key is returned in masked form.We have chosen to leave it masked because it allows a simple syndrome computation, in a practical implementation we would unmask it directly after computing it in order to transmit less data.Note that the public parameters (public keys etc.) can always be used for simulating the probes and that in BIKE 's case, the joint distribution of the public key and any set of at most d internal shares is never correlated to the secret key..

Error generation
The error generation algorithm is necessary for both encapsulation and decapsulation.Its masked version is introduced below in Algorithm 7. It consists in generating a masked error vector e 0 , e 1 .It uses two d-NI sub-gadgets: • sec + corresponds to the logical addition of two integers [Cor14].We introduce sec +partlymasked which is almost identical to sec + but where the first operation (sec & between the two masked parameters) has been modified to take an unmasked element (& between all parts of the masked value and the public one).• sec if represents a conditional branch, it outputs either the first input or the second one depending on the Boolean value of the last input.It is detailed in Algorithm 30 in Appendix A.6.
The error cannot be represented in sparse representation, as the weights of e 0 , e 1 are not constant.This would leak sensitive information.
In this algorithm, the intermediate values are used only once within d-NI gadgets, the only exception being e • i , which is refreshed (d-SNI) before its new use.We can therefore conclude with the following theorem.

Theorem 2. The error generation algorithm is d − NI.
Remark 4. In the context of error generation, we use the s seed to generate our e vector using SHAKE256 hash function (see Section 3.3) which is then processed in the same way as Fisher-Yates.Since we have defined Fisher-Yates (Algorithm 17) with random generation within it, this would require us to redefine it to take a random vector, which would complicate its understanding.This does not change the nature of the algorithm, so to avoid making it unnecessarily complicated, we call Fisher-Yates directly.

Encapsulation algorithm
All the functions used are d − NI.Since the only variable that has been reused is the seed m and the generated error e, we have to refresh them.We can conclude that the algorithm is itself d − NI.

Decapsulation
Decapsulation consists of first decoding the ciphertext and secondly checking that it is correct.While the most challenging masking work was on the decoding algorithm, we propose below a fully masked version of the decapsulation for completeness in Algorithm 10.

BGF decoder
We now describe the most important part of the decapsulation: the masked BGF decoder.The unmasked version of the BGF decoder has been presented in Section 2.4.The masked version of Algorithm 5 is detailed in Algorithm 9. Recall that all the sub-gadgets are detailed in Section 4 and Appendix A (see Tables 2 and 3).
Where SecThreshold and SecSyndrome are fairly simple gadgets (based on additions, multiplications or shifts), SecCounter was quite challenging to mask as it relies on bitslicing.The SecGreyZone optimization allows for a much performant decoder but it also adds a layer of complexity in the decoding.This complexity is also transferred when masking is involved as several sensitive data are used inside the computations.
We denote by vector_zero_masking a subroutine that initializes a d sharing of an r-dimensional zero vector.Let C be a pair of matrices represented as table of dimension 2 × r × (| w 2 | + 1), that can be decomposed into two matrices C 0 and Algorithm 9 BGF decoder Ensure: e 0 ∈ F r 2 , e 1 ∈ F r 2 such that (c 0 + e 0 ) • h 0 = 0 1: e 0 ← vector_zero_masking() 2: e 1 ← vector_zero_masking() 3: s ← SecKaratsuba( c 0 , h 0 ) ▷ Algorithm 13 4: h 0 ← refresh( h 0 ) 5: for i ← 0 to Nbr_Iter −1 do 6: s1 ← SecSyndrome( h 0 , h 1 , e 0 , e 1 , s ) ▷ Algorithm 16 s ← refresh( s ) 16:  Proof.We represent the whole decoding algorithm in Figs. 1 and 2. To avoid complex graphs, the content of an iteration for i ̸ = 0 can be proved separately (if i ̸ = 0, there is no application of the SecGreyZone algorithm, in Lines 14 to 16).Let us first look at one iteration with i ̸ = 0. Let us assume that it is a gadget with inputs e 0 , e 1 , sk and s .And we assume that this iteration's output is a modified version of e 0 , e 1 .Let us assume that an attacker has access to δ ≤ d observations on this sub-gadget.Thus, we want to prove that all these δ observations can be perfectly simulated with at most δ shares of sk , s , e 0 and e 1 .To fix notations, let us consider the following distribution of the attacker's δ observations: • δ 6 on Lines 12 and 13, • δ 5 during the SecCounter computation, • δ 4 during the SecThreshold computation, • δ 3 during the SecSyndrome computation, • δ 2 when s1 si refreshed, • δ 1 when e 0 and e 1 are refreshed.3. The proofs will be provided later in the paper.Finally, all the observations performed during this iteration can be perfectly simulated with at most 6 j=1 δ i shares of e 0 , the same amount for e 1 , δ 6 + δ 5 shares of h 0

By definition of the d-probing model, we have
• , the same for j=2 δ i shares of h 0 and finally the same for h 1 .In the end, we have proved that all the probes can be perfectly simulated with at most δ ≤ d shares of sk , s , e 0 and e 1 .Now let us analyze the complete construction in Fig. 2. The same reasoning applies.Let us assume that an attacker has access to δ ≤ d observations on this algorithm.We consider the following distribution of the attacker's δ observations: • δ iter,i on each i − th iteration, • δ SecGreyZone on the SecGreyZone computation, • δ ref,i on the i − th refresh of the secret key and the syndrome, • δ vector_zero_masking on the vector_zero_masking computation, • δ SecKaratsuba on the computation of the syndrome, • δ ref on the very first refresh.

By definition,
All the gadgets are proved d-NI and the refresh gadgets are d-SNI.All the probes performed after the first iteration (including the grey zone, the key refresh and the other following iterations), can be perfectly simulated with at most N br_Iter−1 i=0 (δ iter,i + δ ref,i ) + δ SecGreyZone shares of sk , s , e 0 and e 1 .Next, we use the probing security of the refresh, SecKaratsuba and vector_zero_masking.All the probes performed during the full decoding algorithm can be perfectly simulated with at most N br_Iter−1 i=0 (δ iter,i + δ ref,i ) + δ SecGreyZone +δ SecKaratsuba +δ ref shares of c 0 , the same for h 0 and N br_Iter−1 i=0 (δ iter,i + δ ref,i )+ δ SecGreyZone for the rest of the secret key.All these numbers are smaller than to δ ≤ d which concludes the proof.

Decapsulation algorithm
For the needs of the decapsulation algorithm, we will introduce subvector function, an algorithm which returns the subvector starting and ending with the bounds given as parameters.

Theorem 5. The decapsulation algorithm is d − NI.
For reasons similar to encapsulation, we may want to make this algorithm d − SNI if we wish to reuse the private key several times.However, it becomes less relevant if BIKE is used with an ephemeral key.

Details on selected gadgets
In this section, we provide some details about selected gadgets.

Bitslicing
Bitslicing was introduced in [Cho16] for the QcBits implementation, with many similarities to BIKE.These techniques allow computations to be performed very efficiently and in constant time by focusing on the binary representation.In Algorithms 11 and 12, we present two versions of this BitSlice procedure depending on the type of the input.Both versions will be used in our implementation.
In Algorithms 11 and 12, we denote by SecHalf_Adder the procedure that computes the addition in Z of the inputs while outputing the carry as a second output.The SecAdder performs the same operation but is given an extra carry.These simple gadgets are detailed and proved d-NI in Appendix A.1 for completeness.We also denote by zero_masking an initialization of a d-sharing of zero.

Multiplications
Several multiplication algorithms are necessary for masking BIKE.Indeed, as opposed to many other masked designs, the multiplication often takes two masked inputs instead of only one.In addition, the underlying F 2 structure makes NTT-based multiplications irrelevant in BIKE's context.Thus, one valid solution is to fully mask the classical Karatsuba algorithm, as presented below.We denote by SecPolymul the naive schoolbook polynomial multiplication (detailed in Algorithm 28 in Appendix A.3 for completeness).Let B be a parameter denoting the recursion depth.It is fixed experimentally to allow performance optimization.In our experiments, we have fixed B = 64.We also set a parameter s ∈ N as a power of two corresponding to the size of the inputs.Let split be a subroutine that splits the s/2 high order and s/2 low order bits into two variables.
In parallel, we also introduce a multiplication algorithm that takes only one masked input, the other input being a public value, as this algorithm is also necessary for our design.We denote it SecMult partlymasked and its design is detailed in Appendix A.3.It is directly inspired from the Montgomery ladder technique.

Leveraging sparse polynomials
In BIKE, it is often possible to leverage the fact that some masked polynomials are stored in sparse notation.We then introduce an extra gadget that takes one masked dense input and one masked sparse input.The multiplication technique uses a cyclic shift, denoted sec ≫ .The idea is to shift a masked dense polynomial by a masked value.It is described and proved in Appendix A.5.

Hamming weight
We introduce a masked Hamming weight computation.It has been optimized and involves the masked bitslice algorithm presented in Algorithm 12. Similarly to Karatsuba, we denote by right and left the cut in length of the matrix.For example, if T ∈ F l×k 2 , right( T ) and left( T ) ∈ F l 2 ×k 2 .T is a matrix that starts with one row, and will gain one more row per loop turn (call to bitslice).So we initialize T 0 as a vector, then at each iteration, T will gain a row.In this part we will introduce the main gadgets necessary for the realization of masked BIKE.

Generation of random polynomials
The generation of sparse polynomials is performed using the Fisher-Yates technique.It was already introduced in Section 3.2.This procedure can be masked as presented in Algorithm 17.It uses sec rand , presented in Appendix A.4.
Algorithm 17 Fisher-Yates (SecFisherYates) Require: s ∈ N, n ∈ N Ensure: r ∈ Z s n a randomly generated vector without repeated values 1: for i ← s − 1 to 0 do 2: Initialize i as a Boolean sharing of i 4: end for 12: end for 13: return r Theorem 11.The Fisher-Yates algorithm is d − NI.
Proof.The Fisher-Yates algorithm involves many dependency loops.Indeed, each random r i is compared to all the previously derived ones.However, each value is refreshed before being used.Thus, the loop in lines 6 to 10 can be seen itself as a d-SNI gadget outputting r i .Besides, the operations in lines 2 to 4 are d-NI.Hence, the outer loop can be seen as a sequential combination of NI gadgets and a d-SNI gadget for lines 6 to 10.In consequence, the algorithm is d − NI.

Masked polynomial inversion
A masked polynomial inversion is needed for inverting h 0 inside the key generation.The masked polynomial inversion is presented in Algorithm 18.
We note sec pow a d-NI gadget allowing to raise a polynomial to the given (known) power.Since we only perform elevations of powers of 2, it boils down to permutations as the underlying ring is F 2 .

Algorithm 18 SecInversion
if the (i + 1) th bit of n − 2 is 1 then end if 12: end for 13: y ← sec pow ( y , 2) 14: return y Theorem 12.The masked inversion algorithm is d − NI.
Proof.The first iteration of the algorithm is presented in Fig. 3.One can graphically conclude that each iteration is d-NI as all the observations can be simulated with at most d shares of ( f , y ).Thus, the full loop is d-NI.In addition, the final operation is d-NI.And, since both f and y are initialized with the same input x , one of them should be refreshed to end up with a full d-NI gadget.

Threshold and counters
The decoder needs the computation of a threshold and counters, as presented in Algorithms 19 and 20.The threshold is an integer value that needs to be recomputed several times during decoding.Initially, the calculation of the threshold is done with floats, which is a concern for the masking.We have therefore reduced this to simple operations on integers such as threshold is equal to max(⌊ T0•S+T1 2 T 2 ⌋, T 3 ).The procedure to mask it involves gadgets previously introduced apart from sec max , a gadget that, given two masked values, computes the greatest.The sec max gadget is detailed in Algorithm 31 in Appendix A.7.

Algorithm 19 SecThreshold
Require: s ∈ F r 2 Ensure: T ∈ Z r the threshold calculated from the syndrome 1: S ← sec hw ( s ) ▷ Algorithm 15 2: T ← sec T ( S ) ▷ Algorithm 20 3: return T Algorithm 20 T computing (sec T ) Require: S ∈ Z r , T 0 , T 1 , T 2 , T 3 fixed parameters of the scheme Ensure: the matrix containing the binary representations of the counters of each coefficient.We manipulate this matrix as two double dimensional matrices, C 0 and C 1 .Let matrix_zero_masking be the initialization of a d-sharing of a 2-dimensional zero matrix.This algorithm uses a gadget that consists in filling a matrix with a value.This technical gadget does not present any difficulties and is detailed in Algorithm 32 in Appendix A.8. Theorem 14.The counter computing algorithm is d − NI.
Proof.The procedure in lines 7 to 9 of Algorithm 21 is depicted in Fig. 4. One can see that all the loops are broken with a d-SNI refresh gadget.Thus lines 7 to 9 can be seen as a d-NI gadget.

Grey Zone
The grey zone is an additional iteration of the decoder that is only realized at the first loop of the decoder.We will carry out the same operations as the classic decoder, but with an additional step with another threshold in order to detect more false positions and to be able to catch the possible errors of some ambiguous positions.
It takes as input the black zone T 0 , which is the matrix containing the counters minus the threshold.With, we can calculate the grey zone T 1 , which contains the counters minus the threshold plus τ .
Theorem 15.The grey zone algorithm is d − NI.
Proof.We define a particular gadget called "block" for lines 11 to 17.
Ensure: e 0 ∈ F r 2 , e 1 ∈ F r 2 1: Initialize τ as a Boolean sharing of 3 ▷ 3 is a fixed parameter 2: V ← sec fill ( τ ) ▷ Algorithm 32 ▷ Coefficient-wise XOR 9: for l ← 0 to 1 do 10: sk ← refresh( sk ) e 0 , e 1 ← refresh( e 0 , e 1 ) 16: e 0 ← e 0 ⊕ v 0 ▷ Coefficient-wise XOR The overall details of the dependencies are presented in Figs. 5 and 6.As illustrated in Fig. 5, the block gadget is d-NI.Indeed, the only dependency loop is broken by a d-SNI refresh algorithm.Let us consider the full algorithm.Let us assume that an attacker has access to δ ≤ d observations on this gadget.Then, we want to prove that all these δ observations can be perfectly simulated with at most δ shares of sk , T 0 , e 0 , e 1 , s and V (note that the last one can be omitted as it is derived from a public parameter).To fix notations, let us consider the following distribution of the attacker's δ observations: • δ 1 during the bitslice of Line 3 • δ 2 during the refreshing of V • δ 3 during the bitslice of Line 5 • δ 4 during the refresh of T 0 (splitted in two sub-gadgets in the figure ) • δ 5 during the ⊕ in Line 8, • δ 6 during the ⊕ in Line 7, • δ 7 during the refreshing of sk and s, • δ 8 in the block with ℓ = 0, • δ 9 in the block with ℓ = 1 By definition of the d-probing model, we have 9 j=1 δ i ≤ δ ≤ d.All the gadgets are proved d-NI and the refresh gadgets are d-SNI.We skip the progressive part of the proof and directly claim that all the observations that are made during the execution of the gadget can be perfectly simulated with • δ 9 + δ 8 shares of e 0 and e 1 (each).
This can be verified with the help of the figure.All these numbers of shares are smaller to δ ≤ d which concludes the proof.

Implementation
All the gadgets introduced in this paper have been implemented in large and complete C-code.Side-channel attacks are highly dependent on the chip on which the algorithm is executed and it is true that assembly codes are always the best practical solution.However, C-code seems the best option to provide a multi-platform proof of concept.This code could be reused for future analysis and optimizations.The full code will be publicly available for code-checking and reproducibility.You can find it in the supplementary archive.

Sparse vs dense representation
Since most of the computations are polynomial operations performed on sparse objects, let us recall that we had two available options: the fully-dense implementation and the hybrid-sparse-dense one.In the first case, we see the polynomials as dense (with a conversion of the keys during the SecKeyGen) and we use Karatsuba for the majority of the calculations.In the other case, since we can represent a number of polynomials in sparse representation, we use SecMult sparsedense as much as possible.As presented in Fig. 7, our benchmarks show that while both approaches seem equivalent for one or two shares, a fully dense approach is indeed more relevant for higher orders.
One can conclude from our work that for the moment (except with potentially upcoming new optimizations), the dense representation seems more relevant.We will therefore keep the dense representation for the rest of the benchmarks, as it scales better when the order exceeds or equals 2.

Detailed benchmarks
The code was benchmarked on an i7-4710MQ running at 2.5Ghz, 8GB of RAM, and compiling with gcc 12.2.0-O3 flag.The given performances are obtained for NIST security level 1 (r = 12323).Identical experiments can provide data for the other security levels, although according to our tests the scaling is the same.Multiple benchmarks were performed and the results are listed in Table 4.We can notice that the performance of the gadgets depends on the performance of the multiplicative gadgets.Bottlenecks The sparse-dense multiplication seems to be the biggest bottleneck of our implementation.An optimization of this gadget could lead to big improvements of the complete scheme's performance.There is also room for optimization in Karatsuba, which, although its scaling looks good, is called a large number of times in most BIKE sections.One idea to improve these gadgets could be to optimize the last recursion calculation of Karatsuba.In the unmasked implementation, specific instructions are used, while in our masked implementation, only a naive multiplication is applied.The problem is that most known optimized techniques require arithmetic operations, thus, a masked form would require a mask conversion.Given the complexity of such conversions, this approach may end up to be equivalent to our original naive technique.In the end, future work is still necessary to innovate and find new optimizations on this instruction.
Similarly, the cyclic shift is performed here directly, while the reference implementation stores the polynomials in duplicate (contiguously) and just has to change its "window" to perform the shift.We could not see any way to keep this advantage in a masked form.This also explains why there is such a difference in performance between the reference implementation and this implementation when the order equals 0.

General performances for masked BIKE (fully-dense)
The performances and scaling for the scheme are detailed in Table 5 and Fig. 8. Remark 7. RNG off refers to returning 0 instead of drawing a random integer.This allows to measure the cost of the number of calls to the RNG, relative to the performance of the implementation.
We can see that the performance of masked BIKE as a function of the order is slightly above quadratic.This unoptimized implementation is still encouraging as it leaves the door open for many possible scaling improvements.In fact, there are still a lot of possible optimizations, in particular on the cyclic shift and on the naive polynomial multiplication.Once optimized, the scaling will probably be improved, especially since there is no boolean arithmetic conversion within the masked scheme.6 Future Work TVLA The next step would be to use TVLA verification techniques on our code to check that there are no apparent leaks.
More optimizations It is possible to highly optimize the performance of our implementation by simply optimizing two important basic gadgets: the naive multiplication (in the last level of the Karatsuba recursion) and the cyclic shift.As outlined above, these gadgets are the bottleneck of our implementation.Thus, the impact on the performance to be very high.The relevance of avoiding mask conversions may also be questioned if such conversions help to gain orders of magnitude in the performance; even though we do not currently believe that conversions would significantly help here.In addition, we think that further optimization could impact the difference between the sparse version and the dense version.
High-order attacks Attacking unprotected implementations with side-channel measurements is often not the best choice to evaluate practical security.But, until now, no masked implementation of BIKE and other code-based schemes were available.This masked implementation is openly accessible and can serve as target for elaborate high-order side-channel attacks.
integer between 0 and n − 1 with the same distribution as a modulo without performing any division other than with a power of 2. We will only need the gadgets already introduced (masked multiplication see Algorithm 26) and the shift, which is a linear operation.
Remark 8.It is assumed that the bits can be drawn safely, since the p bits can be drawn on each of the shares of the shared value.In the context of an implementation, the choice of algorithm for effectively drawing these bits is up to the developer.

A.5 Cyclic shift
This is a masked version of the barrel shifter algorithm.We define SecCyclic_Shift the function that allows to shift a masked polynomial with a public value.As it is only a linear operation, it is safe and not a concern.v ← refresh( v ) 6: end for 7: return X Since we just initialize X with v binary, we just refresh v to avoid to get same mask in two different lines.We then get the following theorem.

A.8 Filling a matrix in masked form
Theorem 25.The fill algorithm is d − NI.

6
j=1 δ i ≤ δ ≤ d.Since Lines 12 and 13 are F 2 -linear operations performed share by share, this computation verifies the d-NI property.In addition, all the gadgets are either d − NI or d − SNI as specified in Table

2
the result of the bitsliced addition between X and y 1: for i := 0 to ℓ − 1 do 2: r := y i 3:for j := 0 to k − 1 do 4:( X ij , r ) ← SecHalf_Adder( X ij , r ) ) Since both SecHalf_Adder and SecAdder are d-NI and all loop iterations use different or updated variables, their sequential combination leads to a d-NI algorithm.Hence the following theorem.Theorem 6.The SecHalf_Bitslice and SecBitslice algorithms are d − NI.
12: t1 ← left1 ⊕ right1 ▷ Coefficient-wise XOR 13: t2 ← left2 ⊕ right2 ▷ Coefficient-wise XOR 14: z3 ← SecKaratsuba( t1 , t2 ) 15: return z ← z1 ⊕( z2 ≪ s/4)⊕( z3 ≪ s/2) ▷ Coefficient-wise Theorem 7. The Karatsuba algorithm is d − NI for any power of two s and any bound B ≤ s.Proof.Let us prove this theorem by induction on the parameter s.If s ≤ B, the d − NI property is directly inherited from the d − NI property of SecPolymul (Theorem 21 in Appendix A.3).Let us assume that the Karatsuba algorithm is d − NI for s > B and let us sketch a proof that is it d − NI for the next power of two: 2 • s.The algorithm first computes z 1 , z 2 with d − NI gadgets.Then, the dependencies are broken by the d-SNI refresh before computing z 3 .Finally, the recombination of z 1 , z 2 and z 3 uses only coefficient-wise F 2 -linear operations.Thus, Karatsuba algorithm is d − NI for 2 • s which concludes the proof.
wise XOR 5: end for 6: return z Theorem 8.The SecMult sparsedense algorithm is d − NI.Proof.Since the gadgets sec ≫ and ⊕ are d-NI.And even if x is reused in each loop, x is refreshed (d-SNI).

Algorithm 15 to 1 step i 2 do 4 :T
Hamming weight (sec hw ) Require: x ∈ F n 2 Ensure: y ∈ F 2 the hamming weight of x 1: T 0 ← x ▷ We initialize the first line of the T matrix with x vector 2: j ← 1 3: for i ← n 2 ← SecBitslice(left( T ), right( T )) ▷ Cut in length 5: j ← j + 1 6: end for 7: y ← zero_masking() 8: for i ← 0 to j − 1 do 9: y ← y ⊕ ( T 0,i ≪ i) 10: end for 11: return y Theorem 9.The hamming weight algorithm is d − NI.Proof.Since as SecBitslice has been proved d-NI in Theorem 6 and all loops use updated variables, their composition leads to a d-NI algorithm.
wise XOR 4: return s1 Since different variables are used in each of the function calls (all d − NI), we get the following theorem.Theorem 10.The syndrome computing algorithm is d − NI.

Figure 3 :
Figure 3: Sub-structure of the polynomial inversion algorithm 31 5: return T Since we perform a sequence of operations that are d-NI themselves, we can establish the following theorem.Theorem 13.The computation of the threshold is d − NI.During decoding, it is necessary to compute the number of unsatisfied parity check equations.We present in Algorithm 21 a masked version of this routine.Let denote by C ∈ F 2×r×(| w 2 |+1) 2

Figure 4 :
Figure 4: Sub-structure of the counter algorithm The rest of the algorithm is a sequence of d-NI gadgets (SecMult partlymasked , sec fill , SecBitslice), thus the full algorith is d − NI.

Figure 5 :
Figure 5: Structure of one block

Figure 6 :
Figure 6: Structure of the grey zone gadget

Figure 8 :
Figure 8: The scaling of masked BIKE (with RNG on)

Algorithm 27
Modular random number (sec rand ) Require: n ∈ N * , p ∈ N * , 2 p ≥ n Ensure: 2 p ▷ Draw p bits on each share 2: r ← SecMult partlymasked ( r , n) 3: r ← r ≫ p ▷ Shift on each share 4: return r Theorem 20.The modular random number Algorithm 27 is d-NI.Proof.Since p and n are public values, we do not need to mask them.Since it operates on each share individually, the shift operation is d-NI.SecMult partlymasked is d-NI, by the previous proof.Finally, the random draw is also d-NI since it operates on each share.The algorithm is d-NI.Algorithm 28 SecPolymul: Naive Polynomial multiplication (parameterized by B, the size of its inputs) Require: x ∈ F B 2 , y ∈ F B 2 Ensure: z = x • y ∈ F 2B 2 1: for i ← 0 to B − 1 do 2: for j ← 0 to B − 1 do 3: u ← sec & ( x i , y j ) 4: z (i+j) ← z (i+j) ⊕ u Since we only use a SNI gadget and we update the z vector on the other hand, the algorithm is d − NI.Theorem 21.The polynomial multiplication SecPolymul parametered with B algorithm is d − NI.

Theorem 22 .
The secure cyclic shift algorithm is d − NI.

Algorithm 32
Fill matrix (sec fill ) Require: v ∈ Z n Ensure: X ∈ F k×(|n|+1) 2 a matrix filled with the binary representation of v 1: for i ← 0 to k − 1 do 2:for j ← 0 to |n| do 3:X i,j ← v [j] ).A gadget is a probabilistic algorithm that takes shared and unshared inputs values and returns shared and un-shared values.These new security properties open the door for securely composing gadgets.An algorithm is d-NI provided all its gadgets are d-NI, and all variables are used at most once as argument of a gadget call other than refresh.Moreover the algorithm is d-SNI if it is d-NI and one of the following holds:

Table 2 :
Security properties of the known gadgets.

Table 3 :
Security properties of the introduced gadgets.
IND-CCA masked implementationThe IND-CCA security of the scheme is achieved thanks to the Fujisaki-Okamoto transformation.This transformation consists in XORing the seed used to generate the secret with the hashed secret.This will allow, during the decryption, to recover the seed and thus to check if the secret has been honestly generated.This transformation prevents active chosen ciphertext attack.In BIKE [ABB + 22], the K, L and H hash functions (see Algorithm 3) are instantiated with SHAKE256 and SHA384.These functions have already been protected in the masked implementation of Saber (see [DKR + 20] for more information about Saber) in [KDVB + 22].This framework is easily adaptable for BIKE without major modification.Masking is done in a similar way, keeping the same masking order.