Understanding binary-Goppa decoding

. This paper reviews, from bottom to top, a polynomial-time algorithm to correct t errors in classical binary Goppa codes deﬁned by squarefree degree-t polynomials. The proof is factored through a proof of a simple Reed–Solomon decoder, and the algorithm is simpler than Patterson’s algorithm. All algorithm layers are expressed as Sage scripts. The paper also covers the use of decoding inside the Classic McEliece cryptosystem, including reliable recognition of valid inputs.


Introduction
This paper is aimed at a reader who • is interested in how ciphertexts are decrypted in the McEliece cryptosystem, • has arrived at a mysterious-sounding "Goppa decoding" subroutine, and • wants to understand how this works without taking a coding-theory course.
A busy reader can jump straight to Algorithm 6.2 and Theorem 6. 4 for a concise answer, highlighting the main mathematical objects inside the decoding process.
In more detail: The cryptosystem uses a large family of subspaces of the vector space F n 2 , namely "classical binary Goppa codes" defined by squarefree degree-t polynomials.This paper reviews a simple polynomial-time "t-error-correction" algorithm for these codes: an algorithm that recovers a vector c in a specified subspace given a vector that agrees with c on at least n−t positions.Components of the algorithm are introduced in a bottom-up order: Sections 3, 4, 5, and 6 present, respectively, "interpolation", finding "approximants", interpolation with errors ("Reed-Solomon decoding"), and Goppa decoding.
1.1.Hasn't this been done already?Goppa codes are more than 50 years old.There are many descriptions of Goppa decoders in the literature.Self-contained This work was funded by the Intel Crypto Frontiers Research Center; by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) as part of the Excellence Strategy of the German Federal and State Governments-EXC 2092 CASA-390781972 "Cyber Security in the Age of Large-Scale Adversaries"; by the U.S. National Science Foundation under grant 2037867; and by the Cisco University Research Program."Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation" (or other funding agencies).Permanent ID of this document: 8561e73dab75d01a6dd6bf542594ddac03cdbe6e.Date: 2022.08.16. descriptions appear in, e.g., van Tilborg's coding-theory textbook [77, Section 4.5, "A decoding algorithm"], a Preneel-Bosselaers-Govaerts-Vandewalle paper on a software implementation of the McEliece cryptosystem [67, Section 5.3], a Ghosh-Verbauwhede paper on a constant-time hardware implementation of the cryptosystem [42,Algorithm 3], and the Overbeck-Sendrier survey of code-based cryptography [61, pages 139-140].
All of these sources-and many more-are describing an algorithm introduced by Patterson [64, Section V] to correct t errors for binary Goppa codes defined by squarefree degree-t polynomials.McEliece's paper introducing the McEliece cryptosystem [56] had also pointed to Patterson's algorithm.
However, Patterson's algorithm isn't the simplest fast binary-Goppa decoder.A side issue here is that there are tradeoffs between simplicity and the number of errors corrected (which in turn influences the required McEliece key size), as the following variations illustrate: also generalizes from F 2 to F q ).For a broader audience, one can reduce to the previous sentence by saying "Take the following course on coding theory".But it's more efficient for the audience to take a minicourse focusing on this type of decoder-and I haven't found any such minicourse in the literature.
To summarize, this paper is a general-audience introduction to a simple t-error decoder for binary Goppa codes defined by squarefree degree-t polynomials, with the proof factored through a proof of a t-error Reed-Solomon decoder.
Each algorithm layer is presented as a script in the Sage [76] mathematics system rather than as pseudocode.The scripts use Sage's built-in support for fields, matrices, and polynomials.The scripts do not use Sage's functions for interpolation (lagrange_polynomial), the Berlekamp-Massey algorithm, etc.
As context, Section 8 explains how the Classic McEliece cryptosystem uses a Goppa decoder.In this context, it is important to reliably recognize invalid ciphertexts.Most descriptions of decoders in the literature simply assume that the input vector has at most t errors, but for cryptography one has to verify the input vector.This paper includes various efficient characterizations of vectors having at most t errors (Theorems 5.5, 6.5, and 7.4), and an analysis of safe options for recognizing valid ciphertexts (Sections 8.3 and 8.4).
Finally, this paper includes extensive pointers to the literature, primarily to give appropriate credit but also to point the reader to further material explaining how to turn this algorithm into today's state-of-the-art software.
1.3.Acknowledgments.Thanks to Hovav Shacham for pointing out an error in the first version of this paper; see Section 7.3.Thanks to Tanja Lange and Alex Pellegrini for their comments.

Polynomials
This section reviews the definition of the polynomial ring k[x] over a field k and the necessary properties of polynomials.
Normally r • s is abbreviated rs, and r + (−s) is abbreviated r − s.

Multiples.
Let R be a commutative ring.The notation uR, for u ∈ R, means the set {uq : q ∈ R}.The notation uR + vR, for u, v ∈ R, means the set {uq + vr : q, r ∈ R}.
2.4.Units.The notation R * means {u ∈ R : 1 ∈ uR}; i.e., u ∈ R * exactly when some v ∈ R satisfies uv = 1.The elements of R * are called the units of R.
In other words, an element of a field is a unit and if only if it is nonzero.
For example, the set {0, 1} with −, +, • defined as arithmetic modulo 2 is a field, denoted F 2 .As another example, the set Q of rational numbers with its usual 0, 1, −, +, • is a field.
2.6.Vector spaces.Let k be a field.A k-vector space is a set V with an element 0, a unary operation −, a binary operation +, and, for each 2.7.The standard n-dimensional vector space.Let n be a nonnegative integer.The set k} is a k-vector space under the following operations: 2.8.Linear maps.Let k be a field, and let V, W be k-vector spaces.A klinear map from V to W is a function from V to W preserving 0, −, +, •: i.e., a function ϕ satisfying ϕ(0 This is the universal-algebra definition of a k-linear map as a k-vector-space morphism.This is equivalent to a shorter definition that omits the conditions If n, m ∈ Z with n > m ≥ 0 then any k-linear map from k n to k m must map some nonzero input to zero. 2.9.Polynomials.Let k be a field.By definition k[x] is the set of vectors (f 0 , f 1 , . . . ) with all nonnegative integers as indices, f i ∈ k for each nonnegative integer i, and {i : If one drops the requirement that {i : f i = 0} is finite then one obtains the power-series ring k [[x]], but the reader can safely focus on k[x] for this paper.
One can also write f as the infinite sum i≥0 f i x i ; only finitely many terms here are nonzero.

2.15.
Coefficients.If f = (f 0 , f 1 , . . . ) ∈ k[x] and i ∈ Z then the coefficient of x i in f means the entry f i for i ≥ 0, or 0 for i < 0. (The case i < 0 arises in the proof of Theorem 4.1 if t > deg A.) One conventionally hides the formal definition of a polynomial as a vector: rather than constructing a polynomial f as (f 0 , f 1 , . . . ) and referring to f i as the entry at position i in f , one constructs f as i f i x i and refers to f i as the coefficient of x i in f .
, where f , g , (f g) are the derivatives of f, g, f g respectively; this is the product rule.

Shifts
is a ring morphism.This map preserves degrees: deg f (x + α) = deg f .This map also preserves derivatives: the derivative of f (x + α) is f (x + α), where f is the derivative of f .One can unify evaluation and shifts into a more general evaluation operation, but this generality is not necessary for this paper.
2.23.Quotients and remainders.If f, g ∈ k[x] and g = 0 then there are unique q, r ∈ k[x] such that f = gq + r and deg r < deg g.If r = 0 then the notation f /g means q.
2.24.Unique factorization.The ring k[x] is a unique-factorization domain.In particular, if f ∈ k[x] has roots α 1 , . . ., α n ∈ k, and α 1 , . . ., α n are distinct, then . This is called the greatest common divisor of f and g, written gcd{f, g}.One has f, g ∈ dk[x] and k . Equivalently, f is not divisible by the square of any irreducible element of k[x].Equivalently, gcd{f, f } = 1 where f is the derivative of f .

Interpolation
This section explains how to recover a polynomial f ∈ k[x] with deg f < n, given (f (α 1 ), . . ., f (α n )).Here α 1 , . . ., α n are distinct elements of k.See Section 5 for a generalization that handles as many as t errors in the input vector, at the expense of requiring deg f < n − 2t.
The formula for ϕ in Theorem 3.1 is usually called the "Lagrange interpolation formula".However, Waring [78] published the same formula earlier.
Theorem 3.1 (direct interpolation).Let n be a nonnegative integer.Let k be a field.Let α 1 , . . ., α n be distinct elements of k.Let r 1 , . . ., r n be elements of Proof.By construction ϕ is a sum of n terms, each term having degree at most n − 1 (more precisely, degree n − 1 if r i = 0, otherwise degree −∞), and hence has degree at most n − 1.
roots, but it visibly has the distinct roots α 1 , . . ., α n , contradiction.and then takes f = r 1 + (x − α 1 )g.This method is more complicated than direct interpolation to express as a concise formula but also costs Θ(n 2 ).
There is an extensive literature on algorithms using n 1+o(1) operations in k, not just for interpolation but also for multiplication (multiplying 0≤i<n f i x i by 0≤i<n g i x i ), division, and other basic operations.See generally [6].
One particularly fast case is interpolating f from its values at every point in a finite field k, using various types of "fast Fourier transforms".A difficulty here is that each of these transforms uses a standard order of points, while α 1 , . . ., α n are in a secret order inside the McEliece cryptosystem.There are algorithms to apply a secret permutation without using secret array indices; see generally [10].It's useful to vary the weights put on the two vector components: let t be a nonnegative integer, and consider the lattice k

Approximants
The point of this section is to find, inside this lattice, a minimum-length nonzero vector (aB − bA, ax deg A−2t−1 ).
(If 2t ≥ deg A then there's a denominator here.One can manually track weights of polynomials to avoid ever having to consider denominators; this is how the theorems below are phrased.One can instead allow denominators, dropping the requirement of staying inside k[x] 2 .Alternatively, one can clear denominators by considering the lattice k Or one can simply prohibit this case; such large values of t aren't of interest for the application to decoding.)Theorem 4.1 says that one can arrange for both aB − bA and ax deg A−2t−1 to have degree at most deg A − t − 1.This also forces b to have degree below t. (Otherwise ) One can also take a, b to be coprime; then, by Theorem 4.2, any lattice vector of degree at most deg A−t−1 must be a multiple of this particular vector.
Why take a minus sign on b?Why multiply a by B and b by A, rather than a by A and b by B? Answer: small aB − bA means that the rational function b/a is close to B/A.This rational function b/a has small height, meaning that its numerator and denominator are small.The perspective of small-height rational approximations has played an important role in the development of fast algorithms in this area.Proof.Define n = deg A. Consider the following k-linear map from k 2t+1 to k 2t : the input is a vector (a 0 , a 1 , . . ., a t−1 , a t , b 0 , b 1 , . . ., b t−1 ); the output entries are the coefficients of The input dimension 2t + 1 exceeds the output dimension 2t, so there is a nonzero input that maps to zero.Theorem 4.2 (the best-approximation property of approximants).Let t be a nonnegative integer.Let k be a field.Let A, B, a, b, c, d be elements of k  More sophisticated extended-gcd algorithms use t 1+o(1) operations.See [6, Section 21].Applying a sequence of 2t "divsteps", taking n = 2t in [16, Theorems A.1 and A.2], uses t 1+o(1) operations with the "jump" algorithms in [16] while avoiding the timing variability of polynomial division.4.6.Approximants as ratios.With the following definition, the conclusion of Theorem 4.1 is that there is an approximant to B/A at degree t.This definition would also slightly compress the statement of Theorem 4.2 and the statements of some theorems later in this paper.For the benefit of a reader looking at just one theorem, this paper avoids using this definition in theorem statements, but readers exploring the literature may find this definition useful.Analogous comments apply to, e.g., Definition 5.8 below.For simplicity the theorems in this section were stated specifically for A, B ∈ k[x], but the concepts and proofs do not require this.This paper does not define k((x −1 )), but instead notes that k((x −1 )) contains the field k(x) of rational functions in x, and that k(x) in turn contains the polynomial ring k[x], so readers not familiar with k((x −1 )) can substitute k[x] for k((x −1 )) in the definition.

Equivalently (in the case
The condition deg(aB this is why it is safe to describe the input as B/A rather than (A, B).As for the output, knowing the ratio b/a and knowing gcd{a, b} = 1 does not exactly determine the pair (a, b), but the only ambiguity is that one can replace (a, b) by (λa, λb) for λ ∈ k * ; this replacement does not affect the conditions on deg a, deg b, and deg(aB − bA).
4.8.History.Euclid's subtractive algorithm [36, Book VII, Propositions 1-2; translation: "the less of the numbers AB, CD being continually subtracted from the greater"] recognizes coprime integers, and, more generally, computes the gcd of two integers.
What is typically called Euclid's algorithm-see [50, Section 4.5.2,text before Algorithm E] for an argument that this must be what Euclid had in mind-is a variant that iterates (A, B) → (B, A mod B).This is much faster than the original algorithm when A/B is large.This version also has a polynomial analogue: Stevin [71, page 241 of original, page 123 of cited PDF] computed polynomial gcd by iterating (A, B) → (B, A mod B).
According to [22, page 3], an extended-gcd algorithm computing solutions to aB − bA = 1, for coprime integers A, B, is due to Aryabhata around the 6th century, and the forward recurrence relation for coefficients in the extended algorithm-in other words, numerators and denominators of convergents to a continued fraction-is due to Bhascara in the 12th century.
Lagrange [53] used convergents to continued fractions of rational functions as small-height approximations to power series.Kronecker [51, pages 118-119 of cited PDF] gave both the continued-fraction construction and ("in directer Weise") the linear-algebra construction.Consequently, it seems reasonable to credit Theorem 4.1 to Lagrange, but the short proof to Kronecker.Small-height approximations to power series are often miscredited to [62] under the name "Padé approximants".
An earlier paper of Lagrange [52, pages 723-728 of cited URL] had described, in the integer case, an algorithm for basis reduction for rank-2 lattices-in the context of simplifying quadratic forms, rather than as a perspective on extendedgcd computations.Lagrange reduction is often miscredited to [41] under the name "Gauss reduction".
In coding theory, finding an approximant is called "solving the key equation"; the "key equation" is, by definition, the congruence d−aB ∈ Ak[x] where deg a ≤ t and deg d < deg A − t.Decoding algorithms are typically factored through this concept, and often the proofs are factored through continued-fraction facts; when the continued-fraction machinery is stripped away, those facts boil down to Theorem 4.2.For the more complicated setting of list-decoding algorithms, short vectors in arbitrary-rank lattices often appear as an abstraction layer; see, e.g., [21], [7], [32], [8], and [9].

Interpolation with errors
This section explains how to recover a polynomial f ∈ k[x] with deg f < n − 2t, given a vector that matches (f (α 1 ), . . ., f (α n )) on at least n − t positions.Here α 1 , . . ., α n are distinct elements of k.The special case t = 0 of this problem was handled in Section 3, and is used as a subroutine for handling the general case.

5.
2. An interpolation-with-errors algorithm.Algorithm 5.3 recovers f given (n, t, k, α, r), where r is a vector with wt(r 1 − f (α 1 ), . . ., r n − f (α n )) ≤ t.The algorithm has three steps: Beware that Sage's degree function is not the same as the conventional degree function for polynomials: on input 0, it returns −1 rather than −∞.This is why Algorithm 5.3 includes a separate test for aB − bA = 0. Theorem 5.4 (interpolation with errors).Let n, t be nonnegative integers.Let k be a field.Let α 1 , . . ., α n be distinct elements of k.
The conditions of Theorem 4. To see that e i = 0 exactly when a(α i ) = 0: A(α i ) = 0 so if a(α i ) = 0 then, by Bernoulli's rule, (A/a)(α i ) = A (α i )/a (α i ) = 0 where a , A are the derivatives of a, A respectively; also b(α i ) = 0 since gcd{a, b} = 1, so e Theorem 5.5 (checking interpolation with errors).Let n, t be nonnegative integers.Let k be a field.Let α 1 , . . ., α n be distinct elements of k.
The condition deg(aB − bA) < n − 2t + deg a here cannot be weakened to deg(aB − bA) < n − t.Consider, e.g., n = 3; t = 1; any field k with #k ≥ 3; α 3 respectively, and there is no polynomial f with deg f < 1 that matches more than one of those values.

More algorithms: varying the pair (A, B).
.
One can vary the choice of (A, B) while preserving the ratio B/A: e.g., one can take A = 1 and B = s≥0 x −s−1 i r i α s i / j =i (α j −α i ).Formally, this requires defining k((x −1 )); but the terms in B/A after x −2t do not matter for decoding, so one can take and wt e ≤ t with e = (r 1 − f (α 1 ), . . ., r n − f (α n )) then {i : e i = 0} = {i : a(α i ) = 0} for any such (a, b).If one assumes that F 2 ⊆ k and that e ∈ F n 2 then knowing {i : e i = 0} is enough information to reconstruct e and thereby f .To instead handle arbitrary e ∈ k n , one can use any of these variants of (A, B) to compute (a, b), and then return to the original (A, B) to apply the formula f = B − bA/a in Theorem 5.4.

Reed-Solomon codes.
The set of vectors (f (α 1 ), . . ., f (α n )) is called a Reed-Solomon code; see Definition 5.8.This is a subspace of the k-vector space k n .Each vector is called a codeword.With this terminology, Theorem 5.4 recovers a Reed-Solomon codeword from a vector that matches the codeword on at least n − t positions.One can similarly define Goppa codes in Section 6. Definition 5.8.Let n, t be nonnegative integers.Let k be a field.Let α 1 , . . ., α n be distinct elements of k.Then {(f (α 1 ), . . ., f 5.9.History.Reed-Solomon [68] suggested encoding a polynomial f ∈ k[x] with deg f < n − 2t as (f (α 1 ), . . ., f (α n )) for distinct α 1 , . . ., α n , so as to be able to recover f even if t vector entries are corrupted.The point is that the code This raises the question of how efficiently one can decode ≤t errors in C, i.e., recover (e, f ) from e + (f (α 1 ), . . ., f (α n )).
Assume n > 2t.Prange's "information-set decoding" [66] interpolates f from n − 2t values at selected positions in the input vector, checks the remaining values of f to deduce e, and, if e has the wrong weight, tries another selection of n − 2t positions.This takes polynomial time if t is close enough to 0 or n/2, but is much slower in general.Reed and Solomon did not have the idea of checking the weight of e: they had instead suggested trying many selections of n − 2t positions to find the most popular choice of f , and relying on an upper bound for how often any particular incorrect choice could appear.
Forney [38,Chapter 4] (see also [39]) introduced a polynomial-time decoding algorithm for Reed-Solomon codes.Forney's algorithm simplified and extended an algorithm by Gorenstein and Zierler [44], which handled the special case {α 1 , . . ., α n } = k * .The latter algorithm extended an algorithm by Peterson [65], which handled the following special case: F 2 ⊆ k, each f (α j ) is in F 2 , and e ∈ F n 2 .The Peterson-Gorenstein-Zierler-Forney algorithm is bottlenecked by matrix operations that, when carried out in a simple way, use n 3+o(1) operations in k, assuming n ∈ t 1+o(1) .The exponent for generic matrix operations was later reduced below 3 (starting with exponent log 2 7 for matrix multiplication by Strassen [72], along with the same exponent for solving linear equations under various nonsingularity constraints), but it turns out that one can obtain much better decoding speeds using the structure of these particular matrices.
Berlekamp [5] introduced a decoding algorithm using just n 2+o(1) operations instead of n 3+o(1) operations; the main work inside the algorithm is polynomial arithmetic rather than matrix arithmetic.Massey [55] streamlined Berlekamp's algorithm and factored the algorithm into two layers, where the top layer is a decoder and the bottom layer is a subroutine for "shift register synthesis".The subroutine is called the Berlekamp-Massey algorithm.
Sugiyama-Kasahara-Hirasawa-Namekawa [75] built an n 2+o(1) algorithm for Reed-Solomon decoding on top of an extended-gcd computation.Algorithms using just n 1+o(1) operations were already known for gcd (see [6,Section 21.6] for history) and for all other necessary subroutines; these algorithms were applied to Reed-Solomon decoding by Justesen [48] and independently Sarwate [69], reducing the costs of decoding to n 1+o(1) .
It turned out that Berlekamp decoders and Sugiyama-Kasahara-Hirasawa-Namekawa decoders are equivalent: Mills [57] pointed out that "shift register synthesis" is the same as the problem of finding approximants, the problem of finding (a, b) in Theorem 4.1.See also [79] for how the result after each polynomial division inside an extended-gcd computation appears inside in the Berlekamp-Massey algorithm; [34] for an extended-gcd explanation of all further quantities inside the Berlekamp-Massey algorithm; and [16, Appendix C] for a reformulation in terms of "divsteps".In a nutshell, the polynomials in the Berlekamp-Massey algorithm are polynomials in an extended-gcd computation but with coefficients in reverse order.
This does not mean that all Reed-Solomon decoders are the same.See, for example, Section 5.6 regarding different choices of (A, B); the choice of (A, B) in Theorem 5.4 was published by Shiozaki [70, Section III] and later Gao [40].For the problem of computing (a, b) in Theorem 4.1, algorithms in the literature have costs ranging from n 3+o(1) down through n 1+o(1) .A "systematic" Reed-Solomon code represents a polynomial f of degree below n − 2t as the values (f (α 1 ), . . ., f (α n−2t )) ∈ k n−2t rather than as the coefficients of f ; one needs to look closely at algorithms to see which representation allows faster decoding, although obviously the gap cannot be larger than the cost of converting between representations, i.e., the cost of evaluation and (error-free) interpolation.Finally, there are list-decoding algorithms that can handle more than t errors.

Binary-Goppa decoding
The title problem of this paper and of this section, binary-Goppa decoding, is to recover e, c ∈ F n 2 from e + c, assuming wt e ≤ t and i c i A/(x − α i ) ∈ gk[x].Here α 1 , . . ., α n are distinct elements of a finite field k containing F 2 ; A means i (x − α i ); and g is a squarefree degree-t element of k[x] with gcd{g, A} = 1, i.e., with g(α 1 ), . . ., g(α n ) all nonzero.This section presents an algorithm to solve this problem.

6.
1.An algorithm to decode binary Goppa codes.Algorithm 6.2 allows any r ∈ k n as an input vector, and returns the unique e ∈ F n 2 with wt e ≤ t such that i (r i − e i )A/(x − α i ) ∈ gk[x], or None if no such e exists.The algorithm has three steps: • Compute an approximant b/a to B/A at degree t as in Theorem 4.1.• Compute {i : e i = 1} as {i : a(α i ) = 0}.Theorem 6.4 says that this works.
The algorithm recognizes the None case using tests stated in Theorem 6.5: e exists exactly when A ∈ ak[x], g 2 b − a ∈ ak[x], and deg(aB − bA) < n − 2t + deg a. Theorem 6.4 (Goppa decoding).Let n, t be nonnegative integers.Let k be a finite field with F 2 ⊆ k.Let α 1 , . . ., α n be distinct elements of k.Define A = i (x − α i ).Let g be a squarefree element of k[x] such that deg g = t and gcd{g, A} = 1.Let B, a, b be elements of k[x] with gcd{a, b} = 1, deg a ≤ t, and deg(aB − bA) < n − t.Let A , a be the derivatives of A, a respectively.Let e be an element of F n 2 such that wt e ≤ t and . Finally, say a(α i ) = 0. Then a (α i ) = 0 and, by Bernoulli's rule, (A/a)(α i ) = A (α i )/a (α i ), so where A is the derivative of A.
One can replace g 2 in this theorem by any polynomial of degree 2t with no roots among α 1 , . . ., α n , but the extra generality is not useful for this paper.
Observe that e i = (g by Theorem 3.1. To see wt e = deg a: Since A splits into linear factors of the form x − α i , the same is true for a, so #{i : e i = 1} = #{i : a(α i ) = 0} = deg a.The main point of the proof of Theorem 6.4 is that the vectors c ∈ k n satisfying are exactly the vectors (β 1 f (α 1 ), . . ., β n f (α n )) where β j = g(α j ) 2 /A (α j ).Any Reed-Solomon decoder can thus be used as a Goppa decoder.Algorithm 6.2 starts from this approach but streamlines the computation of e, taking advantage of the assumption e ∈ F n 2 .The critical information coming from the Reed-Solomon decoder is the "error-locator polynomial" a, which is a nonzero constant multiple of i:e i =0 (x − α i ).Knowing the positions of nonzero entries in e immediately reveals e, since each entry of e is either 0 or 1.
Without the assumption e ∈ F n 2 , one can compute each e i in the Reed-Solomon context as (bA/a)(α i ), which is b(α i )A (α i )/a (α i ) when a(α i ) = 0.In the binary-Goppa context one multiplies by Streamlining this to e i = 1 might not seem helpful in Algorithm 6.2, since the algorithm checks g 2 b − a ∈ ak[x] anyway, and the obvious way to do this is to check g(α i ) 2 b(α i ) = a (α i ); but Section 7 shows that this check can simply be skipped when the input vector is in F n 2 .

A closer look at binary Goppa codes
The main point of this section is that if the input vector is assumed to be in F n 2 , not merely in k n , then the test g 2 b − a ∈ ak[x] can be removed from Algorithm 6.2.
7.1.Proof overview.Theorem 7.2 rewrites i c i A/(x − α i ) ∈ gk[x] as the following system of linear equations: i c i /g(α i ) = 0, i c i α i /g(α i ) = 0, and so on through i c i α t−1 i /g(α i ) = 0. Ths theorem is from Goppa [43, Section 3], and is used inside the standard method of computing McEliece keys.
An analogous calculation is at the heart of Theorem 7.4, the main theorem of this section.I haven't done the work to factor the analogy into a shared lemma.I wouldn't be surprised if Theorem 7.4, or at least part of the proof beyond Theorem 7.2, is already in the literature, but I don't know a reference.Theorem 7.2 (Goppa parity checks).Let n, t be nonnegative integers.Let k be a field.Let α 1 , . . ., α n be distinct elements of k.
If the formal structure of this paper allowed k((x −1 )) then one could replace the last paragraph of the proof with the following: deg B < n − t if and only if deg(B/A) < −t, i.e., if and only if i c i α s i /g(α i ) = 0 for all nonnegative integers s < t, since B/A = s x −s−1 i c i α s i /g(α i ) as in Section 5.6.The proof given above replaces B/A with the approximation (x t B−Q)/x t A so as to work entirely with polynomials.

7.
3. An erratum.Hovav Shacham pointed out a gap in the proof of "Theorem 7.3" in the first version of this paper.Further checks then showed that "Theorem 7.3" was incorrect.The proof of Theorem 7.4 in that version of that paper relied on "Theorem 7.3", so it also had a gap.
I had put effort into tests, in part as a reaction to the possibility of proof errors; see Appendix A. I had spent considerable computer time on searches for counterexamples to Theorem 7.4.But I hadn't done similar searches for "Theorem 7.3".On the bright side, Section 8.4 had asked what happens "if there's a mistake in the extra logic leading to Theorem 7.4" and had recommended a decoding approach that avoids relying on that theorem."Theorem 7.3" was not used except via Theorem 7.4.This version of the paper includes a replacement proof for Theorem 7.4.The statement of Theorem 7.4 is now stronger than the statement in the first version of this paper, which required g to be squarefree.Theorem 7.4 (checking Goppa decoding for received words in F n 2 ).Let n, t be nonnegative integers.Let k be a finite field with Compared to Theorem 6.5, this adds the condition that g(α i ) 2 B(α i )/A (α i ) ∈ F 2 , but removes the condition that g
Write a for the derivative of a.It suffices to show that α i is a root of g 2 b−a for each i with a(α i ) = 0. Indeed, this implies . By Theorem 6.5, wt e = deg a and as claimed.So fix j with a(α j ) = 0; this implies t ≥ 1 since t ≥ deg a. Write q = a/(x−α j ); then q(α j ) = a (α j ) by Bernoulli's rule.The rest of the proof will show that (g 2 b)(α j ) = q(α j ), so α j is a root of g 2 b − a as desired.
Define r i = (g 2 B)(α i )/A (α i ) for each i.By hypothesis r i ∈ F 2 ; i.e., r 2 i = r i .For any ρ ∈ k[x], abbreviate ρ(x + α j ) as ρ.Define D = A, and define δ i = α i − α j .Then δ 1 , . . ., δ n are distinct elements of k; D Consider any ϕ ∈ k[x] with deg ϕ < 2t.Write ϕ e for the coefficient of x e in ϕ.Then ϕ = 0≤e<2t ϕ e x e , so More specifically, consider any h ∈ k[x] with deg h ≤ t, define ϕ = hq, and define Now rewrite H as The second term has degree at most 2t − 2, so the coefficient of x 2t−1 in that term is 0. The third term is in x 2t k[x], so the coefficient of x 2t−1 in that term is also 0. The coefficient of x 2t−1 in H is thus the coefficient of x 2t−1 in x 2t−1 hb; i.e., the coefficient of x 0 in hb; i.e., (hb)(0); i.e., (hb)(α j ).

McEliece decryption
The reader is presumed to be interested specifically in Classic McEliece [12], although without much work one can also cover other versions of the McEliece cryptosystem.
8.1.Ciphertexts.In this cryptosystem, a secret vector e ∈ F n 2 with wt e = t is encoded as a shorter ciphertext H(e) ∈ F mt 2 .This function H : F n 2 → F mt 2 has three critical properties: • Linear: The function is F 2 -linear.This allows the function to be concisely communicated as a matrix, the public key.
Here k is a field with #k = 2 m , and α 1 , . . ., α n , g are as in Section 6, as usual with A = i (x − α i ).
is the identity map, where ι is the injection F mt 2 → F n 2 that simply appends n − mt zeros to the input.In other words, the first mt × mt block of the matrix is an identity matrix.Obviously the identity matrix can then be omitted from the public key, saving some space; less obviously, this reduces the cost of optimized decoding from n 2+o(1) to n 1+o(1) .
For each k, α 1 , . . ., α n , g there is at most one H satisfying these properties.One can construct this H, if it exists, by converting i c i A/(x − α i ) ∈ gk[x] into a system of F 2 -linear equations (a "parity-check matrix") using Theorem 7.2, and then row-reducing the equations to obtain systematic form.Conjecturally, this succeeds about 30% of the time.In case of failure, the traditional response is to try again with a new (α 1 , . . ., α n , g); Chou's "semi-systematic form" options (see [13]) instead apply a limited permutation to (α 1 , . . ., α n ); [3] had instead applied an arbitrary permutation to (α 1 , . . ., α n ).See [13] for step-by-step algorithms., wt e = t}.One way to handle this is as follows: • Feed σ through any decoding algorithm that works for valid inputs.More precisely, apply some function D : F mt 2 → F n 2 with the following property: all e ∈ F n 2 with wt e = t have D(H(e)) = e.
• In all cases, whatever the output e ∈ F n 2 is, check that wt e = t.If this fails, the input vector is invalid.
• "Reencrypt" to double-check validity of σ: compute H(e) and check whether H(e) = σ.If this fails, the input vector is invalid.
Handling the matrix for H in the last step incurs similar costs to encryption.Consider, e.g., [63] saying that this "necessitates the inclusion of the public key as part of the private key and increases the running time of decapsulation", although to save space one could instead take time to "regenerate the public key from the private key when needed".A more efficient approach, already noted in [12, Section 2.5] and used in the software accompanying [12], checks whether H(e) = σ "without using quadratic space", and in particular without storing or recomputing the matrix for H.The point is that the following properties are equivalent: , by the systematic-form property of H; • H(c) = 0 for c = ι(σ) − e, by linearity; , by the Goppa property of H.This last condition, checking that c = ι(σ) − e is a codeword, no longer involves H: it is simply some extra polynomial arithmetic, the same type of arithmetic that is being carried out anyway.
A third approach is to inspect the details of decoding, relying not just on Theorem 6.4 to decode valid inputs but also Theorem 6.5 to identify invalid inputs.Specifically, after interpolating B with B(α i )g(α i ) 2 /A (α i ) = ι(σ) i and finding an approximant b/a to B/A at degree t, one checks See generally the discussion of fast "syndrome" computation in [14].
A fourth approach is to interpolate, find an approximant b/a, check that deg a = t, and check that A ∈ ak[x], skipping the check that bg 2 − a ∈ ak[x].This relies on Theorem 7.4 and the fact that ι(σ) ∈ F n 2 .8.4.Robust system design.There are several reasons to recommend the second approach, the approach taken in Classic McEliece, even if it is not quite as efficient as the fourth approach.
What happens if there's a mistake in the extra logic leading to Theorem 7.4, or in the handling of invalid inputs in the software implementing a decoding algorithm?Software is normally tested on many valid inputs; this doesn't provide any assurance that invalid inputs are correctly recognized.
A separate reencryption step, whether expressed as testing H(e) = σ or more efficiently as testing that c = ι(σ) − e is a codeword, splits the decryption task into two simpler tasks.The task of decoding is to correctly handle valid inputs.The task of reencryption is to reject invalid inputs.Reencryption is redundant if the decoder also rejects invalid inputs, but having the separate reencryption step means that the requirements on the decoder are reduced.
As an illustration of the value of reencryption, consider the efficient chosenciphertext attack from Chou [31] breaking both specified versions (namely [3] and [4]) of "NTS-KEM", a McEliece variant that skipped reencryption.
Recall that Berlekamp-Massey polynomials are extended-gcd polynomials but with coefficients in reverse order.Reversing polynomials loses information if one does not attach extra information (a "formal degree") to each polynomial: for example, both 3 + x + 4x 2 and 3x + x 2 + 4x 3 have the same reversal, namely 4 + x + 3x 2 .The NTS-KEM decoding algorithms are shown in [31] to sometimes find a polynomial ax of degree t when they should instead find a polynomial a of degree t − 1.This often leaks information if the attacker modifies a ciphertext H(e) in a way that correponds to flipping one bit of e.
As further illustrations of how the decoding details matter, [31] identifies bugs (deviations from the specification) in the decoding algorithms in each of the four official NTS-KEM implementations (ref, opt, sse2, avx2); these bugs stop the attack from working against one implementation (ref), although the attack works against the other three implementations.
Reencrypting the incorrect weight-t error vector obtained from ax would have detected the mismatch with σ and would have stopped this attack.A different way to stop this attack would be to require computer verification of proofs that • decoding algorithms decode correctly, including cases of weight below t, and • decoding software correctly implements those algorithms.
Reencryption has the advantage of being easier.Verification has the advantage of also ensuring that valid ciphertexts are handled correctly.8.5.History.McEliece's original cryptosystem [56] had a different ciphertext shape: the secret message being sent was encoded as some c with H(c) = 0 (i.e., some Goppa codeword), and then transmitted as e+c for a secret e with wt e = t.Niederreiter [58] introduced the idea of sending just H(e) as a ciphertext, with e as the message.In both [56] and [58], the decoder handled matrices of similar size to the public key.
McEliece started with a generator matrix for the Goppa code, meaning a matrix with row space {c ∈ F n 2 : i c i A/(x − α i ) ∈ gk[x]}.McEliece said that this matrix "could be in canonical, for example row-reduced echelon, form".Rowreduced echelon form is easily compressed into less space than a random matrix, especially if one requires row-reduced echelon form specifically with no skipped columns, i.e., systematic form.
But McEliece didn't use this canonical matrix as the public key: McEliece used a random generator matrix.McEliece also randomly permuted the output positions; this is equivalent to randomly permuting (α 1 , . . ., α n ).
Eventually it was understood that, after permuting (α 1 , . . ., α n ), one can safely use a canonical generator matrix (or, equivalently, a canonical paritycheck matrix), such as a systematic matrix.Canteaut and Chabaud [25, page 4, note 1] said that "most of the bits of the plain-text would be revealed" by a systematic generator matrix but that using a random generator matrix "has no cryptographic function".Canteaut and Sendrier [26, pages 188-189] said that the Niederreiter variant "allows a public key in systematic form at no cost for security whereas this would reveal a part of the plaintext in McEliece system".As noted by Overbeck and Sendrier [61, page 98], the partial-plaintext problem is eliminated by various McEliece variants designed for security against chosenciphertext attacks: in these variants, the plaintext looks completely random, and the attacker is faced with the problem of finding all of the bits of the plaintext.
The fact that one can decrypt using n 1+o(1) time and space, including an optimized version of a reencryption step to check H(e) = σ, appeared in [12].This relies on systematic form • to reduce decryption of σ to decoding of ι(σ); and, symmetrically, • to reduce testing H(e) = σ to testing that ι(σ) − e is a codeword.
The first reduction had already appeared in the McEliece context in [14, Section 6], which in turn says that the choice of ι(σ) as a decoder input was recommended to the authors by Sendrier.A.7. How the tests catch various bugs.The bug in the Goppa decoder from [3] and [4] is triggered when the correct error vector e has weight t − 1 and has e z = 0 where α z = 0. Figure A.4 is intended to catch this: the tests generate uniform random sequences (α 1 , . . ., α n ) of distinct field elements, and often use weight t − 1 for the error vector e; often α z will be 0 for some z, and often e z will also be 0.
I tried modifying Algorithm 6.2 to imitate what [31] described; Figure A.4 immediately caught the bug.One could directly test the algorithms from [3] and [4] by translating the algorithms from pseudocode to real code.One could directly test the software accompanying [3] and [4] by extracting the Goppadecoding portions of that software and providing a shim layer to support the goppa_errors interface.
The extended-gcd bug in Microsoft's cryptography library was that a modularinversion algorithm continued to loop until finding gcd 1-which would always happen for inputs with modular inverses, but the attacker could provide a noninvertible input, triggering an infinite loop.In the decoding context, an extendedgcd computation is the normal way to compute approximants, and one can imagine someone • starting with an extended-gcd algorithm that computes all remainders, • augmenting the algorithm to record (a, b) for the first remainder aB − bA of degree below deg A − t, and • not optimizing away the pointless computation of subsequent remainders, so there could still be an infinite-loop bug.In these tests, because k is small, some input positions will often be 0, forcing gcd{A, B} = 1, so if there is an infinite loop for that case then the tests will trigger it.
Another easy bug to imagine in Reed-Solomon decoders and Goppa decoders is testing deg(aB − bA) against n − t rather than n − 2t + deg a, although this does not matter in an application that requires deg a = t.I checked that eight runs of

2 .
Readers familiar with integer-coefficient lattices should note that this is something different, a k[x]-lattice.The elements of this lattice have the form a(B, 1) − b(A, 0) = (aB − bA, a) for polynomials a, b ∈ k[x].The vector (aB − bA, a) is a short vector when both aB − bA and a have low degree.

Theorem 4 .
1 (approximants).Let t be a nonnegative integer.Let k be a field.Let A, B be elements of k[x] with deg A > deg B. Then there exist a, b ∈ k[x] such that gcd{a, b} = 1, deg a ≤ t, deg b < t, and deg(aB − bA) < deg A − t.
and deg(B/A − d/c) < −t − deg c, then d/c must equal b/a.To approximate B/A more closely than the fraction b/a constructed in Theorem 4.1, one must take larger-degree denominators.One way to describe the proof is as follows: if the lattice mentioned above has two independent vectors (aB − bA, ax deg A−2t−1 ), (cB − dA, cx deg A−2t−1 ) of degree at most deg A − t − 1, then the lattice determinant has degree at most 2 deg A − 2t − 2; but, by inspection, the lattice determinant is Ax deg A−2t−1 , of degree 2 deg A − 2t − 1. Combining linear dependence with gcd{a, b} = 1 forces (c, d) = (λa, λb).Proof.c(aB − bA) − a(cB − dA) = (ad − cb)A.The left side has degree smaller than deg A, so ad − cb = 0.In particular, cb ∈ ak[x]; but gcd{a, b} = 1, so c ∈ ak[x], and similarly d ∈ bk[x].Write λ for c/a if a = 0, or for d/b if b = 0; in both cases (c, d) = (aλ, bλ) as claimed.4.3.An approximant algorithm.Algorithm 4.4 computes a, b from t, k, A, B. This algorithm works in the same way as the proof of Theorem 4.1, constructing coefficients of a, b as solutions to an explicit system of 2t equations in 2t + 1 variables.Straightforward matrix algorithms use O(t 3 ) operations in k, typically Θ(t 3 ) operations.

Definition 4 . 7 .
Let k be a field.Let A, B be elements of k((x −1 )) with A = 0. Let t be a nonnegative integer.If (a, b) ∈ k[x] × k[x] satisfy gcd{a, b} = 1, deg a ≤ t, and deg(aB − bA) < deg A − t then b/a is an approximant to B/A at degree t.

Theorem 6 . 5 (
checking Goppa decoding).Let n, t be nonnegative integers.Let k be a finite field with F 2 ⊆ k.Let α 1 , . . ., α n be distinct elements of k.Define A = i (x − α i ).Let g be an element of k[x]such that deg g = t and gcd{g, A} = 1.Let B, a, b be elements of k[x] with gcd{a, b} = 1, deg a ≤ t, A ∈ ak[x], deg(aB − bA) < n − 2t + deg a, and g 2 b − a ∈ ak[x], where a is the derivative of a. Define e ∈ F n 2 by e i = [a(α i ) = 0].Then wt e = deg a and i

6. 6 .
Goppa decoders via Reed-Solomon decoders.Fix β 1 , . . ., β n ∈ k * , and consider the problem of recovering f ∈ k[x] with deg f < n−2t given a vector that agrees with (β 1 f (α 1 ), . . ., β n f (α n )) on at least n − t positions.Dividing β j out of the jth position immediately reduces this to the problem considered in Section 5.

8. 2 .
Decryption.Decryption of a ciphertext H(e) works as follows.Define c = ι(H(e)) − e ∈ F n 2 .One has H(ι(H(e))) = H(e) by the systematic-form property of H, so H(c) = 0 by linearity.One then has i c i A/(x − α i ) ∈ gk[x] by the Goppa property of H. Recovering e from H(e) is thus a simple matter of appending n − mt zeros to obtain ι(H(e)) = e + c, and then recovering e, c ∈ F n 2 from e + c as explained in Section 6.This recovery uses α 1 , . . ., α n , g, which are secrets known to the party that generated the public key.8.3.Rigidity.The cryptosystem includes defenses against chosen-ciphertext attacks.These defenses require, among other things, recognizing invalid input vectors.An input vector σ ∈ F mt 2 is by definition valid exactly when it is in {H(e) : e ∈ F n 2

•
that deg a = t (this also forces deg(aB − bA) < n − 2t + deg a, since an approximant by definition has deg(aB − bA) < n − t); • that A ∈ ak[x] (i.e., that a has exactly t roots among α 1 , . . ., α n ); and • that bg 2 − a ∈ ak[x] (i.e., that bg 2 − a vanishes on each of the roots of a).If all of these checks succeed then wt e = t and H(e) = σ where e i = [a(

Figure A. 4
Figure A.4 has an analogous split between testing decodable inputs and testing non-decodable inputs forGoppa decoding.There is no similar split in Figures A.1 and A.2, since those algorithms handle all inputs successfully.For Figures A.1, A.3, and A.4, n is chosen randomly between 0 and q; for Figure A.2, deg A is chosen randomly between 0 and 99.Similarly, t is chosen randomly in Figures A.2, A.3, and A.4; in each case, the range of t covered by the tests is slightly beyond the range of t useful for applications.
Figure A.3 consistently caught this bug; each run already caught the bug with #k = 2.I also checked that eight runs of Figure A.4 consistently caught this bug; here the eight runs caught the bug with #k = 8, #k = 16, #k = 4, #k = 8, #k = 8, #k = 4, #k = 32, #k = 4 respectively.The variation in #k here suggests running more repetitions of the tests for reliability, or adding tests specifically for this case.