Verifiable FHE via Lattice-based SNARKs

. Fully Homomorphic Encryption (FHE) is a prevalent cryptographic primitive that allows for computation on encrypted data. In various cryptographic protocols, this enables outsourcing computation to a third party while retaining the privacy of the inputs to the computation. However, these schemes make an honest-but-curious assumption about the adversary. Previous work has tried to remove this assumption by combining FHE with Verifiable Computation (VC). Recent work has increased the flexibility of this approach by introducing integrity checks for homomorphic computations over rings. However, efficient FHE for circuits of large multiplicative depth also requires non-ring computations called maintenance operations, i.e. modswitching and keyswitching, which cannot be efficiently verified by existing constructions. We propose the first efficiently verifiable FHE scheme that allows for arbitrary depth homomorphic circuits by utilizing the double-CRT representation in which FHE schemes are typically computed, and using lattice-based SNARKs to prove components of this computation separately, including the maintenance operations. Therefore, our construction can theoretically handle bootstrapping operations. We also present the first implementation of a verifiable computation on encrypted data for a computation that contains multiple ciphertext-ciphertext multi-plications. Concretely, we verify the homomorphic computation of an approximate neural network containing three layers and > 100 ciphertexts in less than 1 second while maintaining reasonable prover costs.


Introduction
Fully Homomorphic Encryption (FHE) schemes can be used to add privacy-preserving properties to cloud applications by encrypting the client's inputs such that the server can still (homomorphically) perform computations on them, resulting in encrypted outputs that are sent back to the client.Examples of such applications are oblivious RAM (ORAM) [CDNP23], privacy-preserving machine learning [BPTG15,BGGJ19] and more recently confidential smart contracts in general-purpose blockchains [ZAM23].Normally, FHE schemes can only be used in settings where the server is assumed to be honest-butcurious, meaning the server is trusted to perform the homomorphic computations correctly but not trusted with access to the confidential plaintext values on which the computation is performed.This trust assumption can be removed by adding verifiablitity to existing FHE schemes and thereby constructing verifiable FHE (vFHE) [VKH23].By adding integrity to the FHE primitive, vFHE could be used to maintain confidentiality of FHE against active adversaries performing e.g.key-recovery attacks [CT15,CGG16].More generally, vFHE enables verifiable computation on encrypted data, aka Private Verifiable Computation (PVC) [FGP14], in which a client can outsource computation to a server in a verifiable way while preserving the privacy of its inputs and outputs.
A common approach for constructing vFHE is to combine an FHE scheme with a Verifiable Computation (VC) scheme which is used to prove the correctness of the homomorphic computations.However, combining these two primitives in an efficient way turns out to be a highly non-trivial task.Namely, VC can usually prove arithmetic circuits whose gates are additions or multiplications over some field F p , while the homomorphic computation that the server wants to prove is performed over polynomial rings.Simply representing the computation as circuits over F p introduces many significant overheads on the size of the proofs or on the size of CRS (common reference strings), and also on the running time of the prover and of the verifier.
To overcome this, recent works [GNS23,BCFK21] have studied how to modify the VC protocols to work over rings, in an attempt to have proofs that match the type of computation done by the server and do not require representing operations over rings with gates over F p .We propose a fundamentally different approach, namely, to exploit the well-known decomposition of the polynomial rings used in FHE as direct product of fields1 which evokes the use of a lattice-based SNARK over fields to generate the proofs.

Our Contributions
We propose the first verifiable fully homomorphic encryption scheme combining FHE and SNARKs (succinct non-interactive argument of knowledge) in a non-trivial way.Our approach is modular, meaning that it is possible to construct blocks of verifiable homomorphic circuits that can be assembled together to build larger circuits.Our construction is the first one to handle real homomorphic computation, including the fundamental maintenance operations known as modulus switching and key-switching (aka relinearization).In addition, this implies that we can handle bootstrapping and achieve fully homomorphic encryption.Moreover, we provide a public C++ implementation of our construction and run experiments that can serve as a baseline for future works when it comes to practical results.

Exploiting double-CRT to make FHE more VC-friendly
Most FHE schemes work over cyclotomic rings R = Z[X]/⟨X N + 1⟩, where N is a power of two.In particular, ciphertexts are composed by elements of R Q := R/QR, where Q is a large integer.Thus, when a server performs computations on encrypted data, it operates on elements from R Q , i.e., polynomials modulo X N + 1 and Q.At first glance, this type of computation is not easily represented by circuits over F p , which is the set VC typically handles.
However, FHE schemes are commonly implemented using a double-CRT representation, which works by choosing Q as a product of a few small primes q 0 , ..., q L , then using the isomorphism to represent operations on R Q as independent operations on each R qi .Since for each prime q i , it holds that Z qi is a field, this gives us a hint that it could be possible to instantiate different VC instances, defined over fields F q0 , ..., F q L and then have L + 1 proofs to prove the actual computation over R Q .• c 2 • c 3 modulo Q = q 0 • q 1 .Every gate represents a homomorphic addition or multiplication, which are composed by many operations over R Q .When we represent the circuit as two circuits with low-level operations defined modulo q 0 and q 1 , and inputs c i,j = c j mod q i , the output of the first multiplication gate is used as input of the following gates in all the circuits.However, between the homomorphic operations, one needs to execute two "maintenance operations" that are not defined in terms of additions and multiplications on R Q , and thus do not respect the above isomorphism.These operations are key switching, which is used to guarantee that ciphertexts have a valid format during the whole computation, and modulus switching, which controls the noise growth.Concretely, as discussed in more detail in Section 2.6, both operations require non-arithmetic modular reductions.Moreover, since the isomorphism does not hold, the computations modulo q i become dependent of values modulo q j for j ̸ = i.This means that instead of having L + 1 independent circuits defined modulo different primes q i 's, which could be proved independently, we actually have L + 1 circuits that are interconnected, with intermediary wires being shared among them.This is illustrated in Figure 1 and discussed in more detail in Sections 2.4 and 2.5.
q 1 .We divide the circuits in two layers, the first one is composed by the proofs π 0,0 and π 0,1 , and the second layer corresponds to the proofs π 1,0 and π 1,1 .Proof π 1,0 considers as input the value c 0,3 and the two outputs of the first layer.And similarly for π 1,1 and c 1,3 .
Because of the non-arithmetic operations and the wires shared among the circuits, we cannot simply have one proof for each prime q i .Thus, we subdivide each circuit into layers, such that the input wires can come from any circuit, and the output wires can be connected to any other circuit, but the internal wires are connected only to gates with respect to the same q i .As such, we have "boxes" defined entirely modulo one single prime and we can finally have a proof for them, which gives us a proof for the original homomorphic computation modulo Q as a concatenation of the proofs of these small subcircuits.This is illustrated in Figure 2. Notice that simply breaking a proof with respect to Q = L i=0 q i into proofs with respect to q i 's does not increase the proof size, since each proof now is smaller (containing elements modulo q i ).However, adding k layers multiplies the proof size by O(k).On the other hand, the prover's running time is basically the same and the CRS can even become smaller.Moreover, if a non-interactive VC protocol is used, then our solution remains non-interactive, since the prover can generate all π i,j 's and only then send them to the client for verification.

Optimizations and efficiency
By looking at the homomorphic operations more closely, we see that there are different ways of grouping them or changing the order they are executed in, such that we add no or very little overhead to the prover and reduce the proof size and the verifier's running time.First of all, all the operations between plaintexts and ciphertexts, and also the homomorphic additions can be grouped in single proofs, since they do not require key switching or modulus switching, and these are the only two operations that mix the wires.Also, the homomorphic multiplication is usually composed by a tensor product, then a key switching, then a modulus switching.Thus, at first glance, a block of operations finishing with a ciphertext-ciphertext multiplication would require 3 layers of proofs, however, switching the modulus from Q to Q ′ := Q/q L means that the following computation is executed modulo Q ′ , thus there is no subsequent computation modulo q L .As a result, we can actually finish the proof with respect to q L , then pass its output as inputs to the other proofs and save one layer.This is shown in Figure 3. Also, each layer has one less column than the previous layer, which almost halves the proof size.
Figure 3: Proof for homomorphic computation of the composition g • f , where f ends with a multiplication.On the right, we show how we can merge two layers by ending the proof corresponding to the last prime.
One problem that hinders the practical efficiency of our construction, is that to achieve soundness, state-of-the-art efficient VC schemes [Gro16,GWC19] need to prove computations over fields F q where q is around 256 bits, while the q i 's typically used in FHE schemes are around 30 bits.One could emulate the smaller moduli in the bigger field F q , but this inevitably blows up the number of gates in the arithmetic circuit that the VC scheme verifies.We observe that recently proposed lattice-based approaches [GMNO18,ISW21] can achieve similar security while being flexible in terms of field choice.Note that the use of lattice-based constructions also comes with the added benefit of maintaining plausible post-quantum security.
In Section 3.4, we study the efficiency of our scheme when instantiated to verify building blocks such as ciphertext additions, plaintext-ciphertext multiplication, matrix-vector multiplication, and higher depth computations such as ciphertext-ciphertext multiplication, slot rotations and more general high-depth functions composed of these building blocks.

Implementation and practical results
We present the first implementation of a verifiable FHE construction that can be efficiently instantiated for fully homomorphic circuits i.e. with a possible multiplicative depth greater than one.We instantiate it for a homomorphic circuit representing a 3-layered neural network and implement it in C++ to show the practicality of our scheme.The only other vFHE implementation known to us [VKH23] proves the correct computation of a single ciphertext-ciphertext multiplication (without the required maintenance operations) in 443s while our implementation needs only 167s to prove a homomorphic computation on >100 ciphertexts that includes the maintenance operations required to compute higher depth computations.Verification times vary from 0.6s to 0.9s depending on the size of the input layer.

Notations
We denote the security parameter as λ.The notation y ← A(x) signifies the execution of a probabilistic polynomial-time (PPT) algorithm A which outputs y given the input x.The symbol F is used to denote a finite field, while R denotes a ring.We denote by negl(λ) an arbitrary negligible function in λ.Within the paper, square brackets are used to indicate a range [n] = {1, . . ., n}, and also to represent the central remainder modulo q as [n] q .Bold lowercase letters are used to denote vectors and bold uppercase for matrices.

Rank-1 Constraint System (R1CS)
An R1CS instance is a collection of constraints on a vector of values c ∈ F Nw called the wire values, where N w is the number of wire values and F is a finite field.The first n values of this wire vector are called the statement x ∈ F n .The last N w − n values are called the witness w ∈ F Nw−n .There are N g constraints which are also referred to as gates.An R1CS instance CS can be defined as a tuple {0, 1} for some constraint system CS.This function has the property that for some statement x ∈ F n and witness w ∈ F Nw−n , CS(x, w) = 1 if and only if We call the constraint system satisfiable for some statement x iff there exists some w such that CS(x, w) = 1.The set of all satisfiable wire values (x, w) is a relation called R CS .We can define the corresponding language as L CS = {x | ∃w s.t.CS(x, w) = 1}.

Succinct Non-interactive ARguments of Knowledge (SNARKs)
We define Succinct Non-interactive ARgument of Knowledge (SNARK) schemes in the preproccesing model with a designated verifier.A SNARK consists of the following three probabilistic polynomial time (PPT) algorithms: -Setup(1 λ , CS) → (crs, st) : given the security parameter λ and the constraint system CS, it generates a common reference string crs and a verification state st.
-Prover(crs, x, w) → π : given a common reference string crs, a statement x and a witness w, it generates a proof π.
-Verifier(st, x, π) → b : given a verification state st, a statement x and a proof π, it generates a verification bit b ∈ {0, 1}.
These algorithms must satisfy the completeness and knowledge soundness properties.
Completeness.A SNARK scheme is complete iff for any security parameter λ, statement x and witness w, and R1CS instance CS, Knowledge Soundness.A SNARK scheme satisfies knowledge soundness iff for any PPT algorithm Prover * , there exists a PPT extractor Extr such that for any security parameter λ, R1CS instance CS and state z, SNARKs schemes are also required to be succinct.Concretely, this requires that the proof size can be expressed as poly(λ + log |CS|) and the Verifier algorithm runs in time poly(λ + |x| + log |CS|).

Lattice-based SNARKs.
Traditionally, SNARKs defined over F q rely on the large size of this field to guarantee security and soundness, which means that q typically has around 256 bits.Lattice-based SNARKs, on the other hand, base their security on hardness assumptions such as the learning with errors problem (LWE), which allow them to be instantiated with smaller fields, i.e., F q with small q.Thus, they are a perfect tool for our construction, because we can use small values of q to match the small primes used in FHE schemes.As an additional benefit of lattice-based SNARKs, they are post-quantum, just like the FHE schemes are.Thus, combining them with FHE gives us constructions that are still post-quantum secure.
In our analysis, we assume that the verification runs in time O(λ + |x|) and each proof is composed by a constant number of ring elements, which together have size O(λ).Those assumptions are true for current constructions of lattice-based SNARKs [GMNO18,ISW21].We refer to the appendix B for more details on the construction of lattice-based SNARKs.

Fully Homomorphic Encryption (FHE)
FHE schemes are encryption schemes with the property that arbitrary circuits C can be evaluated homomorphically in the ciphertext space.To make the presentation more concrete, we consider in this section the BGV scheme [BGV12], but notice that other schemes, like FV [FV12] and CKKS [CKKS17], are very similar.We give a high-level description of the construction, that suffices for our purposes.Moreover, to simplify the presentation, we just present a symmetric-key version of BGV.Transforming it into a asymmetric scheme is done via simple standard techniques.
Let R = Z[X]/⟨X N + 1⟩, where N = 2 k for some k ∈ N. Define a modulus Q = L i=0 q i where each q i is a different prime, and R Q = Z Q [X]/⟨X N + 1⟩.The ciphertexts are defined as vectors over R Q , but the homomorphic operations are easier to understand if the ciphertexts are considered polynomials in Fix a plaintext modulus t and an error distribution χ err over R that samples coefficients according to a discrete Gaussian distribution with standard deviation σ err .Then, we say that c(Y ) decrypts to a message m ∈ R t if when we evaluate c(Y ) on the secret key, we get the message plus some small noise term, i.e., c(sk) mod Q = te + m for e ∈ R. Notice that in this case, (c(sk) mod Q) mod t = m.

Generic construction.
A homomorphic encryption scheme HE requires the following functions for parameter generation, secret key generation, encryption and decryption.For a basic (symmetric-key) version of the BGV scheme, they can be constructed as follows.
-HE.ParamGen(1 λ , L): given the security parameter λ and a multiplicative depth L, choose N , Q = L i=0 q i and σ ∈ R such that the (N, Q, σ)-RLWE problem achieves λ bits of security and the FHE scheme based on it can accommodate homomorphic circuits of depth L. Let R := Z[X]⟨X N + 1⟩, R Q := R/QR and R t := R/tR for some plaintext modulus t.The message and the ciphertext spaces are R t and R Q [Y ], respectively.Set params := (N, Q, σ, t), which is a default input to the following algorithms.
-HE.KeyGen(1 λ ): Given the security parameter λ, uses params to output some secret key s and rlk which is the relinearization key with respect to s (see Section 2.6).
-HE.Enc sk (m): Consider m ∈ R t .Sample a uniformly at random from R Q , and all coefficients of e remain smaller than ⌊Q/2t⌋, i.e. the noise term's l ∞ -norm remains below a certain bound.
A homomorphic encryption scheme also requires a function HE.Eval that evaluates an arithmetic circuit C on some input ciphertexts c i (Y ) = HE.Enc(m i ) and outputs ciphertext c ′ (Y ) such that HE.Dec(c ′ ) = C({m i }).We present the functions called by the HE.Eval function to homomorphically compute basic operations on ciphertexts. - Thus, c(Y ) is an encryption of the sum of the messages, as desired.Notice that the noise terms are also added together, and therefore homomorphic addition increases the noise additively.
Verifiable FHE via Lattice-based SNARKs which is an encryption of m 0 • m 1 , as desired.Notice that ciphertext multiplication leads to quadratic noise growth since the noise terms are multiplied.
Furthermore, the degree of c mult in Y is larger than the degree of both input ciphertexts, and thus requires more elements of R Q to store it.In other words, this operation increases the size of the ciphertexts. - Notice that c multP txt has the same degree as c 0 in Y , so the size of the output ciphertext remains constant.Also, the noise term of c 0 is only multiplied by m 0 , and therefore the noise growth is small compared to ciphertext-ciphertext multiplication (at least when t is relatively small).
Notice that both the noise and ciphertext degree grow exponentially with the multiplicative depth of the homomorphic circuit being evaluated.In levelled FHE schemes, this is typically solved by performing maintenance operations after every ciphertext-ciphertext multiplication.More concretely, the ciphertext c(Y This is followed by the modswitching operation which aims to remove the noise added by ciphertext-ciphertext multiplication (and relinearization).As the name implies, this is achieved by switching to a smaller modulus which essentially divides the noise by q i .See Section 2.6 for a more detailed description.

Basics of RNS
All the homomorphic operations are composed of some operations over R Q , which boil down to adding and multiplying polynomials of degree less than N , then reducing them modulo X N + 1, and reducing each coefficient modulo Q.
Because Q is typically large (say, with more than 1000 bits), to work directly with polynomials mod Q, we need to use libraries that implement multi-precision integers, which is inefficient.To overcome this, the residue number system (RNS), is typically used.It exploits the decomposition of Q = ℓ i=1 q i to work with several polynomials modulo each q i , which fit in the 32-or 64-bit native integer types of current processors.
In more detail, because Q = ℓ i=1 q i , by using the Chinese remainder theorem coefficientwise, we have Thus, working with an element of R Q is equivalent to working with a vector of size ℓ in ℓ i=1 R qi .Therefore, we could in principle verify the homomorphic computation over R Q with ℓ independent proofs over each Z qi .We will discuss the limitations of this method soon.
Since multiplying polynomials efficiently requires first performing a fast Fourier transform, or number-theoretic transform (NTT), it is common to go one step further and represent elements of R Q in the "NTT form".Given a ∈ R Q , instead of simply storing its list of coefficients, we precompute the NTTs of a with respect to each q i .For this, we choose each q i as a prime congruent to 1 modulo 2N .This guarantees that there is a primitive 2N -th root of unity ω i ∈ Z qi and that the following is an isomorphism: Putting all together, we start with a(X) ∈ R Q , then we obtain a list of polynomials (a 1 (X), ..., a ℓ (X)) ∈ R q1 × ... × R q ℓ , then we map each of them to a vector using the NTTs so, at the end, a(X) is stored as a matrix By using a special type of NTT transform, called a negative-wrapped convolution (on which we will not elaborate here), we can avoid the polynomial reduction after multiplication [LMPR08,Zuc18].Therefore, we can implement each addition and multiplication over R Q with pointwise operations of the corresponding matrices.For example, a So computations over R Q can instead be performed as vector computations over l different finite fields.We will refer to this as the double-CRT (dCRT) representation.In the following section, it will become clear that the FHE scheme presented in Section 2.4 is not practical when computed in dCRT representation entirely.Maintenance operations require inverting the NTT transform and then sharing elements between different rows in the matrix representation.

Maintenance operations
In Section 2.4, we explained that ciphertexts maintenance enables the scheme to manage arbitrary depth homomorphic circuits.Typically, one relinearizes after multiplication, followed by a modswitch to decrease the noise.We define the modulus at level i as Q (i) = i j=0 q j .Therefore, ciphertexts encrypted over R Q (L) can manage homomorphic circuits of multiplicative depth L. It will become clear from a more detailed description below that the maintenance operations are not composed of additions and multiplications over R Q (i) .This implies that they can not be performed in dCRT representation.Therefore, one should first invert the NTT transform.However, one can avoid inverting the RNS decomposition by performing fast base extension FastBaseExt.To extend the decomposition of , but it can be shown that using fast base extension in maintenance operations only adds negligible noise to the resulting ciphertexts.

Relinearization.
As discussed in Section 2.4, multiplying ciphertexts results in a degree 2 ciphertext c(Y • Y , that decrypts to the same plaintext.One approach would be to encrypt sk 2 as rlk(Y ) = rlk 0 + rlk 1 • Y and then compute c ′ j = c j + c 2 • rlk j for j ∈ {0, 1}.Notice however, that this would add a large noise term c 2 • e, where e is the noise term of rlk(Y ) and c 2 is an element modulo Q (i) .Therefore, we instead use a decomposition of c 2 into its RNS base and define the relinearization key as a vector of ciphertexts rlk 0 , rlk 1 ∈ R i+1 Q (i) , such that we can compute the relinearization as c ′ j = c j + ⟨D Q (i) (c 2 ), rlk j ⟩ for j ∈ {0, 1}.This ensures that noise terms of rlk are only multiplied with smaller elements modulo q j , since the elements of D Q (i) (c 2 ) were base extended from the base q j to the base Q (i) .Using a similar technique, one can construct from a ciphertext c(Y mod t, i.e. they decrypt to the same plaintext using a different secret key.This operation is referred to as key-switching.

Modulus Switching.
To decrease the noise in a ciphertext c(Y are equivalent modulo t, i.e. they decrypt to the same plaintext.Given that the noise at c(Y ) = c 0 + c 1 Y satisfies a certain bound, the coefficients of c ′ can be calculated as where δ l = t(−c l /t mod q i ) for l ∈ {0, 1}.This operation can be performed in RNS decomposition by base extending δ ∈ R qi to the base For a more detailed description of these maintenance operations, as well as alternative methods, we refer to [Zuc18,KPZ21].Importantly, notice that all computations required by the maintenance operations (and also the double-CRT vector operations) are easily representable by R1CS constraints.

Verifiable FHE (vFHE)
We present a definition for vFHE schemes adjusted from Viand et al. [VKH23].This definition simply extends the definition of an FHE scheme by introducing a Verify algorithm that verifies the ciphertext c y and proof π output by the Eval algorithm for a certain input ciphertext c x and homomorphic circuit f .More concretely, a vFHE scheme consists of the following algorithms params ← ParamGen(1 λ , f ): given a security parameter λ and a homomorphic circuit f , it computes the parameters params which are a default input to all other algorithms.
c x ← Enc(x, pk): given plaintext input(s) x and a public key pk, it computes encryption(s) c x .
-(c y , π y ) ← Eval(c x , pk): given some input ciphertexts c x and the public key pk, it computes the output ciphertexts c y and a proof π y .
-{accept, reject} ← Verify(c x , c y , π y , sk): given some input ciphertexts c x , some output ciphertexts c y and a proof π y , output either accept or reject.
y ← Dec(c y , sk): given some ciphertext(s) c y and secret key sk, it computes the decryption(s) y.
Next, we define the properties that a vFHE scheme should satisfy.
Correctness.[VKH23] defines a correct vFHE scheme as a scheme that always decrypts to the correct plaintext, i.e., decryption works with probability one.However, most FHE schemes have a small failure probability, thus, we change the definition replacing "one" by overwhelming.Formally, for a certain security parameter λ, any function f and plaintext inputs x, using the parameters params ← ParamGen(1 λ , f ), it holds that Completeness.A complete vFHE scheme always verifies for output ciphertexts and proof generated honestly for the corresponding input ciphertexts.More formally, for a certain security parameter λ and any function f and plaintext inputs x, using the parameters params ← ParamGen(1 λ , f ), it holds that Soundness.A sound vFHE scheme only allows a negligible probability that some input and output ciphertexts verify if their corresponding plaintexts are not valid.More formally, for a certain security parameter λ and any function f , plaintext inputs x and adversary A, using the parameters params ← ParamGen(1 λ , f ), it holds that Security.The security of a vFHE scheme is defined basically in the same way as the security of a regular FHE scheme.Formally, for a certain security parameter λ, any function f , plaintext inputs x and adversary A, using the parameters params ← ParamGen(1 λ , f ), we say that the vFHE scheme is CPA-secure if it holds that Notice that [VKH23] defines CCA1 security and it is known that by combining FHE and SNARKs, we can achieve CCA1, but this introduces some technical and theoretical complications that would push us away from our goal of constructing a practical vFHE scheme.Thus, we prefer to stick to CPA security.
Remark about approximate FHE.Notice that equality on the plaintext space of the FHE scheme is exact for BGV and FV but is only approximate for the CKKS scheme, that is, for any plaintext m it holds that ∥HE.Dec sk (HE.Enc sk (m))−m∥ ∞ ≤ 1/2 p for some integer p, which is a precision parameter chosen during the parameter generation.Therefore, that is how one has to interpret the condition Dec(c y , sk) = f (x) in the correctness definition when applying our construction to CKKS.Analogously, the condition Dec(c y , sk) ̸ = f (x) in the soundness definition must be interpreted as ∥HE.Dec sk (c y ) − f (x)∥ ∞ is larger than the chosen bound.

vFHE Schemes from Lattice-based SNARKs
In this section, we construct our new verifiable FHE (vFHE), as defined in Section 2.7, by combining a second-generation FHE scheme, such as BGV or CKKS, and a lattice-based SNARK.A vFHE scheme allows a client to outsource the computation of f on input x to a service provider, while keeping x private and also verifying the correctness of the final result y = f (x).Without loss of generality, we assume that the homomorphic computation corresponding to the outsourced function is represented as a layered circuit, as explained in Section 3.1.The homomorphic computation is then performed as usual, and to allow the verification of the computation, the SNARK is used to generate proofs for each layer of the circuit.This is explained in detail in Section 3.2.

Layered circuits for homomorphic computation
In this section2 , we assume that the homomorphic computation can be represented as a circuit where each gate takes as input elements of the ring R Q (polynomials modulo X N + 1 and coefficients in Z Q ) and performs a homomorphic addition, plaintext-ciphertext multiplication, or ciphertext-ciphertext multiplication, which is divided in a tensor product and the maintenance routines, the key-switching and the modulus switching.Moreover, it is always possible (and it is common in the FHE literature) to use layered circuits for the homomorphic computation, where each layer finishes with ciphertext-ciphertext multiplication gates, and the number of layers is then the multiplicative depth of the circuit.Thus, we consider the following structure for the circuits: First of all, define Q (k) = k i=0 q i , i.e., the product of k + 1 small primes.For a circuit with multiplicative depth L, fresh ciphertexts are defined over R Q (L) = R Q .Then we represent the circuit as the following composition of subcircuits where each subcircuit C k takes as input input(k) elements of R Q (k) , outputs output(k) elements of R Q (k) , and only the output gates can be tensor products.The subcircuit M k then corresponds to the key-and modulus-switchings.Hence, it takes as input input(k − 1) ≤ output(k) elements of R Q (k) , and outputs input(k − 1) elements of R Q (k−1) .
-KeyGen(1 λ ) Generate the secret, public, and relinearization key of the FHE scheme, i.e., run (HE.sk,HE.pk, HE.rlk) ← HE.KeyGen(1 λ ).Then, run the setup algorithm of the SNARK scheme for each multiplicative layer i, over each field that is used in the dCRT representation of R Q (i) .In more detail: 1.For each prime q j with j = 0, . . ., L, use the SNARK setup algorithm to generate (crs L,j , vrk L,j ) ← Π.Setup(1 λ , C L ) in F qj .
-Enc(x, pk): -Eval(c, pk): For a layered circuit C 0 (... ) , run the evaluation algorithm of the FHE scheme as usual, but use the intermediate values as the wire values that are used to generate proofs in the SNARK scheme.In more detail: 1.For each prime q j with j = 0, . . ., L: (a) Evaluate c L,j ← HE.Eval(c j , C L , HE.rlk) where cj ∈ (R 2 qj ) input(L) and c L,j ∈ (R 3 qj ) output(L) since C L has input(L) inputs and output(L) outputs.Additionally store the results of all intermediate computations as w L,j ∈ F * qj .(b) Calculate proof π L,j ← Π.Prover(crs L,j , cj ||c L,j , w L,j ).

For each multiplicative layer
(a) Generate the intermediate outputs used for modulus switching c i,i+1 ← HE.Eval(c i+1,i+1 , M i+1 , HE.rlk) where c i+1,i+1 ∈ (R 3 qi+1 ) input(i) and c i,i+1 ∈ (R 2 qi+1 ) input(i) , since a maintenance circuit has as many ciphertext outputs as inputs.Also, for 0 ≤ j ≤ i, generate intermediate outputs c i,j ← HE.Eval(c i+1,j ||[c i+1,i+1 ] qj , C i • M i+1 , HE.rlk) where c i+1,j ∈ (R 3 qj ) input(i) , and c i,j ∈ (R 2 qj ) output(i) since C i has input(i) inputs and output(i) outputs.Again, store the results of all intermediate computations as w i,j ∈ F * qj for 0 ≤ j ≤ i + 1.(b) For 0 ≤ j ≤ i, using additionally in the statement HE.rlk, modswitching outputs c i+1,i+1 and the decompositions [c i ] qj ∈ (R i+1 qj ) input(i) used for relinearization, calculate the proofs Also calculate the proofs for the maintenance circuit M i+1 Return all ciphertexts c i,j output by HE.Eval and all proofs π i,j output by Π.Prover -Verify(c, {c i,j }, {π i,j }, sk): Given the input ciphertexts c ∈ (R 2 Q (L) ) input(L) , all intermediate outputs c i,j ∈ (R 3 qj ) output(i) , the proofs π i,j and verification keys vrk i,j , run Π.Verifier for every partial circuit in every field F qj and output reject if any SNARK verifier rejects.Otherwise output accept.In more detail: 1.For each prime q j with j = 0, . . ., L: verify the circuit C L by running b L,j ← Π.Verifier(vrk L,j , cj ||c L,j , π L,j ) and output reject if b L,j = 0.
2. For each multiplicative layer i = L − 1 to i = 0: (a) For 0 ≤ j ≤ i, verify the circuit C i • M i+1 in F qj by using the relevant input and output ciphertexts, modswitching outputs and inputs for relinearization to run and output reject if b i,j = 0. (b) Verify the circuit M i+1 by using the relevant input and output ciphertexts and inputs for relinearization to run ) and output reject if b i,i+1 = 0. (c) Output accept, since all subcircuits verified correctly.

Subcircuit blueprinting.
In our basic construction, the vFHE public key contains O(L 2 ) different SNARK crs instances which are used in the Eval algorithm to generate the vFHE proof.This could easily be decreased to only O(L) instances by generating a "blueprint" crs for each prime q j .These crs's encode a blueprint circuit C B which is able to compute all other subcircuits C 0 (M 1 (•)), . . ., C L−1 (M L (•)), C L by setting certain input wires to zero.Since these subcircuits would be very similar, mostly differing in the number of inputs and outputs, the added number of gates would be minimal, which means that the added cost of proof generation would also be minimal.Note that one can also choose to make this tradoff for certain similar layers but not for others.This blueprinting technique also effects the number of SNARK vrk instances in the secret key but their size is negligible compared to the crs's.

Security analysis
The FHE and the SNARK schemes are used independently in our construction, hence, the security of our scheme is trivially inherited from them.In this section, we briefly discuss the security requirements that were defined in Section 2.7.

Correctness & Completeness.
These properties follow from the correctness and completeness of the FHE and SNARK scheme respectively.Our construction simply divides the FHE computation in an exhaustive set of subcircuits.The Eval algorithm evaluates every subcircuit, propagating outputs as intended by the FHE scheme.The Verify algorithm will accept when all the SNARKs that prove these subcircuits verify.

IND-CPA Security.
Suppose there is an algorithm A that breaks the CPA-security of our scheme.Then we can construct an algorithm B that breaks the CPA-security of the underlying FHE scheme by simply letting B generate the public parameters of the SNARK scheme (as they are all independent of the secret values of the FHE scheme, thus B is able to do so), and providing them to the A algorithm.The remainder of the IND-CPA security game is the same for the vFHE adversary, so B can forward the messages between A and its challenger.Therefore, our construction remains CPA-secure if the base FHE scheme is already so.

Soundness.
The knowledge-soundness of the SNARK scheme implies that the probability ε that a PPT algorithm can produce a verifying but non-valid assignment is negligible in the security parameter λ.If we denote this probability for the subcircuit in layer i with the modulus q j as ε i,j and call this event V i,j (such that Pr[V i,j ] = ε i,j ), and denote the soundness probability from Section 2.7 as ε vFHE , we can use the union bound to state that where d is the multiplicative depth of the homomorphic circuit.Honest homomorphic evaluation implies vFHE soundness by the correctness of the FHE scheme.

About the compactness and client's efficiency
To avoid trivial constructions of FHE where the server does not actually compute anything, but just attaches the circuit to the ciphertext and the client decrypts and evaluates the circuit on their own, one usually requires that FHE schemes be compact [vGHV10], i.e., the running time of decrypting a ciphertext is independent of the circuit that was evaluated.We notice that as all existing leveled FHE schemes, our construction does not satisfy this compactness property, but it is not equivalent to the trivial construction either.Indeed, there are functions for which it is more expensive for the client to compute them locally than to outsource their computation, run the decryption and the verification.3Moreover, we notice that there are other reasons for a client to outsource the computation, for example, if the client has already outsourced the storage or if the computation to be performed depends on inputs from the server, which are unknown to the client (as is common when the server offers machine learning as a service, since in this case the server trains a large model and the client cannot download the model to run it locally).Furthermore, the privacy of the server's inputs can be preserved by performing noise-flooding in the FHE scheme to achieve circuit privacy similar to [BCFK21].Finally, we would like to stress that in practice, our construction is still much more efficient than existing constructions and allows the computation of complicated and deep circuits that could not be handled previously.
All that said, we now present an analysis of the verification cost, then we provide examples of some functions families that are "outsourceable".

Verification cost
As stated in Section 2.3.1, the time complexity of verifying a proof π that a circuit with input size ℓ in and output size ℓ out was correctly computed is O(ℓ in + ℓ out + λ).Notice that we instantiate our SNARKs over F qj for 0 ≤ j ≤ L, thus, each gate of the circuit that operates on polynomials is actually a point-wise addition or multiplication of N -dimensional vectors corresponding to the NTTs of the polynomials (see Section 2.5).Thus, because the circuit is defined in terms of polynomials, its verification cost is , operations modulo small primes q j 's.

Proof. One just has to note that C
, thus, the verification for each prime q j can be done in time ) basic operations, i.e., operations modulo small primes q j 's.
Proof.Since M k+1 runs the relinearization and the modulus-switching, it uses the algorithm FastBaseExt, which takes as input the outputs of C k+1 used as inputs for C k , for each of the k + 1 primes used to compute C k+1 .Thus, . Then, we have , which means that for each prime q 0 , ..., q k , we can verify operations modulo small primes q j 's.
Proof.Since the verification is done by verifying C L , then verifying the composition we just have to compute t 0 + t 1 , where t 0 is cost from Lemma 1 and t 1 is the sum of costs from Lemma 2 for each k.It holds that We stress that thanks to the slot structure of the plaintext space of the FHE schemes we are considering, the client can encrypt s := Θ(N ) messages per ciphertext and the homomorphic computation actually evaluates the circuit on N different inputs in parallel.Thus, the expression in Lemma 3 can be divided by N when comparing to the cost of evaluating the function locally.Generally speaking, a circuit is outsourceable if it is wide and the number of "inner gates" is much larger than the number of inputs and outputs.For example, supposing that input(k) and output(k) are constants for all k, then Lemma 3 for security reasons.Hence, since the cost of encryption/decryption is negligible wrt the cost of verification, a circuit with S gates and multiplicative depth L is outsourceable if S = Ω(L 3 ).

Outsourceability of matrix-vector multiplication.
Let f (v) = M • v for some matrix M ∈ Z m×n and some vector v ∈ Z n .Also, let s ∈ Θ(N ) be the number of plaintext slots.Then, computing f However, since we can compute f with a circuit of depth one, we can set L = 1 and the verification cost in Lemma 3 becomes Notice that homomorphically calculating these linear functions does not require ciphertext-ciphertext multiplications.

Now we extend the comparison to non-linear functions. Let
where ⊙ represents the Hadamard product.Here, homomorphically calculating f would require one layer of ciphertextciphertext multiplications, so again we set L = 1.Similar to the previous comparison we can show that the verification cost is O(N (n + m)).Also, for s ∈ Θ(N ) plaintext slots, the cost of local computation is at least Ω(N • m • n), so the function remains outsourceable.Notice again that this comparison can be extended to functions of the form

Outsourceability of higher depth circuits.
To homomorphically calculate circuits of higher depth we divide them into consecutive depth-one circuits as described in Section 3.1.Consider for example homomorphically calculating a function f that approximately represents the feedforward computation of a neural network.For a d-layered network, f can be defined as for a sequence of compatible weight matrices M i , bias vectors b i and an activation function σ.In FHE, the function σ is typically approximated by a low-degree polynomial, e.g.σ(v) = v ⊙ v, thus each neural network layer is a depth-one circuit and we can set L = O(d).To ease the comparison, let's say the number of inputs or outputs in each layer (i.e.neurons) is upper bounded by w.Then, the cost of locally evaluating f on s ∈ Θ(N ) different inputs becomes Ω(N • d • w 2 ).From Lemma 3, we can conclude that the verification cost becomes O(N • w • d 3 ).Therefore, f would be (asymptotically) outsourceable in terms of basic operations when w ∈ Ω(d 2 ), meaning the neural network is sufficiently wide.In Section 5, we instantiate our construction for this specific example.

Related work and comparisons
There exists limited previous research on privacy-preserving verifiable computation.Early examples include a scheme by Gennaro et al. [GGP10] based on garbled circuits and a scheme by Goldwasser et al. [GKP + 13] based on functional encryption.Fiore et al. [FGP14] proposed the approach of combining a VC and FHE scheme, later extended in [FNP20] and improved by Bois et al. [BCFK21], with which we compare.
Other vCOED approach.Our construction follows the approach of [FGP14] which uses VC to verify FHE computations and thereby achieves verifiable Computations On Encrypted Data (vCOED).Recently, Garg et al. [GGW23] have proposed a different approach for vCOED where one uses FHE to homomorphically compute a VC scheme but have not provided a concrete description of their scheme nor cost analysis.Another recent eprint [ACGSV23] follows this approach and explicitly describes and implements the homomorphic computation of FRI (an important building block of many SNARKs).Although it would be interesting to make a comparison with our approach, the cost of their vCOED scheme is not described in the non-interactive setting.Furthermore, they have not provided an implementation of the vCOED scheme, nor have they parameterized it for a specific IOP and FHE scheme.

Comparison with [BCFK21]
In [BCFK21], two homomorphic hash functions over Galois rings are proposed and they are used together with a variant of the GKR protocol [GKR08] to obtain verifiable computation over encrypted data.On the positive side, their solution is publicly verifiable.However, the types computation they can verify is rather limited.
In more detail, instead of verifying computation over R Q , they actually verify circuits over Z Q [X], meaning that no reduction modulo X N + 1 is performed, which means that the degrees of the polynomials involved in the homomorphic computation are no longer bounded by N , but on depth d, they have degree O(2 d • N ).Then, they use the homomorphic hash functions to compress these polynomials and reduce the proof size.But because the maintenance operations do not respect the homomorphic properties of the hashes, i.e., they are not composed by additions and multiplications on Z Q [X], they cannot prove the relinearization and the key-switching.Without relinearization, the number of polynomials in each ciphertexts is no longer constant, but at depth d, we have Θ(2 d ) of them.Thus, by exponentially increasing the number of polynomials and the degree of each polynomial, we end up with ciphertexts of size Θ( Of course, operating with larger ciphertexts is also more costly timewise.In other words, there is a huge time and memory overhead for the server depending on the depth.
In our case, the server has essentially no overhead, as the homomorphic computation is basically unchanged.
Furthermore, no modulus switching means that the noise in the ciphertexts grows exponentially fast.Essentially, at depth d, the noise is Θ(σ 2 d ) where σ is a constant bounding the initial noise of fresh ciphertexts.If the noise is too big, then the correctness of decryption stops holding.In more detail, we need the final noise to be bounded by Q, so, Therefore, on top of the limitation related to the size of the ciphertexts (efficiency), there is another limitation related to the correctness, which implies that, in the best possible scenario, the depth of the circuits that [BCFK21] can verify is only O (log log Q).In our case, because we support modulus-switching, the FHE scheme itself supports much deeper computations.
From Theorem 7 of [BCFK21], the time complexity of their verification of a circuit of depth L, having S gates over R Q , and input size input(L) is Õ((input(L)+L 2 )N +λL log S) operations over Z Q .Since additions and multiplications modulo Q cost at least log Q basic operations, supposing Q polynomial in N , as it is usual in FHE, their verification cost becomes Õ(((input(L) + L 2 )N + λL log S) log N ) in terms of basic integer operations.By setting L as a constant, since basically that is the multiplicative depth that their construction can handle, their verification cost becomes Õ(input(L)N log N +λ log S log N ), while in our case, from Equation 1, we obtain verification cost O (λ + N • input(L)).

Comparison with Rinocchio
The authors of [GNS23] propose another approach for verifying computations over encrypted data.They circumvent the incompatibility between the finite field-based zkSNARK schemes and the ring-based FHE schemes by adjusting a state-of-the-art zkSNARK proof system to work over rings.They claim that this proof system enables privacy-preserving verifiable computation by instantiating it over the ring R Q and combining it with RLWEbased FHE schemes.However, similar to [BCFK21], their approach does not natively support the maintenance operations that are crucial for the efficiency of modern FHE schemes such as [BGV12] and [FV12].As discussed above, not performing the maintenance operations means that both the modulus Q and the ciphertext degree depend exponentially on the multiplicative depth d.This would impose an overhead on the prover which makes the scheme impractical for applications with some multiplicative depth.We refer to Appendix C for a more detailed comparison.

Implementation and Performance
In this section, we discuss an instantiation of our scheme that uses the construction by Ishai et al. [ISW21] as lattice-based SNARK to verify the following computation of a BGV homomorphic circuit for two layers l = 0 and l = 1.Here, a and b are elements of the plaintext space R t , and ciphertexts c ∈ R Q [Y ].Notice that this computation approximates the feedforward evaluation of a basic neural network layer.Since each layer has a multiplicative depth of one, we can evaluate one neural network layer before performing relinearization and modswitch on the ciphertexts c 1,i .We refer to Section 3.4 for a more detailed description of this instantiation.
We have selected standard BGV parameters that provide 128-bit security, which we confirmed using the lattice estimator by Albrecht et al. [APS15] for lattice dimension N = 2 12 and Gaussion width s = 4.It is trivial to show that for these BGV parameters, it suffices to select a BGV modulus, Q = Q (1) = q 0 q 1 q 2 , the product of three 30-bit primes, and a post-modswitch modulus Q (0) = q 0 q 1 .(Concretely, the output ciphertexts c 2,i satisfy the noise bound for decryptability and the relinearized ciphertexts c ′ 1,i satisfy the noise bound required before modswitching.)Therefore, we claim that this computation is verifiable using 6 proofs in 2 layers and 3 finite fields, as in Figure 3.To simplify parameter selection for this specific instantiation, the moduli Q (0) , Q (1) contain an extra prime.Remark that our construction also allows for other similar adjustments, e.g.removing multiple primes per modswitch for increased noise reduction.
As shown in Section 3.3, we can select the security parameters for the FHE and SNARK scheme independently.We now select the parameters of the lattice-based SNARK scheme such that 128-bit security is achieved (for instantiations over all prime fields F qi ).This scheme consists of a linear PCP and a linear-only vector encryption.To ensure efficiency of the former, we select the primes q i such that the prime fields F qi contain N g -th roots of unity for N g = 2 20 which determines the maximum number of gates in one R1CS instance.This allows for O(n log n) construction of the QAP that the linear PCP is based on.Notice that the existence of the N g -th roots of unity ensure the existence of the N -th roots of unity required for the NTT transformation in BGV (since N = 2 12 ).Also, a soundness amplification parameter ρ is determined in order to achieve sufficient knowledge soundness for the PCP.That is, to obtain soundness bounded by 2 −s , one should select ρ such that (2N g /(q − N g )) ρ < 2 −s (we refer to Section B.1 for more detail).In our implementation, targeting s = 128 and using 30-bit primes, we chose ρ = 15.But we notice that for 60-bit primes we would only need ρ = 4 which significantly reduces the CRS size.Also, it is possible to use a smaller s, different from the security level.
As for the linear-only vector encryption, the parameters were selected similarly to the method in Section 4.2 of [ISW21].Recall that the plaintext space F pi of this scheme needs to match the finite field of the PCP.In this context, q i are the ciphertext space moduli.In Table 1, we summarize our selection of the necessary parameters.Lastly, we slightly adjusted both the linear PCP and the vector encryption to remove components that form the zero-knowledge property of the resulting SNARK.
We have implemented 4 this instantiation of our vFHE scheme for the homomorphic circuit described by Equation (2).Using the libsnark library, we implemented the R1CS constraint systems.This includes the elementary BGV computations in double-CRT representation, as well as the circuits required for the relinearization and the modswitch Table 1: Parameter selection for the lattice-based SNARK.p is the plaintext modulus, while q is the ciphertext modulus.n and s are the lattice dimension and the Gaussian width of the lattice-based vector encryption.ρ is the soundness amplification parameter.τ is the ciphertext sparsification parameter which ensures the linear-only property and q ′ is the post-modswitch modulus which decreases proof size.For a more thorough explanation, we refer to Ishai et  We compare our results with [VKH23], who implemented the Rinocchio scheme, and also the naive approach of using field-incompatible proof schemes to verify the dCRT computations.Both approaches are unable to efficiently verify homomorphic circuits with depth greater than one, therefore they only verify a circuit that performs one ciphertextciphertext multiplication (followed by a modswitch).Comparison is unfair since it is unclear how their constructions would scale for higher depth circuits.However, we note that our implementation still achieves a 4-6x improvement in prover time while verification times remain practical, even for circuits with a high number of public inputs.

A More building blocks for vFHE A.1 Rotations
For any odd integer k ∈ [N ], homomorphically rotating the slots by k positions is done in two steps: First, we apply an automorphism τ k : X → X k to both components c i ∈ R Q of a ciphertext c(Y ), then we apply a key switching from s(X k ) to s(X).
Since the automorphism is implemented by simply permuting the entries of the matrices corresponding to the double-CRT representation of c(Y ), it does not add any cost to the proof and just implies that the wires of the proof of key switching have to be renamed to match the permutation.Then, the key-switching procedure is basically the same as the relinearization, which was already implemented for the homomorphic multiplication.

A.2 Adding all the slots via rotations
Let AddSlots k : R 2 Q → R 2 Q be a procedure that takes one ciphertext encrypting (µ 0 , ..., µ S−1 ) and outputs an encryption of a vector u such that u[k] = S−1 i=0 µ i and u[i] = 0 for i ̸ = k, i.e., the sum is located in the k-th slot and all the other slots are zero.

Algorithm 1: Standard way of adding all the slots homomorphically
▷ Apply a mask to zero other slots 6 u = (0, ..., 0) ∈ Z S 7 u This procedure is usually implemented as shown in Algorithm 1, since it requires only Θ(log S) rotations.However, since each rotation requires a key switching, this algorithm would require Θ(log S) layers of proofs in our construction.Instead, we propose Algorithm 2, which computes the same, but only applies the key switchings at the end and in parallel, requiring thus, just 2 layers of proofs, one for the main block, and another one for the key switchings.The main idea is that we can loop rotating the slots and adding, as in the original algorithm, but then we are producing a ciphertext that depends not only on the original secret key, but also on "rotated keys" obtained after the automorphisms.Namely, we are adding terms like r (i) 0 ψ 2 i (sk) to the ciphertext.Thus, if we store the values r (i) 0 , we can use them to key switch at the end, producing encryptions of −r (i) 0 ψ 2 i (sk), which can then be added to the final ciphertext so that we only keep the term that depends on sk itself.
The number of additions on R Q is basically the same in both algorithms.The number of key switchings also doesn't change.The only overhead is that now the prover has to store all the O(log S) ring elements r (i) 0 ∈ R Q produced in the first loop.That is, we reduce the number of proof layers from Θ(log S) to 2 basically for free.that for any security parameter λ, auxiliary state z ∈ {0, 1} poly(λ) , plaintext dimension l and plaintext generator M Let CS be an R1CS instance over a finite field F. The construction relies on the following building blocks: -Let Π LP CP = (Q LP CP , P LP CP , V LP CP ) be a k-query LPCP for CS.Let m be the query length of Π LP CP .
-Let Π Enc = (Setup Enc , Encrypt Enc , Decrypt Enc , Add Enc ) be a linear-only additivelyhomomorphic vector encryption scheme for F k .
The single-theorem, designated-verifier SNARK Π SN ARK = (Setup, Prover, Verifier) for RCS is defined as follows: -Setup(1 λ , CS) → (crs, st): On input the security parameter λ, run (st LP CP , Q) ← Q LP CP (CS) where Q ∈ F m×k .For each i ∈ [m], let q ⊤ i ∈ F k denote the i th row of Q.Then sample (p, sk) ← Setup Enc (1 λ , 1 k ) and compute ct i ← Encrypt Enc (sk, q ⊤ i ) for each i ∈

C Comparison with Rinocchio
In [GNS23], the authors propose another approach for verifying computations over encrypted data.Their work defines a zkSNARK for computations over rings by extending the classical Quadratic Arithmetic Programs (QAPs) to Quadratic Ring Programs (QRPs), and defining compatible ring-based encoding schemes.Since QRPs can represent arithmetic circuits over rings, they can be used to represent the homomorphic computations of modern FHE schemes.However, the maintenance operations still pose a problem since they can not be expressed in QRPs in an efficient way.As we discussed in Section 4.1, not performing the maintenance operations means that both the modulus Q and the ciphertext degree depend exponentially on the multiplicative depth d.This puts a significant overhead on the prover in practical applications.
As the authors briefly remark, it is possible to simulate the non-arithmetic operations in the QRP and in that way still incorporate maintenance operations in the proof.Again this causes additional overhead in contrast to our construction, which we will analyze here.Relinearization, for example, requires modular reduction [c] qj of an element c ∈ R Qi for j = 0, . . ., i.To prove this modular reduction, one needs to prove the modular reduction of each of its N coefficients.Proving the reduction [a] qj for a ∈ Z Q requires O(log Q) constraints, since it is only possible by first bitwise decomposing the coefficient a.Therefore, proving relinearizations alone would add O(dLN log Q) constraints to the QRP instance for each ciphertext.In our construction, we avoid having to express these reductions using R1CS constraints and therefore avoid the increased CRS size and the increased cost of proof generation.
Modulus-switching also significantly increases the amount of constraints in the QRP since it requires non-arithmetic modular reduction.Moreover, ciphertexts coefficients are defined modulo Q i−1 after this operation.The authors of Rinocchio suggest to emulate this modulus switch by multiplying by the constant (1, 1, . . ., 1, 0, . . ., 0) in RNS decomposition, for i−1 non-zero elements.Since this only emulates the reduction in the RNS decomposition of the ring, one eventually has to properly reduce (requiring bitwise decomposition) before the next round of maintenance operations.The authors do remark that modulus-switching can be avoided by using a scale-invariant FHE scheme [FV12].However, in that case the homomorphic multiplication of ciphertexts would require non-arithmetic computations even without relinearization.Both approaches, in contrast to our construction, also imply that the size of ciphertexts and therefore the amount of constraints, does not decrease linearly w.r.t. the current multiplicative depth.
One advantage of Rinocchio is that while our proof size depends on the multiplicative depth of the circuit, their proofs consist of a constant number of encoding elements.However, [GNS23] describe this encoding as a Regev-style encoding for plaintext space R Q , which is impractical since Q typically has hundreds of bits.To be fair, we mention an improvement of the encoding in Rinocchio introduced by Viand et al. [VKH23] where each of the RNS digits of a ring element is encoded separately such that the plaintext modulus of the encoding scheme corresponds to a modulus q i instead of Q.In this case the encoding size per ring element would be similar to our construction, but we would have to encode less elements because of the reasons mentioned above, as well as the blueprinting technique mentioned in Section 3.2.
To summarize: our construction is generally more efficient than Rinocchio except in proof size when multiplicative depth is large, but we have shown that in this case their CRS size and proof generation are impractically costly.

Figure 1 :
Figure 1: Homomorphic computation of (c 0+ c 1 ) • c 2 • c 3 modulo Q = q 0 • q 1 .Every gate represents a homomorphic addition or multiplication, which are composed by many operations over R Q .When we represent the circuit as two circuits with low-level operations defined modulo q 0 and q 1 , and inputs c i,j = c j mod q i , the output of the first multiplication gate is used as input of the following gates in all the circuits.

Table 2 :
[ISW21]21].The latter include sub-circuits for the NTT transformations which are most expensive in terms of the number of R1CS gates required.We used the lattice-zksnark library by Ishai et al. to implement the Setup, Prover and Verifier SNARK methods that are used in our scheme.The timings of these algorithms, aggregated over all 6 R1CS instances, are most relevant for the performance of our scheme and are shown in Table2.The total crs size over all 6 proofs is 11.6GB while the proof size is about 187kB.There are 3 133 440 gates in all R1CS instances combined.Note that crs size and setup time could be greatly reduced (possibly 6x smaller) when using the blueprinting technique discussed in Section 3.2.Performance results for SNARK methods of our construction on the computation of Equation (2) for different k 0 and k 1 = 3, k 2 = 1.