Broadcast Encryption using Sum-Product decomposition of Boolean functions

. The problem of Broadcast Encryption (BE) consists in broadcasting an encrypted message to a large number of users or receiving devices in such a way that the emitter of the message can control which of the users can or cannot decrypt it. Since the early 1990’s, the design of BE schemes has received significant interest and many different concepts were proposed. A major breakthrough was achieved by Naor, Naor and Lotspiech (CRYPTO 2001) by partitioning cleverly the set of authorized users and associating a symmetric key to each subset. Since then, while there have been many advances in public-key based BE schemes, mostly based on bilinear maps, little was made on symmetric cryptography. In this paper, we design a new symmetric-based BE scheme, named ΣΠBE, that relies on logic optimization and consensual security assumptions. It is competitive with the work of Naor et al. and provides a different tradeoff: the bandwidth requirement is significantly lowered at the cost of an increase in the key storage.


Introduction Broadcast Encryption
The problem of Broadcast Encryption consists in broadcasting an encrypted message to a large number of users or receiving devices in such a way that the emitter of the message can control which of the users can or cannot decrypt it.A typical setting is the case of access control to a service such as pay-per-view TV, satellite Internet service, etc.In this case, one or several emitting facilities can distribute access keys through a Broadcast Encryption Scheme (BES or BE scheme) so that only the legitimate users (e.g. the ones who are still paying for the service) are able to recover keys while the revoked users (e.g.those who did not renew their subscription) are not.The advantage of a BE scheme is that the status of a user (i.e.revoked or authorized) can be modified at each new broadcast, which means that a user who stopped paying for the service can easily re-subscribe to it later on.
In the literature, Broadcast Encryption is often associated to Traitor Tracing which is a slightly different problem motivated by copyright issues in the context of digital media such as CD/DVD/Blu-Rays.In this context, no one can prevent a legitimate user (e.g. the purchaser of a DVD) to disseminate the content of the media, but a Traitor Tracing Scheme ensures that such a user cannot mask his identity and will therefore have to face the subsequent legal procedures.The present paper focuses on Broadcast Encryption and does not further explore the context of Traitor Tracing.
Since the early 1990s, the design of BE schemes has received significant interest and many different concepts were proposed.We give an overview of the literature on the subject in the next paragraph but let us first give a few criteria that are of importance in order to compare the relative performance of BE schemes.An obvious criterion is security: robust BE schemes offer full collusion resilience, meaning that no matter how many revoked users gather their credentials, they remain unable to decrypt the broadcast message.On the other hand, some BE schemes only resist to collusions of up to k revoked users and are thus called k-collusion resistant while some are probabilistic, meaning that there is a slim chance that a receiver or a collusion might still be able to decrypt the broadcast message.Another important distinction is between static schemes where credentials are initially generated once and for all by the emitter (meaning that beyond a certain number of users the emitter will have to re-run a potentially costly setup procedure in order to accommodate for new users) and dynamic schemes for which new users' credentials can be generated on the fly.Although it was not a concern at the time most of the BE schemes were invented, it is now particularly relevant to distinguish them based on their resistance to a quantum attacker, in which case we write that a BE scheme is postquantum.Finally, the other criteria are related to the performance of the BE schemes and are often subject to a trade-off: • Bandwidth consumption (and in particular its dependence on the number of revoked users) • Key storage both at emitter and receiver level • Amount of computations performed both at emitter and receiver level (the latter usually being the most critical)

State of the art on Broadcast Encryption
A naive approach to BE consists in giving each user a symmetric key also stored by the emitter and encrypt the broadcast message with the key of each authorized user.This has unrealistic bandwidth requirements when there are too many users.The opposite extreme is to affect a symmetric key to each subset of the set of users and use the corresponding key to broadcast a message to any set of authorized users.While this is bandwidth-optimal, the key-storage is exponential in the total number of users and becomes unpractical even for 100 users.The seminal approach to BE [FN93] used an approach based on combinatorics which offers only probabilistic revocation at the price of heavy bandwidth requirements.This approach was later improved using code-based constructions [KRS99] but the resulting scheme still performs poorly and does not have full collusion resilience.
A major breakthrough was achieved by Naor, Naor and Lotspiech [NNL01], introducing the Subset-Cover paradigm.The main idea is to associate symmetric keys to specific subsets of users so that any set S of authorized users can be expressed as a partition involving such subsets.This framework is highly versatile as it offers a trade-off: increasing the number of special subsets will increase the storage requirement but on average reduce the bandwidth (more sets means that less sets should be used to partition S).While such a construction is not very practical on pure combinatorial grounds, [NNL01] cleverly organizes the users as leaves of a complete binary tree in order to turn this concept into an efficient BE scheme.Two instantiations are proposed in [NNL01]: the well-named Complete Subtree (CS) where the special sets correspond to complete subtrees of the initial tree and the Subset Difference (SD) where the special sets are differences of subtrees (i.e. a complete subtree T minus a complete subtree of T).Both constructions are fully collusion resilient and rely on simple and consensual security assumptions (resilience of the underlying blockcipher and PRF).The main differences between CS and SD is that the first option requires less storage at the price of a heavier bandwidth consumption.These two schemes were customized and refined into other BE schemes in order to address a wider range of trade-offs between storage and bandwidth, such as the Layered Subset Difference [HS02].Note that the LSD scheme follows the opposite purpose to our work as it increases the bandwidth consumption in order to reduce the storage requirements at user level.The consequence is that LSD never outperforms SD in terms of bandwidth and this is why we preferably compare our work to SD rather than LSD.Another customizable variant of the SD was proposed by [BS16].In the extreme customization, it becomes equivalent to a naive solution where each possible subset of users is assigned a key.We have analyzed that it needs an important key storage per user (> 10 6 keys for 1000 users) to outperform our scheme in terms of bandwidth, although it will scale better when the number of users becomes very large (n ≥ 10 6 ).
While we only mentioned BE schemes relying on symmetric cryptography, many publickey BE schemes were also proposed in the past decades, such as [TT01, DF03, AWY20].Most of these schemes mimic classical public-key constructions such as Diffie-Hellman or RSA.Among them, a very interesting construction relies on bilinear maps [BGW05], which can be instantiated using pairings [PPSS13], yielding a very efficient BE scheme.Indeed, although it may be a bit hard on storage or on the computations to be performed by receiving devices, it is the only practical construction offering a constant overhead, meaning that the broadcast message sent by the emitter contains the necessary information (i.e. the list of revoked or authorized users and the message) plus an overhead whose size is constant (in practice it consists of two points on elliptic curves).A practical implementation is described in [DGB12].The main weakness of such schemes is that they rely on number-theoretic problems that can be solved in polynomial time using quantum computers.Finding postquantum public-key BE schemes is an active topic of research but the currently available options [Wee22, SDSP23] are not competitive compared to the CS and SD schemes.Indeed, Table 2 of [SDSP23] shows that for n the total number of users, the size of the ciphertext is proportional to n 9 while the bandwidth consumption of the SD is linear in the number of revoked users.It is harder to assess the bandwidth requirements of [Wee22] since no cryptographic parameters are provided in the paper.However, although the growth of the ciphertext is shown to be polynomial in log n, the constants hidden in the O() are likely to make the scheme competitive only for very large n.By extrapolating the parameters recommended in FrodoKEM, a KEM relying on LWE, we assess that the ciphertext would always be greater than a megabit, regardless of the number of (denied) users.In our experiments, our scheme has a ciphertext smaller by one to two orders of magnitude.

Summary of our contribution
In this paper, we present a new BE scheme that offers various advantages: just like the ones from [NNL01], it relies on simple and consensual security assumptions (resilience of a blockcipher and PRF) with lower bandwidth requirements, at the price of a significant increase in key storage and potentially heavy computations at the emitter facility.While our BE scheme can be seen as an adaptive CS where the binary tree structure is regenerated at every broadcast in order to optimize bandwidth, we emphasize that this BE scheme does not follow the philosophy of the Subset-Cover because we do not compute a partition of the set S. Indeed, we rather see S as a Boolean function and compute a ΣΠ-factorization yielding subgroups S i that cover S but we do not preclude one user from belonging to two distinct S i 's as this does not affect the security of our scheme.
Considering only postquantum BE schemes, the most bandwidth-efficient to our knowledge is the SD introduced in [NNL01] as shown in Table 1.We report implementation results showing that our BE scheme can easily accommodate for thousands of users even when the emitter consists in a single laptop, and that it quickly outperforms the SD in terms of bandwidth when the number of revoked users grow: in our test scenarios the turning point is between 1 and 2% of revoked users and for more than 5% of revoked users it already provides a 25% decrease in bandwidth compared to the SD.This makes our BE scheme particularly suitable to the setting where bandwidth is expensive or limited, such as space transmissions or communications with several layers of protection.We also emphasize that the constraints on the receiving devices are more than reasonable even for hand-held devices such as smartphones or radios: they only need to support a block cipher and PRF and the key-storage requirements are kept below 1 GB even for 2 26 (i.e.> 67 million) users when using 128-bit symmetric keys.

Broadcast encryption
A typical symmetric BE scheme needs four procedures: • Setup(n, λ, $) → k master : using a security parameter λ, some source of randomness $ and the number of users n (or an upper-bound), the emitter derives some key material k master .
• Join(u, k master ) → k u : the emitter derives and sends the key material k u of user u (0 ≤ u < n).
• Encrypt(m, A, k master ) → (h, c): using its key material k master , the emitter encrypts the message m so that only users in the set of authorized users A are able to decrypt.It outputs the ciphertext c and some header h containing decryption instructions.It can be as simple as h = A.
• Decrypt(h, c, k u ) → m: the user u uses the header h and its key k u to decrypt c to obtain m.In the event that the decryption fails because user u is not authorized, the output of the procedure is null.
Symmetric-based BE schemes generally relies on one or two block cipher modes of operation (that provides confidentiality and optionally authenticity), that we denote by (E, D) and (E payload , D payload ).The output of the Encrypt procedure can often be seen as follows: where h encodes which keys are to be used (it can be as simple as the list of revoked users or be a more evolved construction) and k e is an ephemeral key used to encrypt m.The key k e is then encrypted several times with the keys k j (1 ≤ j ≤ l) that are possessed or that can be derived by authorized users.Note that this paradigm is slightly different from the traditional one of Naor et al. [NNL01]: • the intermediate encryptions of the ephemeral key are part of the ciphertext c instead of the header h.We prefer to merge the elements that must be indistinguishable from random in c.
• the information i j (1 ≤ j ≤ l) are not necessarily indexes that refer to disjoint subsets.
It may contain more general information that helps authorized users to decrypt.In particular, an authorized user may have several ways of decrypting the ciphertext.
• we add that notion of overhead for performance comparison.It contains all c but the encrypted message.
Most BE schemes focus on generating as few encryptions of k e as possible, thus reducing the overhead.

Boolean functions
Definition 1 (Boolean Function -"don't care" values).A Boolean function in l variables is of the form f : Optionally, f may have "don't care" values, denoted by a star * (for example f (0, 1, 1) = * ).This fictive third value represents the case where we are not interested in some evaluations of f .With the notion of "don't care" values, we need to redefine the concept of support and introduce the one of strict support.
Definition 2 (Support -Strict support).Let f be a Boolean function in l variables.The support of f is the set of inputs x ∈ F l 2 such that f (x) = 1 or * .The strict support of f is the set of inputs x ∈ F l 2 such that f (x) = 1."Don't care" values do not belong to the strict support.
The cornerstone of our BE schemes relies on representing a Boolean function as a sum of products.
Definition 3 (Sum of products -Product term -ΣΠ-form).Let f be a Boolean function in l variables x 1 to x l .A product term is a conjunction of variables or negation of variables (e.g.
A sum of products is a disjunction of product terms (e.g. A ΣΠ-form of a function f refers to a sum of products representing f .Any function f has a ΣΠ-form.A naive solution would be to define a product term for each element of the support of f (for example if f (0, 1, 1) = 1, then define x1 x 2 x 3 ).The function f can clearly be expressed as the sum of these terms.For instance, taking the example of Section 4 where f (u) = 0 for u ∈ {1, 3, 5} and 1 anywhere else for u a 3-bit number, one can naively express A ΣΠ-form of f is generally not unique and we are interested in ΣΠ-forms having a small number of products in order to minimize the overhead.The Quine-McCluskey algorithm allows to solve the problem of finding the smallest sum of products, which appears to be an NP-hard problem.Any alternative algorithm would fit our needs.In particular, the ESPRESSO algorithm should be considered when dealing with a large number of variables, at the cost of finding a good but possibly non-optimal solution.
In Section 3, we describe (a close variant of) the Quine-McCluskey algorithm.A reader only interested in the BE schemes may directly go to Section 4 and take this algorithm in black-box.

Pseudorandom functions
One of our constructions requires the use of pseudorandom functions.
Definition 4 (Pseudorandom Function [KL14]).Let F : K × X → Y be an efficient keyed function.F is a pseudorandom function (PRF) if for all polynomial-time distinguisher D, there is a negligible function negl such that: for some security parameter λ, where the first probability is taken over uniform choice of k ∈ K and the second probability is taken over a random function f : X → Y.
Informally, F is a pseudorandom function if any adversary having access to an oracle F (k, •) cannot distinguish F from a truly random function f .
In the rest of the paper, we actually require a weaker notion of pseudorandom function: instead of being given access to the oracle F (k, •), the adversary is only given a few known (but not chosen) evaluations.Then, the adversary must not be able to evaluate F (k, •) on other points.This security notion is implied by the indistinguishability as described above.

The Quine-McCluskey algorithm
In this section, we detail the Quine-McCluskey algorithm, which allows to solve the problem of finding the smallest sum of products, i.e. among all the ΣΠ-forms defined in Definition 3, find one which minimizes the number of ∨ operands.We actually describe a close variant, that allows to obtain a degraded solution when the number of variables becomes too large to be solved in reasonable time.

Description
This approach tolerates that a Boolean function has "don't care" values, introduced in Definition 1.The Quine-McCluskey algorithm then solves the problem without being constrained by these "don't care" values.
First, the notions of implicant and prime implicant needs to be introduced.
Definition 5 (Implicant -Size-2 α implicant).Let f be a Boolean function in l variables and p be a product term.We say that p is an implicant of f if and only if p implies f (i.e. for all vector x ∈ F l 2 , p(x) = 1 =⇒ f (x) = 1).The implicant p is a size-2 α implicant if its support contains 2 α elements.
Property 1. Remark that since an implicant is a product term, its support necessarily contains a power of two elements.Moreover an implicant is a size-2 α implicant if and only if it is a product of l − α variables (or negation of variables).
Definition 6 (Prime implicant).Let p be a product term.We say that p is a prime implicant of the Boolean function f if: • it is an implicant of f , • there exists no other implicant p ′ ̸ = p of f such that p is an implicant of p ′ .Note that if p ′ exists, then it is a "more general" implicant of f , and consists in a strict subset of variables (or negation of variables) of p.
Let f be a Boolean function in l variables, the Quine-McCluskey algorithm is a 2-step process which consists in: 1. generating all prime implicants of the Boolean function f .2. finding the smallest subset of prime implicants that describes f .This part is similar to the set cover problem.The sum of the obtained implicants is the smallest ΣΠ-form of f .
The algorithm starts from the naive solution stated earlier: for each element of the support of f (i.e.f (x) = 1 or f (x) = * ), create the size-1 implicant that is true only for that element.An example is given in Figure 1.Using Property 1, such implicants all have l variables.
Then, iteratively, the algorithm builds size-2 α+1 implicants by combining two size-2 α implicants as follows: let p 1 and p 2 be two size-2 α implicants such that they share the same variables and differ in a single negation.They can be written as p 1 = p 3 x i and p 2 = p 3 xi for some product term p 3 .Therefore, p 3 is a size-2 α+1 implicant of f .This can be easily proven using Property 1 and the fact that p 3 = p 1 ⊕ p 2 .Such combinations are searched exhaustively.There may exist several combinations ending in the same size-2 α+1 implicant.
When implicants cannot be combined any further, the remaining implicants are prime implicants.Although it is not a minimal decomposition yet, f can already be written as the sum (i.e.disjunction) of these prime implicants.
In the second step of the algorithm, we aim at extracting the smallest subset of prime implicants.We start by constructing the prime implicant chart, as shown in Figure 2.Each row is mapped to a prime implicant generated in Step 1 and each column correspond to an element x of the strict support of f (i.e.f (x) = 1).A cell then indicates whether a prime implicant covers an element x.
From this chart, one can easily compute the corresponding Petrick's function also Petrick's function: shown in Figure 2.For each prime implicant p i , a binary variable p ′ i is generated.It takes the value 1 if p i is kept in the smallest subset, 0 otherwise.The Petrick's function consists in a conjunction of disjonctions of such binary variables.A disjunction indicates which implicants are necessary to cover a specific element of the strict support of f .Therefore, the Petrick's function is true if at least one implicant in each disjonction is kept in the smallest ΣΠ-form of f .
In order to find the smallest solution in the Petrick's function, the so-called Petrick's method introduced in [Pet56] is generally considered.However, it has an exponential complexity that makes it unsuitable for a large number of variables or, more importantly, when the size of the support of f is big, which is true in our scenario.We do not detail more this method.Instead, we follow the idea of [CFN61] and use Integer Linear Programming to solve this problem.ILP is unlikely to have a lower complexity than other approaches to solve the set cover problem, but it has the advantage of having meaningful intermediate results that are hopefully close to optimality.The Petrick's function directly leads to a linear model: • for each disjunction i∈S p ′ i (for some subset S), the linear constraint i∈S p ′ i ≥ 1 is added to the model, • since we search for the smallest subset of implicants, the objective function to minimize is the sum of all p ′ i .An example is given in Figure 3.
Finally, the smallest ΣΠ-form of f is defined by the sum of implicants p i , for which p ′ i = 1 in the minimal solution given by the ILP solver.Remark that some disjunctions may have a single variable (for example p ′ 0 and p ′ 3 in Figures 2 and 3).The corresponding implicants (p 0 and p 3 ) are necessarily in the smallest ΣΠ-form.Therefore, the Petrick's function can be simplified by setting these variables to 1 (p ′ 0 = p ′ 3 = 1), thus reducing the number of variables and disjunctions.This phenomenon is referred as "essential prime implicants" in the original Quine-McCluskey algorithm.
Optionally, once an optimal or suboptimal solution is found, "don't care" values of the Boolean function f (if any) can finally be assigned 0 or 1, depending on the solution.

Practical considerations
Computing the prime implicants may require exponential time, as there may exist O(3 l / √ l) prime implicants, see [CM78].However, when implemented properly, this part is not an issue compared to the second step for the considered parameters.In particular, the choice of the data structure has a considerable impact on the efficiency: for building size-2 α+1 implicants, we start by partitioning size-2 α implicants by the variables they possess and by the number of negations they have.As already mentioned, an implicant can only be combined with an implicant having the same variables but differing in a single negation.Thus, the partition we propose makes the search of appropriate implicants more efficient.Also remark that there may exist several combinations ending in the same higher-size implicant.Following these observations, our python code easily reaches l = 13.
For larger values, a C code will be preferred and parallelisation should be deployed.We recommend to parallelise on the subsets of the partition described above: each thread is given a subset of implicants and tries to combine them with subsets having the same variables and an extra negation.
The second step is equivalent to the set cover problem, which is itself an NP-hard problem.The complexity of our technique, based on Integer Linear Programming (including simplex and branch-and-bound/cut), is unclear.In our experiments, it behaves exponentially.We use CPLEX as ILP solver.Up to l = 9, an optimal solution is found and proved in less than a second.With l ≥ 10, unless f has some specific structure, the ILP solver fails to output an optimal solution within a few hours and using less than our 16Go memory.However, even for l = 12 (the maximal value for our experiments), CPLEX outputs a high-quality solution within few seconds.

The non-collusion-resistant BE scheme
In this section, we introduce ΣΠBE-ncr, a novel symmetric broadcast encryption scheme based on a completely new paradigm.For n users among which r are unauthorized, it requires log 2 (n) keys per user and O(r) messages.In practice, the number of messages is lower than r, except for very small values of r.However, the solution is not secure against (almost) any collusion of at least two users, but is used as a first step towards a fully secure scheme.

Description
We now describe our solution.For the sake of simplicity, let us consider that the number of users n is a power of two and let l = log 2 (n).To illustrate the different procedures, let the system have n = 8 users, among which users labelled 1, 3 and 5 are revoked.We also use two encryption schemes E and E payload that may or may not be the same.In particular, E payload may be an authenticated encryption scheme.The scheme also needs a one-way function F .

Setup
The key material k master of the emitter consists in 2l keys.For 0 ≤ i < l and 0 ≤ j ≤ 1, the key k j i is randomly generated and stored.In our example, the system contains six keys:

Join
Let u 0 ∥u 1 ∥...∥u l−1 be the binary decomposition of u.When user u needs to join the broadcast protocol, the emitter sends it k ui i for all 0 ≤ i < l.In the above example, user 4 = 0b100 receives k 1 0 , k 0 1 and k 0 2 .

Encrypt
In the encryption procedure, the emitter starts by generating an ephemeral key k e and by encrypting the message: E payload (k e , m) which is part of c. (Note that this step can and should be discarded when there is no revoked user, the message is directly encrypted using a common pre-shared key).Let A ⊂ [[0; n − 1]] be the set of authorized users for this session.Let f be the Boolean function in l variables defined by f (u) = 1 if u ∈ A, f (u) = 0 otherwise.The emitter then computes the minimal ΣΠ-form of f using our variant of the Quine-McCluskey algorithm from Section 3. It can be replaced by other algorithms, but depending on the parameters of the systems, we highly suggest to choose an approach that allows to find a good suboptimal solution when an optimal solution cannot be found using reasonable time or resources.
In our scenario, the minimal ΣΠ-form of f is: The header h consists in an encoding of f (see Section 5.3).The ciphertext c contains several encryptions of the ephemeral key k e , which needs to be encrypted for every product term of the ΣΠ-form.Let p 1 = i∈S1 u i i∈S2 ūi be the first product term (w.l.o.g.) of the ΣΠ-form for some ordered subsets S 1 and S 2 of [[0; n − 1]].For each such product term, the emitter encrypts k e as follows: In our example, the ciphertext is composed of two ciphertexts: The header h and ciphertext c are broadcast.
For large values of l, the computation of the ΣΠ-form of f soon becomes the bottleneck of this procedure.However, it only needs to be computed when A is modified.If the set A does not evolve too often (i.e. less than every minute), this part can be made offline, as it does not depend on the message.

Decrypt
While decrypting, user u first checks with the header h that f (u) = 1.In the opposite case, u is not authorized and aborts.The user then searches which product term p i of f is such that p i (u) = 1.There might be several such terms, in which case the user arbitrarily choose one of them (in practice the first convenient factor encountered).As in the encryption procedure, it derives k i with the possessed key materials and F , decrypts k e using the corresponding part of the ciphertext and lastly decrypts m.
Here are a few examples:

Analysis
The scheme is correct: an authorized user is able to decrypt the plaintext.
Correctness.By construction of f , an authorized user u satisfies f (u) = 1.The Quine-McCluskey algorithm (or a suboptimal variant) expresses f as a sum of products f = i p i .If f (u) = 1, then there exists at least one product p i such that p i (u) = 1.
Let p 1 = i∈S1 u i i∈S2 ūi be this product.The corresponding ephemeral key is encrypted with: As a consequence, it has the necessary key materials to reconstruct k 1 , then to decrypt the ephemeral key k e and finally m.
The scheme is secure as long as E, E payload and F are secure: revoked users are unable to decrypt.
Security.By construction of f , an unauthorized user v = v 0 ∥v 1 ∥...∥v l−1 satisfies f (v) = 0. Let us represent f as a sum of products f = i p i .If f (v) = 0, then all p i are such that p i (v) = 0.
Let p 1 (u) = i∈S1 u i i∈S2 ūi be one of the products of f .Since p 1 (v) = 0, there exists at least one index j such that either v j = 0 and j ∈ S 1 or v j = 1 and j ∈ S 2 .By construction k 1 is computed from k 1 i for i ∈ S 1 and from k 0 i for i ∈ S 2 .Then, user v misses at least k vj j , and thus cannot reconstruct k 1 if F is secure.The same argument holds for every product p i of the ΣΠ-form of f .
If E and E payload are secure, the unauthorized user is unable to retrieve either k e or m, which ends the proof.
However, any collusion of users u and v that differ in at least two positions u i ̸ = v i and u j ̸ = v j for distinct indexes i and j, is able to decrypt transmissions that neither u nor v is able to decrypt alone.In particular, let u and v be two users such that u i ̸ = v i for all 0 ≤ i < l.Following the join procedure, one can easily see that they possess the whole key materials of the system and thus are able to decrypt any broadcast.

The collusion-resistant BE scheme
In this section, we modify the ΣΠBE-ncr scheme to prevent any collusion of unauthorized users from decrypting, at the cost of n keys being stored by each user.The encryption size remains unchanged.
The one-way function F is now replaced by a pseudorandom function, as defined in Section 2.3, still referred to as F .
We now describe the four new procedures, taking as example n = 8.

Setup
The key material of the emitter now consists in a single key k PRF , that is randomly generated and stored.As in the previous scheme, for 0 ≤ i < l and 0 ≤ j ≤ 1, values k j i are generated and stored.However, unlike the ΣΠBE-ncr scheme, they need not be random or secret, as long as it is guaranteed that they are all distinct and that any concatenation is unambiguous.It can be as simple as k j i = i∥j encoded on a fixed number of bits.In order to be consistent with the previous scheme, we keep the notation k j i , but it should be considered as a label and not as a key.
In our example, the system contains a key k PRF and six labels

Join
When user u = u 0 ∥u 1 ∥...∥u l−1 needs to join the broadcast protocol, it suffices that the emitter sends it the key For simplicity, we suggest to sort the elements of S in increasing order.There are 2 l = n of them.Note that these keys have a one-to-one correspondence with the product terms p such that p(u) = 1.
In the above example, user 4 = 0b100 receives:

Encrypt
Let A ⊂ [[0; n − 1]] be the set of authorized users for this session.Let f be the Boolean function in l variables defined by f (u) = 1 if u ∈ A, f (u) = 0 otherwise.The beginning of the encryption procedure is identical to the ΣΠBE-ncr scheme, the emitter generates an ephemeral key k e and encrypts the message: E payload (k e , m).The emitter computes the minimal ΣΠ-form of f using the Quine-McCluskey algorithm of Section 3.
In our scenario, suppose the minimal ΣΠ-form of f is: The header h consists in an encoding of f , as in the ΣΠBE-ncr scheme.The ciphertext c contains several encryptions of the ephemeral key k e , which needs to be encrypted for every product term of the ΣΠ-form.Let p 1 = i∈S1 u i i∈S2 ūi be the first1 product term (w.l.o.g.) of the ΣΠ-form for some subsets S 1 and S 2 of [[0; n − 1]].For each such product term, the emitter encrypts k e as follows: The set S 1 ∪ S 2 must be ordered to fit the ordering of the join procedure.As aforementioned, we recommend to sort the elements of S 1 ∪ S 2 in increasing order.In our example, the header is composed of f and two ciphertexts: The header h and ciphertext c are broadcast.A method for encoding f is detailed in Section 5.3.An alternative encoding is possible as long as it allows to easily build a mapping between the product terms p i of f and the encrypted keys k i .
As in the ΣΠBE-ncr scheme, the computation of the ΣΠ-form of f can be made offline, as it only depends on the set of authorized users.

Decrypt
While decrypting, user u first checks that f (u) = 1.In the opposite case, u is not authorized and aborts.The user then searches which product term p i of f is such that p i (u) = 1.There might be several such terms, in which case the user arbitrarily chooses one of them.As we mention in the Join procedure, such a product term corresponds to one key k i received by the user and defined as: Note that the computation of k i is done by the emitter and the receiver is only injected with the list of keys described in the join procedure, contrary to the ΣΠBE-ncr scheme where the user could freely derive k i .This more complicated design is precisely what makes our scheme collusion resistant: even knowing the k i j and the value of u, two colluding users are not able to compute more keys than they already have because they do not know the master key k P RF .Finally, thanks to k i , the user decrypts k e using the corresponding part c i of the ciphertext and subsequently decrypts m.
Here are a few examples: None cannot decrypt

Analysis
The scheme is correct: an authorized user is able to decrypt the plaintext.
Correctness.The correctness can actually be reduced to the one of the ΣΠBE-ncr scheme.Instead of being derived by the user, all possible derivations of the key/label materials k ui i (0 ≤ i < l) are now made by the emitter then sent to the user u.
The scheme is collusion-resistant: any collusion of revoked users is unable to decrypt.

Collusion-resistance. Consider a set of authorized users
As usual, f is expressed in a ΣΠ-form: Assuming that E payload and E are secure encryption schemes, the collusion R is able to decrypt if only if it manages to obtain a key k i corresponding to a product term p i .Recall that they are of the form: Although they possibly know all labels k 0 j and k 1 j , the key k PRF is known only to the emitter.If the collusion is able to decrypt, then: • either k i is known by at least one user v ∈ R. Following the join and encrypt procedures, it implies that f (v) = 1, meaning that v ∈ A. This contradicts the initial definition of R.
• or the collusion is able to compute k i from the other evaluations of the PRF F it possesses from the join procedure.If so, the collusion can be used to build an adversary that distinguishes F from a random function (see Section 2.3).
The security of the scheme against a single revoked user is implied by the collusionresistance.

The case of "don't care" users
The encryption procedures of the ΣΠBE-ncr and ΣΠBE schemes first define a function f such that f (u) = 1 if and only if user u is authorized.Until now, it does not exploit the possibility of setting "don't care" values2 in the Boolean function f .We see several applications of these values.
If the number of users n of the system is not a power of two, let n ′ > n be the closest greater power of two.We recommend to instantiate the schemes with n ′ users and we talk about "don't care" users u for all n ≤ u < n ′ .In the encryption procedures, we suggest to define f (u) = * for all n ≤ u < n ′ .The Quine-McCluskey algorithm then affect either 0 or 1 to these "don't care" users, such that the ΣΠ-form of f is minimal.
As another application, in the context where it is not needed to deliver a message to a user u but it is also not needed to protect this message from u (for instance a pay-TV subscriber who already has his credentials), u can be defined as "don't care" user.Therefore, the set of authorized users A of the encryption procedures must be separated in two sets : • A M for mandatory users.Let f (u) = 1 for all u ∈ A M .
• A DC for "don't care" users.Let f (u) = * for all u ∈ A DC .
An unauthorized user u, which is not in A M ∪ A DC , remains unchanged: f (u) = 0.As a consequence, a "don't care" user u might be able to decrypt, which does not affect the security against unauthorized users.
Similarly, "don't care" users may also be used as spare slots for future needs: new users can be dynamically added using these spare slots.However, in that case, forward secrecy is not guaranteed: a newly added user u might be able to decrypt ciphertexts generated before u joined the system.

An encoding for sums of products
In this section, we propose a method for encoding the ΣΠ-form of a Boolean Function f in a compact way.As usual, let n be the number of users and 2 l the closest greater or equal power of two.
The encoding of f requires an encoding of each product term.A product term contains at most l variables, numbered from 0 to l − 1, which can be negated.For a given product term p, we denote by S 0 the set of negated variables and by S 1 the set of non-negated variables.S 0 and S 1 are (disjoint) subsets of [[0; l − 1]] of size at most l.The product term is written as: We propose to encode S 0 (resp.S 1 ) as an l-bit string where the i-th bit is set to 1 if and only if i ∈ S 0 (resp.i ∈ S 1 ).The term p is then encoded as the 2l-bit string defined by the concatenation of the encoding of S 0 and S 1 .For example with l = 4: Since f is a disjunction of product terms, it now suffices to concatenate the number of product terms and the encoding of each one of them.A (very large) upper bound on this number is n, the number of users.Indeed, as mentioned in Section 3, a naive ΣΠ-form of f involves all elements of the support of f .Therefore the number of product terms is represented by an l-bit integer.As example, taking n = 16 and l = 4: If f = 1 (i.e.all users are authorized), then there are no product term and the f is encoded as an l-bit zero.
An even more compact way is to instead represent f by using an l-digit number in base 3, whose j-th digit would determine each of the three possibilities: either u j does not appear, or u j appears, or ūj appears in f .Such a number is represented on at most l log 3 (2) + 1 bits, which represents a gain of about 20% compared to using 2l bits as above.

Comparison with other schemes 6.1 Comparison with Complete Subtree
In this section, we show that our BE scheme systematically outperforms the Complete Subtree in terms of bandwidth, i.e. that no matter the status of the users, the number of encryptions needed by our BE scheme is no greater than the one needed for CS.To do so, we recall the principle of the Complete Subtree: the n users are numbered from 0 to n and arranged as leaves of a complete binary tree T .Any set S of authorized users is then represented as a disjoint union of sets S i , each of them being the set of leaves of a subtree of T of height h.
In the case where all users are authorized, the CS uses the key corresponding to the root and our BE scheme uses a key common to all users, so they both require one encryption.Let us now assume that at least one user is revoked, meaning that the sets S i involved in the CS comes from strict subtrees of T .We now explain how every S i can be viewed as a product term of f .For instance, if S i corresponds to the leaves descending from the left (resp right) child of the root, then the set S i is exactly the set of leaves whose labels are < n/2 (resp ≥ n/2), it is therefore described by the product term ūh (resp u h ).Inductively, if the set S i corresponds to the leaves of a subtree rooted at a node of depth d, then it will be described by a product term involving either u i or ūi for d ≤ i ≤ h, the occurrence of one or the other option being determined by the critical path from the root of the subtree to the root of T .
This means that every subset cover used in the CS can be represented as a ΣΠ-form but since our BE scheme uses a ΣΠ-form whose number of terms is minimal, it necessarily involves no more encryptions than the CS.Based on the results that were proven for the CS in [NNL01], we get the following theorem as a corollary.
Theorem 1.For a total of n users among which r users have been revoked, the bandwidth required by the ΣΠBE scheme is in O(r log(n/r)).
In the next section, we present experimental results showing that the performance of our BE scheme is on average much better than this bound.We also remark that there is a worst case in which this bound is actually reached.

A practical comparison with Subset Difference
We compare our ΣΠBE scheme to the Subset Difference (SD) method proposed by Naor et al. [NNL01].Let us consider a system of n users and r ≤ n revoked users.The SD scheme requires each user to store 1 2 log 2 (n) 2 keys.In comparison, our solution is heavier, since n keys are stored per user.
On the contrary, the encryption cost and bandwidth consumption, expressed in number of encryptions of the ephemeral key, behaves better in our scheme.It has been proved in [BS13] that 1.25r encryptions are needed in the average case and 2r in the worst case.However, this study is asymptotic and the scheme actually performs better in practice (e.g.≈ 1.15r for n = 2048).In comparison, we cannot give an average case, as it relies on the average solution size of the NP-hard set cover problem, and even so, our approach described in Section 3 outputs a suboptimal solution for large n.Our comparison is therefore empirical, we choose n = 2048 (resp.n = 4096) and r ∈ [[0; 230]] (resp.r ∈ [[0; 520]]).For each value of r, 20 tests are run with a set of r revoked users drawn pseudo-randomly.Since the Quine-McCluskey algorithm generally does not end in reasonable time for such parameters, we limit the ILP part to 30 seconds of computation on our laptop (i5-1135G7 with 16GB RAM).Depending on the context, it may be acceptable to run it for longer, resulting in a better solution and saving even more bandwidth.We We observe that our solution outperforms the SD scheme except for very small values of r, which are unlikely to be the bottleneck anyway.For a fixed r, both schemes show a low variance in the encryption cost.
Concerning the computation time of the encryption process, our ΣΠBE scheme is obviously much heavier.It is exclusively due to the Quine-McCluskey algorithm.If the set of revoked users does not evolve too often (i.e. less than every minute), we stress that this part can be made offline, as it does not depend on the message.We believe most practical scenarios would fit in that description and recall that in the setting of BE, the power requirements are less stringent for the emitter than for the receivers.We also observed that very good solutions are found within the few first seconds of the ILP part.Our experiments suggest that running the ILP solver for 30 seconds (resp. 2 minutes and 20 minutes) improves the solution found after 5 seconds by less than 0.5% (resp.1.7% and 4.2%) on average.Therefore, only 5 seconds of computation would have given roughly the same results as shown in Figure 4.
Beside the Quine-McCluskey part, our approach behaves slightly better, since no key derivation is necessary compared to the SD scheme.Similarly, the decryption process of ΣΠBE requires a few less computations thanks to the absence of key derivation.

Further comments on the comparison with SD
Contrary to the result proven for CS in Section 6.1 it is not true that our BE scheme systematically outperforms the Subset Difference.A typical example is the case where one single user is denied.With the SD framework, only one encryption is needed using the key associated to the whole tree minus the leaf corresponding to the revoked user.For our BE scheme, λ = log 2 (n) encryptions are required.For instance is the revoked user is identified by the number 0, we have f (u) = λ i=1 u i (u is authorized iff it has a 1 in its binary decomposition).This decomposition is already optimal because it is impossible to ensure that u ̸ = 0 without looking at all its bits.
On the other hand, however, there exist cases in which our BE scheme performs significantly better than the SD, in particular when a large number of users have been revoked.For instance, if one revokes all the users whose corresponding number is even then the SD will need exactly n/2 encryptions using, for every authorized leaf, the key corresponding to the difference of the subtree rooted at its parent minus the subtree rooted at its sibling.Using our BE scheme, such a denial is straightforward because the corresponding f (u) is simply the value of the LSB of u, i.e.only one encryption is needed.
Following this reasoning, one gets the intuition that SD will perform well when denied users are clustered and poorly when they are interspersed among the authorized users, which should be a relatively frequent situation when users are randomly denied as in our experiments.Our BE scheme, on the other hand, is more adapted to handle this situation as it can easily cluster the denied users using a "comb" which groups users based on their remainders modulo a power of 2. We believe that this is the explanation behind our experimental results, but we reckon that, as we mentioned before, it is extremely challenging to provide a quantitative analysis of our BE scheme's behaviour.
An important point that may come in mind is that, in practice, users are absolutely not revoked at random but based on features that can be used to arrange the tree used within the SD.For instance, if users are put in the tree based on their subscription date, their corresponding leaves will be close to one another and they will be roughly revoked at the same time, when the subscription expires.While this is a perfectly valid argument, we claim that it applies identically to our BE scheme ("close" users have consecutive numbers, so their most significant bits are identical) and that our BE scheme should actually benefit even more from this fact.
Indeed, in the case where several meaningful features need to be encoded, one could simply assign the first half of the identifier to one feature (e.g.geographic location, type of subscription) while the other half would just be a customer number related to the time of subscription.This would make it extremely convenient for our BE scheme to perform effective revocations based on these features or a combination thereof (for instance if there are various options for subscription duration, users will not be revoked based on their date of subscription but on a combination of both subscription date and subscription duration).

Future works
As briefly mentioned in Section 2.2, the ESPRESSO algorithm [BHMS84] is an interesting alternative to the Quine-McCluskey algorithm when dealing with a large number of users.
As for the Quine-McCluskey, it aims at representing a Boolean function as a small sum of products.However, the prime implicants are built following some customizable heuristic and is likely to output a suboptimal solution.We did not feel confident about using such a complex tool in a black box fashion without having some intuition about how it works and why it would be close or far from optimality.This question would need further investigation, but we point out that both the security and correctness of our BE scheme do not depend on the method used for minimizing the number of terms of the ΣΠ-form of f , leaving room for numerous options including the pre-computation and storage of some ΣΠ-forms in advance when revocations can be anticipated (e.g.expiration of subscriptions).

Table 1 :
Performance comparisons for n users of which r are revoked