The Elegance of the Feistel Cipher

Site: Saylor Academy
Course: CS260: Introduction to Cryptography and Network Security
Book: The Elegance of the Feistel Cipher
Printed by: Guest user
Date: Monday, 28 April 2025, 5:55 AM

Description

Pseudorandom Functions & Block Ciphers

Pseudorandom Functions & Block Ciphers Imagine if Alice & Bob had an infinite amount of shared randomness - not just a short key. They could split it up into λ-bit chunks and use each one as a one-time pad whenever they want to send an encrypted message of length λ. 

Alice could encrypt by saying, "hey Bob, this message is encrypted with one-time pad using chunk #674696273 as key". Bob could decrypt by looking up location #674696273 in his copy of the shared randomness. As long as Alice doesn't repeat a key/chunk, an eavesdropper (who doesn't have the shared randomness) would learn nothing about the encrypted messages. Although Alice announces (publicly) which location/chunk was used as each one-time pad key, that information doesn't help the attacker know the value at that location.

Pseudorandom Functions & Block Ciphers

It is silly to imagine an infinite amount of shared randomness. However, an expo- nential amount of something is often just as good as an infinite amount. A shared table containing "only" 2λ one-time pad keys would be quite useful for encrypting as many messages as you could ever need. 

A pseudorandom function (PRF) is a tool that allows Alice & Bob to achieve the effect of such an exponentially large table of shared randomness in practice. In this chapter we will explore PRFs and their properties. In a later chapter, after introducing new security definitions for encryption, we will see that PRFs can be used to securely encrypt many messages under the same key, following the main idea illustrated above.


Source: Mike Rosulek, https://joyofcryptography.com/pdf/chap6.pdf
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License.

Definition

Continuing our example, imagine a huge table of shared data stored as an array T, so the i th item is referenced as T[i]. Instead of thinking of i as an integer, we can also think of i as a binary string. If the array has 2^{i n} items, then i will be an in-bit string. If the array contains strings of length "out", then the notation T[i] is like a function that takes an input from \{0,1\}^{\text {in }} and gives an output from \{0,1\}^{\text {out }}.

A pseudorandom function emulates the functionality of a huge array. It is a function F that takes an input from \{0,1\}^{i n} and gives an output from \{0,1\}^{\text {out }}. However, F also takes an additional argument called the seed, which acts as a kind of secret key.

The goal of a pseudorandom function is to "look like" a uniformly chosen array / lookup table. Such an array can be accessed through the LOOKUP subroutine of the following library:

The goal of a pseudorandom function is to "look like" a uniformly chosen array lookup table. Such an array can be accessed

As you can see, this library initially fills up the array T with uniformly random data, and then allows the calling program to access any position in the array.

A pseudorandom function should produce indistinguishable behavior, when it is used with a uniformly chosen seed. More formally, the following library should be indistinguishable from the one above:

A pseudorandom function should produce indistinguishable behavior, when it is used with a uniformly chosen seed. More formall

Note that the first library samples out \cdot 2^{\text {in }} bits uniformly at random (out bits for each of 2^{i n} entries in the table), while the second library samples only \lambda bits (the same k is used for all invocations of F ). Still, we are asking for the two libraries to be indistinguishable.

This is basically the definition of a PRF, with one technical caveat. We want to allow situations like in \geqslant \lambda, but in those cases the first library runs in exponential time. It is generally convenient to build our security definitions with libraries that run in polynomial time. We fix this by taking advantage of the fact that, no matter how big the table T is meant to be, a polynomial-time calling program will only access a polynomial amount of it. In some sense it is "overkill" to actually populate the entire table T upfront. Instead, we can populate T in a lazy / on-demand way. T initially starts uninitialized, and its values are only assigned as the calling program requests them. This changes when each T[x] is sampled (if at all), but does not change how it is sampled (i.e., uniformly & independently). This also changes T from being a typical array to being an associative array ("hash table" or "dictionary" data structure), since it only maps a subset of \{0,1\}^{\text {in }} to values in \{0,1\}^{\text {out }}.


Definition 6.1 (PRF security)

Let F:\{0,1\}^{\lambda} \times\{0,1\}^{i n} \rightarrow\{0,1\}^{\text {out }} be a deterministic function. We say that F is a secure pseudorandom function (PRF) if \mathcal{L}_{\mathrm{prf}-\mathrm{real}}^{F} \approx \mathcal{L}_{\mathrm{prf} \text {-rand }}^{F}, where:

Definition 6.1 (PRF security)


Discussion, Pitfalls

The name "pseudorandom function" comes from the perspective of viewing T not as an (associative) array, but as a function T:\{0,1\}^{\text {in }} \rightarrow\{0,1\}^{\text {out }}. There are 2^{\text {out } \cdot 2^{\text {in }} \text { possible }} functions for T (an incredibly large number), and \mathcal{L}_{\text {prf-rand }} chooses a "random function" by uniformly sampling its truth table as needed.

For each possible seed k, the residual function F(k, \cdot) is also a function from \{0,1\}^{i n} \rightarrow \{0,1\}^{\text {out }}. There are "only" 2^{\lambda} possible functions of this kind (one for each choice of k ), and \mathcal{L}_{\text {prf-real }} chooses one of these functions randomly. In both cases, the libraries give the calling program input/output access to the function that was chosen. You can think of this in terms of the picture from Section 5.1, but instead of strings, the objects are functions.

Note that even in the case of a "random function" \left(\mathcal{L}_{\text {prf-rand }}\right), the function T itself is still deterministic! To be precise, this library chooses a deterministic function, uniformly, from the set of all possible deterministic functions. But once it makes this choice, the input/output behavior of T is fixed. If the calling program calls LOOKUP twice with the same x, it receives the same result. The same is true in \mathcal{L}_{\text {prf-real }}, since F is a deterministic function and k is fixed throughout the entire execution. To avoid this very natural confusion, it is perhaps better to think in terms of "randomly initialized lookup tables" rather than "random functions".


How NOT to Build a PRF

We can appreciate the challenges involved in building a PRF by looking at a natural approach that doesn't quite work.

Example

Suppose we have a length-doubling P R G G:\{0,1\}^{\lambda} \rightarrow\{0,1\}^{2 \lambda} and try to use it to construct a PRF F as follows:

Suppose we have a length-doubling PRGG:{0,1}λ→{0,1}2λ and try to use it to construct a PRF F as follows:

You might notice that all we have done is rename the encryption algorithm of "pseudo-OTP" (Construction 5.2). We have previously argued that this algorithm is a secure method for onetime encryption, and that the resulting ciphertexts are pseudorandom. Is this enough for a secure PRF? No, we can attack the security of this PRF.

Attacking F means designing distinguisher that behaves as differently as possible in the presence of the two \mathcal{L}_{\mathrm{prf}-\star}^{F} libraries. We want to show that F is insecure even if G is an excellent PRG. We should not try to base our attack on distinguishing outputs of G from random. Instead, we must try to break the inappropriate way that G is used to construct a PRF.

The distinguisher must use the interface of the \mathcal{L}_{\mathrm{prf}-\star} libraries -\mathrm{i}.e., make some calls to the LOOKUP subroutine and output 0 or 1 based on the answers it gets. The LOOKUP subroutine takes an argument, so the distinguisher has to choose which arguments to use.

One observation we can make is that if a calling program sees only one value of the form G(k) \oplus x, it will look pseudorandom. This is essentially what we showed in Section 5.3. So we should be looking for a calling program that makes more than one call to LOOKUP.

If we make two calls to LOOKUP - say, on inputs x_{1} and x_{2}- the responses from \mathcal{L}_{\text {prf-real }} will be G(k) \oplus x_{1} and G(k) \oplus x_{2}. To be a secure PRF, these responses must look independent and uniform. Do they? They actually have a pattern that the calling program can notice: their XOR is always x_{1} \oplus x_{2}, a value that is already known to the calling program.

We can condense all of our observations into the following distinguisher:

We can condense all of our observations into the following distinguisher:

Let's compute its advantage in distinguishing \mathcal{L}_{\mathrm{prf}-\mathrm{real}}^{F} from \mathcal{L}_{\mathrm{prf}-\mathrm{rand}}^{F} by considering \mathcal{A} 's behavior when linked to these two libraries:

Let’s compute its advantage in distinguishing LFprf−real from LFprf−rand by considering A ’s behavior when linked to these tw

When \mathcal{A} is linked to \mathcal{L}_{\mathrm{prf}-\mathrm{real}}^{F}, the library will choose a key k. Then z_{1} is set to G(k) \oplus x_{1} and z_{2} is set to G(k) \oplus x_{2}. So z_{1} \oplus z_{2} is always equal to x_{1} \oplus x_{2}, and \mathcal{A} always outputs 1. That is,

\operatorname{Pr}\left[\mathcal{A} \diamond \mathcal{L}_{\text {prf-real }}^{F} \Rightarrow 1\right]=1

Pr[A⋄LFprf-real ⇒1]=1

When \mathcal{A} is linked to \mathcal{L}_{\text {prf-rand }}^{F}, the responses of the two calls to LOOKUP will be chosen uniformly and independently because LOOKUP is being called on distinct inputs. Consider the moment in time when the second call to LOOKUP is about to happen. At that point, x_{1}, x_{2}, and z_{1} have all been determined, while z_{2} is about to be chosen uniformly by the library. Using the properties of XOR, we see that \mathcal{A} will output 1 if and only if z_{2} is chosen to be exactly the value x_{1} \oplus x_{2} \oplus z_{1}. This happens only with probability 1 / 2^{2 \lambda}. That is,

\operatorname{Pr}\left[\mathcal{A} \diamond \mathcal{L}_{\text {prf-rand }}^{F} \Rightarrow 1\right]=1 / 2^{2 \lambda}

The advantage of \mathcal{A} is therefore 1-1 / 2^{2 \lambda} which is certainly non-negligible since it doesn't even approach 0. This shows that F is not a secure PRF.

At a more philosophical level, we wanted to identify exactly how G is being used in an inappropriate way. The PRG security libraries guarantee security when G 's seed is chosen freshly for each call to G. This construction of F violates that rule and allows the same seed to be used twice in different calls to G, where the results are supposed to look independent.

This example shows the challenge of building a PRF. Even though we know how to make any individual output pseudorandom, it is difficult to make all outputs collectively appear independent, when in reality they are derived from a single short seed.

PRFs vs PRGs; Variable-Hybrid Proofs

In this section we show that a PRG can be used to construct a PRF, and vice-versa. The construction of a PRG from PRF is practical, and is one of the more common ways to obtain a PRG in practice. The construction of a PRF from PRG is more of theoretical interest and does not reflect how PRFs are designed in practice.


Constructing a PRG from a PRF

As promised, a PRF can be used to construct a PRG. The construction is quite natural. For simplicity, suppose we have a PRF F:\{0,1\}^{\lambda} \times\{0,1\}^{\lambda} \rightarrow\{0,1\}^{\lambda}( i.e., in = out =\lambda). We can build a length-doubling PRG in the following way:


Construction 6.2 (Counter PRG)

Construction 6.2 (Counter PRG)

There is nothing particularly special about the inputs 0 \cdots 00 and 0 \cdots 01 to F. All that matters is that they are distinct. The construction can be extended to easily give more than 2 blocks of output, by treating the input to F as a simple counter (hence the name of this construction).

The guarantee of a PRF is that when its seed is chosen uniformly and it is invoked on distinct inputs, its outputs look independently uniform. In particular, its output on inputs 0 \cdots 00 and 0 \cdots 01 are indistinguishable from uniform. Hence, concatenating them gives a string which is indistinguishable from a uniform 2 \lambda-bit string.

That really is all there is to the security of this construction, but unfortunately there is a slight technical issue which makes the security proof more complicated than you might guess. We will have to introduce a new technique of variable hybrids to cope with it.


Claim 6.3

If F is a secure PRF, then the counter PRG construction G above is a secure PRG.

Proof

In order to prove that G is a secure PRG, we must prove that the following libraries are indistinguishable:

Proof

During the proof, we are allowed to use the fact that F is a secure PRF. That is, we can use the fact that the following two libraries are indistinguishable:

During the proof, we are allowed to use the fact that F is a secure PRF. That is, we can use the fact that the following

The inconvenience in the proof stems from a mismatch of the s variable in \mathcal{L}_{\text {prg-real and }} the k variable in \mathcal{L}_{\text {prf-real. }} In \mathcal{L}_{\text {prg-real, }} s is local to the QUERY subroutine. Over the course of an execution, s will take on many values. Since s is used as the PRF seed, we must write the calls to F in terms of the LOOKUP subroutine of \mathcal{L}_{\text {prf-real. }}. But in \mathcal{L}_{\text {prf-real }} the PRF seed is fixed for the entire execution. In other words, we can only use \mathcal{L}_{\text {prf-real }} to deal with a single PRF seed at a time, but \mathcal{L}_{\text {prg-real }} deals with many PRG seeds at a time.

To address this, we will have to apply the security of F (i.e., replace \mathcal{L}_{\text {prf-real }} with \mathcal{L}_{\text {prf-rand }} ) many times during the proof - in fact, once for every call to QUERY made by the calling program. Previous security proofs had a fixed number of hybrid steps (e.g., the proof of Claim 5.5 used 7 hybrid libraries to show \mathcal{L}_{\text {prg-real }} \approx \mathcal{L}_{\text {hyb-1 }} \approx \cdots \approx \mathcal{L}_{\text {hyb-7 }} \approx \mathcal{L}_{\text {prg-rand }} ). This proof will have a variable number of hybrids that depends on the calling program. Specifically, we will prove

\mathcal{L}_{\text {prg-real }}^{G} \approx \mathcal{L}_{\text {hyb-1 }} \approx \ldots \approx \mathcal{L}_{\text {hyb- } q} \approx \mathcal{L}_{\text {prg-rand }}^{G}

where q is the number of times the calling program calls QUERY.

Don't be overwhelmed by all these hybrids. They all follow a simple pattern. In fact, the ith hybrid looks like this:

Don't be overwhelmed by all these hybrids. They all follow a simple pattern. In fact, the i th hybrid looks like this:

In other words, the hybrid libraries all differ in the value i that is inserted into the code above. If you're familiar with C compilers, think of this as adding "#define i 427 " to the top of the code above, to obtain \mathcal{L}_{\text {hyb-427 }}

First note what happens for extreme choices of i :

  • In \mathcal{L}_{\text {hyb- } 0}, the if-branch is never taken (count \leqslant 0 is never true). This library behaves exactly like \mathcal{L}_{\mathrm{prg} \text {-real }}^{G} by giving PRG outputs on every call to QUERY.
  • If q is the total number of times that the calling program calls QUERY, then in \mathcal{L}_{\text {hyb- } q \text {, }}, the if-branch is always taken (count \leqslant q is always true). This library behaves exactly like \mathcal{L}_{\text {prg-rand }}^{G} by giving truly uniform output on every call to QUERY.

In general, \mathcal{L}_{\mathrm{hyb}-i} will respond to the first i calls to QUERY by giving truly random output. It will respond to all further calls by giving outputs of our PRG construction.

We have argued that \mathcal{L}_{\text {prg-real }}^{G} \equiv \mathcal{L}_{\text {hyb-0 }} and \mathcal{L}_{\text {prg-rand }}^{G} \equiv \mathcal{L}_{\text {hyb-q }}. To complete the proof, we must show that \mathcal{L}_{\mathrm{hyb}-(i-1)} \approx \mathcal{L}_{\mathrm{hyb}-i} for all i. The main reason for going to all this trouble of defining so many hybrid libraries is that \mathcal{L}_{\text {hyb- }(i-1)} and \mathcal{L}_{\text {hyb- } i} are completely identical except in how they respond to the i th call to QUERY. This difference involves a single call to the PRG (and hence a single PRF seed), which allows us to apply the security of the PRF.

In more detail, let i be arbitrary, and consider the following sequence of steps starting with \mathcal{L}_{\text {hyb-( }(i-1)}:

We have taken \mathcal{L}_{\text {hyb- }(i-1)} and simply expanded the else-branch (count \geqslant i ) into two subcases ( count =i and count >i ). However, both cases lead to the same block of code (apart from a change to a local variable's name), so the change has no effect on the calling program.

In more detail, let i be arbitrary, and consider the following sequence of steps starting with Lhyb-( (i−1):

We have factored out the calls to F that use seed s^{*} (corresponding to the count =i case) in terms of \mathcal{L}_{\text {prf-real. This }} change no effect on the calling program.

We have factored out the calls to F that use seed s∗ (corresponding to the count =i case) in terms of Lprf-real. This  change

From the fact that F is a secure PFR, we can replace \mathcal{L}^{F}_{\text {prf-real}} with \mathcal{L}^{F}_{\text {prf-rand}}, and the overall change is indistinguishable.

From the fact that F is a secure PFR, we can replace LFprf-real with LFprf-rand, and the overall change is indistinguishable.

Since count =i happens only once, only two calls to LOOKUP will be made across the entire lifetime of the library, and they are on distinct inputs. Therefore, the if-branch in LOOKUP will always be taken, and T is never needed (it is only needed to "remember" values and give the same answer when the same x is used twice as argument to LOOKUP). Simplifying the library therefore has no effect on the calling program:

Since count =i happens only once, only two calls to LOOKUP will be made across the entire lifetime of the library, and they a

Inlining the subroutine has no effect on the calling program. The resulting library responds with uniformly random output to the first i calls to QUERY, and responds with outputs of our PRG G to the others. Hence, this library has identical behavior to \mathcal{L}_{\text {hyb-i }}

Inlining the subroutine has no effect on the calling program. The resulting library responds with uniformly random output to

We showed that \mathcal{L}_{\mathrm{hyb}-(i-1)} \approx \mathcal{L}_{\mathrm{hyb}-i}, and therefore: \mathcal{L}_{\text {prg-real }}^{G} \equiv \mathcal{L}_{\text {hyb-0 }} \approx \mathcal{L}_{\text {hyb-1 }} \approx \ldots \approx \mathcal{L}_{\text {hyb-q }} \equiv \mathcal{L}_{\text {prg-rand }}^{G} This shows that \mathcal{L}_{\mathrm{prg}-\mathrm{real}}^{G} \approx \mathcal{L}_{\mathrm{prg} \text {-rand }}^{G}, so G is a secure \mathrm{PRG}


\star A Theoretical Construction of a PRF from a PRG

We have already seen that it is possible to feed the output of a PRG back into the PRG again, to extend its stretch (Claim 5.7). This is done by making a long chain (like a linked list) of PRGs. The trick to constructing a PRF from a PRG is to chain PRGs together in a binary tree. The leaves of the tree correspond to final outputs of the PRF. If we want a PRF with an exponentially large domain (e.g., in =\lambda), the binary tree itself is exponentially large! However, it is still possible to compute any individual leaf efficiently by simply traversing the tree from root to leaf. This tree traversal itself is the PRF algorithm. This construction of a PRF is due to Goldreich, Goldwasser, and Micali, in the paper that defined the concept of a PRF.

A Theoretical Construction of a PRF from a PRG

Imagine a complete binary tree of height in (in will be the input length of the PRF). Every node in this tree has a position which can be written as a binary string. Think of a node's position as the directions to get there starting at the root, where a 0 means "go left" and 1 means "go right." For example, the root has position \epsilon (the empty string), the right child of the root has position 1 , etc.

The PRF construction works by assigning a label to every node in the tree, using the a length-doubling PRG G:\{0,1\}^{\lambda} \rightarrow\{0,1\}^{2 \lambda}. For convenience, we will write G_{L}(k) and G_{R}(k) to denote the first \lambda bits and last \lambda bits of G(k), respectively. Labels in the tree are \lambda-bit strings, computed according to the following two rules:

  1. The root node's label is the PRF seed.
  2. If the node at position p has label v, then its left child (at position p \| \theta ) gets label G_{L}(v), and its right child (at position p \| 1 ) gets label G_{R}(v).

In the picture above, a node's label is the string being sent on its incoming edge. The tree has 2^{\text {in }} leaves, whose positions are the strings \{0,1\}^{i n}. We define F(k, x) to be the label of node/leaf x. To compute this label, we can traverse the tree from root to leaf, taking left and right turns at each node according to the bits of x and computing the labels along that path according to the labeling rule. In the picture above, the highlighted path corresponds to the computation of F(k, 1001 \cdots).

It is important to remember that the binary tree is a useful conceptual tool, but it is exponentially large in general. Running the PRF on some input does not involve computing labels for the entire tree, only along a single path from root to leaf.


Construction 6.4
Construction 6.4

Claim 6.5

If G is a secure PRG, then Construction 6.4 is a secure PRF.

Proof

We prove the claim using a sequence of hybrids. The number of hybrids in this case depends on the input-length parameter in. The hybrids are defined as follows:

Proof

The hybrids differ only in their hard-coded value of d. We will show that

\mathcal{L}_{\text {prf-real }}^{F} \equiv \mathcal{L}_{\text {hyb-0 }} \approx \mathcal{L}_{\text {hyb-1 }} \approx \cdots \approx \mathcal{L}_{\text {hyb-in }} \equiv \mathcal{L}_{\text {prf-rand }}^{F}

We first start by understanding the behavior of \mathcal{L}_{\text {hyb- } d} for extreme choices of d. Simplifications to the code are shown on the right.

In \mathcal{L}_{\text {hyb-0 }}, we always have p=\epsilon, so the only entry of T that is accessed is T[\epsilon]. Then renaming T[\epsilon] to k, we see that \mathcal{L}_{\text {hyb- } 0} \equiv \mathcal{L}_{\text {prf-real }}^{F}. The only difference is when the PRF seed k(T[\epsilon]) is sampled: eagerly at initialization time in \mathcal{L}_{\text {prf-real }}^{F} vs. at the last possible minute in \mathcal{L}_{\text {hyb- } 0}.

In Lhyb-0 , we always have p=ϵ, so the only entry of T that is accessed is T[ϵ]. Then renaming T[ϵ] to k, we see that Lhyb- 0

In \mathcal{L}_{\text {hyb-in, we always have } p=} x and the body of the forloop is always unreachable. In that case, it is easy to see that \mathcal{L}_{\text {hyb-in }} has identical behavior to \mathcal{L}_{\text {prf-rand }}^{F}

In Lhyb-in, we always have p= x and the body of the forloop is always unreachable. In that case, it is easy to see that Lhyb-

The general pattern is that \mathcal{L}_{\text {hyb- } d} "chops off" the top d levels of the conceptual binary tree. When computing the output for some string x, we don't start traversing the tree from the root but rather d levels down the tree, at the node whose position is the d-bit prefix of x (called p in the library). We initialize the label of this node as a uniform value (unless it has already been defined), and then continue the traversal to the leaf x.

To finish the proof, we show that \mathcal{L}_{\text {hyb- }(d-1)} and \mathcal{L}_{\text {hyb- } d} are indistinguishable:

The library that is shown here is different from \mathcal{L}_{\text {hyb- }(d-1)} in the highlighted parts. However, these dif ferences have no effect on the calling program. The library here advances d-1 levels down the tree (to the node at location p ), initializes that node's label as a uniform value, then computes the labels for both its children, and finally continues computing labels toward the leaf. The only significant difference from \left.\mathcal{L}_{\mathrm{hyb}-(} d-1\right) is that it computes the labels of both of p 's children, even though only one is on the path to x. Since it computes the label correctly, though, it makes no difference when (or if) this extra label is computed.

The library that is shown here is different from Lhyb- (d−1) in the highlighted parts. However, these dif ferences have no ef

We have factored out the body of the if-statement in terms of \mathcal{L}_{\mathrm{prg}-\mathrm{real}}^{G} since it involves an call to G on uniform input. Importantly, the seed to G (called T[p] in the previous hybrid) was not used anywhere else - it was a string of length d-1 while the library only reads T\left[p^{\prime}\right] for p^{\prime} of length d. The change has no effect on the calling program.

We have factored out the body of the if-statement in terms of LGprg−real since it involves an call to G on uniform input. Imp

We have applied the security of G and replaced \mathcal{L}_{\text {prg-real with }} \mathcal{L}_{\text {prg-rand}}. The change is indistinguishable. Hence, F is a secure PRF.

We have applied the security of G and replaced Lprg-real with  Lprg-rand. The change is indistinguishable. Hence, F is a secu

We have inlined \mathcal{L}_{\text {prg-rand}} and split the sampling of 2\lambda bits into two separate statements sampling \lambda bits each. In this library, we advance d levels down the tree, assign a uniform label to a node (and its sibling), and then proceed to the leaf applying G as usual. The only difference between this library and \mathcal{L}_{\text {hyb-d}} is that we sample the label of a node that is not on our direct path. But since we sample it uniformly, it doesn't matter when (or if) that extra value is sampled. Hence, this library has identical behavior to \mathcal{L}_{\text {hyb-d}}.

We have inlined Lprg-rand and split the sampling of 2λ bits into two separate statements sampling λ bits each. In this librar

We showed that \mathcal{L}_{\text {hyb-}(d-1)}\approx\mathcal{L}_{\text {hyb-}d}. Putting everything together, we have:

\mathcal{L}_{\text {prf-real }}^{F} \equiv \mathcal{L}_{\text {hyb-0 }} \approx \mathcal{L}_{\text {hyb-1 }} \approx \cdots \approx \mathcal{L}_{\text {hyb-in }} \equiv \mathcal{L}_{\text {prf-rand }}^{F}

Hence, F is a secure PRF.

Block Ciphers (Pseudorandom Permutations)

After fixing the seed of a PRF, it computes a function from \{0,1\}^{\text {in }} to \{0,1\}^{\text {out }}. Let's consider the case where in = out. Some functions from \{0,1\}^{\text {in }} to \{0,1\}^{\text {out }} are invertible, which leads to the question of whether a PRF might realize such a function and be invertible (with knowledge of the seed). In other words, what if it were possible to determine x when given k and F(k, x) ? While this would be a convenient property, it is not guaranteed by the PRF security definition, even in the case of in = out. A function from \{0,1\}^{\text {in }} to \{0,1\}^{\text {out }} chosen at random is unlikely to have an inverse, therefore a PRF instantiated with a random key is unlikely to have an inverse.

A pseudorandom permutation (PRP) - also called a block cipher - is essentially a PRF that is guaranteed to be invertible for every choice of seed. We use both terms (PRP and block cipher) interchangeably. The term "permutation" refers to the fact that, for every k, the function F(k, \cdot) should be a permutation of \{0,1\}^{i n}. Instead of requiring a PRP to be indistinguishable from a randomly chosen function, we require it to be indistinguishable from a randomly chosen invertible function. { }^{2} This means we must modify one of the libraries from the PRF definition. Instead of populating the associative array T with uniformly random values, it chooses uniformly random but distinct values. As long as T gives distinct outputs on distinct inputs, it is consistent with some invertible function. The library guarantees distinctness by only sampling values that it has not previously assigned. Thinking of an associative array T as a key-value store, we use the notation T.values to denote the set of values stored in T.


Definition 6.6 (PRP syntax)

Let F:\{0,1\}^{\lambda} \times\{0,1\}^{\text {blen }} \rightarrow\{0,1\}^{\text {blen }} be a deterministic function. We refer to blen as the blocklength of F and any element of \{0,1\}^{\text {blen }} as a block.

We call F a secure pseudorandom permutation (PRP) (block cipher) if the following two conditions hold:

1. (Invertible given k ) There is a function F^{-1}:\{0,1\}^{\lambda} \times\{0,1\}^{\text {blen }} \rightarrow\{0,1\}^{\text {blen }} satisfying F^{-1}(k, F(k, x))=x, for all k \in\{0,1\}^{\lambda} and all x \in\{0,1\}^{\text {blen }}

2. (Security) \mathcal{L}_{\text {prp-real }}^{F} \approx \mathcal{L}_{\text {prp-rand }}^{F}, where:

(Security) LFprp-real ≈LFprp-rand , where:

"T. values" refers to the set \left \{ v\,\mid \,\exists x:T\left [ x \right ] =v \right \}.

The changes from the PRF definition are highlighted in yellow. In particular, the \mathcal{L}_{\text {prp-real and }} \mathcal{L}_{\text {prf-real }} libraries are identical.


Discussion, Pitfalls

In the definition, both the functions F and F^{-1} take the seed k as input. Therefore, only someone with k can invert the block cipher. Think back to the definition of a PRF without the seed k, it is hard to compute F(k, x). A block cipher has a forward and reverse direction, and computing either of them is hard without k!