The Elegance of the Feistel Cipher
Site: | Saylor Academy |
Course: | CS260: Introduction to Cryptography and Network Security |
Book: | The Elegance of the Feistel Cipher |
Printed by: | Guest user |
Date: | Monday, 28 April 2025, 5:55 AM |
Description

Pseudorandom Functions & Block Ciphers
Read this section to see how the Feistel cipher works with pseudorandom functions. One important thing to note is that no matter what trick we use inside a round, we can always undo it. And when you are trying to decode a message, the steps taken are like flipping a mirror image of what was done to encode it. This makes it easy to build simple hardware systems to do this coding and decoding.
Pseudorandom Functions & Block Ciphers Imagine if Alice & Bob had an infinite amount of shared randomness - not just a short key. They could split it up into λ-bit chunks and use each one as a one-time pad whenever they want to send an encrypted message of length λ.
Alice could encrypt by saying, "hey Bob, this message is encrypted with one-time pad using chunk #674696273 as key". Bob could decrypt by looking up location #674696273 in his copy of the shared randomness. As long as Alice doesn't repeat a key/chunk, an eavesdropper (who doesn't have the shared randomness) would learn nothing about the encrypted messages. Although Alice announces (publicly) which location/chunk was used as each one-time pad key, that information doesn't help the attacker know the value at that location.
It is silly to imagine an infinite amount of shared randomness. However, an expo- nential amount of something is often just as good as an infinite amount. A shared table containing "only" 2λ one-time pad keys would be quite useful for encrypting as many messages as you could ever need.
A pseudorandom function (PRF) is a tool that allows Alice & Bob to achieve the effect of such an exponentially large table of shared randomness in practice. In this chapter we will explore PRFs and their properties. In a later chapter, after introducing new security definitions for encryption, we will see that PRFs can be used to securely encrypt many messages under the same key, following the main idea illustrated above.
Source: Mike Rosulek, https://joyofcryptography.com/pdf/chap6.pdf This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License.
Definition
Continuing our example, imagine a huge table of shared data stored as an array , so the
th item is referenced as
. Instead of thinking of
as an integer, we can also think of
as a binary string. If the array has
items, then
will be an in-bit string. If the array contains strings of length "out", then the notation
is like a function that takes an input from
and gives an output from
.
A pseudorandom function emulates the functionality of a huge array. It is a function that takes an input from
and gives an output from
. However,
also takes an additional argument called the seed, which acts as a kind of secret key.
The goal of a pseudorandom function is to "look like" a uniformly chosen array / lookup table. Such an array can be accessed through the LOOKUP subroutine of the following library:
As you can see, this library initially fills up the array with uniformly random data, and then allows the calling program to access any position in the array.
A pseudorandom function should produce indistinguishable behavior, when it is used with a uniformly chosen seed. More formally, the following library should be indistinguishable from the one above:
Note that the first library samples out bits uniformly at random (out bits for each of
entries in the table), while the second library samples only
bits (the same
is used for all invocations of
). Still, we are asking for the two libraries to be indistinguishable.
This is basically the definition of a PRF, with one technical caveat. We want to allow situations like in , but in those cases the first library runs in exponential time. It is generally convenient to build our security definitions with libraries that run in polynomial time. We fix this by taking advantage of the fact that, no matter how big the table
is meant to be, a polynomial-time calling program will only access a polynomial amount of it. In some sense it is "overkill" to actually populate the entire table
upfront. Instead, we can populate
in a lazy / on-demand way.
initially starts uninitialized, and its values are only assigned as the calling program requests them. This changes when each
is sampled (if at all), but does not change how it is sampled (i.e., uniformly & independently). This also changes
from being a typical array to being an associative array ("hash table" or "dictionary" data structure), since it only maps a subset of
to values in
.
Definition 6.1 (PRF security)
Let be a deterministic function. We say that
is a secure pseudorandom function (PRF) if
, where:
Discussion, Pitfalls
The name "pseudorandom function" comes from the perspective of viewing not as an (associative) array, but as a function
. There are
functions for
(an incredibly large number), and
chooses a "random function" by uniformly sampling its truth table as needed.
For each possible seed , the residual function
is also a function from
. There are "only"
possible functions of this kind (one for each choice of
), and
chooses one of these functions randomly. In both cases, the libraries give the calling program input/output access to the function that was chosen. You can think of this in terms of the picture from Section 5.1, but instead of strings, the objects are functions.
How NOT to Build a PRF
Example
Suppose we have a length-doubling and try to use it to construct a PRF F as follows:
You might notice that all we have done is rename the encryption algorithm of "pseudo-OTP" (Construction 5.2). We have previously argued that this algorithm is a secure method for onetime encryption, and that the resulting ciphertexts are pseudorandom. Is this enough for a secure PRF? No, we can attack the security of this PRF.
Attacking means designing distinguisher that behaves as differently as possible in the presence of the two
libraries. We want to show that
is insecure even if
is an excellent PRG. We should not try to base our attack on distinguishing outputs of
from random. Instead, we must try to break the inappropriate way that
is used to construct a PRF.
The distinguisher must use the interface of the libraries
.e., make some calls to the LOOKUP subroutine and output 0 or 1 based on the answers it gets. The LOOKUP subroutine takes an argument, so the distinguisher has to choose which arguments to use.
One observation we can make is that if a calling program sees only one value of the form , it will look pseudorandom. This is essentially what we showed in Section 5.3. So we should be looking for a calling program that makes more than one call to LOOKUP.
If we make two calls to LOOKUP - say, on inputs and
the responses from
will be
and
. To be a secure PRF, these responses must look independent and uniform. Do they? They actually have a pattern that the calling program can notice: their XOR is always
, a value that is already known to the calling program.
We can condense all of our observations into the following distinguisher:
Let's compute its advantage in distinguishing from
by considering
's behavior when linked to these two libraries:
When is linked to
, the library will choose a key
. Then
is set to
and
is set to
. So
is always equal to
, and
always outputs 1. That is,
When is linked to
, the responses of the two calls to LOOKUP will be chosen uniformly and independently because LOOKUP is being called on distinct inputs. Consider the moment in time when the second call to LOOKUP is about to happen. At that point,
, and
have all been determined, while
is about to be chosen uniformly by the library. Using the properties of XOR, we see that
will output 1 if and only if
is chosen to be exactly the value
. This happens only with probability
. That is,
The advantage of is therefore
which is certainly non-negligible since it doesn't even approach 0. This shows that
is not a secure PRF.
At a more philosophical level, we wanted to identify exactly how is being used in an inappropriate way. The PRG security libraries guarantee security when
's seed is chosen freshly for each call to
. This construction of
violates that rule and allows the same seed to be used twice in different calls to
, where the results are supposed to look independent.
PRFs vs PRGs; Variable-Hybrid Proofs
In this section we show that a PRG can be used to construct a PRF, and vice-versa. The construction of a PRG from PRF is practical, and is one of the more common ways to obtain a PRG in practice. The construction of a PRF from PRG is more of theoretical interest and does not reflect how PRFs are designed in practice.
Constructing a PRG from a PRF
As promised, a PRF can be used to construct a PRG. The construction is quite natural. For simplicity, suppose we have a PRF i.e., in
out
. We can build a length-doubling PRG in the following way:
Construction 6.2 (Counter PRG)
There is nothing particularly special about the inputs and
to
. All that matters is that they are distinct. The construction can be extended to easily give more than 2 blocks of output, by treating the input to
as a simple counter (hence the name of this construction).
The guarantee of a PRF is that when its seed is chosen uniformly and it is invoked on distinct inputs, its outputs look independently uniform. In particular, its output on inputs and
are indistinguishable from uniform. Hence, concatenating them gives a string which is indistinguishable from a uniform
-bit string.
That really is all there is to the security of this construction, but unfortunately there is a slight technical issue which makes the security proof more complicated than you might guess. We will have to introduce a new technique of variable hybrids to cope with it.
Claim 6.3
If is a secure PRF, then the counter PRG construction
above is a secure PRG.
Proof
In order to prove that is a secure PRG, we must prove that the following libraries are indistinguishable:
During the proof, we are allowed to use the fact that is a secure PRF. That is, we can use the fact that the following two libraries are indistinguishable:
The inconvenience in the proof stems from a mismatch of the variable in
the
variable in
In
is local to the QUERY subroutine. Over the course of an execution,
will take on many values. Since
is used as the PRF seed, we must write the calls to
in terms of the LOOKUP subroutine of
. But in
the PRF seed is fixed for the entire execution. In other words, we can only use
to deal with a single PRF seed at a time, but
deals with many PRG seeds at a time.
To address this, we will have to apply the security of (i.e., replace
with
) many times during the proof - in fact, once for every call to QUERY made by the calling program. Previous security proofs had a fixed number of hybrid steps (e.g., the proof of Claim
used 7 hybrid libraries to show
). This proof will have a variable number of hybrids that depends on the calling program. Specifically, we will prove
where is the number of times the calling program calls QUERY.
Don't be overwhelmed by all these hybrids. They all follow a simple pattern. In fact, the th hybrid looks like this:
In other words, the hybrid libraries all differ in the value that is inserted into the code above. If you're familiar with C compilers, think of this as adding "#define i 427 " to the top of the code above, to obtain
First note what happens for extreme choices of :
- In
, the if-branch is never taken (count
is never true). This library behaves exactly like
by giving PRG outputs on every call to QUERY.
- If
is the total number of times that the calling program calls QUERY, then in
, the if-branch is always taken (count
is always true). This library behaves exactly like
by giving truly uniform output on every call to QUERY.
In general, will respond to the first
calls to QUERY by giving truly random output. It will respond to all further calls by giving outputs of our PRG construction.
We have argued that and
. To complete the proof, we must show that
for all
. The main reason for going to all this trouble of defining so many hybrid libraries is that
and
are completely identical except in how they respond to the
th call to QUERY. This difference involves a single call to the PRG (and hence a single PRF seed), which allows us to apply the security of the PRF.
In more detail, let be arbitrary, and consider the following sequence of steps starting with
:
We have taken
and simply expanded the else-branch
(count
) into two subcases
count
and count
). However, both cases lead to the same block of code (apart
from a change to a local variable's name), so the change has no effect
on the calling program.
We have factored out the calls
to that use seed
(corresponding to the count
case) in terms of
change no
effect on the calling program.
From the fact that is a
secure PFR, we can replace
with
, and the overall change is
indistinguishable.
Since count happens only once, only two calls to LOOKUP will be made across the entire lifetime of the library, and they are on distinct inputs. Therefore, the if-branch in LOOKUP will always be taken, and
is never needed (it is only needed to "remember" values and give the same answer when the same
is used twice as argument to LOOKUP). Simplifying the library therefore has no effect on the calling program:
Inlining the subroutine has no effect on the calling program. The resulting library responds with uniformly random output to the first calls to QUERY, and responds with outputs of our PRG
to the others. Hence, this library has identical behavior to
We showed that , and therefore:
This shows that
, so
is a secure
A Theoretical Construction of a PRF from a PRG
We have already seen that it is possible to feed the output of a PRG back into the PRG again, to extend its stretch (Claim 5.7). This is done by making a long chain (like a linked list) of PRGs. The trick to constructing a PRF from a PRG is to chain PRGs together in a binary tree. The leaves of the tree correspond to final outputs of the PRF. If we want a PRF with an exponentially large domain (e.g., in , the binary tree itself is exponentially large! However, it is still possible to compute any individual leaf efficiently by simply traversing the tree from root to leaf. This tree traversal itself is the PRF algorithm. This construction of a PRF is due to Goldreich, Goldwasser, and Micali, in the paper that defined the concept of a PRF.
Imagine a complete binary tree of height in (in will be the input length of the PRF). Every node in this tree has a position which can be written as a binary string. Think of a node's position as the directions to get there starting at the root, where a 0 means "go left" and 1 means "go right." For example, the root has position (the empty string), the right child of the root has position 1 , etc.
The PRF construction works by assigning a label to every node in the tree, using the a length-doubling PRG . For convenience, we will write
and
to denote the first
bits and last
bits of
, respectively. Labels in the tree are
-bit strings, computed according to the following two rules:
- The root node's label is the PRF seed.
- If the node at position
has label
, then its left child (at position
) gets label
, and its right child (at position
) gets label
.
In the picture above, a node's label is the string being sent on its incoming edge. The tree has leaves, whose positions are the strings
. We define
to be the label of node/leaf
. To compute this label, we can traverse the tree from root to leaf, taking left and right turns at each node according to the bits of
and computing the labels along that path according to the labeling rule. In the picture above, the highlighted path corresponds to the computation of
.
It is important to remember that the binary tree is a useful conceptual tool, but it is exponentially large in general. Running the PRF on some input does not involve computing labels for the entire tree, only along a single path from root to leaf.
Construction 6.4

Claim 6.5
If is a secure PRG, then Construction
is a secure PRF.
Proof
We prove the claim using a sequence of hybrids. The number of hybrids in this case depends on the input-length parameter in. The hybrids are defined as follows:
The hybrids differ only in their hard-coded value of . We will show that
We first start by understanding the behavior of for extreme choices of
. Simplifications to the code are shown on the right.
In , we always have
, so the only entry of
that is accessed is
. Then renaming
to
, we see that
. The only difference is when the PRF seed
is sampled: eagerly at initialization time in
vs. at the last possible minute in
.
In
and the body of the forloop is always unreachable. In that case, it is easy to see that
has identical behavior to
The general pattern is that "chops off" the top
levels of the conceptual binary tree. When computing the output for some string
, we don't start traversing the tree from the root but rather
levels down the tree, at the node whose position is the
-bit prefix of
(called
in the library). We initialize the label of this node as a uniform value (unless it has already been defined), and then continue the traversal to the leaf
.
To finish the proof, we show that and
are indistinguishable:
The library that is shown here is different from in the highlighted parts. However, these dif ferences have no effect on the calling program. The library here advances
levels down the tree (to the node at location
), initializes that node's label as a uniform value, then computes the labels for both its children, and finally continues computing labels toward the leaf. The only significant difference from
is that it computes the labels of both of
's children, even though only one is on the path to
. Since it computes the label correctly, though, it makes no difference when (or if) this extra label is computed.
We have factored out the body of the if-statement in terms of since it involves an call to
on uniform input. Importantly, the seed to
(called
in the previous hybrid) was not used anywhere else - it was a string of length
while the library only reads
for
of length
. The change has no effect on the calling program.
We have applied the security of and replaced
. The change is indistinguishable. Hence,
is a secure PRF.
We have inlined and split the sampling of
bits into two separate statements sampling
bits each. In this library, we advance
levels down the tree, assign a uniform label to a node (and its sibling), and then proceed to the leaf applying
as usual. The only difference between this library and
is that we sample the label of a node that is not on our direct path. But since we sample it uniformly, it doesn't matter when (or if) that extra value is sampled. Hence, this library has identical behavior to
.
Block Ciphers (Pseudorandom Permutations)
After fixing the seed of a PRF, it computes a function from to
. Let's consider the case where in
out. Some functions from
to
are invertible, which leads to the question of whether a PRF might realize such a function and be invertible (with knowledge of the seed). In other words, what if it were possible to determine
when given
and
? While this would be a convenient property, it is not guaranteed by the PRF security definition, even in the case of in
out. A function from
to
chosen at random is unlikely to have an inverse, therefore a PRF instantiated with a random key is unlikely to have an inverse.
A pseudorandom permutation (PRP) - also called a block cipher - is essentially a PRF that is guaranteed to be invertible for every choice of seed. We use both terms (PRP and block cipher) interchangeably. The term "permutation" refers to the fact that, for every , the function
should be a permutation of
. Instead of requiring a PRP to be indistinguishable from a randomly chosen function, we require it to be indistinguishable from a randomly chosen invertible function.
This means we must modify one of the libraries from the PRF definition. Instead of populating the associative array
with uniformly random values, it chooses uniformly random but distinct values. As long as
gives distinct outputs on distinct inputs, it is consistent with some invertible function. The library guarantees distinctness by only sampling values that it has not previously assigned. Thinking of an associative array
as a key-value store, we use the notation
.values to denote the set of values stored in
.
Definition 6.6 (PRP syntax)
Let be a deterministic function. We refer to blen as the blocklength of
and any element of
as a block.
We call a secure pseudorandom permutation (PRP) (block cipher) if the following two conditions hold:
1. (Invertible given ) There is a function
satisfying
for all
and all
". values" refers to the set
.