Use this section to extend your understanding of hash function design. This section specifically addresses how to take a useful compression function and iteratively apply it in a way that can result in a collision-resistant hash.
Building a hash function, especially one that accepts inputs of arbitrary length, seems like a challenging task. In this section, we'll see one approach for constructing hash functions, called the Merkle-Damgård construction.
Instead of a full-fledged hash function, imagine that we had a collision-resistant function whose inputs were of a single fixed length, but longer than its outputs. In other words,
, where
. We call such an h a compression function. This is
not compression in the usual sense of the word - we are not concerned about recovering the input from the output. We call it a compression function because it "compresses" its
input by t bits (analogous to how a pseudorandom generator "stretches" its input by some
amount).
The following construction is one way to build a full-fledged hash function (supporting inputs of arbitrary length) out of such a compression function:
Construction 11.2 (Merkle-Damgård)
Let be a compression function. Then the Merkle-Damgård transformation of
is
, where:
The idea of the Merkle-Damgård construction is to split the input x into blocks of size
. The end of the string is filled out with 0s if necessary. A final block called the "padding
block" is added, which encodes the (original) length of
in binary.
Example
Suppose we have a compression function , so that
. We build
a Merkle-Damgård hash function out of this compression function and wish to compute the
hash of the following 5-byte (40-bit) string:
x = 01100011 11001101 01000011 10010111 01010000
We must first pad x appropriately (MDPAD(X)):
- Since x is not a multiple of
bits, we need to add 8 bits to make it so.
- Since
, we need to add an extra 16-bit block that encodes the number 40 in binary (101000).
After this padding, and splitting the result into blocks of length 16, we have the following:
The final hash of x is computed as follows:
We are presenting a simplified version, in which accepts inputs whose maxi-
mum length is
bits (the length of the input must fit into t bits). By using multiple
padding blocks (when necessary) and a suitable encoding of the original string length, the
construction can be made to accommodate inputs of arbitrary length (see the exercises).
The value y0 is called the initialization vector (IV), and it is a hard-coded part of the algorithm.
As discussed above, we will not be making provable security claims using the library- style definitions. However, we can justify the Merkle-Damgård construction with the following claim:
Claim 11.3
Suppose is a compression function and
is the Merkle-Damgård construction applied
to
. Given a collision
in
, it is easy to find a collision in
. In other words, if it is
hard to find a collision in
, then it must also be hard to find a collision in
.
Proof
Suppose that are a collision under
. Define the values
and
as in the computation of
. Similarly, define
and
as in the computation of
. Note that, in general,
may not equal
.
Recall that:
Since we are assuming , we have
. We consider two cases:
Case 1: If , then the padding blocks
and
which encode |x| and |x'| are
not equal. Hence we have
, so
and
are a collision
under h and we are done.
Case 2: If , then x and x' are broken into the same number of blocks, so
.
Let us work backwards from the final step in the computations of
and
.
We know that:
=
If and
are not equal, then they are a collision under h and we are done.
Otherwise, we can apply the same logic again to
and
, which are equal by our as-
sumption.
More generally, if , then either
and
are a collision under h (and
we say we are "lucky"), or else
(and we say we are "unlucky"). We start with the premise that
. Can we ever get "unlucky" every time, and not encounter a collision
when propagating this logic back through the computations of
and
? The
answer is no, because encountering the unlucky case every time would imply that
for all
. That is,
. But this contradicts our original assumption that
. Hence
we must encounter some "lucky" case and therefore a collision in
.
Source: Mike Rosulek, https://joyofcryptography.com/pdf/chap11.pdf This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License.