CS260 Study Guide
Unit 2: Hash Functions
2a. Define the meaning of a one-way function
- What is a one-way function?
- How are one-way functions useful for constructing hash functions?
- What is an inverse image?
- What is preimage resistance?
A function over some domain takes input values where, for every input value x, there is one and only one output value y. The inverse image of a value
is the set of values x that can be recovered from y. Therefore, every output value y of a function always has an inverse image associated with it. A one-way function is a function where it is relatively easy to compute
but computationally infeasible to compute the inverse image. Hash functions are deterministic one-way functions that convert variable length messages to fixed length codes. Such constructions are useful for designing hash functions because the goal is to create a fixed-length code from a variable-length message where the code:
- is easy to compute and
- cannot be used to determine the original message.
The one-way property of hash functions is often also referred to as the preimage resistance property.
To review, see:
2b. Explain collision resistance
- What is collision resistance?
- How does the pigeonhole principle apply to hash functions?
- What is the difference between collision resistance and second preimage resistance?
- What is the avalanche effect?
The goal of a hash function is to take variable-length messages and map them into fixed-length codes. In practical applications, most messages will be much longer than the fixed-length code. The pigeonhole principle automatically implies that two different messages will inevitably be mapped into the same code (that is, a "collision" will occur). Collision resistance is where a hash function must be resistant to finding such collisions.
Second preimage resistance is defined as follows: Given a preimage x and its hash value , it should be computationally infeasible to find a second preimage y such that
where yx. Second preimage resistance is a weaker form of collision resistance. A stronger statement is to say that it is computationally infeasible to find any pair (x,y) where
, which is the definition of collision resistance. The avalanche effect, also an important concept for hash functions, is where a small change in the message results in a large change in the hash value (this property is sometimes referred to as "near-collision resistance").
To review, see:
2c. Explain how hash functions are applied in practice
- What is a blockchain?
- How are hash functions applied in practice?
- What is "proof of work"?
A blockchain is a digital ledger that keeps track of transactions by successively adding consecutive records in the form of a "chain". Hash functions are central to the construction of a blockchain because a hash of a previous transaction record (a "block") is integrated into the next transaction record. Using this construction, if any block in the blockchain is altered, all subsequent blocks would have to be altered for subsequent hashes to encompass the alteration. However, since many copies of the blockchain exist, this implies that at least 51 percent of all blockchain copies would also have to be altered to give the impression that the modified transaction was valid. The security of a blockchain lies in the computational infeasibility of modifying the hashes and then applying those modifications to 51 percent of all blockchains. The first generation of blockchains applied proof of work to determine the hash of the current transaction combined with the hash of the previous block. As should be clear from the material in this unit, the Secure Hash Algorithm (SHA) is a standard family of hash functions applied in many applications. For example, in the Bitcoin blockchain, a miner is rewarded bitcoins for being the first to find the SHA-256 hash containing a nonce at the beginning of the hash. Finding a valid hash containing a nonce is known as proof of work. Make sure you understand the blockchain concept and how hash functions are applied in this context.
Another important application of hash functions you must recall is user authentication, where a user's information is used to authenticate the user (for example, a password). It is safer for a system to store the hash of a password rather than the password itself. Given the computational difficulty of inverting a hash, recovering a password from the stored hash should be difficult. When a user attempts a login, the hash of the input password can be compared against the stored hash for authentication.
To review, see:
2d. Describe the Merkle-Damgard construction
- How can good hash functions be constructed?
- What is a compression function?
- What are some features of the Merkle-Damgard construction?
Hash functions are one-way functions, so they must be easily computed. At the same time, they must be collision-resistant. One approach to achieving these goals is by iterating a simple compression function a specified number of times. A compression function is a function that outputs a smaller fixed-length hash from a larger input sequence.
The Merkle-Damgard construction is one where a compression function is applied iteratively by creating a hash value from the concatenation of an input block with the hash value of the previous internal unit. Since the original input message is broken up into blocks of fixed length, the final input block is not guaranteed to have the correct length. Therefore, the input message may need to be padded with zeros to achieve the specified length. You should make sure to understand the Merkle-Damgard construction and how the compression converts larger sequences into shorter fixed-length sequences.
To review, see:
2e. Compose programs to implement Secure Hash Algorithm (SHA)
- How can Python be used to implement hash functions?
- What hash functions are included in hashlib?
- What is the Python syntax for computing SHA hashes?
Python modules exist that are useful for cryptographic applications. The hashlib
module is of particular interest as it can be used to implement a variety of Secure Hash Algorithms (SHA), such as SHA256, SHA384, and SHA512. The syntax for setting up hash computation such as SHA256 looks like:
from hashlib import sha256
It is important to understand how to use the method calls within hashlib to compute a given hash. Additionally, it is important to understand how to write code to exhibit the properties of hash functions. For instance, you should be able to write code that demonstrates the avalanche effect or write code that would test for a collision. In this way, you can use the theory of how hash functions work to create practical implementations.
To review, see:
Unit 2 Vocabulary
This vocabulary list includes terms you will need to know to successfully complete the final exam.
- avalanche effect
- blockchain
- collision resistance
- compression function
- hash function
- inverse image
- Merkle-Damgard construction
- one-way function
- preimage resistance
- proof of work
- second preimage resistance
- Secure Hash Algorithm (SHA)
- user authentication