CS260: Approaches to ZKPs | Saylor Academy

This article begins with a simple description and then gradually works through ZK applications. This reading will guide you through various approaches to ZKPs.

Abstract

Zero-Knowledge proofs are a cryptographic technique to reveal knowledge of information without revealing the information itself, thus enabling systems optimally to mix privacy and transparency, and, where needed, regulatability. Application domains include health and other enterprise data, financial systems such as central-bank digital currencies, and performance enhancement in blockchain systems. The challenge of zero-knowledge proofs is that, although they are computationally easy to verify, they are computationally hard to produce. This paper examines the scalability limits of leading zero-knowledge algorithms and addresses the use of parallel architectures to meet performance demands of applications.

Introduction

The cryptograhic theory of zero-knowledge(ZK) arose in the 1980s and led to a 2012 Turing Award for Goldwasser and Micali. ZK-proofs employ the mathematics of cryptography to enable one party (P, the prover) to prove to another party (V, the verifier) the P knows some particular secret (a fact, data item, etc.) without P revealing that secret to V. Practical application of this concept grew rapidly with the rise of blockchain technology, initially for privacy-focused blockchains such as Zcash, and several others that followed. A more recent application is blockchain speedup using ZK techniques to aggregate a large number of transactions into one, a process called a ZK-rollup. A third application domain is to apply ZK and blockchain technology to large reports and databases so that applications can commit cryptographically to data that are kept secret while proving that those data have certain properties, such as compliance with regulatory constraints or being internally consistent with some other set of data to which a cryptographic commitment exists.

That third application domain opens a powerful new concept of information management in which privacy, transparency, and compliance can be achieved with minimal compromise. We elaborate on this in Section 2. These applications generate a need to produce ZK-proofs pertaining to large-scale executions over large data volumes, which presents a challenge since although ZK-proofs are relatively easy to verify, they are computationally hard to generate. This demand for scalability has been addressed in part by new algorithms, use of specialized hardware, and use of commodity parallel architectures, but significant limitations remain.

After exploring the application domain in Section 2, and providing a fuller introduction to ZK in Section 3, we overview ZK proving systems in Section 4 followed by our benchmarking results and analysis in Section 5. Section 6 discusses the use of commodity parallel architectures to accelerate ZK-proofs so that they can scale up to the level of emerging applications.

Application Domain

Traditional databases serve as an official record of enterprise information, but the degree to which they are truly "official" is based on trust of the owner/administrator of the database. Some commercial database vendors have incorporated hash-protected, immutable data in append-only tables. Here, immutability and privacy still rests in part on some degree of centralized trust. Such trust is often reasonable in enterprise settings (thus the frequent use of permissioned blockchains), but becomes problematic for public data stores (such as public blockchains), multi-enterprise data (where a trust/contract-based business relationship exists, but participants fear that future circumstances like a product recall might stress that trust to the breaking point), and global digital-currency applications where users may not trust the currency provider with details of their transaction data.

For these reasons, even blockchain-enhanced versions of traditional database platforms may require too much trust. Privacy is often of paramount concern. In business relationships, certain data may be considered proprietary (e.g., pricing in a supply chain) and must be disclosed selectively. Public-facing systems face stricter privacy challenges that go beyond user preferences to include mandates by governments. Privacy in a public setting is often viewed as a fundamental right with individuals reasonably valuing a right to privacy even if they "have nothing to hide".

Applications of ZK technology can be divided into three classes:

The first, and oldest, is private transactions within a public blockchain. Whereas all transactions on a public blockchain like Ethereum or Bitcoin are public, chains such as Zcash enable the option of privacy. In public blockchains, if person A knows person B's wallet address, then A knows B's entire financial history on-chain. ZK-based blockchains allow for secret transactions, thus enhancing privacy. Such chains have raised concerns among government regulators, but it is possible to design such private chains with ZK-based compliance tools to alleviate those concerns.
The second relates to blockchain speedup, where ZK techniques can be used to aggregate a large number of transactions into one, a process called a ZK-rollup. Aside from the single node performing aggregation, all other nodes then need only validate a single proof of that aggregation. This results in a significant performance enhancement since that one validation is less costly than re-executing full set of transactions. There are other ZK applications in blockchain infrastructure including state-proofs for light clients and cross-chain bridges, and proof-of-storage in decentralized storage services.
The third is the ability to provide proofs of general facts about secret information. Those proofs can then be verified at low computational cost by any node. Here, information can be kept secret, but desired metadata can be securely published. This is a powerful tool that can be applied in identity proofs, accounting systems, health records, regulatory compliance in financial applications, and global-scale central-bank digital currencies.

The range of real-world uses for this third class of application is virtually limitless. We give three examples here:

A simple application (to introduce the concept) is proof-ofage for bars and other age-controlled venues without a need to reveal exact date-of-birth, drivers' license number, etc.
A more complex application allows proof that corporate tax returns, government-mandated regulatory reports, etc., all come from the same set of accounting records. Thus, only the base ledger need be audited in the traditional manner, while all reports arising from that ledger can be audited via ZK and related technologies, including Merkle proofs. A similar scenario exists around health records.
Central-bank digital currencies (CBDCs) are gaining interest globally. China has begun deployment of its centrallycontrolled digital e-CNY. Open societies such as the U.S. would most certainly seek a much less centralized framework, but full decentralization would raise risks regarding regulation and control of monetary policy. Nearly all of the G20 nations are taking some action in this regard. Prototypes are being developed in industry and government-led partnerships like Project Hamilton. For digital currencies, ZK-proofs make it possible to have transactions with the same privacy as cash, while also proving compliance with regulations such as sanctions. They also serve as a means for a stablecoin issuer to provide proof of reserves and compliance with other regulatory requirements, while preserving privacy regarding the details of those reserves.

The second and third examples result in very large ZK-proofs and large volumes of medium-size ZK-proofs, respectively. The feasibility of these applications depends, respectively, upon cost-effective frameworks to generate large proofs and resource-efficient frameworks that allow high parallelism in proof generation.

With those needs in mind, we focus not on special-purpose hardware as in, but rather on high-performance commodity CPUs and GPUs. Our application-based motivation also guides our choice of algorithms, a matter we discuss in more detail in Section 4.

A Brief Introduction to Zero Knowledge

As we noted in the introduction, ZK-proofs employ the mathematics of cryptography to enable one party (P, prover) to prove to another party (V, verifier) that P knows some particular secret (a fact, data item, etc.) without P revealing that secret to V. At that point, we relied on the intuitive idea of proving "some particular secret," but indeed ZK can be used to show P knows the input (a witness) such that a given program generates a particular output. That given program must be compiled into a circuit representing its execution. A circuit is a computational structure of gates with inputs and outputs and wires connecting the output of a gate to the input of others. These circuits are then transformed into polynomials that form the basis for the proving system. The most intuitive view of a ZK proving system is one in which the verifier interacts with the prover, presenting a series of challenges that, when met by the prover, provides the verifier confidence in the prover that increases with each iteration. Interactive proofs, however, are not only slow, but also fail to provide a means for public verification. Removing the interaction adds complexity both in terms of the proof-generation and verification time (which depends on the proof size). To keep verification time constant, both the prover and verifier must maintain secrets that are not part of the published proof. Those secrets depend on an initial trusted setup of the proof system. The widely used, but older, Groth16 system requires a new setup for each circuit. The newer PlonK system requires just one setup (universal trusted setup) that can then be reused across any set of circuits. Others in that category include Marlin and Sonic.

The setup procedure is an elaborate process involving multiple independent parties selecting a secret random value that they contribute to the "ceremony" and then destroy. The setup is designed such that as long as one trusts that at least one member in the setup has securely destroyed their contributed value, then the setup can be provably trusted.

ZK-Proving Systems

There are a wide variety of ZK proving systems. ZK-proof systems vary in:

Proof size: constant for the Groth16 algorithm and the PlonK variants we are studying, but other algorithms have proof sizes that depend on the circuit (program execution code) over which the proof is being computed.
Prover-secret size: Linear in circuit size for Groth16 algorithm and the PlonK variants, but constant for some others.
Verification time: Constant for the Groth16 algorithm and the PlonK variants, but usually logarithmic in circuit size for others (though some are linear).
Degree of setup required: While some algorithms require no setup, PlonK requires a single initial trusted setup, and Groth16 requires a separate setup for each circuit.
Operation set: Groth16 operates over executions expressed as circuits with addition and multiplication operations, as does PlonK. TurboPLONK reduces the number of gates in a circuit by allowing more operation types than just addition and multiplication. The added operations could be simple as exemplified by an exclusive OR operation(XOR), or highly complex, as exemplified by operations that perform arithmetic over elliptic curves (which are used in many cryptographic hash functions). Reducing the circuit size promises faster proofs since the prover must read the entire circuit.
Supplemental data structures: UltraPLONK, like TurboPLONK, allows operations beyond addition and multiplication. Beyond just computational operations, UltraPLONK includes a table-lookup operation that operates in the style of a keyvalue store. These tables are called plookup tables.

The tradeoffs among ZK proving systems are not as clear as the above list might suggest. Trading added memory requirements for improving performance may make sense given present-day memory capacity. But for sizable proofs, memory consumption is large enough not only to exceed the total available in the system, but also to inhibit effective parallelization of proof generation. An important first step in the creation of highly scalable proof-generation algorithms is to understand these tradeoffs.

In earlier work, our research group focused on the Groth16 algorithm and its execution on a modern GPU. While GPUs offer a high degree of parallelism, memory became a barrier at only 2²⁰constraints¹. The word "only" here might seem to be a strange characterization of 2²⁰ until one realizes that each gate represents a simple arithmetic operation and modern applications easily reach higher numbers of constraints that go well beyond 2³⁰. The applications we envisioned in Section 2 will result in further complexity growth.

Groth's constant verification time is attractive, but the per-circuit setup overhead makes it less desirable than the PlonK variants. For the scale of our applications, the one initial setup for PlonK is quite acceptable. The strongest factor in our focus on PlonK is the variants that use powerful techniques to reduce the circuit size and thus offer the promise of better overall performance. But since that promise comes at the price of additional memory consumption, the intuition that PlonK should perform best is not an obvious conclusion.

Source: Tal Derei, https://dl.acm.org/doi/pdf/10.1145/3595647.3595648
This work is licensed under a Creative Commons Attribution 4.0 License.

COURSE INTRODUCTION

Course Syllabus

Unit 1: Introduction to Cryptography

Unit 1 Learning Objectives

1.1: Introduction to Encryption and Computational Security

Key Concepts of Cryptography

How Cryptography Works

Key Terms for Network Security

1.2: Block Ciphers vs. Stream Ciphers

Overview

More on Stream Ciphers

1.3: Substitution Cipher

Basic Techniques

Python Exercise

1.4: Transposition Cipher

Basic Techniques

1.5: Random Numbers

The Basics of PRNGs

Python Example

1.6: Perfect Secrecy and the One-Time Pad

Perfect Secrecy a La Shannon

The Main Idea behind the One-Time Pad

More on the One-Time Pad

Unit 1 Assessment

Unit 1 Assessment

Unit 2: Hash Functions

Unit 2 Learning Objectives

2.1: What Is a Hash Function?

Overview

One Way Functions

Hash Functions

Python Exercise

2.2: Collision Resistance

The Importance of Collision Resistance

The Vulnerabilities of MD5

2.3: Merkle-Damgard Construction

Overview

Recipes for Hash Functions

2.4: SHA (Secure Hash Algorithm)

Overview

The Security of SHA

Python Exercise

2.5: Application: Blockchain

Introduction to Blockchain

Proof of Work

Python Exercise

Unit 2 Assessment

Unit 2 Assessment

Unit 3: Symmetric Encryption

Unit 3 Learning Objectives

3.1: Introduction to Symmetric Encryption

Key Concepts

3.2: Feistel Cipher

Combining Transposition and Substitution

The Elegance of the Feistel Cipher

3.3: Data Encryption Standard (DES)

Overview

Applying the Feistel Cipher

3.4: Advanced Encryption Standard (AES)

Beyond DES

Python Implementations

3.5: Triple DES (3DES)

Introduction to 3DES

Contrasting 3DES with AES

A Performance Comparison of DES, AES, and DES

Unit 3 Assessment

Unit 3 Assessment

Unit 4: Asymmetric Encryption

Unit 4 Learning Objectives

4.1: Asymmetric Cryptosystems

Introduction to Public Key Cryptography

4.2: The RSA Cryptosystem

The RSA Cryptosystem

4.3: Integer Factorization

Prime Factorization

Factoring Techniques

Python Exercise

4.4: Congruences Modulo n

Modular Arithmetic

More Immersion into Congruences modulo n