Data Encoding

Bitcoin and Bitcoin interfaces use not only unique encoding methods, but also some fairly esoteric ones. Unless you've been working with cryptography, you probably are not familiar with ASN.1 and DER encoding. However, as these are formats that you are likely to need to deal with when working with Bitcoin, we'll cover these formats, as well as PEM, here briefly.

When working with crypto, if you really get into the nuts and bolts of moving data around, you're likely to run across many, likely unfamiliar, data formats.

With Bitcoin, every byte in a transaction incurs a cost. As such, data encoding methods were chosen for utility rather than ease of use.

Here we won't dive into depth on these formats, but will instead prep you with a bit of background and reference resources for future use. Below are three formats common in the cryptocurrency world that most developers will likely not have seen before. PEM, DER, and ASN.1 are all examples of encoding often used for cryptographic keys, and signatures.

 

PEM

The PEM or Privacy-Enhanced Mail (https://en.wikipedia.org/wiki/Privacy-Enhanced_Mail) format was developed for storing public and private keys in files. Here is an example of a public key PEM file…

-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxzYuc22QSst/dS7geYYK
5l5kLxU0tayNdixkEQ17ix+CUcUbKIsnyftZxaCYT46rQtXgCaYRdJcbB3hmyrOa
vkhTpX79xJZnQmfuamMbZBqitvscxW9zRR9tBUL6vdi/0rpoUwPMEh8+Bw7CgYR0
FK0DhWYBNDfe9HKcyZEv3max8Cdq18htxjEsdYO0iwzhtKRXomBWTdhD5ykd/fAC
VTr4+KEY+IeLvubHVmLUhbE5NgWXxrRpGasDqzKhCTmsa2Ysf712rl57SlH0Wz/M
r3F7aM9YpErzeYLrl0GhQr9BVJxOvXcVd4kmY+XkiCcrkyS1cnghnllh+LCwQu1s
YwIDAQAB
-----END PUBLIC KEY-----

 

You can see that this file contains a header and footer, but did you also notice that the key is encoded with base64?

 

ASN.1

ASN.1 (https://en.wikipedia.org/wiki/ASN.1) is a standard for data transmission, cross platform. It is used for the exchange of data between systems and is independent of any particular computer or programming language, it is also both human-readable and machine-readable.

ASN.1 is a joint standard of the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) and ISO/IEC, originally defined in 1984 and moved to its own standard, X.208.

The standard provides a number of predefined basic types, and makes it possible to define constructed types. The basic types include:

  • integers
  • booleans 
  • character strings
  • bit strings

One of the most common data encoding formats used in cryptocurrencies is Distinguished Encoding Rules, or DER encoding which is part of the ASN.1 standard.

 

DER

DER or Distinguished Encoding Rules (https://en.wikipedia.org/wiki/X.690#DER_encoding) is a binary format. DER is intended for applications in which a unique octet(8-bit byte) string encoding is needed, which is the case with cryptographic signatures. 

There is obviously a lot to this, but, in short, when you need to be really, really specific, binary level specific, with data, as you do when generating and validating digital signatures, you need to use very specific formatting such as DER.

Note: An important formatting concern to keep in mind is endianness, or big-endian (BE) vs. little-endian (LE). While we won't dive into depth on it here, endianness refers to the order in which data is recorded. From Wikipedia, https://en.wikipedia.org/wiki/Endianness "A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest. A little-endian system, in contrast, stores the least-significant byte at the smallest address".

 

DER Tags

In DER encoding data is stored in a Tag-Length-Value format.

Example: Let's encode the integer 2.

02 01 02

The first byte 02 is the tag for integer. The second byte, 01, is the length of the value part. The value section contains just one byte, with the value 02.

Here are a few more tags.

30 - sequence length

06 - object identifier

03 - bitstream

And you can read more about this here, https://en.wikipedia.org/wiki/X.690.

Note: An important formatting concern to keep in mind is endianness, or big-endian (BE) vs. little-endian (LE). While we won't dive into depth on it here, endianness refers to the order in which data is recorded. From Wikipedia, https://en.wikipedia.org/wiki/Endianness "A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest. A little-endian system, in contrast, stores the least-significant byte at the smallest address".


Source: Saylor Academy
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.

Last modified: Tuesday, October 5, 2021, 4:15 PM