Home / Markets News & Opinions / Block Parsers: How to Read the Bitcoin Block Chain

Block Parsers: How to Read the Bitcoin Block Chain

Last Updated March 4, 2021 4:42 PM
Alex Gorale
Last Updated March 4, 2021 4:42 PM

Block Parser TuringA Block Parser reads the Bitcoin block chain. There is no encryption of the data stored in the block chain. Bitcoin is a pseudonymous system. Meaning, ECDSA key pairs are used to abstract the identity of users. However, the binary data in the block chain can be read.

The block chain is a transaction database. Every full node participating in the Bitcoin network has the same copy. The Bitcoin protocol dictates its structure and is the means through which each node maintains a duplicate copy. Overall, the block chain is just a data structure for storing blocks. The block chain stores blocks in a series, beginning with the genesis block .

Also read, What is Bitcoin?” Is Google’s 4th Most Searched “What is…?” Term of 2014

A Simple Block Parser

This example is a minimal approach. In all, 138 lines of Python code are used to build this block parser. In some places, encoding and endianness are unfamiliar or backwards. Despite these minor formatting issues, below is a beginner approach to a Bitcoin block parser.

The project began with building the tools required to parse the binary data. The protocol dictates the tools that will be necessary.

import struct

def uint1(stream):
return ord(stream.read(1))

def uint2(stream):
return struct.unpack('H', stream.read(2))[0]

def uint4(stream):
return struct.unpack('I', stream.read(4))[0]

def uint8(stream):
return struct.unpack('Q', stream.read(8))[0]

def hash32(stream):
return stream.read(32)[::-1]

def time(stream):
time = uint4(stream)
return time

def varint(stream):
size = uint1(stream)

if size < 0xfd:
return size
if size == 0xfd:
return uint2(stream)
if size == 0xfe:
return uint4(stream)
if size == 0xff:
return uint8(stream)
return -1

def hashStr(bytebuffer):
return ''.join(('%x'%ord(a)) for a in bytebuffer)

These functions will read unsigned integers from the block chain. These tools will be used to build classes to represent the blocks and transactions. Each function reads a part of the block chain and will parse the binary data.

This unit test will read the first block and transaction in a block file. When used with blk000000.dat it gives the following output:

Magic Number: d9b4bef9
Blocksize: 285
Version: 1
Previous Hash 00000000000000000000000000000000
Merkle Root 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
Time 1231006505
Difficulty 1d00ffff
Nonce 2083236893
Tx Count 1
Version Number 1
Inputs 1
Previous Tx 00000000000000000000000000000000
Prev Index 4294967295
Script Length 77
ScriptSig 4ffff01d14455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73
ScriptSig O��EThe Times 03/Jan/2009 Chancellor on brink of second bailout for banks
Seq Num ffffffff
Outputs 1
Value 50.0
Script Length 67
Script Pub Key 414678afdb0fe5548271967f1a67130b7105cd6a828e0399a67962e0ea1f61deb649f6bc3f4cef38c4f3554e51ec112de5c384df7bab8d578a4c702b6bf11d5fac
Lock Time 0

Parsing a Block Parser

The protocol guides the development of the classes.

Block Parser Structure
The block data structure matches the protocol description.

class Block:
        def __init__(self, blockchain):
                self.magicNum = uint4(blockchain)
                self.blocksize = uint4(blockchain)
                self.setHeader(blockchain)
                self.txCount = varint(blockchain)
                self.Txs = []

                for i in range(0, self.txCount):
                        tx = Tx(blockchain)
                        self.Txs.append(tx)

        def setHeader(self, blockchain):
                self.blockHeader = BlockHeader(blockchain)

        def toString(self):
                print ""
                print "Magic No: t", self.magicNum
                print "Blocksize: t", self.blocksize
                print ""
                print "#"*10 + " Block Header " + "#"*10
                self.blockHeader.toString()
                print
                print "##### Tx Count: %d" % self.txCount
                for t in self.Txs:
                        t.toString()

The block parser begins by reading the Magic Number. The Magic Number is the first four bytes. It is always d9b4bef9 or f9beb4d9. The following four bytes is the block size and represents the number of bytes to the end of the block. The following 80 bytes is the block header.

Block Parser Header

class BlockHeader:
        def __init__(self, blockchain):
                self.version = uint4(blockchain)
                self.previousHash = hash32(blockchain)
                self.merkleHash = hash32(blockchain)
                self.time = uint4(blockchain)
                self.bits = uint4(blockchain)
                self.nonce = uint4(blockchain)
        def toString(self):
                print "Version:t %d" % self.version
                print "Previous Hasht %s" % hashStr(self.previousHash)
                print "Merkle Roott %s" % hashStr(self.merkleHash)
                print "Timett %s" % str(self.time)
                print "Difficultyt %8x" % self.bits
                print "Noncett %s" % self.nonce

Notice that only the previous block hash and Merkle Root reside in the block header. A block hash is a computed value.

After the block header is a transaction counter. The counter is a variable integer. The number of bytes it takes up changes depending on the number of bytes required to represent the total transactions. Transactions are stored in a list.

Block Parser Tx

class Tx:
        def __init__(self, blockchain):
                self.version = uint4(blockchain)
                self.inCount = varint(blockchain)
                self.inputs = []
                for i in range(0, self.inCount):
                        input = txInput(blockchain)
                        self.inputs.append(input)
                self.outCount = varint(blockchain)
                self.outputs = []
                if self.outCount > 0:
                        for i in range(0, self.outCount):
                                output = txOutput(blockchain)
                                self.outputs.append(output)
                self.lockTime = uint4(blockchain)

        def toString(self):
                print ""
                print "="*10 + " New Transaction " + "="*10
                print "Tx Version:t %d" % self.version
                print "Inputs:tt %d" % self.inCount
                for i in self.inputs:
                        i.toString()

                print "Outputs:t %d" % self.outCount
                for o in self.outputs:
                        o.toString()
                print "Lock Time:t %d" % self.lockTime

For each transaction, there is a list of inputs and outputs.

Block Parser TxInput

class txInput:
        def __init__(self, blockchain):
                self.prevhash = hash32(blockchain)
                self.txOutId = uint4(blockchain)
                self.scriptLen = varint(blockchain)
                self.scriptSig = blockchain.read(self.scriptLen)
                self.seqNo = uint4(blockchain)

        def toString(self):
                print "Previous Hash:t %s" % hashStr(self.prevhash)
                print "Tx Out Index:t %d" % self.txOutId
                print "Script Length:t %d" % self.scriptLen
                print "Script Sig:t %s" % hashStr(self.scriptSig)
                print "Sequence:t %8x" % self.seqNo

An input is a reference to an output in a previous transaction. Id is the index of the output in the transaction. The ScriptSig is evidence of ownership over the private key that corresponds to the output.

Block Parser TxOutput

class txOutput:
        def __init__(self, blockchain):
                self.value = uint8(blockchain)
                self.scriptLen = varint(blockchain)
                self.pubkey = blockchain.read(self.scriptLen)

        def toString(self):
                print "Value:tt %d" % self.value
                print "Script Len:t %d" % self.scriptLen
                print "Pubkey:tt %s" % hashStr(self.pubkey)

Outputs are instructions for sending bitcoins. The value denominates the balance in Satoshis. ScriptPubKey is the first half of a ScriptSig, used with a future input to spend the coins.

Putting the Block Parser Together

Here is my sloppy block parser code.

import sys
from blocktools import *
from block import Block, BlockHeader

def parse(blockchain):
        print 'print Parsing Block Chain'
        counter = 0
        while True:
                print counter
                block = Block(blockchain)
                block.toString()
                counter+=1

def main():
        if len(sys.argv) < 2:
                print 'Usage: blockparser.py filename'
        else:
                with open(sys.argv[1], 'rb') as blockchain:
                        parse(blockchain)



if __name__ == '__main__':
        main()

This script will run until the end of the file. Output will look similar to

Magic No:       d9b4bef9
Blocksize:      285

########## Block Header ##########
Version:         1
Previous Hash    00000000000000000000000000000000
Merkle Root      4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
Time             1231006505
Difficulty       1d00ffff
Nonce            2083236893

##### Tx Count: 1

========== New Transaction ==========
Tx Version:      1
Inputs:          1
Previous Hash:   00000000000000000000000000000000
Tx Out Index:    0
Script Length:   77
Script Sig:      4ffff01d14455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73
Sequence:        ffffffff
Outputs:         1
Value:           5000000000
Script Len:      67
Pubkey:          414678afdb0fe5548271967f1a67130b7105cd6a828e0399a67962e0ea1f61deb649f6bc3f4cef38c4f3554e51ec112de5c384df7bab8d578a4c702b6bf11d5fac
Lock Time:       0

To test the loops, I tried the block parser on the first five megabytes of block file 65.

Magic No:       d9b4bef9
Blocksize:      234622

########## Block Header ##########
Version:         2
Previous Hash    0000000b1bda851ed2a5a543062a2789d7e82d7b33c838352bfba
Merkle Root      b22375d89ab682da9262ea8f4e784f68e5dd9eedde5a62866e3fadfa64c32f9
Time             1370602521
Difficulty       1a011337
Nonce            522491547

##### Tx Count: 419

========== New Transaction ==========
Tx Version:      1
Inputs:          1
Previous Hash:   00000000000000000000000000000000
Tx Out Index:    0
Script Length:   37
Script Sig:      35caa3400bb89124d696e656420627920425443204775696c64800427e0014ec
Sequence:        ffffffff
Outputs:         1
Value:           2525344340
Script Len:      25
Pubkey:          76a91427a1f12771de5cc3b73941664b2537c15316be4388ac
Lock Time:       0

========== New Transaction ==========
Tx Version:      1
Inputs:          1
Previous Hash:   a52c458c3a4e39b63d4a7bdcfab917444ddbfae9991245db39a85d98e9bbdb9
Tx Out Index:    0
Script Length:   106
Script Sig:      473044220447d5ae4624357f6b1361daac5d3aaeae5e197551fdf067f42aec5c7a5e51f2204117b06f77809295dd385da9b96567d3dc568e87d622ee37a758c836bb136e1212e0ac817fd21a44b43c6468d71a472e198521fcb66e36663b5a8173986d7609f
Sequence:        ffffffff
Outputs:         2
Value:           30000000000
Script Len:      25
Pubkey:          76a914f3fc2c5c7f8e3970bd824fbce8fce1ed4c1a988ac
Value:           19206322991
Script Len:      25
Pubkey:          76a9143bf18e9cc4c287764e29759b689fe51e33f757d88ac
Lock Time:       0

All clear. Full source is available on github .

What do you think? Comment Below

Images from World of Computing, Bi5tcoin Wiki, and Shutterstock.