In the digital era, ensuring large-scale data integrity and verification has become a core challenge for distributed systems. Merkle Tree, also known as hash tree, is not simply a mathematical data structure but also the cryptographic "backbone" that allows decentralized networks to operate efficiently and securely. Invented by Ralph Merkle in 1979, this structure represents a breakthrough in optimizing the data verification process, allowing entities participating in the network to verify information without having to hold the entire giant database.
This report from the team of experts at Tan Phat Digital will analyze in detail the technical nature, advanced variations, strategic roles in blockchain protocols such as Bitcoin and Ethereum, as well as expanded applications in the modern Web3 ecosystem.
Historical Origins and Evolution of Hash Trees
The concept of Merkle Tree was first introduced in the scientific article titled "A Digital Signature Based on a Conventional Encryption Function" by Dr. Ralph Merkle. At that time, Merkle's goal was to find an efficient method to handle digital signatures and securely distribute public keys. Before the advent of hash trees, verifying the integrity of a set of data often required linear methods where every piece of data had to be hashed and compared sequentially, leading to a waste of computational resources and bandwidth as the data scale increased.
Merkle Trees completely changed the game by organizing hash functions in a hierarchical structure. This innovation allows converting a list of data of size n into a tree structure with a depth of just log2(n), turning the verification problem from a linear burden into an extremely fast logarithmic operation. Over more than four decades, from its initial applications in secure file systems and peer-to-peer (P2P) networks, Merkle Trees became an integral element of Bitcoin's design in 2008 and continued to evolve into more complex structures such as Merkle Patricia Tries or Verkle Trees in modern networks like Ethereum.
See more: How does Blockchain work?
Technical Mechanism and Structure of Merkle Tree
Technically, Merkle Tree is a type of binary tree in which each leaf node is labeled with a hash function of a block of data, and each non-leaf node is labeled with a cryptographic hash of the labels of its child nodes.
Core Components of a Merkle Tree
To understand how this structure works on mobile devices, Tan Phat Digital summarizes the three main layers of the tree as follows:
Leaf Nodes: This is the lowest level of the tree, where each node contains the hash of a unit of raw data (e.g., a blockchain transaction). If the data is L1,L2, then the corresponding leaf nodes will be H1=Hash(L1),H2=Hash(L2).
Intermediate Nodes: These nodes are created by pairing leaf nodes and hashing them with each other. For example, nodeH12 is the result of Hash(H1+H2). This process repeats in cascade until the top is reached.
Merkle Root: This is the single node at the top of the tree, representing the entire data set as a hash string of fixed length. It acts as a unique "digital fingerprint"; If just one bit of data at the leaf level changes, the Merkle Root will completely change due to a chain effect.
Crypto-meaning hierarchy:
Peak (Level 0) - Merkle Root: Commitment to the entire data set, measured in Hash(Hleft+Hright) of the layer bottom.
Intermediate - Internal Nodes: Structural link leading to the Root, calculated by Hash(Pair of child nuˊt).
Bottom - Leaf Nodes: Represents each individual element, calculated from Hash(Data thoˆ).
Role of Cryptographic Hash Function
The hash function is the engine that drives the security of Merkle Tree. Popular hash functions such as SHA-256 possess important properties: determinism, preimage resistance, and collision resistance. In a Merkle Tree, collision resistance ensures that no one can replace a valid transaction with a fake transaction while preserving the Merkle Root.
Handling Odd Data in Binary Structures
Since Merkle Trees typically require the number of leaf nodes to be a power of 2, in cases where the number of data is odd, most implementations will double the last element to form a complete pair. For example, if there are transactions A, B, C, the tree will hash A with B to get HAB, and hash C with C itself to get HCC.
Importance for Blockchain Integrity and Efficiency
Merkle Tree solves the scalability problem through two key concepts that Tan Phat Digital analyzes below here:
Merkle Proof: A logarithmic verification mechanism
A Merkle Proof allows verifying whether a piece of data is in a committed set without loading the entire set. The user simply provides "sibling hashes" along the path from that transaction up to the Root. For example, with over 1 million transactions, users only need 20 intermediate hashes to validate their transactions.
Optimized for Light Clients
Light nodes only download very small block headers (80 bytes in Bitcoin). When a payment needs to be verified, the light node requests Merkle Proof from a Full Node. By matching the final hash result with the saved Merkle Root, the light node can have complete confidence in the validity of the transaction.
Linear vs Merkle Tree verification comparison:
Complexity: Linear is O(n), while Merkle Tree is O(logn).
Data transport: Linear needs the entire list, Merkle Tree only needs sister nodes on the path.
Parallelizability: Linear is very low, Merkle Tree is very high.
Impact of change: Linear needs full rehash, The Merkle Tree simply updates an affected branch.
Merkle Tree in Bitcoin and Ethereum
Bitcoin: A Simple Binary Structure
Bitcoin uses a simple binary Merkle Tree. The Merkle root is placed directly into the block header, creating an unbreakable bond. However, this structure is static; once a block has been mined, the transaction list never changes.
Ethereum: Merkle Patricia Trie (MPT)
Ethereum uses MPT to store not only transactions but also the entire network state (account balances, contract codes). MPT allows efficient updating and compressed storage of common prefix paths. An Ethereum block contains four types of Merkle Roots:
State Root: Contains global information about all accounts, constantly changing with each block.
Transactions Root: Contains the transactions in the current block, which are local.
Receipts Root: Stores transaction receipts, execution results, and log.
Storage Root: Separate for each smart contract, stores the internal data of that contract.
See more: What is Chain in Blockchain?
Advanced Variations: Sparse Merkle Tree and MMR
Sparse Merkle Tree (SMT)
SMT is designed for huge address space (2256 cards), but most of the cards are empty. It allows for fast calculations and supports "Proof of Non-Existence", proving a transaction never happened.
Merkle Mountain Range (MMR)
MMR is a tree structure that allows adding new elements without changing any existing nodes. This is great for append-only data and optimizing data writes to SSDs.
Verkle Tree: A Revolution in Statelessness
Verkle Tree is a more efficient alternative to traditional Merkle Trees, using Polynomial Commitment instead of simple hashing.
Outstanding features of Verkle Tree compared to MPT:
Structure: MPT is a binary hash tree, Verkle is a polynomial commitment tree.
Width: MPT is 2 or 16 wide, Verkle is up to 256 or 1024 wide.
Proof size: MPT is about 150 KB, Verkle is only 1-2 KB (50-100 times smaller).
Storage: Verkle allows "stateless" operation, helping the node not need to store the entire huge state data.
Calculation: MPT is simple to calculate, Verkle requires complex elliptic curve calculations more.
Cryptographic Security and Vulnerabilities
While powerful, Merkle Tree still has weaknesses such as the transaction duplication attack (CVE-2012-2459) discovered in Bitcoin, or the second preimage attack. As a precaution, Tan Phat Digital recommends that developers use the double leaf hash method or Domain Separation as proposed by OpenZeppelin.
Extended Web3 Application
Proof of Reserves (PoR): Exchanges like Binance combine Merkle Tree with zk-SNARKs to prove they hold enough resources customer assets while still protecting privacy.
Airdrops Distribution: The project only needs to announce the Merkle Origin, users themselves submit claim requests with evidence, saving 99% on gas fees.
File System (IPFS/BitTorrent): Use Merkle Tree to identify content and ensure pieces of data downloaded from many different sources are not corrupted. damaged.
Celestia: Uses Namespaced Merkle Tree (NMT) to arrange data according to each application, helping Rollups only need to download the relevant data.
10 Typical Case Studies of Merkle Tree Application
To demonstrate practical applicability, Tan Phat Digital synthesizes 10 typical cases most pictured:
Bitcoin (Simplified Payment Verification): Enables lightweight mobile wallets (like Electrum) to validate transactions without downloading 500GB+ of chain data, just an 80-byte block header and about 12-20 proof hashes.
Ethereum (State Management): Uses Merkle Patricia Trie to store millions of assets account and balance. When a balance changes, the system simply updates the corresponding branch without having to rebuild the entire database.
Binance (Proof of Reserves): After the FTX event, Binance deployed Merkle Tree combining zk-SNARKs to prove the 1:1 asset reserve ratio, helping users verify their own money without revealing their identities.
Celestia (Modular Blockchain): Use Namespaced Merkle Trees to classify data according to each project (Rollup). This helps applications only have to download "their" data, reducing the load on system resources.
Git (Version Management System): Every "commit" in Git is actually a Merkle Tree (Tree Object). It helps Git compare millions of lines of code and detect changes or duplicate files in a snap.
BitTorrent (Peer-to-Peer File Sharing): When downloading a movie, Merkle Tree helps check each piece of data (chunk) downloaded from different sources. If a fragment is corrupted, the system reloads only that fragment instead of the entire file.
IPFS (Distributed File System): Uses Merkle DAG to identify content. Two files with the same content will have the same address (CID), helping the system automatically eliminate duplicate data across the network.
Apache Cassandra (Database Replication): Use Merkle Tree to detect inconsistencies between replica servers (replicas). Instead of sending the entire data for comparison, the servers only exchange Merkle Root to find the skewed data area.
Starknet (ZK-Rollup Airdrops): Uses Merkle Tree and Cairo language to distribute tokens to millions of users. The project only needs to store a single 32-byte Root Hash on-chain to perform the entire reward event.
Certificate Transparency: A public storage system for SSL/TLS certificates that uses Merkle Tree to prevent certificate authorities (CAs) from issuing fake certificates to websites without detection.
Frequently Asked Questions (FAQ)
Here are the 10 most popular questions about Merkle Tree compiled by Tan Phat Digital:
Why does Blockchain use Merkle Tree instead of a simple hash list?
Merkle Tree allows verifying a specific transaction with O(log n) complexity instead of O(n). This means that for 1 million transactions, you only need about 20 verification steps instead of 1 million steps, which saves significant bandwidth and resources.
What happens if the number of transactions in a block is an odd number?
To maintain a balanced binary tree structure, the last transaction in the list is duplicated and hashed with itself to form a complete pair.
The difference What really is the difference between Merkle Root and Merkle Proof?
Merkle Root is a unique hash located at the top of the tree, representing the entire data set. Merkle Proof is the set of intermediate hashes (paths) needed for a user to manually recalculate the Merkle Root from a particular data.
Can an attacker change the data without changing the Merkle Root?
Theoretically no, due to the collision-resistant nature of the cryptographic hash function. Any small change in the leaf node will create a chain effect that completely changes the hash code at the root.
Why is Ethereum moving to Verkle Tree?
The biggest problem of Merkle Tree in Ethereum is that the proof size (witness) is too large (about 150 KB). Verkle Tree helps reduce this size to 1-2 KB, allowing network nodes to operate in "stateless" mode more effectively.
How to prove that an account does NOT exist in the system?
This is done through Sparse Merkle Tree (SMT). Since SMT has a fixed address space, you can provide a Merkle Proof leading to a leaf node with a value of 0 to prove the absence of that data.
Does Merkle Tree help preserve privacy?
Yes. Merkle Proof allows you to prove a transaction exists in a block without revealing any other transactions in that block. When combined with ZK-SNARKs, it creates extremely high security systems like Proof of Reserves of exchanges.
Why doesn't Bitcoin use a complex structure like Ethereum's Merkle Patricia Trie?
Because Bitcoin is a static ledger (UTXO model). Once the block is mined, transactions never change. Ethereum needs MPT because it is a dynamic state machine, where account balances and contract codes change continuously after each transaction.
What is a "Second Pre-image" attack in Merkle Tree?
This is a vulnerability where an attacker tricks the system by injecting an internal node into the position of a leaf node. To prevent this, libraries like OpenZeppelin often double hash the leaves ($H(H(\text{data}))$) or add a prefix to differentiate the layers.
Where else is Merkle Tree used besides Blockchain?
It is widely used in Git for source code version management, in ZFS/Btrfs to prevent hard drive data corruption, and in P2P networks such as BitTorrent and IPFS to verify the integrity of file fragments downloaded from many sources.
According to Tan Phat Digital, Merkle Tree is a testament to the power of simplicity in mathematics. It not only protects data from tampering but is also a tool to expand network access for weak devices. The evolution to Verkle Tree shows that blockchain technology is making continuous efforts to solve a global-scale problem. The Merkle Root will continue to be the "fulcrum of trust" in a decentralized world, where users trust the laws of mathematics instead of humans.
Share








