All Posts

What is an archive node and who really needs to run a full node

blockchainJanuary 28, 2026·#Blockchain

Archive node is considered the "eternal memory" of the blockchain. Tan Phat Digital analyzes why this type of button is important for Web3 developers and financial institutions in 2026.

What is an archive node and who really needs to run a full node

As the decentralized finance (DeFi) revolution and Web3 applications reach maturity, the underlying infrastructure of blockchain — specifically nodes — has become a key research topic. According to analysis from the team of experts at Tan Phat Digital, a blockchain network cannot exist without these entities that store and authenticate data. However, the differentiation between types of network nodes has created significant technical and economic barriers. In particular, the concept of "archive node" emerges as a component that stores the "eternal memory" of the system, but requires a huge investment of resources. This report will analyze in detail the nature of archive nodes, the architectural differences compared to full nodes, and determine exactly who needs to operate this type of infrastructure in the 2025-2026 technology era.

Structure and classification of network node systems in the decentralized space

At its core, a blockchain network node is a specialized computer running client software to Participate in a peer-to-peer (P2P) system. It serves as both an on-chain database, a rule enforcement engine, and a network router. Depending on the amount of historical data that network node maintains, we can classify it into three main groups: full nodes, archive nodes and light nodes.

The nature of full nodes and data pruning mechanisms

Full nodes are considered the backbone of the network, responsible for downloading and validating every block and every transaction since the genesis block (genesis block). block). A full node ensures security and decentralization by checking the validity of data itself without trusting any third party. However, to save hard disk space and increase performance, most full nodes today use a "pruning" mechanism.

This mechanism allows the node to retain the entire history of blocks and transaction receipts (receipts), but it only stores the current state and a short history window of previous states — usually the most recent 128 blocks on the Ethereum network. This means that although the full node knows everything that has happened (through the block history), it does not keep available information about the balance of any wallet at a particular block in the past unless it performs a replay from scratch, an extremely time-consuming and inefficient process.

Archive Node: A comprehensive historical state repository

Archive node is essentially a full node that has been invalidated Disable pruning feature. It inherits all the capabilities of a full node but adds the addition of building a repository of the historical state of the blockchain at all times. While a full node only retains a "snapshot" of the current state, an archive node retains every intermediate state change since block number 1.

This allows the archive node to answer deep history queries on the fly, such as retrieving the code of a smart contract that has been destroyed, or checking the stored state of a contract variable at a block from three years ago. This is the reason why archive node becomes an indispensable tool for research, auditing and in-depth analysis purposes.

Technical characteristics of network node types (Updated 2026)

Below is a detailed analysis of technical characteristics compiled by Tan Phat Digital for users to easily compare:

  • Short node (Light) Node):

    • Data storage: Block Headers only.

    • Network role: Used for IoT/Mobile devices.

    • Data authentication: Based on full node (SPV).

    • Capacity (ETH): Less than 1 GB.

    • Old query latency: Not available.

  • Full Node:

    • Data stored: Entire block and recent state (typically 128 blocks).

    • Network role: Validate transactions and maintain copper Consent.

    • Data authentication:100% independent self-authentication.

    • Capacity (ETH): 1.2 TB - 2 TB (Requires NVMe SSD).

    • Deep and analytical.

    • Data validation: 100% independent self-validation.

    • Capacity (ETH): 3 TB - 20 TB (Depending on client software).

    • Old query latency: Extremely low (Data already available in local storage set).

See more: What is a node in Blockchain? Classification of Full Node, Light Node, Validator Node

Software architecture and evolution of clients (Clients)

Effective operation of an archive node depends greatly on the architecture of the client software used. In the Ethereum network, client diversity not only enhances network security (client diversity) but also provides different optimization options for historical data storage.

Geth: The gold standard and storage challenge

Geth (Go Ethereum) is the most popular client, capturing a large market share thanks to its stability and strong community support. However, Geth uses a hierarchical storage model (Merkle Patricia Tree), leading to significant data bloat when running in archive mode. A Geth archive node could take up more than 13.5 TB of disk space by 2023 and is expected to exceed 18-20 TB by 2026. This requires extremely expensive and high-performance storage solutions.

Erigon and Reth: Revolutionizing flat data structures

Erigon emerges as a superior alternative to storage nodes thanks to its redesign completely transform the storage layer into a flat key-value storage model. By reorganizing data, Erigon significantly reduces the amount of duplicate indexes, allowing to run an Ethereum archive node with a capacity of only about 2 TB to 3.5 TB — an impressive 75% reduction compared to Geth.

The spiritual successor to Erigon, Reth (Rust Ethereum) is developed with a focus on extreme performance and modularity. Reth not only optimizes storage capacity but also provides significantly faster processing of RPC requests, reaching thousands of requests per second (RPS) even under heavy load. For developers building real-time analytics tools, Reth is becoming the preferred choice in 2026.

Besu and Nethermind: The Enterprise Choice

Hyperledger Besu, written in Java, offers a different approach to Bonsai Tries — a data structure that allows accessing historical state by "rewinding" changed blocks because stores every state separately. Although remote history access may be slower than traditional archive nodes, Besu is extremely efficient in terms of maintenance and does not require manual pruning. Nethermind, written in C#, focuses on performance and high compatibility with monitoring systems, making it a great fit for enterprise infrastructures that require high observability.

Execution Clients Performance for Archive Node

Below is a summary of the comparison of clients from Tan Phat Digital's research:

  • Geth (Go):

    (Go):

    • Archive capacity: 2.5 TB - 4 TB.

    • RPC speed: About 3,999 RPS.

    • Advantages: Maximum disk savings and fast sync speed.

  • Reth (Rust):

    (C#):

    • Archive capacity: 10 TB - 14 TB.

    • RPC speed: High.

    • Advantages: Optimized for corporate environments, supports automatic pruning.

  • Besu (Java):

    • Archive Capacity: Customizable (highly efficient).

    • RPC Speed: Average.

    • Advantages: Using Bonsai Tries, there is almost no need for manual disk maintenance.

See also: What Is Network In Blockchain? Distributed Systems Architecture and Vision 2026

Demand analysis: Who really needs to run a storage node?

The question "who really needs an archive node" often leads to confusion between usage needs and direct operational needs. In fact, very few entities have the ability and need to operate this node themselves due to the high management costs.

  • Block Explorers: Platforms such as Etherscan or Solscan depend entirely on archive nodes to accurately display transaction balances and impacts at all times in the past.

  • Financial analysis and legal investigation: Tools Companies like Chainalysis use archive nodes to extract raw data, look for unusual behavior patterns, or investigate hacks.

  • Auditing and Research: Entities like Quantstamp need archive nodes for “backtesting” — testing contracts on past states to find vulnerabilities.

  • dApps developers: Need archive nodes to calculate voting rights (Governance) based on snapshots or analysis of user reputation over time.

Technical infrastructure requirements and operating costs in 2026

Building an archive node in 2026 requires enterprise-class equipment to ensure performance.

Detailed hardware requirements analysis

  • Ethereum (Based on Erigon/Reth):

    • CPU: 8-12 Cores / 16-24 Threads.

    • RAM: 64 GB ECC.

    • Hard Drive: 4 TB - 8 TB NVMe SSD.

    • Bandwidth: 500 Mbps - 1 Gbps.

    • 128 GB ECC.

    • $8,000.

  • TB per month).

  • Bandwidth: 1 Gbps Dedicated.

  • Hardware cost: Over $45,000.

Cost of operation and risk (TCO)

Operators face high power consumption (200W-500W), system cooling system and unlimited internet bandwidth. The biggest risk is service disruption; If the node loses sync, resynchronization can take weeks. For validators, going offline also results in a "slashing" penalty.

RPC methods require archived data

Most regular blockchain requests can be handled by a full node, but the following methods require an archive node when querying data older than 128 blocks:

  1. eth_getBalance: Extract the number account balance at a specific block in the past.

  2. eth_getStorageAt: Reads the value of a state variable (e.g. NFT owner at block history (serving DeFi backtesting).

Run your own network node or use a provider's service?

With the solution consulting experience from Tan Phat Digital, most of the Web3 ecosystem has switched to using infrastructure service providers such as Alchemy or QuickNode because of the following benefits:

  • Time-to-Market: Having an endpoint in a few minutes instead of waiting for weeks suite.

  • Reliability: 99.99% uptime commitment and automatic failover mechanism.

  • Scalability: Automatically adjust resources as the dApp grows in users.

  • Cost savings: Renting a dedicated node is often cheaper than running the infrastructure and technical team yourself technique.

However, you should run the node yourself if needed:

  • Absolute privacy (no IP/query tracking by the provider).

  • Ultra-high frequency local interactions (like MEV bots that need minimal latency).

  • Direct contribution to network decentralization grid.

Future outlook: State Bloat and scaling solutions

The problem of data bloat (state bloat) is being solved through Ethereum's "The Purge" roadmap:

  • Statelessness: Enables block validation without storing the entire state thanks to Verkle Trees.

  • EIP-4444: Limit the required historical data that a node must store (e.g. only keep the most recent year).

  • Layer 2 & AppChains: Move transaction payload out of Layer 1 but still maintain the need for dedicated archive nodes for analysis and traceability.

Frequently Asked Questions FAQ about Archive Node

  1. What is the biggest difference between Full Node and Archive Node? Full node only stores the current state and a short window of recent data to save space (pruning). The Archive node stores all historical states since the first block, allowing data to be queried at any point in the past without recomputing.

  2. Can I upgrade from Full Node to Archive Node without resynchronizing? In theory it is possible if you enable archiving from the start, but in practice with most software like Geth, if you have "pruning" enabled, you will have to synchronize from scratch (from genesis) to rebuild deleted historical states.

  3. Is participating in Staking (Validator) required to run Archive Node? No. Most validators simply run a Full Node to validate new blocks and maintain consensus. Running an Archive Node for staking purposes is unnecessary and a huge waste of hard drive resources.

  4. Why is Solana's Archive Node capacity much larger than Ethereum? Solana has extremely fast block production speed and high transaction throughput, resulting in more than 4 TB of ledger data generated monthly. By 2025, a Solana archive node requires up to 400 TB of storage.  

  5. Why is the "128 block" limit important for Full Node Ethereum? This is the default threshold at which most Ethereum clients (like Geth) retain state in disk memory. If you query data deeper than 128 blocks without an archive node, the node will have to replay thousands of transactions, causing huge delays or request errors.  

  6. How will EIP-4444 affect the need to run an Archive Node? EIP-4444 allows network nodes to delete historical data older than one year. This reduces the burden on the Full Node but makes the role of the Archive Node and decentralized storage networks (Portal Networks) more important to preserve the chain's permanent history.

  7. Should dApp programmers run the node themselves or use an RPC provider? For most small teams, using a provider (like Alchemy, QuickNode) is optimal because the cost of running an archive node can be high up to thousands of USD per month. Running a node yourself should only be considered when absolute privacy or extremely high-frequency interaction (MEV) is needed.

  8. What is the best client software to run Archive Node today? Erigon and Reth are currently the top two choices thanks to their flat data architecture that helps reduce archive storage capacity to about 3 TB instead of 15-20 TB like traditional Geth.

  9. How to access archive data without paying? Some infrastructure service providers like Alchemy offer free plans that support accessing Archive data (with limited RPS). Additionally, you can use community analytics tools like Dune Analytics.

  10. Who really shouldn't run an Archive Node themselves?Individual users, retail miners, or startups should not run an Archive Node themselves due to high technical risks, expensive NVMe hardware costs, and 24/7 maintenance requirements.

Archive nodes serve as a "source of truth" history" is indispensable in the blockchain ecosystem. Tan Phat Digital believes that understanding what an Archive node is will not only help organizations make the right decisions about infrastructure, but also open up new possibilities in exploiting value from blockchain's huge data treasure.

In 2026, the optimal strategy for the vast majority of businesses is to leverage the power of RPC providers to focus on application logic, while new generation clients such as Erigon and Reth continue continues to push the limits of historical storage performance.

Share

Comments

0.0 / 5(0 ratings)

Please login to leave a comment.

No comments yet. Be the first to share your thoughts.