In the pursuit of achieving maximum scalability, various ecosystems adopt different strategies. For instance, in the Ethereum ecosystem, there’s a proliferation of Layer 2 (L2) networks. While the Ethereum team has abandoned sharding, some projects have effectively implemented it. Ouroboros Leios is focused on scaling the blockchain protocol to its physical limits, which is distinct from the approach taken by sharding or L2s. Each concept has its strengths and weaknesses. Let’s explore the distinctions between these concepts and the challenges faced by development teams.
Introduction
All strategies—whether sharding, layered architecture (plenty of L2 networks), or Ouroboros Leios—share a common goal: to introduce parallelism into transaction processing and smart contract execution. The aim is to utilize available computing power and bandwidth efficiently, enabling simultaneous handling of multiple transactions.
However, the primary challenge lies in maintaining the core benefits of blockchain technology, particularly decentralization. In a distributed network, nodes must participate in settling transactions. Yet, not all nodes need to process every transaction immediately upon submission. Instead, they collectively maintain a consistent global state. Note that maintaining a uniform global state doesn’t necessarily require all nodes to store all data; they simply need awareness of the global state.
The overarching objective of these strategies is to temporarily reduce reliance on the global state and traditional (slow) L1 consensus (that must respect the linearity of blockchain). This trade-off allows for higher transaction throughput and faster finality, albeit with varying security guarantees. The key lies in enabling users and assets to temporarily escape the global state and L1 consensus, only to return later.
For instance, Ethereum and Bitcoin enable users to move assets to Layer 2 networks, each with its own rules, transaction structure, state model, consensus mechanisms, and execution environment. Users can seamlessly return to L1 when needed.
Sharding, on the other hand, involves partitioning the ledger into smaller, manageable pieces called shards. Nodes distribute the workload, and L1 consensus operates within this sharded framework. All shards adhere to the same rules, transaction structure, state model, and execution environment.
In contrast, Ouroboros aims for maximum parallelization without ledger partitioning. Like sharding, it distributes the workload across nodes, maintaining consistency through shared rules, transaction structures, state models, and execution environments.
L2s
Ethereum is the most successful example of an ecosystem that went in the direction of layered architecture. The Ethereum team has shifted its focus away from implementing sharding and instead aims to achieve scalability through a diverse set of L2 networks. Vitalik Buterin, one of Ethereum’s co-founders, even suggests that L2s are a comparable alternative to sharding. However, I contend that L2s and shards are distinct concepts.
L2s primarily operate as centralized networks running on one or more servers. Their key advantage lies in their high throughput and rapid transaction settlement. Each L2 can specialize in specific tasks, and development teams have the flexibility to define their own consensus rules, transaction structures, state models, and execution environments. Coordinating interactions between L1 (Ethereum) and L2s is essential. Each L2 periodically writes its global state (in the form of proofs) back to L1, albeit in varying formats and intervals.
In the picture, you can see ZK Rollup, which has its structure of transactions, consensus, storage, execution environment, and state model. Proofs are written to Ethereum at regular intervals, which are validated in the execution environment of Ethereum. Proofs are subsequently recorded in the ledger. Orange and green rectangles indicate incompatibility.
User transactions are not included in the new state root or the ZK proof. Instead, they are stored separately in the Data Availability Layer, an off-chain storage. This data is crucial for reconstructing the state of the Rollup. Storing it off-chain enhances scalability.
Layer 1 acts as the verifier and Layer 2 as the prover.
The state root is a cryptographic commitment to the new state of the ZK Rollup after the batch of transactions has been processed. It signifies the net effect of all the transactions in the batch.
The ZK proof is a cryptographic proof that attests to the validity of the state transition represented by the batch. It verifies that the batch of transactions was processed correctly and resulted in the new global state. Proof by itself is insufficient to verify that there was no fraud in L2. Ethereum can only validate root state and ZK proof.
Both the ZK proof and the new L2 global state enable a secure and efficient way to represent the batch of transactions on the main chain, without the need to process each transaction on-chain.
The Ethereum smart contract (verifier) does not require a batch of transactions for verification. It only needs the ZK proof and the new state root.
If Ethereum, or an auditor, wanted to verify transactions from the L2 network, it would be necessary to use proofs from the Ethereum ledger, have L2 data with transactions available, understand the transaction format, and implement the L2s execution environment. In addition, it would be necessary to understand how proofs are created and how they are validated in the execution environment of Ethereum.
Something like this needs to be implemented if reliable and secure cross-L2 communication is to be built. Such a solution must support all major L2s. That’s what Bridges is trying to do. It will be described later.
Processing transactions elsewhere and using the blockchain as the main settlement layer is the goal for both layered architecture and Ouroboros Leios strategies. In the case of sharding, this is not necessarily the case.
From a global state perspective, L2s present a challenge. Their proofs lack coherence and context, making it impossible to validate against each other directly.
The proliferation of diverse L2s poses significant challenges related to cross-L2 communication, security, and maintaining a unified global state. Distributed ledger technology fundamentally relies on state transitions. Consensus mechanisms must ensure that the majority of nodes (stake) agree on the correctness of state transitions. The core requirement is to transit from one valid state to another in a deterministic and verifiable manner.
Introducing incompatible execution environments, state models, and transaction structures complicates matters. Transaction validation across different L2s becomes intricate, and Ethereum’s main chain cannot verify L2 activities.
The fragmentation caused by various L2s hinders cross-chain interactions. Verifying state and execution logic across disparate environments becomes non-trivial. Consider Ethereum’s execution environment (EVM), transaction structure (A), and state model (A). Contrast this with an L2 that employs execution environment B, transaction structure B, and state model B, or another L2 using execution environment C, transaction structure C, and state model C.
When validating transactions, Ethereum cannot reprocess L2 events because it lacks knowledge of foreign execution environments, transaction structures, and state models. A does not understand B and C. B and C also do not understand each other.
In the picture, you can see 3 L2 networks and Ethereum. L2 networks store proofs in the Ethereum ledger through Ethereum transactions. The only thing Ethereum can validate is proofs. There is no cross-chain communication between L2s.
Moreover, reprocessing what has already occurred in another layer would be inefficient. Therefore, L2s only publish proofs to L1. Ethereum would become a bottleneck if it had to revalidate all transactions.
Each distinct L2 may require a different method of proof verification. While Ethereum can validate that proofs are correct, those from different L2s represent entirely separate state models. Unfortunately, Ethereum lacks knowledge of the execution environment in L2s, making it unable to understand and validate transactions fully.
L2 can be more similar to Ethereum, for example using EVM and the same transaction structure. This would be closer to the concept of sharding.
The global state of the Ethereum ecosystem is a mixture of several mutually incompatible states that come from different networks.
As the L2 ecosystem diversifies, maintaining a clear, auditable trail of state transitions becomes increasingly challenging. The lack of uniformity across L2s complicates cross-chain communication and global state tracking. Several state transitions are constantly occurring asynchronously in mutually incompatible networks.
Some argue that L1 should not validate transactions processed in L2s. Instead, L1 serves primarily as a data availability (DA) layer. They are right. However, this approach limits L1’s role in facilitating cross-chain communication.
Despite Ethereum’s decentralization and security, L2s inherit little from it. Most user activity occurs within L2s, which rely on centralized sequencers and private keys of the teams. Rules related to prevention against transaction censorship and against front-running attacks, contract execution, and state transitions (from state A to state A’) are defined within L2s.
L2 teams claim they want to decentralize sequencers. However, this complicates the implementation of cross-chain bridges even more. Currently, bridges have to interconnect servers. In the future, they will have to interconnect distributed networks, i.e. correctly interpret consensus rules and deal with data diffusion delays in the network.
For users seeking Ethereum’s decentralization and security, the final settlement must occur in L1. Users transfer assets to L1, but this reliance can create bottlenecks if frequent settlements are desired.
Although there may be many well-scalable L2s in the ecosystem, low-scalable Ethereum may remain the bottleneck of the ecosystem. So scalability will not be completely resolved.
L2s offer advantages such as high throughput and fast transaction finality due to centralization. Each L2 can be perceived as a shard, i.e. a unit of parallelization. High parallelization can be achieved through L2s.
If the system does without cross-chain communication, the limitations are negligible. But that is not the case. High scalability comes at the cost of maintaining a unified global state. L2s operate autonomously, often with their tokens and incentives. Building cross-chain communication is a challenge. Ethereum, in this context, isn’t a truly decentralized global computer—it’s fragmented across mutually incompatible L2s.
It can be said that a highly parallelizable system is being built, but it is difficult to synchronize the state. The limit to scalability is primarily Ethereum’s ability to write data to the blockchain. Secondly, it is the ability to create cross-chain communication where it is needed (where users will require it).
The fragmentation of the scalability solution has an impact on other aspects. Fragmentation can be observed at the level of users and capital. Navigating between L2s is a nightmare for users.
The Ethereum team must collaborate with L2 projects to address cross-chain communication challenges. Building bridges between L2s (e.g., enabling communication between Arbitrum and Optimism) is essential. Mutual incompatibility introduces risks like bugs and security vulnerabilities. The problem is not only the incompatibility. L2s are competing networks fighting for users and liquidity. Teams may not be sufficiently incentivized to implement cross-chain bridges.
Building a bridge between two L2s is a challenge, as the team must understand both worlds to securely connect them.
Now imagine having to do this for all L2s.
Interestingly, Ethereum abandoned sharding to avoid cross-shard communication complexities, management of shards, distribution of nodes for shards, partitioning of the ledger, et cetera. Yet, dealing with similar issues in incompatible L2s raises questions. Perhaps sharding would have been a more straightforward choice.
Let’s delve into sharding to better understand its potential benefits. You will see that sharding is somewhat similar to L2s, but in many ways more efficient and simpler.
Sharding
Sharding is a method used to partition a blockchain ledger into smaller, more manageable pieces known as shards. Each shard processes transactions semi-autonomously, enabling parallel processing and improving efficiency.
Sharding divides a blockchain into multiple sub-blockchains. This often necessitates the partitioning of nodes, the ledger, the global state, and the execution environment.
You can see in the picture that the same size of resources can be used differently. Either to maintain a single ledger or to maintain multiple shards. Blockchain processes transactions sequentially in blocks while sharded blockchain takes advantage of parallelism.
No single project among the top ten has fully implemented sharding, and no dominant project exists with this technology yet. Therefore, only general concepts will be described here.
Sharding may involve dividing nodes into groups, with each group controlling a specific shard. Proper shard management must be implemented within the network. When a new node joins, it needs instructions on which shards to join. If a significant number of nodes leave a shard simultaneously, it becomes necessary to reallocate some nodes to maintain balance.
For instance, in a network with 1,000 nodes and 10 shards, each shard would have 100 nodes. Some nodes within a shard can communicate with nodes in other shards. The process of cross-shard communication will be explained further in the text.
The advantage of shards is that they are decentralized, which is not the case with centralized sequencers in the Ethereum ecosystem.
There must be clear rules for sorting transactions into shards. This allows transactions to be validated by smaller groups of nodes. Often, ranges managed by specific groups of nodes are used for partitioning. Transaction hashes, addresses, account hashes, and similar criteria can be used to assign transactions to the appropriate shards.
Partitioning the ledger typically results in the partitioning of the execution environment and global state as well. Each shard operates somewhat independently, processing its transactions, executing smart contracts, and maintaining its portion of the global state.
The global state of the entire network is composed of these individual shard states, each maintained by the nodes dedicated to those shards. Parallelization is achieved by dividing network resources across multiple shards and assigning transaction processing to these individual shards.
Let’s highlight the main difference between layered architecture and sharding.
Layered architecture, or Layer 2 (L2) solutions, preserves the original blockchain’s consensus, ledger, and global state. Parallelization is achieved by building around the existing blockchain using L2s. L2 teams have the flexibility to choose their technologies and determine the level of decentralization for their networks. The primary interaction between the blockchain (such as Ethereum) and L2s is the ability to transfer assets between them and occasionally write the state of the L2 back to the blockchain.
In contrast, sharding integrates parallelization directly into the protocol’s consensus. The network is designed from the ground up to process transactions in parallel, resulting in a structure that is not a single blockchain but a collection of interconnected shards. Each shard operates with the same transaction structure, global state, and execution environment.
Despite the fragmentation into shards, the components of parallelization remain compatible with each other. This unified approach ensures a consistent global state and facilitates efficient cross-shard communication.
Similar to Layer 2 (L2) solutions, sharding introduces challenges related to composability and state fragmentation. It is not always possible to process a transaction solely with the information available within a single shard.
Sometimes, processing a transaction requires information from another shard, necessitating cross-shard communication, which can be complex and inefficient. Fragmentation can cause pieces of the state, such as account balances or UTXOs, to be scattered across different shards.
However, communication between shards is less complex compared to L2s due to their inherent compatibility. Shards can understand each other because they share the same execution environment, transaction structure, and state model. Although consensus can occur asynchronously within individual shards, achieving an atomic update of states is relatively straightforward.
Unlike L2s, building bridges for communication between shards is unnecessary. Cross-shard communication is a natural part of the protocol rules, reducing the potential for bugs and security vulnerabilities. Transferring information between shards is relatively easy and integrated into the system.
It is important to note that cross-shard communication, as well as cross-L2 communication, can be quite high overhead. A relatively large amount of data may be required to be transferred and may be costly to network computing resources. It is necessary to achieve a certain degree of synchronicity between two (or more) networks. This is not an easy task either way. In the case of sharding, this task is significantly easier and safer, but still challenging.
As the number of shards increases, the amount of cross-shard communication tends to rise. When a user initiates a transaction that spans multiple shards, information must flow between those shards. This communication includes validating the transaction, updating the global state, and ensuring consistency. As the number of shards grows, the complexity of managing cross-shard communication increases. Coordinating interactions between numerous shards become more intricate.
Teams must carefully consider what amount of shards is ideal. In addition to high storage requirements, cross-shard communication is one of the biggest constraints of scalability for a sharded blockchain. However, even with cross-shard communication overhead, it is possible to achieve much higher scalability than what a first-generation blockchain with a linear consensus protocol is capable of.
In both layered architecture and the described version of sharding, teams face the challenge of effectively transferring data and states between networks while maintaining system integrity. Essentially, they need to preserve a coherent and unified global state within a fragmented system. This must be accomplished without sacrificing efficiency, decentralization, or security. In the case of sharding, the inherent compatibility among shards provides a significant advantage. For now, decentralization is sacrificed in the case of L2s.
There is no single ideal version of sharding. Some concepts will not be so demanding on cross-shard communication. We’ll see if these concepts catch on in the future.
Sharding and L2s are not mutually exclusive strategies. A sharded blockchain, similar to Ouroboros Leios, can use L2s. Blockchain without sharding or other technologies enabling parallelization is dependent only on L2s. In such a case, the blockchain network may still be a bottleneck.
Ouroboros Leios
Similar to sharding, Ouroboros Leios achieves parallelization at the underlying protocol consensus level. The goal is to achieve near-optimal throughput to accommodate an exceptionally high volume of transactions, approaching the network’s uppermost capacity within existing constraints.
Sharding could potentially enhance transaction throughput beyond the network’s capacity as nodes are tasked with maintaining shards, thus avoiding complete ledger replication and processing all incoming transactions.
Both sharding and layered architecture strategies can be fundamental facilitators of scaling but they address a different problem compared to Ourobors Leios. It addresses scaling a blockchain to its absolute physical limits. However, this does not prevent us from comparing strategies.
Additionally, as said, Cardano scaling is not solely dependent on Ouroboros Leios. It can be extended by L2s and partner chains.
Ouroboros Leios separates transaction processing from final settlement. Transaction processing, including conflict resolution, takes place in parallel in a unified execution environment, while final settlement (ordering of transactions) in a linearly maintained blockchain.
There is no fragmentation of the global state. When validating all transactions, it is possible to reference a uniform global state regardless of which group of nodes processes the transaction. The goal is to make maximum use of the network throughput and processing power for the preparation of transactions and the execution of scripts, as these resources are significantly underutilized in the first generation of most blockchains.
Decentralized computation refers to a protocol’s ability to allow different nodes to perform computations, sharing the results reliably across the network. This means nodes do not need to repeat the same computations, enabling parallel transaction processing.
The decentralized computation uses a method called stake-based endorsing. This approach involves a randomly selected group of nodes processing and verifying information, then endorsing it by providing a signature. The signatures from all validating nodes can be compiled into a concise certificate, which is then attached to blocks.
Individual nodes won’t fully validate every transaction in the network. Instead, they’ll verify the proof that a sufficient amount of stake (associated with nodes) supports the validation. Once the required number of endorsements (approvals) is reached, other nodes can trust that the processing and validation were done correctly. This conserves network resources and increases transaction processing capacity.
The blockchain organizes Ranking Blocks (RBs) without being burdened by the size of Input Blocks and Endorsement Blocks (IBs and EBs) or the associated computations. It only handles references and certificates, which serve as verification that the necessary amount of stake validated the blocks.
Unlike layered architecture and sharding, Ouroboros Leios does not fragment the global state. Anyone can quickly verify the consistency of the global state stored on the blockchain. In layered architecture, asynchronous partial state transitions occur in Ethereum and many L2s, making it difficult to obtain a unified global state due to L2 incompatibility. Similarly, sharding involves asynchronous state transitions within shards. However, the compatibility of shards makes it easier to piece together the global state. In both cases, fragments of states must be assembled to achieve a complete global state.
Ouroboros Leios maintains one global state at all times, represented by the blockchain. References allow easy access to each individual transaction. Similar to sharding, all nodes in Ouroboros Leios use the same execution environment, transaction structure, state model, and consensus rules, minimizing the space for bugs and security vulnerabilities. There is no need to build bridges between nodes.
All nodes participate in the network’s decentralization and maintain the blockchain, with the workload randomly distributed among nodes for transaction processing. With Ouroboros Leios, Cardano will remain a consistent blockchain that efficiently uses resources for higher throughput.
Scalability limits are primarily determined by the network’s physical capabilities. While all nodes maintain the blockchain, this imposes a relatively low burden. The remaining resources can be used for parallel processing. Depending on parameter settings, a certain number of virtual threads in the network will process transactions in parallel. However, there will not be many of them, as the random allocation of nodes is based on stake. If the endorsement threshold is high, parallelization will be relatively low, as a significant portion of nodes will validate the same set of new transactions.
With sharding, it is theoretically possible to achieve a higher level of parallelization. This may or may not mean fundamentally higher throughput.
Conclusion
Measured by transactions per second (TPS), the layered architecture would be the winner. However, its high degree of centralization, inability to verify the global state, and unresolved cross-chain communication issues make it less appealing. The blockchain essentially becomes isolated from processing and serves only as a storage layer for L2 states.
Sharding would come second, as it allows the ledger to be fragmented into many relatively autonomous and mutually compatible sub-blockchains. While cross-shard communication can be a significant challenge, only a growing number of users will determine if a sharded blockchain can remain effective.
Ouroboros Leios might not win in a TPS contest and could perform similarly to sharding at best. However, blockchain is not solely about TPS; it also involves decentralization, security, integrity, auditability, and more. Holistically, Ouroboros Leios represents a conservative approach to building a blockchain.
One instance of Ouroboros Leios can theoretically function as a single shard, suggesting that Cardano could implement sharding where each shard maximizes the subnet’s throughput. Cross-shard communication could be relatively easy due to the UTxO model, but it would still be a significant burden, similar to regular sharding. A combination of Ouroboros Leios and L2s might be a better direction for Cardano. The UTxO model and determinism could play a crucial role in building ZK Rollups, allowing for different technologies than those used in Ethereum.
L2s are sometimes compared to sharding, although the differences between the two are significant. Ethereum can be seen as a main shard for surrounding child shards. However, different execution environments complicate everything needed to obtain global state. For that reason, I think that Ethereum is not a global computer as it is sometimes marketed. Sharded blockchains and Cardano with Ouroboros Leios are much closer to achieving this goal.