3.1 Scalable Architecture Design
Last updated
Last updated
The DGT Platform is a Distributed Ledger Technology (DLT) class software, deployed to solve objectives of distributed data storage, processing, and presentation to end users of TCP/IP networks. Although the taxonomy of DLT solutions is still evolving, the figure below presents one of the well-known attempts to systematize the concepts of this area (see Hyperledger Sawtooth, Security Assessment Technical Report. The Linux Foundation, 2017 | Available on foundation site). It is easiest to imagine the DLT network as a distributed database (albeit only within certain limits, as the technology is much more versatile than just a data organization and storage system) in which there are N participants (agents, nodes), each of which has its copy of the data.
The entire network is organized in such a way as to ensure the integrity and consistency of data, provided that many participants have access to the record. The main properties of such systems include:
Distribution – means the physical separation of the network, whereas the nodes may be or are dispersed across locations and interact through the network.
Decentralization – means that different entities act as owners (managers) of nodes and that their interests and motivations may not coincide.
Encryption – the network, its operation, as well as data integrity are supported by various technical mechanisms, but the quintessential one is cryptography, which includes hashing (one-way conversion), digital signature, encryption, and similar functions.
Immutability – data entered into a distributed database is stored unchanged. It is impossible to change any record without changing the rest. In this case, historical records will be the same for all participants, and imperceptibly substituting records is impossible or extremely unlikely.
Tokenization – the ability to represent the value of work in the form of elementary components and distribute the value and costs of maintaining the network among all participants. Tokenization makes it possible to give economic meaning to distributed networks and complements the technological components of DLT systems.
Each of the listed properties is not binary and may involve a whole range of solutions. For example, the level of distribution may vary from nodes located in a single virtual space to nodes separated geographically. Moreover, if nodes play the same role and completely coincide in functionality and processes, this would be a peer-to-peer (P2P) network. But there are other types of networks, such as DGT that uses a hierarchical network with different node roles (see 3.3.8). Decentralization can also take different degrees, depending on the rules for connecting nodes to the network:
Public networks. All nodes are free to join the existing network.
Private networks. All nodes are controlled by one or a few organizations that delegated network administration to one party. In such a network, outsider nodes cannot join the network without an explicit approval/authorization procedure.
Consortium-based networks. In such networks, network participation may be relatively free, but subject to certain conditions. This is like the real situation with private banks: each business participant may want to organize their bank, but they must receive a license and accumulate some capital that would guarantee its operation.
Hybrid networks. These networks have different segments with varying access policies. Some are free to join (public), and some are closed and are either private or consortium based.
One of the most important components of decentralized systems is the rules of consensus, a mechanism for agreeing on various data that determines which of the records (transactions) will ultimately be entered into a distributed database – a ledger (see 3.7.1.2). Decentralized distributed systems do not have a built-in mechanism for synchronizing data at the time of their creation. In the asynchronous network model, the arrival of messages at a given point in the network is not guaranteed within a limited time, which may result in the possible loss of messages. Back in 1985, Fisher, Lynch, and Paterson published a theorem about the impossibility of a distributed consensus (the FLP Impossibility, the Fisher-Lynch-Paterson result). According to this theorem, a deterministic asynchronous system can have no more than two of the following three properties:
Safety – security, the guarantee that all results are correct and identical on all nodes.
Liveness – vitality, guaranteed completion of the transaction, that is, nodes that never fail to provide a result.
Fault Tolerance – a system can survive the simultaneous failure of one or more nodes.
The F-BFT algorithm that forms the foundation of the DGT platform is designed so that it prefers Fault Tolerance and Safety to Liveness. Such decisions lead to an appropriate system architecture designed to support asynchronous transaction processing. The limitations of the security model resulting from the architecture are discussed in section 3.4
Today, the name BLOCKCHAIN has become all-encompassing for a wide range of distributed and decentralized solutions. The name is based on a specific way of organizing data – literally in a chain of blocks. Classical blockchain systems pack transactions into blocks (through the so-called Merkle Tree), which in turn are interconnected so that each subsequent block contains the hash of the previous block's header. This data organization provides an easy check of data integrity and makes it impossible to replace previously saved transactions. Features of data organization are discussed in paragraph 3.6. Here we shall restrict ourselves to general remarks on approaches to data storage:
Transactions are the main object for storage. These can be structured in terms of (1) the transaction header containing important meta-information (sometimes also called the envelope), and (2) the transaction body containing the actual content of the transaction.
The very first records (the first whole block when storing data in the blockchain) are called genesis records (genesis block). Unlike all subsequent records, these initial ones do not contain references to any previous ones and are formed relatively arbitrarily.
The set of records combined with cryptographic functions forms a ledger (Ledger), which reflects the general state of the networks’ nodes – State. In this context, blockchain-like systems are State machines that move from state to state.
The general approach to data storage is done through the Merkle Tree, which is a complete binary tree. Its top leaf branches contain hashes from data blocks, while inner branches contain hashes from adding values in child branches. Such an approach allows DGT to get a fingerprint of all transactions in the block and to effectively verify it.
Other technical solutions can be used in addition (even in conjunction) with the Merkle Tree. These may include Patricia Trees, which (unlike Merkle ones) store data in the top leaf branches, while each non-leafed node is represented by a unique string symbol that identifies data (similar to hash tables). Other solutions are improvements of the Merkle Tree, such as the Prefix Merkle Tree (used by Ethereum, whereas a dynamic key is added for nodes that allow for fast sum calculations); or HashFusion (a Hewlett Packard Labs solution that allows for the calculation of hash function values to be done in stages). Each header block may contain several Merkle Trees or their equivalents, for example, for storing transactions, receipts (results of accepted transactions), or states.
Storing data in a chain of blocks (blockchain) is not the only option. Common approaches include:
Blockchain (classic option, see Hyperledger Sawtooth, Security Assessment Technical Report. The Linux Foundation, 2017 | Available on foundation site for example)
Directed graph (Directed Acyclic Graph, DAG) – used by the DGT Platform, (see Lima, C. “Developing Open and Interoperable DLT\/ Blockchain Standards [Standards]” in Computer, vol. 51, no. 11, 2018, 106–111. | Available at IEEE Site)
BlockMatrix (see Gavin Wood, Ethereum Yellow Page: A Secure Decentralised Generalised Transaction Ledger, 2022 |Available at github)
In terms of architecture, there are comparisons of centralized and decentralized systems that are used to solve similar problems. The main attributes of solutions for different types of systems are presented in the table below.
Approach Comparison
Organization
They are organized as a “client-server” system, in which the client represents the active part of the system close to the end user, while the server is its passive part responsible for responding to client requests. The concept of a 3-level or multi-level organization of a system is often used, in which a data layer, a business logic layer, and a presentation layer are distinguished.
It is assumed that the server is the center of the system, and its update or change is instantly distributed across the entire system. At the same time, real systems can use such approaches as micro-services and federated data organization, which involve significant distribution and even a departure from rigid centralization.
Organized as several/many servers (nodes), each of which can have any number of clients.
Each of the nodes acts about the other nodes both as a client and a server. The nodes may not even have direct interaction with each other, but instead connect through special bridges.
Base attributes
Performance
The performance of centralized systems is easily increased vertically by adding power to the central server, but it is practically limited by the capabilities of the network and equipment.
The performance of decentralized systems is generally not high for each node but can be scaled up horizontally: the nodes as a sum can process significantly more information than centralized systems.
At the same time, decentralized systems are sensitive to network costs.
Availability
System availability is determined by backup procedures and load balancing. At the same time, the classical solution has a single point of failure, which reduces the stability of the system.
Decentralized solutions are less dependent on the network’s fragmentation and failure of individual nodes. On the other hand, until the network reaches a certain level of maturity (volume of running nodes), it is still vulnerable to failures of critical nodes and reduction to under the critical volume of infrastructure.
Scalability
System scalability is limited by hardware.
System scalability is limited by the protocol. In theory, distributed and decentralized systems have infinite scalability.
Security
Determined by border protection.
Determined by the strength of the protocol. In theory, decentralized systems are designed to be highly resilient against attacks of all kinds.
Flexibility
The systems are flexible in terms of increasing new services of a single centralized system.
Systems are flexible if the protocol allows for an increase in functionality. For instance, Ethereum provides network flexibility by expanding functionality through smart contracts.
Maintenance
Support for a centralized system falls entirely on one of the parties and grows with an increase in the number of users and functionality.
Maintenance of individual nodes lies with the administrators of these nodes and allows for distributing the cost of maintaining the network. However, the value to be acquired versus these costs requires effective tokenization.
Interoperability
Integrability of the central system is relatively simple and is done by connecting adapters and/or agents.
Integration of decentralized systems is carried out by including nodes of a special type and building inter-network bridges. Such solutions require protocol consideration and scale worse than centralized counterparts.
Operational view
TCO
Total Cost of Ownership
Operational overheads are significant, while the system startup costs are relatively low. Costs rise as users and features increase.
Most of the costs are incurred at the stage of launching and popularizing the solution. In the future, the cost is distributed among a significant number of participants.
Use Cases
Applicable within a homogenous organizational environment. It is a classic solution for regular business models. Main applications: portals, ERP, CRM.
Has appeal for building ecosystems, integrating disparate actors, and operating in the absence of a trusted environment.
Application: cryptocurrency, asset tokenization, logistics, quality control, and more.
Advantages
Ease of use; technology maturity
Scalability and resilience through node autonomy
Limitation
Limited development potential due to scaling issues. Lack of mechanism to effectively balance the interests of participants.
Not suitable for building small systems due to the cost/benefit ratio.
The above list of architectural features allows for the conclusion that there is no panacea, the only solution right for any objective, and that there is a need to take the business model into account when forming an effective solution. This approach leads to the formation of general principles for the design of decentralized systems, which are followed in DGT:
Decentralized distributed systems based on the DGT platform greatly depend on business requirements. An effective solution requires building a special business model.
Comparing DLT solutions is a difficult task. The most important decision is to support open standards and focus on the community supporting this class’s solutions.
Tokenization is mandatory for open and consortium-based solutions, as it allows you to control the distribution of the costs.
Blockchain solutions do not provide automatic security for data and require additional protection of end applications and mechanisms for protecting private data.
System scaling is directly related to its protocol (consensus) and must be considered in balance with security.
The classical principles of building software systems are also true for decentralized solutions. Complex solutions require detailed design and justification. The modularity of solutions is a mandatory requirement for their architecture.
The architecture of private and consortium-based solutions should include configuration and support management systems or community adoption of the technology.