Solana’s Plans to Enhance Resiliency

May 12, 2022

Birth Pangs Or The Sound Of The Death Knell?

From the Ethereum DAO hack, to the 2016 Shanghai attacks, to Polkadot’s multisig wallet hack in July 2017, layer 1 blockchains have a history of dealing with threats to their network and overcoming them. Similar issues also occur on Solana, however, these are but the birth pangs of a relatively youthful network that has not reached the apex of technological maturity, and are not the death knells their critics deride them as. After all, Solana’s mainnet beta network has only been alive since March 17th, 2020. Now we turn our attention to the April 30th to May 1st outage.

What Happened And What Were The Causes?

On Saturday 30th of April 2022, an outage occurred on the Solana network. This lasted till May the 1st for a total of approximately 7 hours. The cause of this network outage was an NFT minting bot swarm that tried to swipe unminted NFTs on Metaplex’s Candy Machine module, which allows NFTs to be minted into collections. This caused a deluge of transactions, reportedly between 4 million and 6 million per second, accompanied by in excess of 100 Gbps traffic, to spam the network. This network spam congested the network because, for every NFT mint that failed, it caused an invalid transaction to swamp validator nodes on Solana which are foundational elements in the network’s consensus. Metaplex subsequently instituted a 0.01 SOL fee in order to mitigate the gaming of the NFT system.

A major consideration to bear in mind is that Solana’s total TPS threshold is determined by the type of transaction. Transactions that use high computational resources will congest the network much faster than those which use the lowest computational resources, thus diminishing the upper bound of Solana’s TPS threshold. Hence why the network outage occurred at a much lower TPS threshold than the oft-cited 50,000 TPS.

The Solana Labs post-mortem regarding this recent network outage states that “There is no evidence of a denial of service attack”. However, the minting bots spammed millions of transactions per second that acted as the functional equivalent of a DDoS by incapacitating validators. This caused multiple problems for validators; some validator nodes RAM capacity was exceeded and their RAM overloaded due to a barrage of different Solana forks forming. Solana’s validator node requirements are a minimum of 128GB RAM and a “Motherboard with 256GB capacity”, thus demonstrating the computational intensity of NFT minting spam. Other nodes that were besieged with 100 Gbps traffic were cut off due to data centers detecting the influx and determining it to be a DDoS. A culmination of both these events resulted in the crashing consensus and requiring validators to restart the network.

How Did The Validators Restore Consensus?

Validators, in a display of camaraderie that is typical of the Solana ecosystem, collaborated, cooperated, and coordinated in order to restore the network with the resumption of block production occurring at 3:30 UTC on May the 1st.

In order for validators to restart block production, consensus, and resuscitate the network,

80% of SOL stake is required in order for a cluster restart to occur. To this end, 605 validators (including Figment) participated in a coordinated restart.

Validators had to conglomerate in mb-validators Discord and use the previous guide for network restarts in Solana’s documentation which was created during the previous September 2021 network outage.

Validators had to coordinate to find what is called the “highest optimistically confirmed slot”. The highest optimistically confirmed slot is used to ensure that no confirmed transactions are wiped, as this would undo previously confirmed transactions potentially jeopardizing SOL users’ funds. According to the Solana mb-validators Discord, the selected block height for this was block 131973970. A snapshot was then taken based on the instructions and then restarted. Shortly after this we announced this in discord whilst assisting other validators.

Before the outage, but during the flooding, we at Figment also transitioned our active validator node onto even higher performing hardware amidst the chain halt whilst simultaneously migrating between multiple providers to enhance overall network performance.

The Trifecta Of Mitigations: Solutions For The Future

In wake of this recent network outage. The Solana Team is busy implementing the following 3 mitigations to vastly diminish the prospect of future network instability:

QUIC

What is QUIC? QUIC is a network layer designed by Google which resides on the 4th layer of the OSI model, namely the transport layer and can be seen effectively as an improvement and an extension of the User Datagram Protocol (UDP). QUIC bolsters security and reduces the likelihood of successful DoS attacks. Although not impervious to attacks, overall security is enhanced. QUIC also  confers a litany of advantages including default TLS 1.3 encryption, enhanced speed due to less latency and less client-server connections overall are required due to QUICs design architecture allowing session reuse. Solana Labs have stated that “Once adopted, there will be many more options available to adapt and optimize data ingestion.”

Stake-weighted transaction QoS 

Stake-weighted transaction QoS is the application of stake weighting validator nodes (only allowing the percentage of staked SOL) that disallows other nodes from forwarding too many packets to the leader node. According to Solana’s blog, this has an amplification effect as “Stake-weighted QoS will be more robust in conjunction with QUIC.”

Fee-based execution priority

Fee-based execution priority is a system in which the execution priority of transactions is based on a fee which in itself is determined by the amount of compute units initially requested. The additional fee is collected only if it is included in a block.

No Omega For Mainnet Beta

Although the Solana network has seen temporary periods of relative instability, network degradation, or outright outages, it so far has always bounced back due to the resilience of the protocol architecture and a concerted effort from the Solana team and ecosystem. With all novel and fast-moving technologies, there is always a risk of security being an afterthought, but Solana has managed to begin the institution of new mitigations in order to combat these types of security issues. Solana is undergoing an evolutionary process and as the network transitions from its youth to a phase of maturity, so too will downtime and instability decrease and eventually become a thing of the past.

SHARE POST

Meet with us

Bring the Complete Staking Solution to Your Organization

Figment respects your privacy. By submitting this form, you are acknowledging that you have read and agree to our Privacy Policy, which details how we collect and use your information.