A guide to running Libra validators
As a member of the Libra Association,* Bison Trails has gained in-depth experience running the first non-Novi** validator node on the Libra testnet network. In this post we detail our learnings from that exercise and offer our recommendations for other validator node operators on how to optimize node performance.
Getting started running a validator (the short version)
Before we get into the details of some of our lessons learned, we encourage you to download and run the Libra network software. The Libra project team has open source software available on GitHub accompanied by excellent documentation on the Libra project developer’s site. They provide a guided tour of the Libra blockchain, an introduction to the Move programming language, and detailed instructions on how to build and run a validator. We won’t get down in the weeds in this post, but the distilled version of running a node via Docker is as simple as checking out the source code and either:
- Running via Docker locally by following the instructions in the “docker” directory of the testnet branch of Libra core
- Running a network on AWS using Terraform, again following the instructions in the “terraform” directory of the testnet branch of Libra core
In either case, you should use the testnet branch of the code, as it is both more stable and recommended by the Libra blockchain developer documentation.
It is relatively straightforward to run a validator using either of the methods above. We recommend you run via Docker locally at first to get a feel for the configuration of a node, see what its logs look like using the docker logs command, and understand how the validators are bootstrapped to discover each other. Once you’re comfortable locally, the Terraform deployment will launch a more realistic network of validators communicating with each other over the internet.
Our suggestions below will make the most sense to those who have already taken a crack at running the software by either of these methods, although we’ve written them to be useful if you’re here before you run your first validator.
Three ways to prepare for Mainnet
Next, we would like to share three suggestions based on our initial experience running a Libra validator and our prior experiences with other blockchain networks.
1. Persist the blockchain
When the Libra network launches, the ledger state will grow over time as new accounts are added and validated transaction executions create new versions of the ledger state. The database that stores ledger state will grow accordingly. It is important that validators and full nodes are able to recover quickly in the event that, for whatever reason, the validator process is restarted. In the worst case, a node can always, in theory, just resync the entire history starting from the genesis block, but this costly and time-consuming sync can be avoided trivially by storing the blockchain on a persistent volume.
By convention, Libra validators are typically configured to store the blockchain data in the directory “/opt/libra/data”; you can store blockchain data elsewhere by changing the storage section of /opt/libra/etc/node.config.toml, but we recommend you stick to the default location.
Figure 1: recommended storage configuration excerpted from node.config.toml
dir = "/opt/libra/data"
Regardless of which system directory your node uses to store the blockchain, you will need to mount a persistent volume at that point in the directory tree. When running via Docker, which we recommend, this is as simple as using the –volume or –mount flag to specify mount details. For example, assuming you’ve mounted a multi-TB persistent volume on the host at /data, and your configuration files are available on a secure volume /libra-config, you can invoke Docker to use the volume as follows:
Figure 2: using the volume flag to persist
$ docker run -v /data:/opt/libra/data -v /config:/opt/libra/etc libra_e2e
And, in fact, the Terraform templates provided in the Libra blockchain source code use such a configuration to store Libra blockchain data on an EBS volume.
At Bison Trails, we also have proprietary systems that periodically snapshot the blockchain data so that if we lose a volume or if a particular data center becomes unavailable (not an infrequent occurrence across the hundreds of thousands of blockchain nodes running worldwide), we can quickly start a new node with a new volume or in a different location. That said, one of the first things we did with our own Libra validator, separate and apart from these advanced types of systems, was to store blockchain directory in a persistent location.
2. Metrics and alerts
At Bison Trails, we’re accustomed to adding a monitoring layer alongside the running blockchain software so that we can anticipate and take any scaling actions required through the normal evolution of the network and can react to any unanticipated events.
In the case of the Libra blockchain, the core development team has given all validators a huge head start by shipping software that already publishes extremely useful metrics via Prometheus. Prometheus is an excellent time-series data solution that is becoming the gold standard of metrics and alerting for devops teams. The best way to experience these metrics is to run a validator network via the Terraform method described above in Getting Started Running a Validator. As you can see in the screenshot below, it provides an out-of-the-box dashboard with many of the key metrics for individual as well as network-wide nodes.
Figure 3: Libra core ships with working metrics and example dashboards
From running nodes across many networks, we have instituted a fairly broad and rigorous approach to monitoring our nodes. We view metrics in three general categories:
- System metrics, e.g. CPU/memory/disk utilization
- Blockchain node, e.g. process health, node connectivity, data transfer
- Blockchain application, e.g. block rates, transaction rates, and validation statistics
For each metric that we track, we have alerts that can broadly be classified as either critical or non-critical. With the Libra mainnet not yet launched, and the core development clipping along at an extremely aggressive pace, nobody at Bison Trails gets paged if a validator process stalls. However, as launch approaches, we will be tightening down our alert thresholds and severities, and we recommend that key performance metrics be monitored by any Libra Association member who runs a node, with alerts in place where appropriate.
3. Protect your keys
Our final suggestion relates to key management for your Libra node. First, a caveat: the practices around key management of a validator are evolving, so what we are describing here is not intended to be used for mainnet, but instead intended to get Association members and other node operators thinking about the validator keys. The methods below will certainly change as some of the operational questions around keys, key rotation, HSMs and other security questions are addressed in the coming months.
Libra validators currently run with three key pairs stored in two configuration files:
- A consensus key stored in /opt/libra/etc/consensus_keypair.config.toml
- Network identity and signing keys stored in /opt/libra/etc/network_keypair.config.toml
At Bison Trails, we use a layered approach to securing access to keys. Since a Libra validator needs to read keys from files, the following two practices apply:
- Restrict the key file permissions: whatever user, the validator process is the only process that needs to read these files, and no process needs to write to them, so we recommend the permissions mode be set to “400”, meaning the user can read, and nobody else can read or write.
- Don’t touch the disk: at a minimum we recommend that you use tmpfs volumes for your Docker image and include bootstrapping code to make the configuration files available on the tmpfs volume.
If you are just experimenting with the validators locally, there is no need to protect the keys, but it is important to note the fundamental differences between development mode and what you will want to do in production so that you are prepared for mainnet launch.
Learning more about the Libra blockchain
In this post we have shared three of the ways that Libra node operators can approach preparing their Validators and Full Nodes for mainnet. In addition to the lessons we have learned from our hands-on experience running a validator, we have also found the following resources to be extremely helpful in learning more about the Libra blockchain:
- Life of a Transaction
- Libra Whitepaper
- Libra Blockchain Technical Whitepaper
- Libra Core github repository
About Bison Trails
Bison Trails is pioneering blockchain infrastructure. Our technology platform provides enterprise-grade security, multi-cloud and multi-region distribution, and a 99.99% uptime guarantee. Our aim is to strengthen the entire blockchain ecosystem, by providing robust infrastructure for the pioneers of tomorrow.
In this post, we sketched out some of the suggestions we have to improve your Libra validator. If you have questions for us about your Libra validator, or any network software, we would love to hear from you. Contact us at email@example.com.
*On December 1, 2020, the Libra Association was renamed to Diem Association.
**Novi was first announced as Calibra in 2019.