Exploring etcd

August 1, 2014

Summary

After spending some time with coreos and etcd on ec2, I’ve come away with a few lessons learned:

Provide stable IP addressing for the etcd nodes. In AWS, this means using a VPC or clustering with public (elastic) ips. After stopping and re-starting all 3 instances, I was unable to recover the original cluster built from the cloudformation stack since all 3 nodes changed IPs. I feel that there should be a way to recover a cluster with the data dirs (conf + log) intact, however I couldn’t make it work. More research is needed.
You can (perhaps, should) bootstrap your cluster without a discovery service. The etcd discovery api went down while I was working on rebuilding my cluster. This provided me a great opportunity to learn about how etcd clustering works, and further increased my suspicion of relying on external services for managing infrastructure.
If you haven’t already, take some time to get familiar with systemd before diving into coreos. I hadn’t spent much time with systemd, and ended up detouring to learn the basics so I could troubleshoot simple things like syslogs and init scripts.

For a better understanding of etcd clustering, I recommend reading these docs a few times:

Also, I’ve created an etcd reference page based on my explorations.

This blog post records the process I went through while trying to get etcd starting (and re-starting) in AWS. It’s unlikely you’ll need to perform these steps yourself since there is little benefit to manually wiping a cluster and starting over, as opposed to deploying new instances and building a new cluster from scratch. Also, if you use static IPs on your coreos nodes, you will likely not find yourself in a similar position.

Motivation

I wanted to start playing around with etcd, coreos, and fleet while on vacation, but my macbook air lacked the RAM to do this effectively. So I turned to EC2. However, not wanting to commit to running 3 m3.medium instances continuously, I wanted to make sure I could stop and start all instances without any issues or loss of data. With this goal in mind, I turned to the coreos docs to get started.

Getting started with CloudFormation

The CoreOS docs provide a guide and CloudFormation stack to get started. Getting up and running was easy. After generating a discovery token via etcd’s discovery service, I added this to the cloud-init section and booted the cluster. Feeling proud of my accomplishment, I shut my instances down and went back to my vacation.

Upon restarting my instances, I noticed that the cluster was unable to start.

$ etcdctl ls
Error: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

Each of the nodes was attempting to connect to the peers found during initial discovery, but all 3 instances had changed IPs in the meantime. And thus began my journey of (re-)discovery.

My first thought was to update the discovery url and start a new cluster. It took some digging to find out where the discovery url was being set. Here’s the general path I took.

$ systemctl status etcd
● etcd.service - etcd
   Loaded: loaded (/usr/lib64/systemd/system/etcd.service; disabled)
  Drop-In: /run/systemd/system/etcd.service.d
           └─10-oem.conf, 20-cloudinit.conf
   Active: active (running) since Thu 2014-07-31 21:47:23 UTC; 14min ago
 Main PID: 517 (etcd)
   CGroup: /system.slice/etcd.service
           └─517 /usr/bin/etcd

After peeking through the named directories, I found that the etcd parameters were being set by cloud-init in /run/systemd/system/etcd.service.d/20-cloudinit.conf.

[Service]
Environment="ETCD_ADDR=10.102.167.165:4001"
Environment="ETCD_DISCOVERY=https://discovery.etcd.io/9916944ce6c95c4273a96eadbc99c226"
Environment="ETCD_NAME=a897577772184497ab575e85c64a0808"
Environment="ETCD_PEER_ADDR=10.102.167.165:7001"

Completely forgetting how cloud-init worked, I started out by generating a new discovery token and dropping it straight into 20-cloudinit.conf. After the change, I was prompted to reload systemd and I complied before restarting etcd.

$ sudo systemctl start etcd
Warning: Unit file of etcd.service changed on disk, 'systemctl daemon-reload' recommended.
$ sudo systemctl daemon-reload
$ sudo systemctl start etcd

This appeared to work after doing this on all 3 nodes. Checking the output of the discovery service, I saw that all 3 of my hosts had entered. Once again satisfied with my progress, I shut down my instances and returned to my vacation. Upon restarting my instances, however, I was once again greeted with a broken cluster.

After some digging, I was able to determine that the discovery url had reverted to the original token. After seeing the reverted cloudinit.conf, I realized that the cloud-init that I provided with the cloudformation stack was getting reapplied in /run/ after each reboot. To properly update the discovery url, the userdata would need to be updated within cloudformation.

Reverting to manual discovery

Around this time, I noticed that https://discovery.etcd.io was no longer responding or handing out new tokens. The docs outline a procedure for generating your own token and using your own discovery cluster, but this is problematic if you’re trying to launch your first etcd cluster. No problem, let’s do it the old-fashioned way.

Since the cloudformation stack is designed to start via discovery url, I decided to delete the stack and build the cluster manually. After launching the 3 instances, and logging in, all 3 nodes were leaders of their own clusters. This made sense, since no existing cluster information was supplied, either via logs, discovery service, or explicit peers.

core-01$ curl -L http://127.0.0.1:4001/v2/machines
http://10.179.184.130:4001
core-01$ curl -L http://127.0.0.1:4001/v2/leader
http://10.179.184.130:7001

core-02$ curl -L http://127.0.0.1:4001/v2/machines
http://10.102.162.181:4001
core-02$ curl -L http://127.0.0.1:4001/v2/leader
http://10.102.162.181:7001

core-03$ curl -L http://127.0.0.1:4001/v2/machines
http://10.179.183.39:4001
core-03$ curl -L http://127.0.0.1:4001/v2/leader
http://10.179.183.39:7001

Reading through the etcd docs on clustering and cluster discovery, it looked like manually specifying peers was what I wanted. My plan was to stop the etcd.service via systemd and then manually launch the etcd binary with the correct parameters to update the configuration and join the new cluster. Once everything was re-configured, I would restart the etcd service and, hopefully, have a working cluster.

In my first attempts, I chose to leave the logs and config intact hoping that etcd would fall through to the peers options and rebuild. This did not appear to work, and in the end, I was unable to find a way to rebuild the cluster and was forced to wipe the conf/log information before I could get things to work.

Here are the steps and log output I captured from the last successful iteration.

On the first node, stop etcd, wipe the existing config and restart etcd. By default, etcd should start a new cluster if it is not able to join an existing one.

core-01$ sudo systemctl stop etcd
core-01$ sudo rm /var/lib/etcd/{conf,log}
core-01$ sudo systemctl start etcd

core-01$ journalctl -u etcd
Aug 01 13:55:19 ip-10-179-166-134.ec2.internal systemd[1]: Started etcd.
Aug 01 13:55:19 ip-10-179-166-134.ec2.internal etcd[610]: [etcd] Aug  1 13:55:19.226 INFO      | The path /var/lib/etcd/log is in btrfs
Aug 01 13:55:19 ip-10-179-166-134.ec2.internal etcd[610]: [etcd] Aug  1 13:55:19.227 INFO      | Set NOCOW to path /var/lib/etcd/log succeeded
Aug 01 13:55:19 ip-10-179-166-134.ec2.internal etcd[610]: [etcd] Aug  1 13:55:19.227 INFO      | 927d1a8b73bd49fa98bec5070460f641 is starting a new cluster
Aug 01 13:55:19 ip-10-179-166-134.ec2.internal etcd[610]: [etcd] Aug  1 13:55:19.235 INFO      | etcd server [name 927d1a8b73bd49fa98bec5070460f641, listen on :4001, advertised url http://10.179.166.134:4001]
Aug 01 13:55:19 ip-10-179-166-134.ec2.internal etcd[610]: [etcd] Aug  1 13:55:19.235 INFO      | peer server [name 927d1a8b73bd49fa98bec5070460f641, listen on :7001, advertised url http://10.179.166.134:7001]
Aug 01 13:55:19 ip-10-179-166-134.ec2.internal etcd[610]: [etcd] Aug  1 13:55:19.236 INFO      | 927d1a8b73bd49fa98bec5070460f641 starting in peer mode
Aug 01 13:55:19 ip-10-179-166-134.ec2.internal etcd[610]: [etcd] Aug  1 13:55:19.236 INFO      | 927d1a8b73bd49fa98bec5070460f641: state changed from 'initialized' to 'follower'.
Aug 01 13:55:19 ip-10-179-166-134.ec2.internal etcd[610]: [etcd] Aug  1 13:55:19.241 INFO      | 927d1a8b73bd49fa98bec5070460f641: state changed from 'follower' to 'leader'.
Aug 01 13:55:19 ip-10-179-166-134.ec2.internal etcd[610]: [etcd] Aug  1 13:55:19.241 INFO      | 927d1a8b73bd49fa98bec5070460f641: leader changed from '' to '927d1a8b73bd49fa98bec5070460f641'.

To test the clustering, I set a key on the first node and tracked when it showed up on the others.

core-01$ etcdctl set /message "Hello"
Hello
core-01$ etcdctl ls
/coreos.com
/message

On to the second node. You can cat the cloudinit.conf to quickly spit out the configured etcd_name and addresses. You’ll want these to match so that the etcd service will start with the same parameters. I later found that you can start etcd with -f (force) instead of manually wiping the conf and log files in /var/lib/etcd. Note that starting etcd as sudo means the conf and log files are owned by root. These need to be set back to etcd:etcd before restarting the service.

core-02$ sudo systemctl stop etcd
core-02$ sudo rm /var/lib/etcd/{conf,log}
core-02$ cat /run/systemd/system/etcd.service.d/20-cloudinit.conf
[Service]
Environment="ETCD_ADDR=10.102.162.181:4001"
Environment="ETCD_NAME=e8f5ea2914184819af71fb65a7e90307"
Environment="ETCD_PEER_ADDR=10.102.162.181:7001"
core-02$ sudo etcd -peer-addr 10.102.162.181:7001 -addr 10.102.162.181:4001 -peers 10.179.184.130:7001 -data-dir /var/lib/etcd -name e8f5ea2914184819af71fb65a7e90307
core-02$ sudo chown etcd: /var/lib/etcd/{conf,log}
core-02$ sudo systemctl start etcd

et voila:

core-02$ etcdctl ls
/coreos.com
/message

Repeat once more for node 3. And on all three nodes, you should be able to verify that the machines list and leader are the same:

$ curl -L http://127.0.0.1:4001/v2/machines
http://10.179.166.134:4001, http://10.93.160.3:4001, http://10.218.166.153:4001