A call to action: Installing a cluster of servers shouldn’t be hard (by )

One of the reasons cloud servers (AWS EC2, Google's GCE, Azure, etc) are popular - despite their eye-watering costs once you get to any sort of nontrivial scale - is that they give you a web interface and API to provision resources, rather than needing to set up your own managed cluster.

This is crazy, as managing a cluster of servers is just a software problem, and not a particularly tricky one, so there really should be a decent open source solution for doing it.

I know there's stuff like OpenStack and Kubernetes which are great once you've gotten them installed, but setting them up is harder than it should be. There's a load of accidental complexity exposed to the administrator.

So without further ado, here's a quick reminder of what the inherent complexity of managing a cluster of servers is, so that everyone knows that any other complexity they encounter is just the tools being lame:

  1. You can turn a computer into the first node in a new cluster by downloading the "First node of a new cluster" install media and booting it from that. It will boot, ask you if you'd like it to get network configuration from DHCP or if you'd like to manually configure the network, probably ask you to pick an initial admin password, then after a while, it'll report it's ready.

  2. It's now a single-node cluster, running OpenStack and/or whatever other cluster management thingies you want, and can be used as such.

  3. You can tell it you want to create a bootable USB stick that will add new nodes to the cluster. It will write an OS install image onto that stick, along with contact details of all the nodes currently in the cluster, the ID of the new node, and a cryptographic key so that the node can authenticate the cluster and the cluster can authenticate the node.
    Optionally, it will either configure the boot media to ask about network configuration, or it can have the network configuration of the new node hard-coded, or it can be hard-coded to just talk DHCP. There is also the option of creating a bootable USB stick that just contains a generic "new node key" rather than details of a new node, so it can be re-used to add multiple nodes; that's convenient if there's a lot to add, but might be a problem if the key is leaked and untrustworthy hardware adds itself to the cluster, so that option should only be used when it's worth the risk. The new-node key can be revoked and regenerated, anyway.

  4. When you boot a computer from that USB stick, it will join the cluster; if it has hard-coded network configuration in the image, then no questions will need to be answered during the process, it'll be entirely hands-off.

  5. Also, any node in the cluster should be easily configured to be a DHCP+TFTP server that offers a pxeboot service to add new nodes to the cluster - either requiring new nodes to have their MAC address registered first, or for trusted LANs, just accepting any pxeboot request from a previously unknown MAC as a new node. IP ranges to issue IPs from via DHCP will need to be configured in.

  6. And in case you want to set up multiple clusters without a manual step as per (1) above, the pxeboot server should have an option to provide a pxeboot image for the first node of a new cluster, with the cluster configuration pre-burnt into the image. This is an edge case I include purely for completeness!

  7. Any additional per-node configuration - assigning roles to the node, perhaps telling it what data to put on what storage devices if the cluster management software isn't smart enough to work that out declaratively, telling the node to shut itself down and wipe all its cryptographic secret stores then power off because it's being decommissioned, and so on - can be done through the cluster management software, once the node is in the cluster.

...and there you have it. It shouldn't be any harder than that. Things like OpenStack and Kubernetes have done great work in handling the management of a cluster once it exists, but the initial setup and subsequent expansion of clusters seems a bit neglected.

So, come on people, what's stopping it being as easy as that? We should be able to (for instance) create some Linux installer images with something like OpenStack bundled inside along with configuration scripting to do the above!

No Comments

No comments yet.

RSS feed for comments on this post.

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales