Summary

We want to enable users to easily perform mass installations of Ubuntu on a pool of machines. We develop a console tool to intelligently manage dhcpd and syslinux configuration, and provide a GUI frontend for easy point and click configuration.

N.B. We use the term 'cluster' to mean 'a pool of machines'. While this could be a pool of cluster compute nodes, it doesn't have to be. The mass-install infrastructure also provides generic netboot management.

Rationale

We already support fully automated installation with d-i using preseeding (and/or kickstart). The Ubuntu LTSP integration already puts in place its own plumbing for netbooting thin clients. Let's put all the parts together so they are easy to use even for less experienced administrators, make them configurable, and provide a reasonable management interface.

Use cases

Design

The management tool we discuss is called 'nmt' (netboot management tool). It is able to set the next boot policy for each machine that's registered with it, or a named group of such machines.

A boot policy is a simple specification of the file that gets sent to a machine that is requesting a PXE boot. Example policies include:

nmt has the first four policies built-in. The LTSP policy would be shipped by the Ubuntu LTSP package. The tool should also support initiating the reboot remotely, but this functionality will likely be handled by the ConfigurationInfrastructure spec.

Built-in groups are:

All computers supposed to be controlled by nmt are set to PXE boot. We operate under the assumption that once turned on, it would be impractical to turn PXE booting off on a per-machine basis (as is certainly the case with, for example, computing clusters). This is why we have to provide a method to boot a client from the local disk, even though it's attempting a PXE boot from the server.

The only time a machine that's attempting PXE boots _should_ boot into its local disk, is if it's a thick client machine (possibly a HPC compute node) that's previously had Ubuntu installed on it through our automated preseed/kickstart installer. After consulting with ColinWatson, we decided that the automated installer, after finishing the stage1 install, should send a notification to the installation server that specifies its root device. The installation server keeps a mapping of MAC addresses to root devices for all automatically installed machines. Upon first receiving such notification for a machine that was previously in the unknown nmt group, the installation server automatically removes the machine from the unknown group, and places it into the built-in local boot group.

The 'boot to local disk' policy hence depends on the root device mapping, and allows us to serve (via PXE) a syslinux image which simply chains to the bootloader on the root device specified in the mapping.

The notification at the end of the first stage of the installer is received on the installation server by a tiny daemon called nmtd. nmtd runs as non-root, and its sole purposes are to receive stage1 installation notifications and receive notification about ip assignments from dhcpd. We'll have to evaluate if it is feasible to extend dhcpd with a trigger mechanism or if we have to parse the dhcpd logs in real-time. This is done to provide nmt with up-to-date information about yet unseen hosts.

nmt is a CLI tool; a GUI frontend is available for less experienced system administrators, who want to avoid dropping into the shell to configure things.

The tool supports the following actions:

Addressing the use cases

Here we explain how the tools we will build, nmt and nmtd, address each of our use cases.

He changes the policy for the 'unknown' group to 're-image', and chooses the snapshot that was just sent to the server. He boots the rest of his lab machines, which are re-imaged from the snapshot. He adds all of his lab machines to a 'lab' group in nmt with policy 'local boot', and assigns a root device mapping to the whole group. He sets up a cron script on the installation server, which changes the 'lab' group policy to 're-image' (with the previously created snapshot) every night at 2AM, and reverts it to 'local boot' 30 minutes later. Rich is done. There is world peace and much rejoicing in the streets.

* Roger maintains a set of machines that are reimaged and managed via an autoinstaller. On one machine, that happens to be far away, perhaps in a remote data center, the install hangs due to some anomaly. Roger has a set of monkeys who can examine the machine, but the issue is beyond their scope. He would like to take a look at the anomaly by perhaps SSHing into the machine. Roger would like network-console to be enabled and included in the automated installer as an option. Perhaps Roger would like an embedded syslog server in the installer that can dump log data to a log server configured by debconf.

nmt interface design

The nmt GUI interface will be designed working together with a usability person. It is not specified at this time.

Implementation and code

Both nmt and nmtd will be written in Python. They will use a SQLite database to share state. The automatic stage1 installer is to be modified to send completion notification to the installation server.

Outstanding issues

Check with dhcpd maintainer if it is feasible to extend dhcpd by a trigger mechanism on handing out ips as described above.

Needs integration with:

Notes

- Since we're already building an interface to dhcpd, nmt could also provide an easy interface to configure static IPs and other standard dhcpd functionality (outside of netboot management)

- Eventually, we will need to provide integration with NetworkWideUpdates and NetworkAuthentication, which will both be hooked into the post-installation stage of the automatic installer.

- An administrator can design a custom kickstart/preseed file. We could generate one at the end of the installation server install, which duplicates a standard Ubuntu install using defaults given at install time.

- The default password for the default preseed file will be chosen randomly, and presented when nmt is first started

ajmitch offers help with implementation. Nicolas Kassis offers a testbed for nmt.


CategorySpec

NetbootManagement (last edited 2008-08-06 16:24:21 by localhost)