LiveMigrationSpec

Summary

Create a set of tools to help make optimal use of computing ressources by automatically migrating virtual machines between physical hosts based on a policy.

Release Note

This version of Ubuntu features ULM (Uncomplicated Live Migration) which monitors your virtual and physical machines and moves virtual machines around to achieve optimal utilisation of your computing ressources. You can schedule downtime of your physical machines and ULM will make sure your virtual machines are migrated to other servers before it shuts down. You can even use this to shut down a large amount of your physical machines outside office hours when load is low and thus save lots of money on electricity.

Rationale

A common selling point for virtualisation is the ability to move virtual machines to different physical hosts as requirements change. Up until now, this has been an entirely manual process. An administrator had to identify the need, determine the new optimum distribution of virtual machines, and finally do the grunt work of actually moving the virtual machine. The advent of live migration makes the final step less tedious if certain criteria are met, but the first two steps are still the same. Monitoring a set of physical machines for ressource availability and relating it to the requirements of the virtual machines running on them and applying an algorithm to determine that optimum distribution is the sensible way do this instead.

Use Cases

  • John's in charge of a large virtualised installation with hundreds of physical machines with a total of thousands virtual machines on them. Requirements change every day and John would like to be able to not spend all day moving virtual machines around because of this.
  • Peter cares about the environment (and by extension: his electrical bill). Outside office hours, hardly anyone uses the servers at his company, but shutting them down is inconvenient to do, and makes the system unusable in the off chance that someone happens to need it during the night, so that's not an option.

Assumptions

It is assumed that the virtual machines are using shared storage.

Design

A monitoring component will at intervals gather performance information from each node. This information will be stored and made available to the scheduler. The scheduler will take the performance information, apply the user defined policy to it and decide whether any VM's are ripe for migration, and if so, make it happen.

Implementation

  • There are patches for libvirt to make it do live migration. We need them in mainline libvirt.
  • Abuse munin for fetching performance information from the nodes.

We'll have to write a scheduler from scratch. Cool as we are, we're not in the habit of solving NP-Hard problems in scalable ways just yet, so actually finding the optimum distribution of virtual machines is slightly out of scope. Besides, there's no particular reason to actually distribute the load completely equally as long as no nodes are particularly overloaded. So, as a first step, just to get the ball rolling, we'll make a really simple scheduler that looks around for overworked nodes (e.g. sustained > 75% cpu usage), and see if the most ressource hungry vm on the node will fit anywhere else, and if not, try the second most ressource hungry vm, etc. When a new location for the vm has been found, start the migration.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release.

This need not be added or completed until the specification is nearing beta.

Outstanding Issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.


CategorySpec

LiveMigrationSpec (last edited 2008-08-06 16:14:27 by localhost)