BootPerformance

Revision 6 as of 2009-01-26 23:07:22

Clear message

Summary

Describes the process for making improvements to the boot performance of Ubuntu, and changes that will most likely help.

Release Note

As part of an ongoing project, the boot performance of Ubuntu 9.04 has improved compared to earlier releases.

Rationale

As time has passed, the boot performance of home computers has steadily worsened; it is not longer unusual for a desktop operating system to take well over a minute to be ready from the time the user first presses the power button.

While this has been somewhat acceptable for desktops, where the user at least expects it to take a long time, increased focus on smaller and more mobile devices means that something needs to be done.

For a netbook or mobile device to be truly useful as a convenient, lightweight, computing platform; the boot must take as little time as possible so that the system is immediately useful for the user. Any longer, and they may as well use their desktop or full laptop.

Use Cases

  • It should be possible to power off desktops and work stations over night, to conserve power and energy. Lost time in the morning to switch them back on is undesirable.
  • Laptop users want to use their computer as soon as possible without waiting, even though the hardware is lower powered and slower, it should boot quickly.
  • Netbooks are marketed to be convenient and portable Internet and computing devices. They must not take any significant time to boot, otherwise it becomes more likely the owner will simply use their desktop computer instead.
  • For Ubuntu to be suitable for mobile devices such as cell phones, it must boot quickly to give the "instant on" experience most commonly expected with such commodity devices.

Definition

The very term "boot" is confusing: no two people agree exactly where the boot sequence begins, and where it ends.

From a user's point of view, the time it takes a machine to boot the time from when they first press the power button to the time that the system is fully loaded and settled down.

Quick study of any Windows user, an operating system that employs tricks of bringing a login screen up earlier and deferring many services to start during login, will show that the user doesn't trust the system until the hard drive light is off and things have stopped changing on the screen.

Put simpler, the real boot time is from the moment the user wants their machine for something to the moment they think it's ready to do that.

Unfortunately the system goes through seven distinct phases from our point of view:

  1. hardware initialisation, BIOS, etc.
  2. boot loader, including loading kernel and initramfs images
  3. kernel initialisation
  4. initramfs
  5. core system startup ("plumbing")
  6. X startup
  7. desktop startup

The first is completely out of our control, until such time as we are able to work sufficiently closely with the hardware vendor that we can remove BIOS from the equation. It's only from the second stage that we are executing code that we can modify and improve.

The boot loader is tiny and takes an immeasurably small amount of time to load, the primary amount of time is the delay to allow interaction and the loading phase afterwards. This varies from platform to platform, and is a complex subject in its own right; for the purposes of this specification, we assume the boot loader is also out of our control.

Thus our timer shall begin with kernel initialisation; this is also the easiest and most reliable measurement, since the kernel keeps its own timer from the moment its code execution begins and this serves as a very accurate clock available throughout userspace.

Our timer ends once the desktop has been loaded, all applets in the session have appeared, and the disk activity has ceased.

For the purposes of testing, auto-login is desired.

Process

There are two schools of thought as to the best approach to improving the boot time.

The first is that you take what you have, profile and chart it, and identify areas for improvement. You iterate over this process steadily reducing the time it takes for your system to boot. Usually you have no clear goal other than "faster than before".

The second is that you decide up front what time you intend to boot in, and plan a budget for your boot sequence accordingly. If you intend to reach a full desktop in ten seconds, you may only allocate two seconds to the kernel for example. Following this budget, you work on each piece individually until it's fast enough, and move on to the next piece.

It's hard to deny that this second school of thought gives brilliant results, it was the method followed by Intel's Arjan van de Ven for his "5 second boot" talk. And although their system was rather more stripped down and less flexible than a generic distribution image, his process has clear merit.

On the other hand, it also has risk associated with it. It's easiest to do by starting from scratch and putting a system together in pieces. When you have a complete distribution to upgrade, maintaining user configuration without regressions, it's somewhat harder to pull off.

At this point in time, we have a lot of low hanging fruit. Much of our core system could do with updating and generally tidying up. There's also a significant number of boot speed related bugs that we're aware of, and which would not be difficult to fix.

In short, without significant investigation, we know of enough work to fill a release cycle or more that will bring a noticeable reduction in boot speed.

Thus we will follow the first school of thought for now. We'll correct the problems we already know about, get everything up to date and cleaned up, and give ourselves a sane base to work from in future.

No doubt in a short number of releases time, we'll reach a point where we simply can't make it any faster by iterative reduction. It's at this point we'll set ourselves an aggressive budget, and begin focusing on each component individually to bring them in under budget.

Hardware

It is difficult to compare results of tests made on different pieces of hardware. Since the length of time to boot is ultimately related to the hardware underneath, especially the disk drive, changes must be made on one platform and compared there.

So that different people may work on boot performance and compare their results, it makes sense that reference hardware platforms be used. These should be systems that are stable in their configuration so that two different models should give near identical results.

This does not mean that the effect of changes on other platforms shall be discounted, indeed it is important to measure the effect generally to ensure regressions are not occurring. It's simply not useful to know that "my machine boots in 32s" unless we know how long it took to boot before that, and what was changed in the meantime.

Since the netbook form factor is one of the driving forces behind boot performance work, it makes sense to use a netbook as one of the reference hardware platforms. The chosen model is the Dell Mini 9, this has the Intel Atom processor and an SSD disk, so makes an excellent benchmark for this form factor.

It's also recommended that as the ARM architecture gains more prominence, an ARM-based platform is chosen.

Finally it's suggested that a standard laptop model, with a rotary disk, is selected to serve as the reference for the mass market.

Assumptions

Most of our efforts in improving boot performance are based on the assumption that the slowest piece of hardware in the computer is the disk drive, whether solid state or rotary; and that the efficiency of use of this piece of hardware is the key to a fast boot.

Fundamentally the boot sequence is about loading the executable code of the operating system from disk and into memory where it can be executed; whilst also loading configuration from disk and applying it.

Since the speed of the disk is relatively slow (50MB/s is considered fast) and the operating system code relatively large (up to 500MB), we have an inherent lower bound on the speed of the boot sequence (10s).

Reaching that lower bound requires utmost efficiency, if the disk is idle and further data is to be read, that is an error; and if the disk is being used for purposes that could be avoided, that is also an error.

Breaking it requires ruthless decisions about what we actually need to load during boot, which services need to be started and which configurations need to be supported.

Implementation

Investigate Behdad's posts on improving GNOME login time:

UI Changes

Code Changes

Migration

Test/Demo Plan

Unresolved issues

BoF agenda and discussion


CategorySpec