Summary

Discussion of installing packages DURING the download of multiple packages, versus AFTER the download of all packages completes.

Release Note

Ubuntu can now install package faster by doing downloads and installs in parallel.

Rationale

When installing packages the download is a separate step from the unpack/configure. While downloading the cpu and disk are mostly idle. While installing the network is idle. Doing them in parallel is a good way to utilize both systems.

User stories

Joe installs some updates and is happy to see that his system applies the updates faster now.

Assumptions

Design

The first task for this spec is to gather some data to find out what is actually taking the bulk of time during a install/upgrade. We need to gather a bootchart like diagram that gives us information about what package takes how long to unpack and to configure. Based on this we can then decide what is the optimal strategy for the parallelization.

There are various ways to do the download/install in parallel. The options include:

  1. partition the download into self-contained sets. when downloading one set is finished, start installing that and keep download the remaining sets in parallel. this requires code that idetifies the sets and some analysis how big they are and how many we have on a typical install/upgrade. Problematic with this is anything that uses the apt dpkg::pre-invoke handlers (like debconf, apt-listchanges).
  2. download packages and when a download finishes start unpacking the deb immediately (either to a new dir location or to a special filename). A problem with that approach is that on unpack the preinst is also run, so we would need a new --pre-unpack option that would skip that (and think about if that is safe in all cases). Then dpkg needs to know about the pre-unpacked files and use them instead of unpacking the deb again.
  3. download the debs and unzip them when they are finished downloading

We also need to make sure that the space requirement calculation gets updated.

Implementation

In the initial phase of this spec we gather data to see how much there is to gain from doing the work in parallel and what bits of the package installation take how much time.

The data gathering will be part of the non interactive version of the release upgrader. A new option (NonInteractive/DpkgProgressLog=(yes|no)" is provided that will write out a dpkg performance log as dpkg-progress.%i.log). It will contains the time, pkgname and dpkg action that is being performed (unpack, configure, trigger). Being able to run it non-interactive and unattended will ensure we can easily reproduce the measurements.

In addition to that, libapt is modified to send status information on when dpkg is executed (it is run multiple times with --unpack and --configure) to be able to measure the overhead of the initial dpkg database reading. It will be a "pmstatus:dpkg-exec:%percent:Running dpkg\n" style message that can then be easily extracted from the progress log.

This information is than processed with a tool (that needs to be written) that graphs this data. It may be worthwhile to gather data from /proc/stat, /proc/diskstat as well during the upgrade.

Test/Demo Plan

To test we perform a regular release upgrade with the feature turned on and off and compare the resulting file systems. They must be identical. We also time the upgrade and check how much time we won.

BoF agenda and discussion

UDS Session Notes

The problem:

The proposal:

Considerations:

Considering the above, there was discussion around where to pre-unpack. Colin and Steve L. pointed out that dpkg already does this; they could just add a file that pointed out where we had stopped. Mark still thought that putting all the caches in a separate directory would be safer, but Colin says there are many reasons to just use the default dpkg name.

Next steps:

notes:mvo

unpack while downloading

currently:

Optimization ideas:

Simple idea:

future idea (tricky):

problems with the idea:


CategorySpec

FoundationsTeam/Specs/KarmicUnpackDuringDownload (last edited 2009-06-17 08:45:21 by mvo)