KarmicUnpackDuringDownload

Differences between revisions 2 and 3
Revision 2 as of 2009-05-26 12:20:19
Size: 4307
Editor: 80
Comment:
Revision 3 as of 2009-06-12 20:36:00
Size: 7911
Editor: 99-156-85-10
Comment: Added Kiko's EXCELLENT notes from the UDS session
Deletions are marked like this. Additions are marked like this.
Line 60: Line 60:

=== UDS Session Notes ===
The problem:
    
    Currently apt operations download every packages file; then do an apt-list changes, then run a dpkg on everything, and dpkg does what it needs to do, unpack, etc, in an order which depends partially on apt and dpkg.

The proposal:

    It may save substantial time to start installation as soon as packages have been downloaded and are ready for installation.

Considerations:

  * Mark's original idea was pre-unpacking the files and leaving them in a state which is immediately usable by dpkg.

  * Colin suggested unpacking and not pre-unpacking (as in dpkg --complete); possibly similarly to downloading and installing as self-contained sets of packages become available for installation. This is because we can't leave the system in a half-upgraded situation in the case of an aborted download.

  * Mark points out that the unpacking approach doesn't have the dependency calculations themselves; you take advantage of the fact that.

  * Steve L. reminded us that the disk space requirements could go up give the fact that we are unpacking multiple packages. Ordering the downloads to match install order as closely as possible would be the best possible situation.

  * The scenario Mark proposes is a change to dpkg which allows it use a cache containing a pre-expanded package; if the cache isn't there, dpkg just goes on with installation as normal, but if it's there, then it is used.

  * We'd have to unpack into the same filesystem as the target directory; dpkg already creates subdirectories anyway. Mark points out that it's dangerous to trust this cache if an installation is aborted; underlined the need for some approach to ensure that the package is complete, the right version, etc.

Considering the above, there was discussion around where to pre-unpack. Colin and Steve L. pointed out that dpkg already does this; they could just add a file that pointed out where we had stopped. Mark still thought that putting all the caches in a separate directory would be safer, but Colin says there are many reasons to just use the default dpkg name.

  * Lars points out that the amount of data installed outside of /usr is going to be minimal.

  * There is a UI issue with having apt and debconf (or anything which involves a user prompt) run together; there's a progress bar displayed for downloads, and running debconf would require displaying stuff on the console.

  * Adam pointed out that uncompressing (instead of the full unpacking) during download would definitely be a lot simpler, and possibly a win in itself, given the CPU cost of it. Lars however countered by saying that in general the writing to disk will be slower; Colin reminded us that .lzma decompression is significantly slower so might benefit more from this approach.

  * Colin discussed vendor hooks being added to dpkg this release.

Next steps:

  * Profile dpkg steps; what takes the most time: uncompression, untarring, package configuration and maintainer scripts. Mark reminds us: the larger the download, the larger the benefit.

  * Profiling different compression mechanisms.

  * Michael points out that a prototype that just pre-unpackaged would not be too complicated to build.

  * Mark suggested doing a timing experiment on a release upgrade:
   1. apt-get --download-only
   1. unpack everything
   1. start watch
   1. install everything based on an unpacked set
   1. stop watch
 and comparing it to:
   1. apt-get --download-only
   1. start watch
   1. install everything
   1. stop watch

Summary

Discussion of installing packages DURING the download of multiple packages, versus AFTER the download of all packages completes.

Release Note

This section should include a paragraph describing the end-user impact of this change. It is meant to be included in the release notes of the first release in which it is implemented. (Not all of these will actually be included in the release notes, at the release manager's discretion; but writing them is a useful exercise.)

It is mandatory.

Rationale

This should cover the _why_: why is this change being proposed, what justifies it, where we see this justified.

User stories

Assumptions

Design

You can have subsections that better describe specific parts of the issue.

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:

UI Changes

Should cover changes required to the UI, or specific UI that is required to implement this

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Migration

Include:

  • data migration, if any
  • redirects from old URLs to new ones, if any
  • how users will be pointed to the new way of doing things, if necessary.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

UDS Session Notes

The problem:

  • Currently apt operations download every packages file; then do an apt-list changes, then run a dpkg on everything, and dpkg does what it needs to do, unpack, etc, in an order which depends partially on apt and dpkg.

The proposal:

  • It may save substantial time to start installation as soon as packages have been downloaded and are ready for installation.

Considerations:

  • Mark's original idea was pre-unpacking the files and leaving them in a state which is immediately usable by dpkg.
  • Colin suggested unpacking and not pre-unpacking (as in dpkg --complete); possibly similarly to downloading and installing as self-contained sets of packages become available for installation. This is because we can't leave the system in a half-upgraded situation in the case of an aborted download.
  • Mark points out that the unpacking approach doesn't have the dependency calculations themselves; you take advantage of the fact that.
  • Steve L. reminded us that the disk space requirements could go up give the fact that we are unpacking multiple packages. Ordering the downloads to match install order as closely as possible would be the best possible situation.
  • The scenario Mark proposes is a change to dpkg which allows it use a cache containing a pre-expanded package; if the cache isn't there, dpkg just goes on with installation as normal, but if it's there, then it is used.
  • We'd have to unpack into the same filesystem as the target directory; dpkg already creates subdirectories anyway. Mark points out that it's dangerous to trust this cache if an installation is aborted; underlined the need for some approach to ensure that the package is complete, the right version, etc.

Considering the above, there was discussion around where to pre-unpack. Colin and Steve L. pointed out that dpkg already does this; they could just add a file that pointed out where we had stopped. Mark still thought that putting all the caches in a separate directory would be safer, but Colin says there are many reasons to just use the default dpkg name.

  • Lars points out that the amount of data installed outside of /usr is going to be minimal.
  • There is a UI issue with having apt and debconf (or anything which involves a user prompt) run together; there's a progress bar displayed for downloads, and running debconf would require displaying stuff on the console.
  • Adam pointed out that uncompressing (instead of the full unpacking) during download would definitely be a lot simpler, and possibly a win in itself, given the CPU cost of it. Lars however countered by saying that in general the writing to disk will be slower; Colin reminded us that .lzma decompression is significantly slower so might benefit more from this approach.
  • Colin discussed vendor hooks being added to dpkg this release.

Next steps:

  • Profile dpkg steps; what takes the most time: uncompression, untarring, package configuration and maintainer scripts. Mark reminds us: the larger the download, the larger the benefit.
  • Profiling different compression mechanisms.
  • Michael points out that a prototype that just pre-unpackaged would not be too complicated to build.
  • Mark suggested doing a timing experiment on a release upgrade:
    1. apt-get --download-only
    2. unpack everything
    3. start watch
    4. install everything based on an unpacked set
    5. stop watch
  • and comparing it to:
    1. apt-get --download-only
    2. start watch
    3. install everything
    4. stop watch

notes:mvo

unpack while downloading

currently: - download all - install all (serialized)

- use bootchart to profile the package install time

Optimization ideas: - run dpkg --unpack to a different place while downloading - *but* preinst is also run - needs to be on the same filesystem

- would have to parition the download into self-contained

  • groups

- big groups of pkgs are good (because dpkg database handling

  • is slow)

- next steps:

  • - benchmark to figure out what really takes the time
    • - could be maintainer scripts - could be unpack - could be configure
    - so figure out how much time saving we get

- marks "cache" idea:

  • - download package - when package downloaded (pre)unpack it into a "cache" area
    • (that can well be just the regular name/destination that dpkg
      • uses anyway)
    - when finished all downloads, run dpkg normally - if anything fails blow away the cache - problem with this idea: needs a lot of diskspace Cache design: - add indexfile to /var/lib/dpkg/... that records what dpkg
    • has done in order to cleanly rollback
    - new pre-unpacked, half-pre-unpacked, etc in the status file Cache design(2): - mark the pre-unpacked with a different filename (foo.preunpack.dpkg-tmp)
    • or (foo.preunpacked.$sha1.dpkg-tmp) to make cleanup easier
    Cache design(3): - unpack into a seperate dir

Simple idea: - what about data.tar.gz unpacking in /var/cache/apt/archives

  • when a download is finished (profile this!)

future idea (tricky): - split it up into paritions - when a download set is finished, keep downloading - install packages when a set is self-contained (while still downloading) - problems: debconf --preconfigure (that is optional but results in

  • questions during the install)

- progress bar when a package is asking questions

problems with the idea: - apt-listchanges - error handling - selinux labeling handling


CategorySpec

FoundationsTeam/Specs/KarmicUnpackDuringDownload (last edited 2009-06-17 08:45:21 by p54A659AF)