DistUpgradeProcessImprovements

Revision 29 as of 2006-11-29 19:26:53

Clear message

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

The upgrade experience from dapper->edgy was not good for a lot of people. This spec tries to identify what caused the problems and what we can do to fix them.

Rationale

Currently there are situations that can make the dist-upgrade fail. In the worst case, this means that the system becomes unbootable or that X won't start. We need to make sure that even when errors happen during the upgrade the system is still bootable and X will still work.

Use cases

1. Alice heard that Ubuntu is a great distro. She runs a script in the forums that automatically installs multimedia stuff. When she upgrades later, the upgrader detects this and works around the issue.

2. Bob has installed some python modules manually. When he upgrades, a python package postinst fails because of this. The upgrade goes on and only the affected package is reported as problematic, the rest is installed fine.

Scope

There are various ways to attack the problem. One is AutomaticUpgradeTesting to find errors early and automatically. Next we need to make sure that packages/postinst scripts with errors can not trash the system (to the extent that this is possible). An option to test/roll-back an upgrade would be good as well, but this is technically very challenging. We should add a option to automatically (or semi-automatically) send in problem reports when the upgrade failed, using apport for this if feasible.

Design

The following things in the ReleaseUpgrader needs to be improved:

  1. Upgrade calculation
  2. Recover from third party packages
  3. Error handling during the upgrade (maintainer scripts)
  4. Deal with a changing environment (themes/libraries) during the upgrade
  5. Better SRU support
  6. Misc improvments

Improve the upgrade calculation

We should test a new algorithm for the ReleaseUpgrade calculation. It should work like this:

  1. Upgrade all essential packages
  2. Upgrade all packages in main and set them to protected
  3. Force problem resolution on them. Because no packages in main depend on packages outside main, this set should be self-contained.
  4. Do the same for unsupported packages and make sure that we do not interfere with main

Recover from problems caused by third party packages

Currently we do not offer a fallback if we can't do a dist-upgrade and still keep the {ubuntu,kubuntu,edubuntu,xubuntu}-desktop installed. This can happen when third party packages are installed (e.g. for dapper->edgy when compiz was enabled). Instead of just showing a error message we should offer a mode that will create a high pin on the Ubuntu archive to force downgrades. This should ensure that we get only official packages. Because downgrades are not a good idea in general we will only do this as a last resort and print a big warning to the user.

Fixing the error handling

The error handling for failed maintainer scripts needs to be improved. Currently apt will stop after dpkg reports a error. It should instead report this error to the frontend and keep going with the upgrade until there are only broken packages left. This requires changes in libapt. A new APT::DPkgPM::StopOnErrors will be used to control the behaviour.

Fixing the environment changes problem

The problem of the changes in the environment needs to be attacked from two directions. Firstly we need to make sure that we run with known working environment as far as possible. This involves switching the theme before the upgrade (and switching back after).

We also need to make sure that even if the GUI crashes during the upgrade we can recover better from it. This means that we need to keep all state of the upgrade in a separate process that won't die if the frontend dies. All communication between frontend and backend is done via a socket and a very simple protocol (modeled after the debconf protocol) that can set progress information and the current state. During the upgrade we need to copy the input/output of dpkg so that we can still present all progress in a vte GTK widget (or the equivalent for Qt). We use vte_terminal_set_pty() to interact with the running dpkg. If the GUI goes away the backend can try to restart it and (if that does not work) fallback to a text based UI to ensure that the upgrade is actually fully performed.

Better SRU support

Similar to the new requirements from StableReleaseUpdates we need a way to test the ReleaseUpgrader in $dist-proposed. To do this we will add a new switch to update-manager "--proposed" that will make it look for a meta-release-proposed file. Only users who explicitly run update-manager with this switch will get the ReleaseUpgrader from proposed.

CDRom upgrade

If a upgrade is run from the CD and the user selects that the network should be used, the upgrader needs to check for a upgrade of itself and download it. This way we can fix potential problems later.

Dealing with derivatives

In order to better support kububuntu, xuubntu and edubuntu we need to support different localizable announcements for them. This will be done bassed on the installed metapackage.

Implementation

The existing update-manager release-upgrader code need to be modified to implement the above requirements.

Testing

In addition to the auto-testing we need to cover the following test scenarios:

  1. ubuntu-desktop install with incompatible packages. We should try to model the situation with compiz where xserver-xorg couldn't be upgraded.

Analysing the problems dapper->edgy

The following problems were reported in launchpad for the dapper->edgy upgrade:

  1. upgrade could not be calculated (e.g. with unofficial compiz: #58424)
  2. {pre,post}inst failures (e.g. firestarter, python-$foo: #56779, #59932, #64615, #67913, #67996, #68378, #69019, #69104, #59347, #63450, #66347, #67368, #67559, #67696,#68177,#66702, #68765)
  3. X didn't ome up (#67069)
  4. kernel wouldn't boot (#68848) other hardware regressions (#62628)
  5. upgrader crashes because of environment changes (e.g. theme changes: #68027, #69124)
  6. upgrader crashes because of programming errors (#68553)
  7. system behaves differently in a fundamental way after the upgrade (#69145,#69059,#69208,#67803,#64909)
  8. misc problems that makes the upgrade difficult (#69051, #68467, #67090, #59946)

Here is a list with all the identified upgrade bugs so far:BR https://launchpad.net/distros/ubuntu/+bugs?field.tag=edgy-upgrade

Here is a list with all the identified upgrade regressions:BR https://launchpad.net/distros/ubuntu/+bugs?field.tag=edgy-regression

The data we have so far indicates that most of the problems are caused by failures in maintainer scripts. This means that apt/update-manager needs to better prepared for those and should try to ignore these errors as much as possible.

Another source of problems were the changes in the environment during the upgrade itself. Those are the hardest to protect against. If e.g. the theme engine becomes suddenly broken for a certain amount of time, this can cause crashes in the upgrader during the process.

The current dist-upgrader on the alternate CD is not able to update itself from the net. It should do this when it finds a network connection.

The above mentioned problems should be attacked by the following means:

  1. Should probably mention that using apt-get dist-upgrade will calculate a upgrade but that this upgrade will most likely not be a good one because e.g. ubuntu-desktop can't be upgraded. It may also have a fallback mode for experts that needs to be explicitly enabled in which it asks if it should try to force the upgrade by using pinning/downgrades. The way the upgrade is calculated should also be rethought. The current dist-upgrade tries to upgrade all installed package. We should instead upgrade all installed packages from main first and then lock this selection and try to upgrade the rest.

  2. Find as many postinst problems as possible with automatic testing (see AutoDistUpgradeTesting). If we still have failing postinsts then those should be logged as problems but libapt should try to continue as long as possible (and make sure that the dist-upgrader can use selected backports during the upgrade)

  3. We should spend time to have a recovery mode (vesa/vga) in X (this is covered by the BulletProofX spec)
  4. Get as much real-world testing as possible (asking users to test the live CD maybe?), otherwise not a lot we can do about hardware regressions.
  5. We should probably force the upgrader to change itself to a theme that we have tested (human). We also need to redesign the current architecture of the ReleaseUpgrader so that the actual upgrade process is running in a separate process with its own pty and if it detects that the GUI frontend went away for some reason it will fall back to the text terminal and presents a message that the upgrade is still running so that the user does not press reset. The pty needs to be sent across to the frontend. vte_terminal_set_pty() is used to interact with the dpkg terminal output.

  6. Testing
  7. Testing

BoF agenda and discussion


CategorySpec