DistUpgradeProcessImprovements

Revision 24 as of 2006-11-11 00:55:23

Clear message

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

The upgrade experience from dapper->edgy was not good for a lot of people. This spec tries to identify what caused the problems and what we can do to fix them.

Rationale

Currently there are situations that can make the dist-upgrade fail. In the worst case, this means that the system becomes unbootable or that X won't start. We need to make sure that even when errors happen during the upgrade the system is still bootable and X will still work.

Use cases

1. Alice heard that ubuntu is a great distro. She runs a script in the forums that automatically installs multimedia stuff. When she upgrade later a problem caused by that script makes the system unstable and she is disappointed by ubuntu and the upgrade process.

2. Bob has installed some python modules manually. When he upgrades a python package postinst fails because of this and the upgrade is not fully completted. He decides to try a different distro because of that.

Scope

There are various ways to attack the problem. One is AutomaticUpgradeTesting to find errors early and automatically. Next we need to make sure that packages/postinst scripts with errors can not trash the system (to the extent that this is possible). An option to test/roll-back an upgrade would be good as well, but this is technically very challenging. We shoudl add a option to automatically (or semi-automatically) send in problem reports when the upgrade failed, using apport for this if feasible.

Requirements

  • We need a way to stage new versions of the upgrade tool in $dist-proposed so that it can be tested similar to the new requirements from StableReleaseUpdates.

  • The updater needs to be able to update itself when run from the CD
  • we should consider a message in apt-get dist-upgrade to make clear that the prefered method should be the ReleaseUpgrader

  • Different announcements for different ubuntu releases (edubuntu/kubuntu/xuubntu) and i18n able

Analysing the problem

The following problems were reported in launchpad for the dapper->edgy upgrade:

  1. upgrade could not be calculated (e.g. with unofficial compiz: #58424)
  2. {pre,post}inst failures (e.g. firestarter, python-$foo: #56779, #59932, #64615, #67913, #67996, #68378, #69019, #69104, #59347, #63450, #66347, #67368, #67559, #67696,#68177,#66702, #68765)
  3. X didn't came up (#67069)
  4. kernel wouldn't boot (#68848) other hardware regressions (#62628)
  5. upgrader crashes because of environment changes (e.g. theme changes: #68027, #69124)
  6. upgrader crahes because of programming errors (#68553)
  7. system behaves differently in a fundamental way after the upgrade (#69145,#69059,#69208,#67803,#64909)
  8. misc problems that makes the upgrade difficult (#69051, #68467, #67090, #59946)

Here is a list with all the identified upgrade bugs so far:BR https://launchpad.net/distros/ubuntu/+bugs?field.tag=edgy-upgrade

Here is a list with all the identified upgrade regressions:BR https://launchpad.net/distros/ubuntu/+bugs?field.tag=edgy-regression

The data we have so far indicates that most of the problems are caused by failures in maintainer scripts. This means that apt/update-manager needs to better prepared for those and should try to ignore these errors as much as possible.

Another source of problems were the changes in the environment during the upgrade itself. Those are the hardest to protect against. If e.g. the theme engine becomes suddenly broken for a certain amount of time, this can causes crashes in the upgrader during the process.

The current dist-upgrader on the alternate CD is not able to update itself from the net. It should do this when it finds a network connection.

Design

The above mentioned problems should be attacked by the following means:

  1. Should probably mention that using apt-get dist-upgrade will calculate a upgrade but that this upgrade will most likely not be a good one because e.g. ubuntu-desktop can't be upgraded. It should also make have a fallback mode in which it asks if it should try to force the upgrade by using pining/downgrades. The way the upgrade is calculated should also be rethought. The current dist-upgrade tries to upgrade all installed package. We should instead upgrade all installed packages from main first and then lock this selection and try to upgrade the rest.

  2. Find as many postinst problems as possible with automatic testing (see AutoDistUpgradeTesting). If we still have failing postinsts then those should be logged as problems but libapt should try to continue as long as possible (and make sure that the dist-upgrader can use selected backports during the upgrade)

  3. We should spend time to have a recovery mode (vesa/vga) in X (this is covered by the BulletProofX spec)
  4. Get as much real-world testing as possible (asking users to test the live-cd maybe?), otherwise not a lot we can do about hardware regressions.
  5. We should probably force the upgrader to change itself to a theme that we have tested (human). We also need to redesign the current architecture of the ReleaseUpgrader so that the actual upgrade process is runing in a seperate process with its own pty and if it detects that the GUI frontend went away for some reason it will fall back to the text-terminal and presents a message that the upgrade is still runing so that the user does not press reset. The pty needs to be send across to the frontend. vte_terminal_set_pty() is used to interact with the dpkg terminal output.

  6. Testing
  7. Testing

Improve the upgrade calculation

We should test a new algorithm for the ReleaseUpgrade calculation. It should work like this:

  1. Upgrade all essential packages
  2. Upgrade all packages in main and set them to protected
  3. Force problem resolution on them. Because no packages in main depend on packages outside, this set should be self contained.
  4. Do the same for unsupported packages and make sure that we do not interfere with main

Recover from 3rd party stuff

Currently we do not offer a fallback if we can't do a dist-upgrade and still keep the {ubuntu,kubuntu,edubuntu,xubuntu}-desktop installed. This can happen when 3rd party packages are installed (e.g. for dapper->edgy when compiz was enabled). Instead of just showing a error message we should offer a mode that will create a high pin on the ubuntu archive to force downgrades. This should ensure that we get only official packages. Because downgrades are not a good idea in general we will only do this as a last resort and print a big warning to the user.

Fixing the error handling

The error handling for failed maintainer scripts needs to be improved. Currently apt will stop after dpkg reports a error. It should instead report this error to the frontend and keep going with the upgrade until there are only broken packages left. This requires changes in libapt. A new APT::DPkgPM::StopOnErrors will be used to control the behaviour.

Fixing the environment changes problem

The problem of the changes in the environment needs to be attacked from two directions. Firstly we need to make sure that we run with known working environment as far as possible. This involves switching the theme before the upgrade (and switching back after).

We also need to make sure that even if the GUI crashes during the upgrade we can recover better from it. This means that we need to keep all state of the upgrade in a separate process that won't die if the frontend dies. All communication between frontend and backend is done via a socket and a very simple protocol (modeled after the debconf protocol) that can set progress information and the current state. During the upgrade we need to copy the input/output of dpkg so that we can still present all progress in a vte gtk widget (or the equivalent for qt). We use vte_terminal_set_pty() to interacte with the runing dpkg. If the gui goes away the backend can try to re-start it and (if that does not work) fallback to a text based UI to ensure that the upgrade is actually fully performed.

Implementation

The existing update-manager release-upgrader code need to be modified to implement the above requirements.

Testing

In addition to the auto-testing we need to cover the following test-scenarios:

  1. ubuntu-desktop install with incompatible packages. We should try to model the situation with compiz were xserver-xorg couldn't be upgraded.

BoF agenda and discussion

Maybe the update process could be simplified (or at least the chances of making a dist-upgrade work could be improved) by first trying to create a default system, upgrading and then trying to restore all the customizations. Of course, the user has to be informed, about what is going on and if he wants an action to be performed. There also should be a recommended action and a warning that the upgrade might fail, should anything else be selected. I imagine something like this:

1) Check for modified config-files If some are found, copy them to a new folder in the user's /home-directory (e.g. /home/user/updatebackup) and restore the defaults. If possible list the differences.

2) Check for 3rd-party/unsupported apps and repositories The repositories are already checked. The apps could be checked against the old ubuntu-version's repositories. If they are not supported (this also includes newer versions), make a list of these and then remove them. If they have generated config-files in other folders then the /home-folders, make a backup.

3) Reboot, if necessary.

4) Upgrade (change repositories, download files, install apps)

5) Try to restore the removed apps Apps should first be checked against the official repositories. Maybe they are included by now. If not, tell and warn the user, restore the removed repositories. If they include the name of the older version, rename them (e.g. "edgy" to "feisty"). Then try to install the apps again. If nothing works, write a list of the apps concerned, place it in the user's /home-folder and tell the user.

6) Try to restore the config-files If now installed apps had config-files, restore them (after backing up the defaults). The user was already warned when installing the apps. If other config-files were changed, make a backup of the new default (best named file.ubuntuname.default, e.g. xorg.conf.feisty.default), warn the user, show him the differences and ask him to confirm each restoration.

7) Make list of all changes Write a list of all the changes to the default settings (apps and config-files) and place them in the user's /home-folder.

8 ) If need be, reboot

I consider the upgrade itself the prority, so keeping all the old apps comes second. Like this the upgrade should work while (hopefully) giving the user back his old system. If some apps or config-files could not be restored, the user at least has a list and can decide, whether he still needs those. E.g. I didn't need all the modifications to my xorg.conf in dapper to make beryl work in edgy. It's better to give a user a working system he needs to fine-tune, then a broken system he probably has to wipe.

--Sokraates


CategorySpec