DistUpgradeProcessImprovements

Differences between revisions 27 and 28
Revision 27 as of 2006-11-29 15:34:57
Size: 11842
Editor: 82-69-40-219
Comment: initial proofreading pass
Revision 28 as of 2006-11-29 19:23:36
Size: 12421
Editor: p54A65BD0
Comment:
Deletions are marked like this. Additions are marked like this.
Line 20: Line 20:
1. Alice heard that Ubuntu is a great distro. She runs a script in the forums that automatically installs multimedia stuff. When she upgrades later, a problem caused by that script makes the system unstable, and she is disappointed by Ubuntu and the upgrade process. 1. Alice heard that Ubuntu is a great distro. She runs a script in the forums that automatically installs multimedia stuff. When she upgrades later, the upgrader detects this and works around the issue.
Line 22: Line 22:
2. Bob has installed some python modules manually. When he upgrades, a python package postinst fails because of this and the upgrade is not fully completed. He decides to try a different distro because of that. 2. Bob has installed some python modules manually. When he upgrades, a python package postinst fails because of this. The upgrade goes on and only the affected package is reported as problematic, the rest is installed fine.
Line 28: Line 28:
== Requirements == == Design ==
Line 30: Line 30:
 * We need a way to stage new versions of the upgrade tool in $dist-proposed so that it can be tested similar to the new requirements from StableReleaseUpdates.
 * The updater needs to be able to update itself when run from the CD
 * we should consider a message in apt-get dist-upgrade to make clear that the preferred method should be the ReleaseUpgrader
 * Different announcements for different ubuntu releases (edubuntu/kubuntu/xubuntu) and i18n able
The following things in the ReleaseUpgrader needs to be improved:
 1. Upgrade calculation
 1. Recover from third party packages
 1. Error handling during the upgrade (maintainer scripts)
 1. Deal with a changing environment (themes/libraries) during the upgrade
Line 35: Line 36:
== Analysing the problem == === Improve the upgrade calculation ===

We should test a new algorithm for the ReleaseUpgrade calculation. It should work like this:
 1. Upgrade all essential packages
 1. Upgrade all packages in main and set them to protected
 1. Force problem resolution on them. Because no packages in main depend on packages outside main, this set should be self-contained.
 1. Do the same for unsupported packages and make sure that we do not interfere with main

=== Recover from problems caused by third party packages ===

Currently we do not offer a fallback if we can't do a dist-upgrade and still keep the {ubuntu,kubuntu,edubuntu,xubuntu}-desktop installed. This can happen when third party packages are installed (e.g. for dapper->edgy when compiz was enabled). Instead of just showing a error message we should offer a mode that will create a high pin on the Ubuntu archive to force downgrades. This should ensure that we get only official packages. Because downgrades are not a good idea in general we will only do this as a last resort and print a big warning to the user.

=== Fixing the error handling ===

The error handling for failed maintainer scripts needs to be improved. Currently apt will stop after dpkg reports a error. It should instead report this error to the frontend and keep going with the upgrade until there are only broken packages left. This requires changes in libapt. A new APT::DPkgPM::StopOnErrors will be used to control the behaviour.

=== Fixing the environment changes problem ===

The problem of the changes in the environment needs to be attacked from two directions. Firstly we need to make sure that we run with known working environment as far as possible. This involves switching the theme before the upgrade (and switching back after).

We also need to make sure that even if the GUI crashes during the upgrade we can recover better from it. This means that we need to keep all state of the upgrade in a separate process that won't die if the frontend dies. All communication between frontend and backend is done via a socket and a very simple protocol (modeled after the debconf protocol) that can set progress information and the current state. During the upgrade we need to copy the input/output of dpkg so that we can still present all progress in a vte GTK widget (or the equivalent for Qt). We use vte_terminal_set_pty() to interact with the running dpkg. If the GUI goes away the backend can try to restart it and (if that does not work) fallback to a text based UI to ensure that the upgrade is actually fully performed.

=== Better testing ===

Similar to the new requirements from StableReleaseUpdates we need a way to test the ReleaseUpgrader in $dist-proposed. To do this we will add a new switch to update-manager "--proposed" that will make it look for a meta-release-proposed file. Only users who explicitly run update-manager with this switch will get the ReleaseUpgrader from proposed.

=== CDRom upgrade ===

If a upgrade is run from the CD and the user selects that the network should be used, the upgrader needs to check for a upgrade of itself and download it. This way we can fix potential problems later.

=== Dealing with derivatives ===

In order to better support kububuntu, xuubntu and edubuntu we need to support different localizable announcements for them. This will be done bassed on the installed metapackage.

== Implementation ==

The existing update-manager release-upgrader code need to be modified to implement the above requirements.

=== Testing ===

In addition to the auto-testing we need to cover the following test scenarios:
 1. ubuntu-desktop install with incompatible packages. We should try to model the situation with compiz where xserver-xorg couldn't be upgraded.



== Analysing the problems dapper->edgy ==
Line 60: Line 106:
== Design ==
Line 71: Line 115:

=== Improve the upgrade calculation ===

We should test a new algorithm for the ReleaseUpgrade calculation. It should work like this:
 1. Upgrade all essential packages
 1. Upgrade all packages in main and set them to protected
 1. Force problem resolution on them. Because no packages in main depend on packages outside main, this set should be self-contained.
 1. Do the same for unsupported packages and make sure that we do not interfere with main

=== Recover from problems caused by third party packages ===

Currently we do not offer a fallback if we can't do a dist-upgrade and still keep the {ubuntu,kubuntu,edubuntu,xubuntu}-desktop installed. This can happen when third party packages are installed (e.g. for dapper->edgy when compiz was enabled). Instead of just showing a error message we should offer a mode that will create a high pin on the Ubuntu archive to force downgrades. This should ensure that we get only official packages. Because downgrades are not a good idea in general we will only do this as a last resort and print a big warning to the user.

=== Fixing the error handling ===

The error handling for failed maintainer scripts needs to be improved. Currently apt will stop after dpkg reports a error. It should instead report this error to the frontend and keep going with the upgrade until there are only broken packages left. This requires changes in libapt. A new APT::DPkgPM::StopOnErrors will be used to control the behaviour.

=== Fixing the environment changes problem ===

The problem of the changes in the environment needs to be attacked from two directions. Firstly we need to make sure that we run with known working environment as far as possible. This involves switching the theme before the upgrade (and switching back after).

We also need to make sure that even if the GUI crashes during the upgrade we can recover better from it. This means that we need to keep all state of the upgrade in a separate process that won't die if the frontend dies. All communication between frontend and backend is done via a socket and a very simple protocol (modeled after the debconf protocol) that can set progress information and the current state. During the upgrade we need to copy the input/output of dpkg so that we can still present all progress in a vte GTK widget (or the equivalent for Qt). We use vte_terminal_set_pty() to interact with the running dpkg. If the GUI goes away the backend can try to restart it and (if that does not work) fallback to a text based UI to ensure that the upgrade is actually fully performed.

== Implementation ==

The existing update-manager release-upgrader code need to be modified to implement the above requirements.

=== Testing ===

In addition to the auto-testing we need to cover the following test scenarios:
 1. ubuntu-desktop install with incompatible packages. We should try to model the situation with compiz where xserver-xorg couldn't be upgraded.

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

The upgrade experience from dapper->edgy was not good for a lot of people. This spec tries to identify what caused the problems and what we can do to fix them.

Rationale

Currently there are situations that can make the dist-upgrade fail. In the worst case, this means that the system becomes unbootable or that X won't start. We need to make sure that even when errors happen during the upgrade the system is still bootable and X will still work.

Use cases

1. Alice heard that Ubuntu is a great distro. She runs a script in the forums that automatically installs multimedia stuff. When she upgrades later, the upgrader detects this and works around the issue.

2. Bob has installed some python modules manually. When he upgrades, a python package postinst fails because of this. The upgrade goes on and only the affected package is reported as problematic, the rest is installed fine.

Scope

There are various ways to attack the problem. One is AutomaticUpgradeTesting to find errors early and automatically. Next we need to make sure that packages/postinst scripts with errors can not trash the system (to the extent that this is possible). An option to test/roll-back an upgrade would be good as well, but this is technically very challenging. We should add a option to automatically (or semi-automatically) send in problem reports when the upgrade failed, using apport for this if feasible.

Design

The following things in the ReleaseUpgrader needs to be improved:

  1. Upgrade calculation
  2. Recover from third party packages
  3. Error handling during the upgrade (maintainer scripts)
  4. Deal with a changing environment (themes/libraries) during the upgrade

Improve the upgrade calculation

We should test a new algorithm for the ReleaseUpgrade calculation. It should work like this:

  1. Upgrade all essential packages
  2. Upgrade all packages in main and set them to protected
  3. Force problem resolution on them. Because no packages in main depend on packages outside main, this set should be self-contained.
  4. Do the same for unsupported packages and make sure that we do not interfere with main

Recover from problems caused by third party packages

Currently we do not offer a fallback if we can't do a dist-upgrade and still keep the {ubuntu,kubuntu,edubuntu,xubuntu}-desktop installed. This can happen when third party packages are installed (e.g. for dapper->edgy when compiz was enabled). Instead of just showing a error message we should offer a mode that will create a high pin on the Ubuntu archive to force downgrades. This should ensure that we get only official packages. Because downgrades are not a good idea in general we will only do this as a last resort and print a big warning to the user.

Fixing the error handling

The error handling for failed maintainer scripts needs to be improved. Currently apt will stop after dpkg reports a error. It should instead report this error to the frontend and keep going with the upgrade until there are only broken packages left. This requires changes in libapt. A new APT::DPkgPM::StopOnErrors will be used to control the behaviour.

Fixing the environment changes problem

The problem of the changes in the environment needs to be attacked from two directions. Firstly we need to make sure that we run with known working environment as far as possible. This involves switching the theme before the upgrade (and switching back after).

We also need to make sure that even if the GUI crashes during the upgrade we can recover better from it. This means that we need to keep all state of the upgrade in a separate process that won't die if the frontend dies. All communication between frontend and backend is done via a socket and a very simple protocol (modeled after the debconf protocol) that can set progress information and the current state. During the upgrade we need to copy the input/output of dpkg so that we can still present all progress in a vte GTK widget (or the equivalent for Qt). We use vte_terminal_set_pty() to interact with the running dpkg. If the GUI goes away the backend can try to restart it and (if that does not work) fallback to a text based UI to ensure that the upgrade is actually fully performed.

Better testing

Similar to the new requirements from StableReleaseUpdates we need a way to test the ReleaseUpgrader in $dist-proposed. To do this we will add a new switch to update-manager "--proposed" that will make it look for a meta-release-proposed file. Only users who explicitly run update-manager with this switch will get the ReleaseUpgrader from proposed.

CDRom upgrade

If a upgrade is run from the CD and the user selects that the network should be used, the upgrader needs to check for a upgrade of itself and download it. This way we can fix potential problems later.

Dealing with derivatives

In order to better support kububuntu, xuubntu and edubuntu we need to support different localizable announcements for them. This will be done bassed on the installed metapackage.

Implementation

The existing update-manager release-upgrader code need to be modified to implement the above requirements.

Testing

In addition to the auto-testing we need to cover the following test scenarios:

  1. ubuntu-desktop install with incompatible packages. We should try to model the situation with compiz where xserver-xorg couldn't be upgraded.

Analysing the problems dapper->edgy

The following problems were reported in launchpad for the dapper->edgy upgrade:

  1. upgrade could not be calculated (e.g. with unofficial compiz: #58424)
  2. {pre,post}inst failures (e.g. firestarter, python-$foo: #56779, #59932, #64615, #67913, #67996, #68378, #69019, #69104, #59347, #63450, #66347, #67368, #67559, #67696,#68177,#66702, #68765)
  3. X didn't ome up (#67069)
  4. kernel wouldn't boot (#68848) other hardware regressions (#62628)
  5. upgrader crashes because of environment changes (e.g. theme changes: #68027, #69124)
  6. upgrader crashes because of programming errors (#68553)
  7. system behaves differently in a fundamental way after the upgrade (#69145,#69059,#69208,#67803,#64909)
  8. misc problems that makes the upgrade difficult (#69051, #68467, #67090, #59946)

Here is a list with all the identified upgrade bugs so far:BR https://launchpad.net/distros/ubuntu/+bugs?field.tag=edgy-upgrade

Here is a list with all the identified upgrade regressions:BR https://launchpad.net/distros/ubuntu/+bugs?field.tag=edgy-regression

The data we have so far indicates that most of the problems are caused by failures in maintainer scripts. This means that apt/update-manager needs to better prepared for those and should try to ignore these errors as much as possible.

Another source of problems were the changes in the environment during the upgrade itself. Those are the hardest to protect against. If e.g. the theme engine becomes suddenly broken for a certain amount of time, this can cause crashes in the upgrader during the process.

The current dist-upgrader on the alternate CD is not able to update itself from the net. It should do this when it finds a network connection.

The above mentioned problems should be attacked by the following means:

  1. Should probably mention that using apt-get dist-upgrade will calculate a upgrade but that this upgrade will most likely not be a good one because e.g. ubuntu-desktop can't be upgraded. It may also have a fallback mode for experts that needs to be explicitly enabled in which it asks if it should try to force the upgrade by using pinning/downgrades. The way the upgrade is calculated should also be rethought. The current dist-upgrade tries to upgrade all installed package. We should instead upgrade all installed packages from main first and then lock this selection and try to upgrade the rest.

  2. Find as many postinst problems as possible with automatic testing (see AutoDistUpgradeTesting). If we still have failing postinsts then those should be logged as problems but libapt should try to continue as long as possible (and make sure that the dist-upgrader can use selected backports during the upgrade)

  3. We should spend time to have a recovery mode (vesa/vga) in X (this is covered by the BulletProofX spec)
  4. Get as much real-world testing as possible (asking users to test the live CD maybe?), otherwise not a lot we can do about hardware regressions.
  5. We should probably force the upgrader to change itself to a theme that we have tested (human). We also need to redesign the current architecture of the ReleaseUpgrader so that the actual upgrade process is running in a separate process with its own pty and if it detects that the GUI frontend went away for some reason it will fall back to the text terminal and presents a message that the upgrade is still running so that the user does not press reset. The pty needs to be sent across to the frontend. vte_terminal_set_pty() is used to interact with the dpkg terminal output.

  6. Testing
  7. Testing

BoF agenda and discussion

Maybe the update process could be simplified (or at least the chances of making a dist-upgrade work could be improved) by first trying to create a default system, upgrading and then trying to restore all the customizations. Of course, the user has to be informed, about what is going on and if he wants an action to be performed. There also should be a recommended action and a warning that the upgrade might fail, should anything else be selected. I imagine something like this:

1) Check for modified config-files If some are found, copy them to a new folder in the user's /home-directory (e.g. /home/user/updatebackup) and restore the defaults. If possible list the differences.

2) Check for 3rd-party/unsupported apps and repositories The repositories are already checked. The apps could be checked against the old ubuntu-version's repositories. If they are not supported (this also includes newer versions), make a list of these and then remove them. If they have generated config-files in other folders then the /home-folders, make a backup.

3) Reboot, if necessary.

4) Upgrade (change repositories, download files, install apps)

5) Try to restore the removed apps Apps should first be checked against the official repositories. Maybe they are included by now. If not, tell and warn the user, restore the removed repositories. If they include the name of the older version, rename them (e.g. "edgy" to "feisty"). Then try to install the apps again. If nothing works, write a list of the apps concerned, place it in the user's /home-folder and tell the user.

6) Try to restore the config-files If now installed apps had config-files, restore them (after backing up the defaults). The user was already warned when installing the apps. If other config-files were changed, make a backup of the new default (best named file.ubuntuname.default, e.g. xorg.conf.feisty.default), warn the user, show him the differences and ask him to confirm each restoration.

7) Make list of all changes Write a list of all the changes to the default settings (apps and config-files) and place them in the user's /home-folder.

8 ) If need be, reboot

I consider the upgrade itself the prority, so keeping all the old apps comes second. Like this the upgrade should work while (hopefully) giving the user back his old system. If some apps or config-files could not be restored, the user at least has a list and can decide, whether he still needs those. E.g. I didn't need all the modifications to my xorg.conf in dapper to make beryl work in edgy. It's better to give a user a working system he needs to fine-tune, then a broken system he probably has to wipe.

--Sokraates


CategorySpec

DistUpgradeProcessImprovements (last edited 2008-08-06 16:36:24 by localhost)