DistUpgradeProcessImprovements

Differences between revisions 15 and 35 (spanning 20 versions)
Revision 15 as of 2006-11-02 11:20:56
Size: 4682
Editor: p54A66D4A
Comment:
Revision 35 as of 2008-08-06 16:36:24
Size: 12648
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from DistUpgradeProcessImprovments
Line 11: Line 12:
The upgrade experience from dapper->edgy was not good for a lot of people. This spec tries to identify what caused the problems and what we can do to fix them.  The upgrade experience from dapper->edgy was not good for a lot of people. This spec tries to identify what caused the problems and what we can do to fix them.
Line 19: Line 20:
1. Alice heard that ubuntu is a great distro. She runs a script in the forums that automatically installs multimedia stuff. When she upgrade later a problem caused by that script makes the system unstable and she is disappointed by ubuntu and the upgrade process. 1. Alice heard that Ubuntu is a great distro. She runs a script in the forums that automatically installs multimedia stuff. When she upgrades later, the upgrader detects this and works around the issue.
Line 21: Line 22:
2. Bob has installed some python modules manually. When he upgrades a python package postinst fails because of this and the upgrade is not fully completted. He decides to try a different distro because of that. 2. Bob has installed some python modules manually. When he upgrades, a python package postinst fails because of this. The upgrade goes on and only the affected package is reported as problematic, the rest is installed fine.
Line 25: Line 26:
There are various ways to attack the problem. One is AutomaticUpgradeTesting to find errors early and automatically. Next we need to make sure that packages/postinst scripts with errors can not trash the system (to the extent that this is possible). An option to test/roll-back an upgrade would be good as well, but this is technically very challenging. We shoudl add a option to automatically (or semi-automatically) send in problem reports when the upgrade failed, using apport for this if feasible. There are various ways to attack the problem. One is AutomaticUpgradeTesting to find errors early and automatically. Next we need to make sure that packages/postinst scripts with errors can not trash the system (to the extent that this is possible). An option to test/roll-back an upgrade would be good as well, but this is technically very challenging. We should add a option to automatically (or semi-automatically) send in problem reports when the upgrade failed, using apport for this if feasible.
Line 27: Line 28:
== Requirements == == Design ==
Line 29: Line 30:
 * We need a way to stage new versions of the upgrade tool in $dist-proposed so that it can be tested similar to the new requirements from StableReleaseUpdates.
 * The updater needs to be able to update itself when run from the CD
The following things in the ReleaseUpgrader needs to be improved:
 1. Upgrade calculation
 1. Recover from third party packages
 1. Error handling during the upgrade (maintainer scripts)
 1. Deal with a changing environment (themes/libraries) during the upgrade
 1. Better SRU support
 1. Misc improvments
Line 32: Line 38:
== Analysing the problem == === Improve the upgrade calculation ===
Line 34: Line 40:
The following problems have been observed during the dapper->edgy upgrade: We should test a new algorithm for the ReleaseUpgrade calculation. It should work like this:
 1. Upgrade all essential packages
 1. Upgrade all packages in main and set them to protected
 1. Force problem resolution on them. Because no packages in main depend on packages outside main, this set should be self-contained.
 1. Do the same for unsupported packages and make sure that we do not interfere with main

=== Recover from problems caused by third party packages ===

Currently we do not offer a fallback if we can't do a dist-upgrade and still keep the {ubuntu,kubuntu,edubuntu,xubuntu}-desktop installed. This can happen when third party packages are installed (e.g. for dapper->edgy when compiz was enabled). Instead of just showing a error message we should offer a mode that will create a high pin on the Ubuntu archive to force downgrades. This should ensure that we get only official packages. Because downgrades are not a good idea in general we will only do this as a last resort and print a big warning to the user.

Implementation note: A high pin seems rather risky because of the risk of downgrades breaking the system even worse.

=== Fixing the error handling ===

The error handling for failed maintainer scripts needs to be improved. Currently apt will stop after dpkg reports a error. It should instead report this error to the frontend and keep going with the upgrade until there are only broken packages left. This requires changes in libapt. A new APT::DPkgPM::StopOnErrors will be used to control the behaviour.

Implementation note: This is implemented in the current apt in feisty. Because the relase-upgrader does not support backports right now (See Implementation for more details.), I would like to get this code into edgy-updates. It does not change any behaviour by default and will only act if the option is explicitely set. So the risks is low.

Implementation note: Because we do not have proper backports support this should be done as a patch to apt in edgy-proposed.

=== Fixing the environment changes problem ===

The problem of the changes in the environment needs to be attacked from two directions. Firstly we need to make sure that we run with known working environment as far as possible. This involves switching the theme before the upgrade (and switching back after).

We also need to make sure that even if the GUI crashes during the upgrade we can recover better from it. This means that we need to keep all state of the upgrade in a separate process that won't die if the frontend dies. All communication between frontend and backend is done via a socket and a very simple protocol (modeled after the debconf protocol) that can set progress information and the current state. During the upgrade we need to copy the input/output of dpkg so that we can still present all progress in a vte GTK widget (or the equivalent for Qt). We use vte_terminal_set_pty() to interact with the running dpkg. If the GUI goes away the backend can try to restart it and (if that does not work) fallback to a text based UI to ensure that the upgrade is actually fully performed.

Implementation note: Python does not support filedescriptor passing over a PIPE currently. See Implementation for more details. For the next release update-manager contains a "fdsend" module that supports file descriptor passing over sockets.

=== Better SRU support ===

Similar to the new requirements from StableReleaseUpdates we need a way to test the ReleaseUpgrader in $dist-proposed. To do this we will add a new switch to update-manager "--proposed" that will make it look for a meta-release-proposed file. Only users who explicitly run update-manager with this switch will get the ReleaseUpgrader from proposed.

Implementation note: Done, added --proposed switch

=== CDRom upgrade ===

If a upgrade is run from the CD and the user selects that the network should be used, the upgrader needs to check for a upgrade of itself and download it. This way we can fix potential problems later.

Implementation note: Done, when the user selects that he wants to use the network it will check and download the updated version.

=== Dealing with derivatives ===

In order to better support kububuntu, xuubntu and edubuntu we need to support different localizable announcements for them. This will be done bassed on the installed metapackage.

== Implementation ==

The existing update-manager release-upgrader code need to be modified to implement the above requirements.

There are some limitations that we are currently facing. The current release-upgrader is arch=all.

That makes a speperation between frontend and backend impossible because python does not support sending file descriptors over a pipe without a arch=any modules (e.g. fdsend). The https://blueprints.launchpad.net/ubuntu/+spec/dist-upgrader-arch-any spec discusses some ways to make this possible. There is a working prototype for a frontend-backend seperation code is at http://bazaar.launchpad.net/~mvo/update-manager/gui-seperation. For full operation it requires that the backend can send the pty with the attached terminal for dpkg to the frontend. This pty is then attached to the vte terminal widget or the konsole widget. I recently patched the konsole kpart to support setPtyFd().

The new APT::DPkgPM::StopOnErrors option faces the same problem. To use it, libapt needs to be upgraded. This is currently not possible (only in a very hackish way).




=== Testing ===

In addition to the auto-testing we need to cover the following test scenarios:
 1. ubuntu-desktop install with incompatible packages. We should try to model the situation with compiz where xserver-xorg couldn't be upgraded.



== Analysing the problems dapper->edgy ==

The following problems were reported in launchpad for the dapper->edgy
upgrade:
Line 36: Line 109:
 1. {pre,post}inst failures (e.g. firestarter, python-$foo: #56779, #59932, #64615, #67913, #67996, #68378, #69019, #69104, #59347, #63450, #66347, #67368, #67559, #67696,#68177,#68765)
 1. X didn't came up (#67069)
 1. kernel wouldn't boot (#68848)
 1. {pre,post}inst failures (e.g. firestarter, python-$foo: #56779, #59932, #64615, #67913, #67996, #68378, #69019, #69104, #59347, #63450, #66347, #67368, #67559, #67696,#68177,#66702, #68765)
 1. X didn't ome up (#67069)
 1. kernel wouldn't boot (#68848) other hardware regressions (#62628)
Line 40: Line 113:
 1. upgrader crahes because of programming errors (#68553)
 1. system behaves differently in a fundamental way after the upgrade (#69145,#69059,#69208,#67803)
 1. misc problems that makes the upgrade difficult (#69051, #68467, #67090)
 1. upgrader crashes because of programming errors (#68553)
 1. system behaves differently in a fundamental way after the upgrade (#69145,#69059,#69208,#67803,#64909)
 1. misc problems that makes the upgrade difficult (#69051, #68467, #67090, #59946)
Line 44: Line 117:
Here is a list with all the identified upgrade bugs so far:[[BR]] Here is a list with all the identified upgrade bugs so far:<<BR>>
Line 46: Line 119:

Here is a list with all the identified upgrade regressions:<<BR>>
https://launchpad.net/distros/ubuntu/+bugs?field.tag=edgy-regression
Line 49: Line 125:
Another source of problems were the changes in the environment during the upgrade itself. Those are the hardest to protect against. If e.g. the theme engine becomes suddenly broken for a certain amount of time, this can causes crashes in the upgrader during the process. Another source of problems were the changes in the environment during the upgrade itself. Those are the hardest to protect against. If e.g. the theme engine becomes suddenly broken for a certain amount of time, this can cause crashes in the upgrader during the process.
Line 51: Line 127:
The current dist-upgrader on the alternate CD is not able to update itself from the net. It should do this when it finds a network connection.  The current dist-upgrader on the alternate CD is not able to update itself from the net. It should do this when it finds a network connection.
Line 53: Line 129:
== Design == All of the above problems are addressed by this spec.
Line 55: Line 131:
The above mentioned problems should be attacked by the following means:
 1. Should probably mention that using `apt-get dist-upgrade` will calculate a upgrade but that this upgrade will most likely not be a good one because e.g. ubuntu-desktop can't be upgraded.
 1. Find as many postinst problems as possible with automatic testing. If we still have failing postinsts then those should be logged as problems but libapt should try to continue as long as possible (and make sure that the dist-upgrader can use selected backports during the upgrade)
 1. We should spend time to have a recovery mode (vesa/vga) in X
 1. ?
 1. We should probably force the upgrader to change itself to a theme that we have tested (human)
 1. ?
 1. ?
 1. ?
Line 65: Line 132:
== Implementation == == User comments ==
Line 67: Line 134:
=== Code === Maybe the update process could be simplified (or at least the chances of making a dist-upgrade work could be improved) by first trying to create a default system, upgrading and then trying to restore all the customizations. Of course, the user has to be informed, about what is going on and if he wants an action to be performed. There also should be a recommended action and a warning that the upgrade might fail, should anything else be selected. I imagine something like this:
Line 69: Line 136:
=== Data preservation and migration === 1) Check for modified config-files
If some are found, copy them to a new folder in the user's /home-directory (e.g. /home/user/updatebackup) and restore the defaults. If possible list the differences.
Line 71: Line 139:
== Unresolved issues == 2) Check for 3rd-party/unsupported apps and repositories
The repositories are already checked. The apps could be checked against the old ubuntu-version's repositories. If they are not supported (this also includes newer versions), make a list of these and then remove them. If they have generated config-files in other folders then the /home-folders, make a backup.
Line 73: Line 142:
== BoF agenda and discussion == 3) Reboot, if necessary.
Line 75: Line 144:
4) Upgrade (change repositories, download files, install apps)

5) Try to restore the removed apps
Apps should first be checked against the official repositories. Maybe they are included by now. If not, tell and warn the user, restore the removed repositories. If they include the name of the older version, rename them (e.g. "edgy" to "feisty"). Then try to install the apps again. If nothing works, write a list of the apps concerned, place it in the user's /home-folder and tell the user.

6) Try to restore the config-files
If now installed apps had config-files, restore them (after backing up the defaults). The user was already warned when installing the apps. If other config-files were changed, make a backup of the new default (best named file.ubuntuname.default, e.g. xorg.conf.feisty.default), warn the user, show him the differences and ask him to confirm each restoration.

7) Make list of all changes
Write a list of all the changes to the default settings (apps and config-files) and place them in the user's /home-folder.

8 ) If need be, reboot

I consider the upgrade itself the prority, so keeping all the old apps comes second. Like this the upgrade should work while (hopefully) giving the user back his old system. If some apps or config-files could not be restored, the user at least has a list and can decide, whether he still needs those. E.g. I didn't need all the modifications to my xorg.conf in dapper to make beryl work in edgy. It's better to give a user a working system he needs to fine-tune, then a broken system he probably has to wipe.

--Sokraates

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

The upgrade experience from dapper->edgy was not good for a lot of people. This spec tries to identify what caused the problems and what we can do to fix them.

Rationale

Currently there are situations that can make the dist-upgrade fail. In the worst case, this means that the system becomes unbootable or that X won't start. We need to make sure that even when errors happen during the upgrade the system is still bootable and X will still work.

Use cases

1. Alice heard that Ubuntu is a great distro. She runs a script in the forums that automatically installs multimedia stuff. When she upgrades later, the upgrader detects this and works around the issue.

2. Bob has installed some python modules manually. When he upgrades, a python package postinst fails because of this. The upgrade goes on and only the affected package is reported as problematic, the rest is installed fine.

Scope

There are various ways to attack the problem. One is AutomaticUpgradeTesting to find errors early and automatically. Next we need to make sure that packages/postinst scripts with errors can not trash the system (to the extent that this is possible). An option to test/roll-back an upgrade would be good as well, but this is technically very challenging. We should add a option to automatically (or semi-automatically) send in problem reports when the upgrade failed, using apport for this if feasible.

Design

The following things in the ReleaseUpgrader needs to be improved:

  1. Upgrade calculation
  2. Recover from third party packages
  3. Error handling during the upgrade (maintainer scripts)
  4. Deal with a changing environment (themes/libraries) during the upgrade
  5. Better SRU support
  6. Misc improvments

Improve the upgrade calculation

We should test a new algorithm for the ReleaseUpgrade calculation. It should work like this:

  1. Upgrade all essential packages
  2. Upgrade all packages in main and set them to protected
  3. Force problem resolution on them. Because no packages in main depend on packages outside main, this set should be self-contained.
  4. Do the same for unsupported packages and make sure that we do not interfere with main

Recover from problems caused by third party packages

Currently we do not offer a fallback if we can't do a dist-upgrade and still keep the {ubuntu,kubuntu,edubuntu,xubuntu}-desktop installed. This can happen when third party packages are installed (e.g. for dapper->edgy when compiz was enabled). Instead of just showing a error message we should offer a mode that will create a high pin on the Ubuntu archive to force downgrades. This should ensure that we get only official packages. Because downgrades are not a good idea in general we will only do this as a last resort and print a big warning to the user.

Implementation note: A high pin seems rather risky because of the risk of downgrades breaking the system even worse.

Fixing the error handling

The error handling for failed maintainer scripts needs to be improved. Currently apt will stop after dpkg reports a error. It should instead report this error to the frontend and keep going with the upgrade until there are only broken packages left. This requires changes in libapt. A new APT::DPkgPM::StopOnErrors will be used to control the behaviour.

Implementation note: This is implemented in the current apt in feisty. Because the relase-upgrader does not support backports right now (See Implementation for more details.), I would like to get this code into edgy-updates. It does not change any behaviour by default and will only act if the option is explicitely set. So the risks is low.

Implementation note: Because we do not have proper backports support this should be done as a patch to apt in edgy-proposed.

Fixing the environment changes problem

The problem of the changes in the environment needs to be attacked from two directions. Firstly we need to make sure that we run with known working environment as far as possible. This involves switching the theme before the upgrade (and switching back after).

We also need to make sure that even if the GUI crashes during the upgrade we can recover better from it. This means that we need to keep all state of the upgrade in a separate process that won't die if the frontend dies. All communication between frontend and backend is done via a socket and a very simple protocol (modeled after the debconf protocol) that can set progress information and the current state. During the upgrade we need to copy the input/output of dpkg so that we can still present all progress in a vte GTK widget (or the equivalent for Qt). We use vte_terminal_set_pty() to interact with the running dpkg. If the GUI goes away the backend can try to restart it and (if that does not work) fallback to a text based UI to ensure that the upgrade is actually fully performed.

Implementation note: Python does not support filedescriptor passing over a PIPE currently. See Implementation for more details. For the next release update-manager contains a "fdsend" module that supports file descriptor passing over sockets.

Better SRU support

Similar to the new requirements from StableReleaseUpdates we need a way to test the ReleaseUpgrader in $dist-proposed. To do this we will add a new switch to update-manager "--proposed" that will make it look for a meta-release-proposed file. Only users who explicitly run update-manager with this switch will get the ReleaseUpgrader from proposed.

Implementation note: Done, added --proposed switch

CDRom upgrade

If a upgrade is run from the CD and the user selects that the network should be used, the upgrader needs to check for a upgrade of itself and download it. This way we can fix potential problems later.

Implementation note: Done, when the user selects that he wants to use the network it will check and download the updated version.

Dealing with derivatives

In order to better support kububuntu, xuubntu and edubuntu we need to support different localizable announcements for them. This will be done bassed on the installed metapackage.

Implementation

The existing update-manager release-upgrader code need to be modified to implement the above requirements.

There are some limitations that we are currently facing. The current release-upgrader is arch=all.

That makes a speperation between frontend and backend impossible because python does not support sending file descriptors over a pipe without a arch=any modules (e.g. fdsend). The https://blueprints.launchpad.net/ubuntu/+spec/dist-upgrader-arch-any spec discusses some ways to make this possible. There is a working prototype for a frontend-backend seperation code is at http://bazaar.launchpad.net/~mvo/update-manager/gui-seperation. For full operation it requires that the backend can send the pty with the attached terminal for dpkg to the frontend. This pty is then attached to the vte terminal widget or the konsole widget. I recently patched the konsole kpart to support setPtyFd().

The new APT::DPkgPM::StopOnErrors option faces the same problem. To use it, libapt needs to be upgraded. This is currently not possible (only in a very hackish way).

Testing

In addition to the auto-testing we need to cover the following test scenarios:

  1. ubuntu-desktop install with incompatible packages. We should try to model the situation with compiz where xserver-xorg couldn't be upgraded.

Analysing the problems dapper->edgy

The following problems were reported in launchpad for the dapper->edgy upgrade:

  1. upgrade could not be calculated (e.g. with unofficial compiz: #58424)
  2. {pre,post}inst failures (e.g. firestarter, python-$foo: #56779, #59932, #64615, #67913, #67996, #68378, #69019, #69104, #59347, #63450, #66347, #67368, #67559, #67696,#68177,#66702, #68765)
  3. X didn't ome up (#67069)
  4. kernel wouldn't boot (#68848) other hardware regressions (#62628)
  5. upgrader crashes because of environment changes (e.g. theme changes: #68027, #69124)
  6. upgrader crashes because of programming errors (#68553)
  7. system behaves differently in a fundamental way after the upgrade (#69145,#69059,#69208,#67803,#64909)
  8. misc problems that makes the upgrade difficult (#69051, #68467, #67090, #59946)

Here is a list with all the identified upgrade bugs so far:
https://launchpad.net/distros/ubuntu/+bugs?field.tag=edgy-upgrade

Here is a list with all the identified upgrade regressions:
https://launchpad.net/distros/ubuntu/+bugs?field.tag=edgy-regression

The data we have so far indicates that most of the problems are caused by failures in maintainer scripts. This means that apt/update-manager needs to better prepared for those and should try to ignore these errors as much as possible.

Another source of problems were the changes in the environment during the upgrade itself. Those are the hardest to protect against. If e.g. the theme engine becomes suddenly broken for a certain amount of time, this can cause crashes in the upgrader during the process.

The current dist-upgrader on the alternate CD is not able to update itself from the net. It should do this when it finds a network connection.

All of the above problems are addressed by this spec.

User comments

Maybe the update process could be simplified (or at least the chances of making a dist-upgrade work could be improved) by first trying to create a default system, upgrading and then trying to restore all the customizations. Of course, the user has to be informed, about what is going on and if he wants an action to be performed. There also should be a recommended action and a warning that the upgrade might fail, should anything else be selected. I imagine something like this:

1) Check for modified config-files If some are found, copy them to a new folder in the user's /home-directory (e.g. /home/user/updatebackup) and restore the defaults. If possible list the differences.

2) Check for 3rd-party/unsupported apps and repositories The repositories are already checked. The apps could be checked against the old ubuntu-version's repositories. If they are not supported (this also includes newer versions), make a list of these and then remove them. If they have generated config-files in other folders then the /home-folders, make a backup.

3) Reboot, if necessary.

4) Upgrade (change repositories, download files, install apps)

5) Try to restore the removed apps Apps should first be checked against the official repositories. Maybe they are included by now. If not, tell and warn the user, restore the removed repositories. If they include the name of the older version, rename them (e.g. "edgy" to "feisty"). Then try to install the apps again. If nothing works, write a list of the apps concerned, place it in the user's /home-folder and tell the user.

6) Try to restore the config-files If now installed apps had config-files, restore them (after backing up the defaults). The user was already warned when installing the apps. If other config-files were changed, make a backup of the new default (best named file.ubuntuname.default, e.g. xorg.conf.feisty.default), warn the user, show him the differences and ask him to confirm each restoration.

7) Make list of all changes Write a list of all the changes to the default settings (apps and config-files) and place them in the user's /home-folder.

8 ) If need be, reboot

I consider the upgrade itself the prority, so keeping all the old apps comes second. Like this the upgrade should work while (hopefully) giving the user back his old system. If some apps or config-files could not be restored, the user at least has a list and can decide, whether he still needs those. E.g. I didn't need all the modifications to my xorg.conf in dapper to make beryl work in edgy. It's better to give a user a working system he needs to fine-tune, then a broken system he probably has to wipe.

--Sokraates


CategorySpec

DistUpgradeProcessImprovements (last edited 2008-08-06 16:36:24 by localhost)