FullFilesystemSanityGutsy

Differences between revisions 4 and 5
Revision 4 as of 2007-06-04 17:14:25
Size: 5773
Editor: chiark
Comment: xref
Revision 5 as of 2007-06-04 17:56:35
Size: 6367
Editor: dyn233046
Comment: Comment about daemons
Deletions are marked like this. Additions are marked like this.
Line 81: Line 81:
== BoF and Discussion ==

["Warbo"]: As well as applications I think this should eventually include daemons which may run at bootup too, especially ones which download things (MLDonkey, BOINC, etc. if they don't behave sanely already), since they may fill up any freed space immediately (P2P transfers can easily work until zero bytes are left, then pause until more space becomes available, then eat that up too, for example). I know this spec is just for default applications for now, but it is important to remember that many daemons start up at boot and could cause some headaches. :)

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

The desktop system should handle disk full situations gracefully.

Use Case

Michael is an Ubuntu user, and downloads huge numbers of files and fills his disk. This should only prevent him saving further files to the disk, and may cause some applications that he starts to fail. It should not stop the general desktop from operating normally.

If Michael reboots, the computer should boot up normally and allow him to login. It should warn him that he has no free disk space. The general desktop should still operate normally; however applications may prevent him from saving files, or may not function.

If he uses the file manager to free up disk space, he should be able to immediately resume using his computer normally; saving files from applications, and using those that cannot operate without writing to disk, and he should not need to reboot or log in again or restart any application.

Rationale

Current disk full behaviour in Ubuntu is very poor. BootLoginWithFullFilesystem (also targeted for gutsy) will fix the worst problems and at least allow the user to recover, but we would like the system not to misbehave and not to

Approach and Scope

This is an ambituous goal. Without a fully automatic functional test of every component in the system, and comprehensive agreement and support from all upstreams, complete success will be unattainable and future regressions will occur and be difficult to detect. However, we expect to be able to find and fix the most important cases and expect them to regress only slowly:

We will concentrate on core software (defined here as software which is installed by default and which runs between power-on and the user's desktop readiness including the file manager). We hope that filesystem full misbehaviour we discover and fix will be regarded as bugs by upstream and regress relatively rarely.

(Here "misbehaviour" and "bug" refer to violations of the intent of the Use Case, specified above.)

We propose to create a test environment which will allow us to monitor the software under test during startup and use. This will identify the specific programs and subsystems which attempted to write to disk but failed to do so; we will then apply ad-hoc techniques (including debugging, ad-hoc testing and source code inspection) to ensure that each such identified case does not constitute or trigger a bug.

Other applications

As time permits, the same tests can be performed on specific other applications of interest. OpenOffice.org is an obvious candidate.

Firefox (and other programs in the Mozilla suite) are not expected to be tractable: the Mozilla profile management system is known to regularly write to disk and not to handle out of disk space conditions well. The profile management system itself is unsuitable for handling disk full in a coherent way, and upstream do not seem to have the resources or priorities which would allow these problems to stay fixed for any length of time.

Assumptions

If we find and fix a disk full bug in the software we really care about, it is unlikely to regress so quickly that the manual work done pursuant to this spec quickly becomes irrelevant.

Design

The current plan is to use an ad-hoc kernel modification which logs all writes which return ENOSPC in a special area of memory which can then be read out later with a suitable tool (probably using /dev/mem directly).

It may turn out that this is impractical (eg, the volume of data is too large), in which case other approaches will be considered.

For each failed write discovered in this way, it will be determined in an ad-hoc manner whether or not the failure has broken the software in question or whether it starts working normally again after the failure. These determinations may involve:

  • Testing the programs' functionality after the disk full condition has ceased
  • Inspection of the source
  • Ad-hoc tests and per-program test harnesses
  • Argument based on the program's function and observed behaviour during the test

as appears to be appropriate in each case If in fact it turns out that there is a bug in the program, this bug will be fixed if practical.

A record will be kept of each failed writes which is discovered, and what the resolution was (ie, why it was concluded that there is no problem, or the bug number(s) of fixed bugs, etc) and these reports will be tabulated in a suitable format on a wiki page or similar

Release Note

  • Ubuntu's core programs were tested in a disk full situation.
    • The following programs' behaviour was improved: details TBD

Test/Demo Plan

This specification mainly consists of testing. By its nature, such complex and ad-hoc tests are difficult to reproduce.

For bugs that are reported as having been fixed (see above), it will be possible to demonstrate that the fix has taken as follows:

  • Fill up the disk (as root)
  • Reboot and log in
  • Run the application in question
  • Make the disk no longer full
  • Start the application again if it failed to start before
  • Observe that the application works normally (if relevant, observe that the specific functionality which was stated to be broken after disk full is now working properly)

Outstanding Issues

iwj needs some reassurance and pointers from kernel developers.

BoF and Discussion

["Warbo"]: As well as applications I think this should eventually include daemons which may run at bootup too, especially ones which download things (MLDonkey, BOINC, etc. if they don't behave sanely already), since they may fill up any freed space immediately (P2P transfers can easily work until zero bytes are left, then pause until more space becomes available, then eat that up too, for example). I know this spec is just for default applications for now, but it is important to remember that many daemons start up at boot and could cause some headaches. Smile :)


CategorySpec

FullFilesystemSanityGutsy (last edited 2008-08-06 16:23:24 by localhost)