SystemCleanUpTool

Summary

This specification discusses implementing a computer house keeping tool. This tool will offer a user several ways to keep his system from getting too full, cluttered and confusing to use over time. This tool will attempt to require as little intervention as possible by the user. This should result in a running system always kept tidy, easy and enjoyable to use.

Rationale

In due course, a once fresh system can become cluttered with all sorts of residual content, such as too many installed kernels, log files taking up precious system disk space, contents of the Trash folder, packaging and browser cache, and various large and small user files such as audio visual content, aging documents, old chat logs and more. These tend to accumulate over time, confusing the user, or even eventually leading to the computer becoming unusable, forcing users to actively put effort into cleaning it up. There should be a solution to warn users beforehand, and suggest and perform purge operations for common leftover cruft.

Use cases

  • George has been receiving lots of audio visual content recently, from relatives overseas. He has been burning these to DVDs, and put each file in the garbage bin after writing it to DVD. After a while, the free space in his home filesystem has dropped to the minimum allowed, but he does not notice. He is also downloading a big ISO image of the edgy desktop CD for testing. The system clean up tool detects that there is not enough free space, and pops up a desktop notification bubble, suggesting that there is large amount of data in .Trash that can be purged to make more room. George acknowledges; space is freed and the download is saved.
  • High Priority: John is a Dapper user. A recent kernel upgrade has been released to cater for a security bug. After finishing to install the new kernel, and before rebooting to use it, the system informs him that he has some left over kernel packages that could be removed and asks him if he wants to do so. If he acknowledges, all of the kernels that are no longer needed are removed, leaving him with a clean /boot with only the currently running kernel and the newly installed one. After he reboots to use the new kernel, he could always revert back to the previous one that was kept, if for some reason the new kernel is faulty.

  • Brian is a Launchpad developer. He has several Zope instances installed for developing Launchpad, and runs the PostgreSQL database server. These applications produce a lot of log file data, especially when used for heavy development and experimentation; in this case they are usually set to maximum verbosity for debugging purposes. After a week of heavy work, his free space on his root filesystem reaches the low minimum. The system clean up tool detects the problem, and before Brian is running into operational problems, it offers to delete some old logs and some files in /tmp that occupy most of the currently used space. He confirms the removal, and resumes his work. The wizard takes care to do the removal in the background, first hunting for the biggest files, in order to keep the system operational and make more space in the shortest possible time. The interaction with Brian is done through the desktop notification infrastructure.

Scope

  • Kernel left overs:
    1. Due to security upgrades.
    2. General bug fixes and version upgrades.
    3. Make sure never to touch a kernel package created by the user.
      • -- How can we know if it's such a package?
  • Residual packaging related content:
    1. Conffiles.
    2. Init scripts.
  • Packaging system leftovers:
    1. Contents of /var/cache/apt/archives.
    2. Orphaned files.
  • General left over content:
    1. Web / File Browser caches. (e.g. ~/.thumbnails)
    2. Aged audiovisual content.
    3. Aged and/or large log files.
    4. Large ISO files.
    5. Content of /tmp
    6. Content of /var/log

Design

Aging: The time period that had passed between the last time a file has been accessed, and the recorded time reference point. The reference point can be the current date and time reported by the system, or an eariler time in order to enable more accurate aging calculation when a system hasn't been used over a long period of time.

1.Dealing with general left over content:

A weighing algorithm needs to be developed to enable the tool to identify targets of opportunity. The following factors need to be taken in consideration in producing the weight result per file:

  1. A relative time reference point should be used for measuring the aging time of all files. This is in order to overcome the "vacation problem" where a user hasn't been using his system for a long period of time, and by using the current time when he first login after his vacation, the weighing would get distorted to include files that the user accessed just before he went on vacation. This means that we need to measure the actual usage time. To do so, we will record the last access time of files that are accessed every login (for example, gdm files) and use this_login-1 's time stamp as our new reference point.
  2. How much time passed since last access time of the file.
  3. MIME / file type.
  4. Size of file
  5. Capacity of the holding volume or the user set quota.
  6. In order to not affect system performance too obtrusively, consideration should be made to have the aging measurement code to the updatedb periodical process. It already affects system performance to a great deal when it runs, but is still supported in Ubuntu. Either as a stand alone approach or combined with the previous one, we should take care to keep the calculation and scanning process held, until the system becomes idle and build it such that it does its processing in incremental chunks. E.g., progress each time the system is idle a bit more until covering all files / folders in the designated file system for clean up. We should also make sure to use the fastest system call to receive the file data we need for aging and oppurtunity measurement. If that can be only done in C, then we'd rather code it in C and have python bindings to access it.

2.Package left over house keeping:

  1. Offer to remove orphaned files that no longer belong to any of the packages installed on the system. Certain system configuration files created during installation that are not to be removed should be also automtically detected and added to the blacklist. We will achieve this by gathering a list of those files, and feeding it to the shipped blacklist.
  2. Offer to remove packages that are rarely or not used anymore, and consume substantial amount of disk space.

3.Unused left over kernels:

  1. When a new kernel is being installed by the high level packaging tools (apt, syanptic) , the installing packaging tool will call the system clean up tool with a command line that will instruct it to deal only with kernels clean up on that specific invocation. We should try to make the callback intelligent and be able to detect weather it can use X GUI, or a text UI interface to cater for people using this tool on system that do not have X/GNOME installed.
  2. In order to not make the clean up tool mandatory on one's system, any call made to start it should first check if the executable file to be called exists. Any calling tools should gracefully ignore its absence and continue as usual.
  3. If installed, then, the system clean tool fires up and marks the packages of:
    1. The currently running kernel. This kernel is used as a reference point (is already used and running) , as we will be marking for removal older kernels that were installed previously excluding:
      1. Manually installed manually (e.g. using dpkg -i ..) ,
      2. The reference point kernel package (e.g. the current running kernel package version)
      3. The newly installed kernel package (e.g. the new kernel package just downloaded and installed).
    2. If the current running kernel was infact installed manually, then this logic is still valid.
  4. Checking if the user has any other kernels installed other then those detected for keeping in the previous items.
  5. If he does not, do nothing.
  6. If he does have, gather a list of all those kernel packages.
  7. Pop up a desktop notification to the user: "You have unused kernels installed on the system. Would you like to purge them?".
  8. If the user confirms, then present a dialog displaying the list of kernels that was constructed in steps 2-4 in a window that will contain a columned table, each row representing a kernel package:
    • Row 1: Kernel Version. (e.g. "2.6.15-25-686")
    • Row 2: Kernel Package Name. (e.g "linux-image-2.6.15-25-686").
    • Row 3: A check box indicating if this kernel package is to be removed, or left installed. (checked->keep, unchecked->remove)

  9. All items in the list are by default unchecked (meaning that the tool will remove all kernels on the remove-list)
  10. The user can choose to keep any of the kernels on the list by checking the checkbox next to the name/version.
  11. Pressing "Commit" will calculate the kernels packages to be removed from the check list, remove the kernel packages and notify the user of success or failure if any issues were encountered during removal.
  12. Left over kernel removals should be also proposed when a user is running out of sufficient free space in his /boot fs or in his / while /boot is part of it. (This should be the last priority if /boot and / are on the same fs; we should first check for other bigger files that can be removed.)

4.Modes of operation:

  1. When being executed as an unprivileged user, the tool should touch only the running user's home directory.
  2. When being executed as a privileged user (sudoed), the tool should care about system wide cleanup (kernel, system folders, etc.).

5.Catching disk space events:

  1. The tool should use gnome-volume-manager to catch for low disk space events. If g-v-m doesn't allow the flexiability to set notifications to be dispatched acoording to user's quota, or custom set minimum free space - we should implement our own daemon for monitoring the amount of free space on a given file system.

6. User Interface:

  1. After gathering all required information, and building the opportunity list, the tool will present the different found targets of oppurtunity, together with the amount of free space that will be reclaimed when they are removed. We should take care to describe those in a for the user understandable way(tm) , rather then just show cluttered lists of files. So this dialog could list items like:
    • 31MB of historical log files, last accessed: NEVER.
    • 200MB of audio visual content, last accessed: 2 Years ago
    • 46MB of old un-used kernels
    • 700MB of old downloaded package files
    • etc..
  2. Each item shall be assisted by a a drop-down each with identical predefined actions next to it, those actions will be:
    • "Leave on system" - Will just ignore the item for this run of the tool. It will show up again in next time the tool is run.
    • "Always leave" - Will add the item to the whitelist, so it will never come up as a removal target again.
    • "Remove" - Will remove the files related to this and identified under this item, actually freeing space.
  3. Per each top level item where an item list is applicable, we should also provide the functionality for the user to do item by item selection to tell if he wants to either keep, remove, or never bother about an item again. We should probably consider using checkboxes spread horizontally for this list, as using the drop down over a large number of files could be annoying UI wise.

Implementation

  • The weight of a file for the opportunity calculation will be weight = file size + aging factor.

  • Kernel clean will use packaging interface to remove the old kernels, a bash or python script will be used to record the current running kernel and the one just newly installed.
  • PyGTK and Glade will be used for the UI development.
  • Desktop notification framework will be used to deliver the first interaction with user prior to launching the clean up application (we will have to replace / patch the current low disk space notification available from gnome-vfs)
  • Removal of conffiles, cron scripts and init scripts should be addressed using the "residual config removal" functionality available through python-apt.

Kubuntu

  • What about KDE and Kubuntu?
    • Since a while I've contacted the KleanSweep] author and we're working together to deliver a unified back end, that will be assisted by two KDE/GNOME front ends making sure KDE and GNOME users have a consistent GUI for that kind of tasks. --SivanGreen

Comments

  • Joey Stanford - It would be very nice if this also included, even at a rudimentary level, a home directory dot file cleanup wizard.

    • e.g. A user installs the Holotz Castle game. They decide they don't like it and remove it. The /home/user/.holotz-castle directory still exists and is not removed.
      • this may seem trivial but I, as example, copy my home directory over during upgrades. I've had the same directory since breezy (now on Edgy) and it's a royal mess of unused dot files.
    • A better way to do this might be to force all packages to include dot files in the postrm script as well as keeping "not installed (residual config)" entry in synaptic around while dot files exist.
    • An alternative way would be to incorporate the code inside kleansweep (a KDE tool) into the system cleanup tool.

  • Pádraig Brady
    • An alternative to kleansweep is fslint (a pygtk tool).

  • PaulKishimoto

    • Removing old backup files with names list "filename.ext~" would also be helpful. I find these files in various places in my /home/ directory after the original files have been removed or moved.'
      • Wouldn't that be more of a job for Nautilus? --JeremyVisser

    • Packages identified as orhphaned (by deborphan or a simpler method) could be among those suggested for removal.
    • It would be nice if tools like hubackup and sbackup could require this tool be run first to clear cruft out of /home/ that would otherwise end up in backups.
  • Jean-Michel Frouin
  • How about letting the user insert a removable drive (USB or CD-R) to move files on to. --SamTygier

  • We are running some thin client environments where the users have quota's on their home directories. When their home directories get full, by a hard limit, they won't be able to login, to clean it up. When the size reaches the soft limit, they won't get any warning at the moment. I have build a script that the users will get to see, when their soft limit is reached, on login. The message says that their home directory is full. A nice way to handle this, is to be able to attach a script on a disk full or a quota reached event (soft limit and hard limit apart). --MichielEghuizen


CategorySpec

SystemCleanUpTool (last edited 2008-08-06 16:16:56 by localhost)