NoFsckAtBoot

FIXME

  • Discuss this with ext2/ext3 developers?
  • Only implement running fsck in cron?
  • Problem: e2fsck -n on a live, active filesystem results in spurious error messages, making this approach completely useless. LVM snapshots would probably work better, but that's not a generic solution.

Summary

Running fsck on clean filesystems at boot can take a lot of time, and yet rarely finds problems. Running it from cron instead makes things better.

Release Note

Ubuntu no longer runs the file system integrity checker at boot on filesystems that have been cleanly unmounted. Instead, the checker is run, in read-only mode, while the system is running, with any errors reported the system administrator via the system log file and e-mail. This change removes the need to wait for filesystem checks at boot, and catches errors on servers that rarely boot.

Rationale

Currently Ubuntu systems are set up by default to run fsck on bootup on clean filesystems about once a month, or after about 20 mounts. Fsck almost never finds any problems, but running it takes from several minutes to over an hour, depending on the size of the filesystem and speed of the disks. This is irritating, especially since it tends to happen at inopportune moments, so that the user can't use their system.

Boot-time fsck also does not find filesystem corruption on servers that run for months or years without boots.

The solution is to not run fsck on clean filesystems at boot, and instead run fsck from cron.

Use Cases

  • LeClerc's filesystem is clean, but has not been checked for a year. He boots, fsck is not run immediately, but is run by cron later, when the system is idle.

  • Fannys' filesystem is corrupted. Her fsck is run at boot.

Design

Some filesystems' fsck has an -n option, allowing it to be safely run in read-only mode while the filesystem is mounted. Not all of them do, and for those fsck needs to be used at boot as it is currently used. For ext2, and ext3, at least, our most commonly used filesystems, -n does work.

The general design, then, is to disable fsck on cleanly unmounted ext2 and ext3 filesystems at boot time, by having the installer set the max-mount-count setting to 0 and interval-between-checks to 0. Additionally, a cron job is added that runs "e2fsck -n" on all ext2 and ext3 filesystems at suitable intervals. Since the machine may not be up when the cron job would run, it should run fairly frequently, but not actually run the check if the filesystems have been checked recently enough.

Reporting is done by the cron job writing all results to syslog, and any errors to stdout/stderr so that cron will automatically e-mail them to root. If an error is written out, instructions for running fsck without -n will be included, or at least a pointer to them.

If at boot it is realized that cron is not installed, or won't be started, the read-only checking is started in the background as if by cron. In this case, errors are sent via mail(1) to root.

Implementation

In the installer: run "tune2fs -c 0 -i 0" on any ext2/ext3 filesystems it creates during the installation.

A new program/script will be written to run "e2fsck -n" on all ext2/ext3 filesystems. For want of a better name, the script will be called rofsck.

/etc/init.d/checkfs.sh: in addition to the current way of running fsck, check for cron being installed and started, and start rofsck if not.

Test/Demo Plan

  • Use qemu-img, mke2fs, and tune3fs to create filesystem images for various test cases. Additionally, find or write a tool to mangle filesystems so that fsck has something to fix.
  • Run rofsck on each filesystem image and verify that it runs fsck only when it should, and that it captures and reports fsck output the right way.

Comments

SamTygier: https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/209416 has a couple of methods to cause fixable corruption to ext3 filesystems.


CategorySpec

NoFsckAtBoot (last edited 2008-08-06 16:26:37 by localhost)