ContinuousBackups

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

Ubuntu needs a robust and easy to use backup-solution that allows the user to restore his system even in case of catastrophic failure. The backup-tool needs to be:

  • Automatic (once enabled). It must not nag to the user. We do not want a Clippy that asks the user "Would you like to back up your data?"
  • Near Continuous, so it backs up everything very frequently. We need to be able to restore anything, at almost any point in time.
  • Versioning, so we can restore older versions of the data, instead of just the latest version.
  • Transparent, so the user does not even know it's running.
  • System-wide, so it backs up everything. Ubuntu is a multi-user OS, we need to have a backup-tool that backs up every users data. Of course some users could be excluded from the backup if so desired.
  • Flexible, so we can backup to external drives or network-shares. Manual archive-backups could be done on optical media.
  • Aware of file renames and moves. For example when a user downloads photos from a digital camera, those new files will quickly get backed up by a continuous backup system. But the next day the user may want to organize those photos by renaming them and putting them into sub-directories. It is not good to again copy the renamed and moved files because the image files have not changed in this example.

Rationale

The most important thing in users computer is the data. Whether it's his collection of music and video, archived emails that go back years, or that unfinished novel he has been writing, it's all absolutely vital to the user. OS'es and apps can be re-installed, hardware can be replaced, but that data might be irreplaceable.

Use cases

  • Tim is "the IT guy" in a company, who needs an easy-to-use backup-solution for his users, with centralized backup-server.
  • John is a home-user who keeps his digital-photos and music in his computer. His computer is shared by his wife Anne, who has her own data in her account. Both users want their data to be backed up.

Scope

There are already several backup-tools available for Linux. We need to pick one, and spread some "Ubuntu pixie-dust" on it.

Design

The tool needs to be "configure once, run all the time". That is, the user sets it up once, and after that, it just works. It must not nag user to "backup your data!", it needs to do that automatically, without prompting the user. The user can select where to backup (network-share, external drive etc.), and he can specify how much space the backup can use on the target-media, as well as which users should be backed up (if the machine is shared by several people, each and every one would not have to configure their own backup-scheme). The tool could also be used for making "archive-backups" on optical media and such.

The obvious inspiration of this tool is the Time Machine that will be released in Mac OS X "Leopard".

Implementation

* A good tool to look at for this might be rdiff-backup. Its a great backup utility that already handles revisions and time records.

  • Pybackpack is a GUI for rdiff-backup that is well-integrated into Ubuntu and is user friendly.

  • rdiff-backup has been described as being close to a version control tool, which is probably ideal for a continuous backup solution.

    • I always felt thats What 'Time Machine' more readily resembled then an on demand backup, but my $0.02. -KevinKubasik

* A really simple way to get at least some of this functionality would be to mount /home and /etc as versioned (copy on write) filesystems

  • Perphaps the most mature copy-on-write file system is Ext3Cow

  • Wayback or CopyFS) are other choices. Neither of these are really mature. Also, some form of UI would need to be written for this. (Of couse, a more awesome solution would be for Gnome Storage to not be dead Sad :( ) - ChrisHalseRogers

* This feature does need to work in KDE as well.

Code

Data preservation and migration

Unresolved issues

BoF agenda and discussion

Perhaps an open source clone of this product would be of interest: Linux CDP Server (commercial product)

Methods for Computing Deltas in Backup Applications

While there are hundreds of different backup applications all of them use one of several known methods for computing Deltas.

Deltas are simply defined as the data that has changed since the last backup run. Defining it any further than that depends on how the backup application computes deltas. A delta could be a raw disk block, a variables length portion of a file or even a complete file depending on the method. near-Continuous Delta Method (CDP)

The most efficient method for computing Deltas is the near-Continuous or CDP method. R1Soft happens to be the only example of a near-Continuous Deltas method for both Windows and Linux platforms. The near-Continuous method works by using a Disk Volume Device Driver. The device driver is situated between the file system (e.g. NTFS) and the Disk Volume (e.g. Logical Disk Volume 1).

By locating a device driver between the file system and raw Disk Volume the application is able to identify changed Disk Blocks in real-time without any performance impact. Its really quite a simple concept. In Windows this kind of Device Driver is called an Upper Volume Filter Driver. R1Soft's Linux CDP implementation also uses a device driver. Linux does not have an official filter driver API though the form and function is very similar to the Windows CDP driver.

"Why spend hours reading data from the Disk just to compute Deltas when you can watch them happen for free?", says David Wartell, R1Soft Founder.

With the near-Continuous method of Delta computation a fixed length block size is used that in practice usually corresponds to the file system block size. Typically this fixed block size is 4 KB but can vary in different environments or implementations. As writes to the Disk are observed the block number that was changed is recorded in a specialized in-memory data structure.

R1Soft Linux Agents versions 1.0 employ a bitmap for this purpose where a region in memory uses 1 bit to describe the state of a disk block. Commonly bitmaps are used in image file formats. With a 4 KB block size there are 26,214,400 Disk Blocks per 100 GBof Disk Volume size. That corresponds to 26,214,400 bits or 3,276,800 bytes (3.125 MB) of memory to track all Deltas made to 100 GB of raw Disk capacity.

R1Soft 2.0 Windows Agents and later use a new proprietary data structure for tracking deltas developed by R1Soft. This new data structure is based on a Tree so that in the average case only 200 or 300 KB of memory is used to track all Deltas per 100 GB of raw Disk capacity. R1Soft is making this new more efficient data structure available to its Linux CDP technology with the release of Continuous Data Protection Server 3.0.


CategorySpec

ContinuousBackups (last edited 2009-04-13 03:08:48 by c-76-122-50-68)