SimpleBackupSolution

Revision 16 as of 2005-07-24 02:07:58

Clear message

SimpleBackupSolution

Status

Introduction

A Simple Backup Solution for Ubuntu should comprise of snapshot functionality – to restore to the last known good configuration of a single file, directory or a directory tree.

Rationale

User expects from their favorite OS the possibility to perform some sane and simple backups of their system.

Scope and Use Cases

  • James Troup's laptop has been stolen, he wants his data back.
  • HELP! my laptop is on fire! I must restore my music files somewhere else!
  • Hey somebody on #ubuntu told me to run sudo rm / -rf and now nothing works.
  • Matt accidentaly deleted his quaterly report and wants to get it back as new as possible
  • Pete deleted some very important piece of text from the product description document on Tuesday, he wants to get a copy of this document from Monday's backup to copy that text into the current version

Implementation Plan

  • admin can configure a backup solution for the system
    • - Recommended config or Custom config, Size requirements estimation - Includes: directory selection ( defaults to /etc, /opt, /usr/local, /var )

      - Excludes: checkboxes for common dirs and file types + textbox for regexes + maxfilesize (default to exclude all media files (avi,mp3,wav,ogg,mpg) + all files >100 Mb, /var/backup) - Format: format selection and compression level selector (default: individual gzip level 9) - Destination: local dir, remote ssh session (default: /var/backup/)

      - Time: frequency (hourly, dayly, ...), type (full, incr), how long to store, timepoint (default: weekly full backup & daily incremental)

  • admin can see how much disk space the backup will take
  • package list is backuped
  • if admin allows this, users backup preferences override admins instructions (user overrides postproned to later versions)
  • admin can write a snapshot of backup to CDs/DVDs with a GUI frontend that uses dar in backend
    • - there is option to write a full backup for any date or an increment from a given date

Note: incremental here means 'backup all files that have been changed (mtime) since the last backup'. diffs are not involved, as that would create a lot of complication for remote backup and lots of disc usage for the local ones.

Timeline

  • June 22 -- final specifications, data structures, Glade GUIs sumbitted for usability review
  • July 29 -- Initial version of backup worker and command line restore utility
  • August 5 -- Inital version of config and restore GUIs with UI modified according to the results of usability review
  • August 6-10 -- testing and bugfixing

Functional modules

  • backup backend worker
  • command line restore utility
  • administrator backup configuration capplet
  • user backup configuration override capplet (postproned to later versions)
  • user/admin GUI restore utility
  • admin tool to write a backup snapshot to CDs/DVDs and read it from such media

Backed worker was quite complicated and required creation of a block scheme and a definition of an internal data structure: backup definition tree.

Command line restore utility was quite trivial - no further detalisation. Note: It should be written so that it is usable as a module by GUI restore utility.

Administrator and user backup configuration caplets give me a bit of trouble with the creation of all neccessery UI elements in Glade. "Backup now" button is here (with a dialog). There is much functional similarty similarity between these caplets and that lets me belive that a merge is possible here. Glade mockup for user caplet is allmost done (three dialogs missing).

The GUI restore utility is mostly trivial, except for the need to give an ability to select multiple files/directories where these directories might or might not exist in the current directory tree. It seams that use of a generic tree structure will be required here.

Format of the backup target directory

/var/backup - base directory

/var/backup/.tree.cache - cache of the file structure of all backups (updated by cron jobs after all regular backups)

/var/backup/20050723.172354.aigarhome.ful/ - a full backup snapshot from 23rd July 2005

/var/backup/20050723.172354.aigarhome.ful/ver - backup directory version information (default - 1). mtime of this file is the definitive start time of the backup.

/var/backup/20050723.172354.aigarhome.ful/packages - dump of 'dpkg --get-selections'

/var/backup/20050723.172354.aigarhome.ful/tree - file structure of this backup (includes name, size, permision, ctime, mtime and atime information)

/var/backup/20050723.172354.aigarhome.ful/excludes - structure describing all excluded files (paths, regexes, maxsize)

/var/backup/20050723.172354.aigarhome.ful/files/home/aigarius/soc/simple_backup_spec.txt.gz - recreation of the filesystem structure for the backuped files. all files are .gz'ipped individually. permissions, ctime, mtime and atime are maintained. symlinks are copied as such. hardlinks are ignored.

/var/backup/20050723.172354.aigarhome.ful/files.tar - if target filesystem cann't maintain UNIX file information, .tar of the files/ subdirectory is to be used. This is a security risk as all users can get any file from this, so this must be used *only* when filesystem does not enforce proper permission control anyway.

/var/backup/20050723.172354.aigarhome.ful/stats - statistics

/var/backup/20050724.070502.aigarhome.inc/ - an incremental snapshot from 24th July 2005

/var/backup/20050724.070502.aigarhome.inc/ver - same as above

/var/backup/20050724.070502.aigarhome.inc/base - symlink to the previose (base) directory. can be *.ful/ or *.inc/

/var/backup/20050724.070502.aigarhome.inc/packages.patch - 'diff -u' of the packages list

/var/backup/20050724.070502.aigarhome.inc/tree.patch - 'diff -u' of the file list. describes added, removed and changed files.

/var/backup/20050724.070502.aigarhome.inc/excludes - same as above

/var/backup/20050724.070502.aigarhome.inc/files/* - same as above

/var/backup/20050724.070502.aigarhome.inc/files.tar - same as above

/var/backup/20050724.070502.aigarhome.inc/stats - same as above

Sequence of a backup run

  • Check that no another backup process ir running - quit if concurrence found
  • Load basic configuration
  • Check writability of the target directory - quit if failed
  • Load the configuration of this backup run
  • Create target directory for this snapshot, create '.../ver' file
  • If incremental, find a base directory and create '.../base' symlink
  • Initiate backup tree structure - root element is ["/", 0, 0, {}]. First element is the path, second shows whether to backup this branch (1) or no (0) or if it is a internal node (-1), third element is the base timestamp for the incremental backup or 0 for full backup, fourth element is list of excluding regexes
  • Create the backup tree structure from the config - whenever a subdirectory is included or excluded, parent of this directory is recursevely expanded, so that the tree would contain all significant nodes. all newly created nodes inherit properties from the parent. List of paths must be sorted alphabetically before processing so that including of a parent does not override previoselly defined exclusion of the child.
  • Mark backup target directory as excluded from the tree (of it is a local directory)
  • Write ".../excludes"
  • Backup package selections
  • Test if target directory can enforce permissions (make $p_ok=1) or not (make $p_ok=0
  • Foreach leaf of the backup tree structure (leafs have second fiels set to 0 or 1)
    • - Call do_backup( $target, $path, $increment_timestamp, @excludes, $p_ok ). - do_backup will return a list of all backuped files or a patchlist. Append that to ".../tree" or ".../tree.patch"
  • Write ".../stats"

do_backup()

  • get list of all files and directories in the $path (that have mtime bigger then $increment_timestamp)
  • if $increment_timestamp then get list of all removed files and directories in the $path since $increment_timestamp else make this list empty
  • apply @excludes to remove unneeded files and directories from both lists
  • if !$p_ok then open the .tar file
  • For each directory in the list
    • - create the directory in the $target (or .tar)
  • For each file in the list
    • - compress the file - write it to the $target directory (or .tar) - check, if error, the remove the file from the file list and add to errors list
  • if !$p_ok then close .tar file
  • return list of backuped files and directories or a patchlist

Notes:

  • + in later versions, when user initiated backups will be possible, permission checking will be needed in do_backup(), but still the backup will have to be run from root user, so that target directory can be properly created. + the whole backup run should use only one connection to remote target servers. connection should be reset as needed.

Data Preservation and Migration

None

Packages Affected

None

User Interface Requirements

Outstanding Issues

  • Config applet is lacking a GUI to construct regularity rules for the cron jobs
  • dar cmd-line is complex.
  • During the BOF session, discussed system-level snapshoting as being the only way that could allow rollback and keep a consistent system, meaning at the whole filesystem is in sync with the contents of /etc. Looked at dm-snapshot target. If this can be made to work without wasting excessive amounts of space (eg, too many extra partitions) then this would be the ultimate foo, meaning that somebody could upgrade to Hoary, click 'rollback' and have their system come back exactly as it was 24hours early. This would require alot of investigate but maybe something is worth following up as a unique selling point.

UDU BOF Agenda

UDU Pre-Work