Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

This specification describes a functional enhancement to apt-get which attempt to locate package files on the local network via mDNS before resorting to download from the Internet.

Release Note

apt-get now attempts to download package files from other computers on your network, when available. This occurs with the help of the new apt-share service. This mechanism uses no authentication and only allows read access to package files through apt-share by name, not by path. Those responsible for network security baselines should familiarize themselves with this service and specifically address it in their policies, whether allowing it or mandating its removal.

Rationale

Any computer network should have more than one computer; if multiple computers on the same segment each run Ubuntu, they can reduce the load on external access to the Internet and on the mirrors by sharing downloaded packages between each other.

Use Cases

Several use cases exist.

Assumptions

Design

The system should publish information about packages to the local network. The design should address both package list consistency and package file availability.

We can address package list consistency by publishing the latest package list for a repository. The retrieving system must verify the validity of the Package file.

Package file availability should come specifically from the package file name. The cache should behave in such a way as to maximize the availability of current packages on the network.

Implementation

The system will use mDNS/Avahi to publish information about packages to the local network.

When a system performs a package list update, it can advertise what it just updated to the local subnet and when, allowing other systems to sync their package list automatically. Note that a system should only obtain its package list in this manner when it sees a newer package list on the network; when a system wants to update the package list specifically (update-manager, apt-get update), it should still update as normal.

The retrieving system must verify the Package file against the Release file, and the Release file against Release.gpg. The publishing system must publish all of these.

The publishing system must identify the Package file for a repository by distribution and category (i.e. gutsy main).

Package file availability should come specifically from the package file name; specifically reject any file name with a '/' in it, do not try to work around it.

apt-get maintains a package cache in /var/cache/apt/packages already, with various options to prune this directory when it grows too large or contains old files. The cache should exhibit the below behavior to maximize usefulness in this environment. Note more useful cache behavior may exist.

When a system needs a list of packages, it should locate all nodes with that package on the network and all nodes that need that package. These nodes should then negotiate an agreement to avoid multiple nodes downloading the same package at the same time. Overall, each node should have as few connections to manage as possible; by distributing the actual packages each node fetches, the availability of packages saturates more quickly (i.e. more systems have a scarce package by the time the list of needed packages starts running thin).

For packages not on the local network, only one node should use the Internet to fetch a package at once, and then it should fetch the most needed package. It can transfer this package to another node during the download, on the fly. This type of propagation allows at least two nodes to have the complete package if not more by the time the first has finished downloading it, avoiding a network shock where all systems start downloading that package from the system grabbing it from the Internet.

For packages on the local network, each system should get a list of other systems which currently have the package. They should then attempt to distribute access between these systems, downloading from the least-utilized wherever possible. On switched networks, this keeps as few nodes accessing any given node at once as possible; switches physically isolate network circuits, so having 1:1 relationships between local network connections and mutually uploading/downloading packages effectively multiplies total network bandwidth by 2n for n hosts.

UI Changes

Synaptic needs to expose an option to disable this.

Code Changes

apt needs to use improved cache management to optimize this.

apt needs to have all this stuff added to it.

The Ubuntu developers need to create a new daemon to publish this service through Avahi and work with the apt cache.

Migration

Ubuntu should enable this feature by default.

Test/Demo Plan

Seeding the apt cache with a full update on two computers, half on each, and then updating the package list on one from a local repo would trigger this. Removing the connection to the Internet would force apt to fail if this mechanism failed to work properly.

Outstanding Issues

BoF agenda and discussion

Comments


CategorySpec

AptAvahi (last edited 2010-07-31 08:44:03 by lpzg-4dbdc79c)