PackageLicenseTracking

Summary

The OEM team, and others, need to know what licenses each package is under, so that they can summarize this for clients. The debian/copyright file in the package contains this information, but it is currently not in a machine-parseable format. Debian is working on changing the format, and this spec aims at adopting the proposed new format, and helping Debian achieve that. This spec also includes the development of tools to parse the new format.

Release Note

Rationale

  • It is essential for open source distributions to be able to machine generate well formatted license information for all packages in an image (or a subset) so that all interested parties can have a degree of confidence that proper license vetting has occurred and that legal liabilities and etc. are known.
  • Development of this tool will help Ubuntu lead the adoption of the Debian standard for structured copyright files.
  • It will also greatly assist the OEM team as it continues to develop and release projects

Use Cases

  • Execute something (the parser) in a running image:
  • Input:
    • None: in which case all installed packages are evaluated
    • List of package names for evaluation
  • Output: two well-formatted outputs for evaluated packages:
    • For packages that conform to new structured copyright file, the package name, package version and license info (clearly expressed using well-known/enumerated license types with exception text as needed)
    • List of packages that don't conform to the structured copyright file

Assumptions

Design

Implementation

UI Changes

Code Changes

Migration

Test/Demo Plan

Unresolved issues

BoF agenda and discussion

MichaelVogt: The content of the debian/copyright file is availalbe on changelogs.ubuntu.com (e.g. http://changelogs.ubuntu.com/changelogs/pool/main/a/apt/apt_0.7.19ubuntu1/copyright). Not machine readable (or only to a certain extend) but at least not each package needs to be downloaded fully. There is also http://wiki.debian.org/Proposals/CopyrightFormat with a proposal to make debian/copyright machine readable.

UDS Lucid Notes

  • this has slipped two cycles now, and needs to be escalated
    • Steve will have one week a month to dedicate to the spec
  • Kyle wants a mechanical import of the license data of packages from Debian (I don't think that's quite what he wants -- he wants a check against a deb *package* -- see below)
  • Format-Specification should be included in the examples, so that consumers aren't doing the wrong thing in their own debian/copyright (with the current rev number (e.g. dynamically if possible))
    • lintian checks (one to use spec at all; a different, more immediately acceptable one for updating to final spec)
  • Kyle would like the parser to run against a deb package (not in an image against an installed package) with outputs as either: this package does not have a structured copyright or it does and here is what it is

Prototype Tool Info

copyrightformat-library.tar.gz
This is a very quick and rough prototype, and does not understand more than a couple of existing copyright files, but it might be useful for playing around. The point of the tool is not to be practical today, but to let interested people play around with it to provide feedback on what kinds of tools are needed, and what the Python library it uses should provide.

To use the tool:

  • /copyright-tool paths/to/copyright/files

For example:

  • /copyright-tool copyright.sample
  • /copyright-tool $(cat files.txt)

(The sample file is included. files.txt lists the files on my machine that sort-of work as input.)

The output of the tool may be wrong. It doesn't even try to implement everything in DEP5 as it is, just enough to make it possible to play with this stuff on a computer rather than in one's head or on mailing lists.

Background: In the Ubuntu Foundations team we're under the impression that the OEM team needs to keep track of what licenses are on the CD (or some other collection of software), and is currently doing that manually with a spreadsheet. DEP5 (http://dep.debian.net/deps/dep5/) aims to get copyright files be machine-readable, rather than free-form text, and this tool reads files in that format.

Steve Langasek is in charge of getting DEP5 approved in Debian, eventually, but he's been busy with other things this cycle. There's also some bad Debian internal politics going on, which has slowed things down also.

I assume that the vision for the future is to start converting copyright files in Ubuntu to the DEP5 format. This conversion will start with Ubuntu-specific packages, and that it need not be complete to be useful. Presumably we would expand this to packages in Ubuntu main that are imported from Debian, and feed back those changes to Debian.


CategorySpec

FoundationsTeam/Specs/PackageLicenseTracking (last edited 2009-11-20 16:25:19 by 63)