ImproveDebianImportSpecification

Revision 4 as of 2009-06-16 08:45:30

Clear message

Summary

We will make the branches we have of Debian source packages have some link with the packaging branch used by the Debian maintainer, where they declare one in a "Vcs-*" header. This will enable better collaboration, and also lead us to solve many of the issues that we will have to confront when we do a similar thing for upstream VCSs.

Release Note

None, as this is not a user-visible change.

Rationale

We now have branches of Debian source packages available on Launchpad, but are just imports of the packages. The Debian maintainer may use a Vcs for their packaging, and if they do these branches will be unrelated as far as bzr is concerned. This limits the general ability of Ubuntu and Debian developers to collaborate using the Vcs that will be most natural to them. The Ubuntu developer could use the Debian maintainer's Vcs, but it would be nice if they didn't have to "special case", and whichever branch they chose to use they could collaborate with the Debian maintainer, as well as with any other Ubuntu developer who is using the Ubuntu branch.

Doing this will allow the Ubuntu developer to merge in the Debian maintainer's branch as they like, and also provide patches that apply directly to tip, and that are mergeable by them where the combination of Vcs support that.

User stories

  • It's close to release time and Fabio wishes to merge the changes from the Debian maintainers VCS in to the Ubuntu package to upload before release. With the new system it is a simple bzr merge to do this.

  • Julie is making a change in Ubuntu and goes to forward the changes to Debian. With a few straightforward commands she produces and mails the changes such that they apply directly to the tip of the Debian maintainer's branch, even though they have packaged a new upstream version there.

Assumptions

  • The Debian Vcs-* headers are not present for every package, and even when they are not there they may be incorrect.
  • The contents of the Debian Vcs-* branch may not be the same as the corresponding source package.
  • Not every upload to Debian may be represented in the Debian Vcs-* branch.

Design

The Debian maintainer is free to declare the Vcs that they use in the debian/control file, and so we can use this to automatically make the links. These fields aren't ubiquitous though, and even when they are there they may be incorrect, therefore the information will be used on a best effort basis. We shall continue importing the Debian packages in the same way as before, so we can still rely on there being an up-to-date branch that contains the source that is actually in Debian. Where possible we will improve these branches by adding extra revision parents such that bzr sees it as a shared history, and will do the right thing when across the branches.

The basic idea is that when the importer finds a new Debian upload to import it can add an extra parent which is the relevant revision in the Debian maintainer's Vcs-* branch. The difficulty comes from the uncertainties about which revision that is, and so various heuristics will have to be employed to work this out.

./debian/-only vs. full-source

The Debian Vcs-* branch may be a ./debian/-only layout, containing just the ./debian/ directory, or a full source branch, the same as you get from dpkg-source -x. (It may in fact be anything, but we will detect these two and not try to act on any other layout that we find). All the Ubuntu-created branches are the latter, and so where the Debian maintainer's branch differs there must be adjustments made.

Consistency across packages is one of the aims for the Ubuntu branches, so that won't change, instead we will aim to set up the branches such that they share history in the ./debian/ part.

bzr has a join command, which is designed to do this, so we will employ that logic when that case is detected.

However, there is a difficulty with this, that the file ids will differ between the Debian maintainer's branch and the Ubuntu branch. Therefore the join will lead to conflicts. However, we know that the files are "the same", and so we don't want these conflicts. We can either rewrite the Ubuntu branches to use file ids from the Debian maintainers branch, or change bzr to handle this better. We don't want to rewrite, as it's not necessarily going to be a one-time thing. You could get in to a situation where you needed to rewrite the branches every time a file was added to the debian directory. In addition, the Debian maintainer may change VCS, in which case you would have three sets of file ids to reconcile, which couldn't be done with rewriting.

Finding the revision to use

Given a source package there needs to be a way to find the revision in the Vcs-* branch that most resembles it. There are various things that can help with that:

  • Tags: many Debian maintainers will tag their uploads, so we can use these to find the revisions. There may not be one scheme used across the board though.
  • Changelog: this should be a very strong indicator so that e.g. reverts of the code don't lead to the wrong revision being chosen.
  • Timestamps: the timestamps on the revisions closer to the timestamp in the changelog should be favoured.
  • Tree content: the closer the content the more likely it is to match.
  • Revision history: revisions already merged shouldn't be considered.

A heuristic based on all of these inputs can try and find the revision of interest. This revision can simply then be added as an extra parent to the revision being committed.

This will mean that the first time an upload is imported where a corresponding revision is found you will get a merge that introduces a second root, so the log looks a bit odd, and things like annotation will tend to point at the package change, rather than the change in the Debian maintainer's Vcs. However, this will diminish over time, and allows us to bootstrap more easily, and doesn't give bzr wrong information about the history, merely incomplete information.

Timing of imports

When searching for the revision to use it may be that the revision you need isn't public yet. To reduce the likelyhood of this we will ensure that we are looking at the most up-to-date revision history we have. Also, the method used for spotting new uploads to Debian does have some latency, increasing the dwell period in which the Debian maintainer can push their changes.

This dwell period won't always suffice however, and so there will be cases where we don't add a revision parent that we could. We don't want to wait indefinitely though, as that would sacrifice the usability of these branches for correctness.

It would be possible to watch for the desired revision for a while after importing, and if it is found add a new revision that doesn't change the tree, but adds the new parent. This would lead to a slightly uglier revision history, but would perhaps be more useful in a few cases. The effort that it would take to do this would be quite large though, and given that the number of times it would make a difference isn't known we won't implement this unless it is found that it would be of significant benefit.

Implementation

./debian/-only vs. full-source

./debian only branches will be detected by having one of the following things:

  • ./debian at the root of the branch with no other entries in the root (perhaps with a whitelist for .gitignore and the like)
  • Common debian files in the root of the branch, as some tools support versioning the contents of ./debian/ and not ./debian/. As there are no files that can be guaranteed to exist in a package maintainers branch (e.g. the VCS build tool could generate the changelog) then heuristics will be used (control/control.in, rules, changelog, copyright, etc.), and no ./debian/ directory.

If they are found then a join will be performed to make it in to a full source branch. TODO: research how join works, and specify what we join in to.

In order to deal with the "parallel imports" problem of file-ids, we have two options:

  • Take the file ids from the Debian maintainer's branch. This means that file-ids will change from Ubuntu's point of view. This has two problems, it makes the view of history limited, and it makes merging over one of these events harder.
  • Extend bzr to handle "join of file ids". The bzr developers already have a suggestion for doing this, but it will take an unknown amount of effort at this point. As it is changing one of the fundamental things of bzr it could be a very large change, possibly taking months.

As the second solution has much better results we will start by discussing with bzr developers possible solutions, and attempt to determine the amount of effort that would be required to get this feature. Once we have better information we can determine the best course of action.

Finding the revision to use

For this we will implement a class that can compare a set of bzr trees to another, single, bzr tree, and determine which is the best match, if any. It will be based on the heuristics from the Design section.

We have a large corpus that we can test with, so we should be able to do a good job of tuning the heuristics to give satisfactory results.

Code imports

We will want to have all of the Debian maintainers' branches available to compare with. We could either do this using bzr foreign branch support, or using Launchpad's vcs-imports and mirroring. Using the latter means that they are easily available for everyone, so that you can merge from them as you like without having to install extra bzr plugins, so that would be preferable.

We will therefore have a job that watches for additions to and changes in Vcs-* fields and sets up the vcs-imports/mirrors on to launchpad (probably semi-automatically).

We will need a well-known name for these imports on Launchpad, or at least a consistent mapping from VCS-* URI to Launchpad name.

Also, when we are importing we want to ensure that we have access to the latest revisions on the branch. Therefore we would want a way to request Launchpad perform a mirror, and preferably get notification when the mirror is complete.

Also, so that parallel imports are not a problem, it is very desirable for us to have Launchpad vcs-imports done with a system that makes it possible to do parallel imports. This is currently the case for git, but not for SVN, but the Launchpad developers have plans to change that for SVN.

Much of this will require co-ordination with the Launchpad developers and discussion with them over the best way to implement it.

UI Changes

None

Code Changes

Much of the changes here will be made to bzr-builddeb to allow it to add the extra parents as it is importing.

There will also have to be changes to the driver scripts used by the importer in order to find the correct branches and tell bzr-builddeb about them.

Migration

These changes shouldn't require any migration plan for existing users, they will just find that the branches become more useful after a bzr pull one day.

Test/Demo Plan

There will be unit-testing and manual testing of the bzr-builddeb changes as they are made.

For integration testing we can run some imports with the new system in parallel for a while to catch any significant problems.

Unresolved issues

The major unresolved issue is how to handle the parallel imports problem.

BoF agenda and discussion

  • File bugs in Debian where information is known to be wrong.
  • Document the way to set up Vcs such that it will work flawlessly, to allow those interested to ensure that they can get the benefit.


CategorySpec