LucidSoftwareCenterRepositoryBasedIndexfiles

Summary

To improve the presentation of packages from unofficial archives, any package archive -- whether it be official, Canonical partner, PPA, or anything else -- should be able to contain a human-friendly index, describing the icons, categories, subcategories, and search keywords for its packages. There should be a way of generating this index automatically (just as Packages.gz is generated), and Soyuz should do this whenever any Launchpad archive package changes.

Release Note

TBD

Rationale

The current way to provide additional meta-data for software-center is to provide the desktop files of the applications along with additional meta-data like what package they belong too.

app-install-data is something we want to get rid of:

  • it's slow and manual to update
  • the time we want to update this package is when the archive is frozen
  • there are exceptions and bugs, so software shows up in Ubuntu Software Center that isn't installable
  • it's difficult to update the data
  • it works only for Main and Universe, not for PPAs or other archives

User stories

  • Maree maintains 16 packages in a PPA. She wants these packages to show up, with proper names, icons, departments etc, in the Ubuntu Software Center for anyone who adds her PPA.
  • Brian has packaged a new version of Adobe Reader and published it to the Canonical partner repository. It has a different icon from the previous version.
  • Ben uses apt-get to install everything. He isn't interested in downloading icons etc for applications he is never going to install.

Design

The current data we have in the app-install-data package should be replaced by multiple indexfile that is put alongside the Packages.gz file.

For software-center are interessted in the following data:

  • appname
  • packagename
  • Comment: friendly summary for the application (we may have multi apps per pkg) - multi language
  • keywords - multi language
  • popcon / rating
  • iconname
  • Categories
  • mime-type

That will prodice a "Applications.gz" indexfile that can contains multiple apps for a given package and that may contains multiple identical named apps (with generic names like "Terminal"). In order to get a unique representation the tuple (appname, pkgname) is useful.

The icons should be probably published in a seperate file:

  • icons
    • uuencode?
    • gkticoncache ?
    • size?
    • format?

But some research is needed in order to find a good format. We need to compare uuencoded text vs gtkiconcache or simple tarfile.

For the command-not-found-data that we want to use for this as well the data we want is:

  • packagename -> binaries

    • PROBLEM diverts etc, real world problem during extraction of this data

File format

  • Opaque to apt, so doesn't really matter
  • RFC-822
  • Localization of categories and keywords (in the Translations-$lang file?)

Moving the localization for the Categories/Comments into a separate file is tricky for the first version of this specification. This can be done later, we just need to be careful about backwards compatibility if we decide to have the translations inline (i.e. ensure that for lucid we still produce files with the inline translations). Alternatively we can move the translations into the langpacks. The disadvantage is that the translations will only be available after the langpack got installed (or updated).

Example for a first version with inline translations

Package: gnome-utils
Popcon: 17939
Section: main
Icon: baobab
Name: Disk Usage Analyzer
Name-de: Festplatten Überprüfer 
Comment: Check folder sizes and available disk space
Comment-de: Überprüfen des verfügbaren Platzes
Exec: baobab
Categories: GTK;GNOME;Utility;

Package: gnome-utils
Popcon: 17939
Section: main
Name: Search for Files...
...

Soyuz process

  • Strip the data out of the package when building it, store it somewhere
  • Publish the metadata file along with the Package.gz files

The current meta-data is extracted by a script that crawls over the entire ubuntu/lucid archive and looks into each deb for dekstop files, extracts them and add meta-data. There is also a table of fixup actions. Some of the common issues are that the desktop file is not in the package we are interessted in. E.g. wesnoth-data vs. wesnoth or emacs-common contains the icon for emacs22. Some of this can be fixed in the packages by the maintainer by overwriting the X-AppInstall-Package= field (or by adding a explicit X-AppInstall-Ignore=true). For some we provide manual overrides. To migrate this to soyuz, both desktop file and icon needs to be extracted after the build process, annotated with meta-data and stored somewhere.

The translations for a package are currently scattered around in launchpad. Parts of the translatiosn are in the app-install-data template (e.g. the Comment field from the desktop file). Parts are in the ddtp-ubuntu template (the long description). Both are not sorted per package and not presented for the translator alongside the package but in a seperate project. This should be fixed so that there those are displayed as separate templates in a source package (so that the translator needs to look only in one place for a full translation of a source package).

apt

The design for libapt is described in https://blueprints.launchpad.net/ubuntu/+spec/foundations-lucid-index-based-downloads-client

Implementation

  • something needs to go once the build is finished and extract the .desktop file and icons, then export that metadata
    • setup another script once build is finished
  • need to create a standard for what kind of metadata can be supplied within the package
    • debian/something.desktop with X-App-Install tags or the regular desktop file with the X-AppInstall tags

    • debian/something.command-not-found for command-not-found hints
  • Things this metadata might contain (not all of this needs to be done for lucid):
    • icons
    • not screenshots/movies/sounds (handled elsewhere as not downloaded up-front in software center)
    • mimetype
    • hardware/software requirements (opengl etc)
    • whether it's available in my language (may change after package upload since translation packages are different)
      • similar problem to ratings updates
    • keywords and keyword translations

There are two kinds of information that we want to publish:

  • we have information that can be extracted from a upload (icon)
  • and changes that happen after a upload (rating, translation status).

For the later we need a different approach because once the archive is frozen we traditionally do not update the indexfiles anymore.

Launchpad team requirements

Needs doing by beta 1 or not in Lucid at all.

The Launchpad team needs from the Ubuntu Software Center team:

  • File format for the metadata
  • the code to inspect Debian packages and extract the metadata

Open questions

What should be done with popular package that are not available for a given architecture (e.g. skype on arm). Currently we show them and say its not available for the given hardware.

What to do with app-install-data-partner? Currently we do not enable the partner channel by default but we do ship desktop files so that its trivial to enable the partner repository from software-center. If we replace the app-install-data-partner package with a repository based approach then partner apps will not show up anymore or we need to enable the repository by default.

What about the relatively large blacklist for desktop files we have right now? There are e.g. a bunch of gpe applications that we do not show by default in software-center currently as a application because they are optimized for a different kind of device. We do blacklist other desktop files for similar reasons. Should all this be undone (with the impact to potentially clutter the applications list quite a bit)?

If we decide to keep a blacklist to avoid cluttering the applications directory we need a process and a team that is responsible for reviewing the meta-data and blacklist inappropriate applications data.

Should we add signatures to this meta-data to the Release file? The only risk I can see is a man-in-the-middle attack that injects poisoned image data that exploits a (hypothetical) flaw in the image rendering in gtk/webkit. Otherwise the data seems to be not very security sensitive. Adding signatures means the Release file gets a lot bigger for everyone (something we want to avoid with the design of multiple IndexFiles).

Disscussion notes

Roadmap

  • Maybe start with non-localized data for PPAs in Lucid, then localized in a future version?
  • Maybe still have Ubuntu Software Center using appnstall-data instead for Main and Universe in Lucid
    • iteratively survey the differences between app-install-data and the metadata Soyuz is producing
      • fix bugs in the packages and/or in Soyuz

actions:

  • - client: write scripts to extract the needed data to LP - client: provide examples what the file

Issues

  • Often a .desktop file is in a separate package from the package you're actually interested in
    • e.g. wesnoth-data vs. wesnoth
    • e.g. emacs-common contains the icon for emacs22
    • maybe this should be fixed in the packages themselves
  • Debian may or may not be interested this
    • e.g. keeping packages and debtags in sync
  • If bulk of metadata is not in Packages file then we can be nicer to Launchpad
  • Filling the Librarian with icons is necessary but annoying
    • garbage-collect them when done?
  • Current list view description translations should be migrated from app-install-data (currently ca. 4000 strings)
    • [long description translations come from DDTP data in the archive, but list view's short descriptions come from .desktop file's Comment field which is translated in app-install-data's Rosetta template]

Two parts:

  • 1) LP export 2) apt support for downloading this

LP: • what to export?

  • ∘ per arch, per pocket (main, universe)

    ∘ one pkg -> multiple apps, app names are not uniq ∘ desktop data parts

    • ‣ appname ‣ packagename ‣ Comment (friendly summary) - multi language ‣ popcon / rating ‣ keywords ‣ iconname ‣ Categories ‣ mime-type
    ∘ command not found data
    • ‣ per arch, per pocket (main, universe)

      ‣ packagename -> binaries ‣ PROBLEM diverts etc, real world problem

    ∘ icons ?
    • ‣ uuencode? ‣ gkticoncache ? ‣ size? • format?
    ∘ tagfile/rfc822 just like Packages

• how many files?

  • ∘ command-not-found ∘ software-center ∘ icons

* we need to hook into the build process to extract the desktop file,

  • command not found file data, icons etc (problem is diverts)

Initial Implementation

  • Put icons into repository
    • options: big tarball or gtk-icon-cache
  • something needs to go once the build is finished and extract the .desktop file and icons, then export that metadata
    • setup another script once build is finished
  • need to create a standard for what kind of metadata can be supplied within the package
    • debian/control modifications
    • debian/something.desktop with X-App-Install tags
    • debian/something.something for command-not-found hints
  • Things this metadata might contain:
    • icons
    • package descriptions (also translated)
    • restart required
    • not screenshots/movies/sounds (handled elsewhere as not downloaded up-front in software center)
    • mimetype
    • hardware/software requirements (opengl etc)
    • whether it's available in my language (may change after package upload since translation packages are different)
      • similar problem to ratings updates
    • keywords and keyword translations
  • we have information that can be extracted from a upload (icon)
  • and changes that happen after a upload (rating)

Launchpad team requirements

Needs doing by beta 1, or not in Lucid at all.

The Launchpad team needs from the Ubuntu Software Center team:

  • File format for the metadata
  • TODO - software center team: The code to inspect Debian packages and extract the metadata


CategorySpec

FoundationsTeam/Specs/LucidSoftwareCenterRepositoryBasedIndexfiles (last edited 2009-12-02 09:13:35 by p5B09F81D)