LanguagePackGeneration

Introduction

The purpose of this page is to document the technical details of the language pack generation process.

Ubuntu langpack admins

Language packs are administered by Ubuntu language pack builders.

language-pack- and language-support-

We provide the following packages for each language:

  • language-pack-$languagecode-base: translations for non-gui or legacy X applications

    • language-pack-$languagecode (Depends on language-pack-$languagecode-base): translation updates, Delta package to language-pack-$languagecode-base; contains only translations which have changed since the last Base export from Launchpad (see below).

  • language-support-gnome-$languagecode-base (Depends on language-pack-$languagecode): translations for GTK+ and Gnome based packages

    • language-pack-gnome-$languagecode (Depends on language-pack-gnome-$language-code-base): Delta package to language-pack-gnome-$languagecode-base.

  • language-support-kde-$languagecode-base (Depends on language-pack-$languagecode): translations for KDE based packages

    • language-pack-kde-$languagecode (Depends on language-pack-kde-$language-code-base): Delta package to language-pack-kde-$languagecode-base.

The following packages are optional and only exist for languages where we have additional software to provide input methods, writing aids or additional font packages for that language:

  • language-support-fonts-$languagecode: Meta package, which depends on additional font packages that are useful for the target language. This generally only applies to non-Latin script based languages.

  • language-support-input-$languagecode: Meta package, which depends on input method packages, which are necessary to type the target language on a keyboard.

  • language-support-writing-$languagecode: Meta package, which depends on writing aids for the target language, such as spell checkers and dictionaries.

    Warning /!\ Since Karmic, Finnish voikko packages, which depend on Mozilla products or Openoffice.org, are handled by language-selector instead.

  • language-support-$languagecode: Meta package, which depends on all other language-support-*-$languagecode packages in order to provide a "complete support" for the target language.

Warning /!\ Since Karmic, the following Meta packages are obsolete and have been removed from the archive.

  • language-support-translations-$languagecode: Meta package to depend on additional translation packages for software where we cannot handle translations in Launchpad (e.g. Thunderbird, Openoffice.org, etc.). Since they depend on their corresponding application package to be installed, these packages are now handled by language-selector instead.

  • language-support-extra-$languagecode: Meta package to depend on additional applications, which are of general interest for the target language environment. This has been dropped, since it was considered to pull unnecessary packages for those languages. If users want additional software, they should install it themselves.

Language codes

In general, we follow the ISO-639 list of languages and group all language variants into a single set of language-packs and language-support packages. Since Karmic, we have one exception to this rule: we have split the Chinese (zh) language-pack and language-support packages into two, one for Simplified Chinese (hans) and one for Traditional Chinese (hant). The reason is, that translations usually exist twice for Chinese, once for zh_CN, once for zh_TW, therefor the majority of all translations are duplicates, written in different scripts and sometimes with differing vocabulary. It is considered that putting both, Simplified and Traditional Chinese into the same package would be a waste of download capacity and disk storage for any Chinese user, since they only choose one variation for their translations, and the translations, due to their difference in script and vocabulary, cannot use the fall back mechanism of gettext.

This language-pack split has caused us to add certain exceptions to the code which handles translations and language-packs. This is true for langpack-o-matic, po2xpi and language-selector.

Mapping between locales and Simplified vs. Traditional Chinese

Usually, translations use a locale code (language + country) to indicate whether they are Simplified Chinese or Traditional Chinese translations. The most common of those are zh_CN (= Simplified Chinese) and zh_TW (= Traditional Chinese). Sometimes other regions use a different vocabulary than the mainstream Simplified and Traditional Chinese. Therefor, we can define the following mapping (in the order of fall back):

  • zh-hans (Simplified Chinese) = zh_SG, zh_CN

  • zh-hant (Traditional Chinese) = zh_MO, zh_HK, zh_TW

Translations Export from Launchpad

Translations are exported into translation tarballs once per week for stable releases and twice per week for development releases of Ubuntu (Exports schedule). These tarballs are stored in launchpadlibrarian.

For each Ubuntu release, there is a language-pack administration page:

The first translation export in a newly started release cycle is always a Base (or "full") export, i.e. the tarball contains all translations for all templates in that release where the templates are marked as to be exported into language packs, and is usually done manually on request by the Launchpad Translations team (aka Rosetta team). The setting whether or not the translations will be exported into the translation tarballs is on the template admin pages and is accessible by Ubuntu Translations Coordinator team members.

The subsequent translation exports are per default Delta exports, i.e. the tarballs only contain those translation files, which have been changed since the configured active Base translation tarball.

The language-pack administration website has the following lists and configuration items:

  • A list of all translation tarballs ever generated for the release with links to download them. The list lists both, Base and Delta tarballs.
  • A combobox to select which of these tarballs is the currently used Base tarball. All further Delta tarballs are deltas to this Base tarball.
  • A combobox to select the current Delta tarball. All further generated tarballs are deltas to this Delta tarball.

    Warning /!\ This setting is not to be used for Ubuntu language-packs, since we want to automate as much as possible in the language-pack generation process and avoid this administration overhead.

  • A combobox to select the current proposed update tarball, which is currently being tested.

    Warning /!\ This setting is not to be used for Ubuntu language-packs, since we want to automate as much as possible in the language-pack generation process and avoid this administration overhead.

  • A checkbox to request that the upcoming translation tarball is again a Base (i.e. "full") export. At certain intervals (e.g. shortly before new CD images are being generated) or when the Delta tarballs get too big, new Base tarballs should be generated to reduce the size of the language-pack update packages the users need to download in addition to the base packages.

Layout of the translation tarballs

./rosetta-$release/
 ├ mapping.txt
 ├ timestamp.txt
 ├ $languagecode/
 │ └ LC_MESSAGES/
 │   └ $translation_domain.po
 └ xpi/
   ├ firefox[-3.5]/
   │ └ $languagecode.po
   └ xulrunner[-1.9.1]/
     └ $languagecode.po
  • mapping.txt: contains a space separated list of source package names and the corresponding translation domains. This information is necessary for the langpack-o-matic script to sort the translations into the correct language-packs (language-pack, language-pack-gnome or language-pack-kde).

  • timestamp.txt: contains the date of the tarball generation on Launchpad. It will be used by langpack-o-matic to determine the package version.

  • xpi/: Mozilla products (Firefox, Xulrunner) use the XPI format for their translations. Since Launchpad can only handle translations using the gettext format, the Mozilla XPIs need to be converted back and forth between XPI and gettext format. At import time of the Mozilla upstream translations into Launchpad, they get converted into gettext format. When exported into the translation tarballs, they use a custom crafted XPIPO format, which looks like a normal gettext .po file, but has additional information in the message comments in order for langpack-o-matic to be able to convert them back into the XPI format.

langpack-o-matic

Langpack-o-matic is a service running on a Canonical internal server, which assembles language-packs out of the translation tarballs from Launchpad.

Directory structure

/srv/language-packs.ubuntu.com/
 ├ home/
 ├ $release[-proposed]/
 │ ├ sources-base/
 │ │ ├ language-pack-$languagecode-base/
 │ │ ├ language-pack-gnome-$languagecode-base/
 │ │ └ language-pack-kde-$languagecode-base/
 │ ├ sources-update/
 │ │ ├ language-pack-$languagecode/
 │ │ ├ language-pack-gnome-$languagecode/
 │ │ └ language-pack-kde-$languagecode/
 │ ├ sources-support/
 │ │ ├ language-support-$languagecode/
 │ │ ├ language-support-fonts-$languagecode/
 │ │ ├ language-support-input-$languagecode/
 │ │ ├ language-support-writing-$languagecode/
 │ │ ├ language-support-translations-$languagecode/ (only for Hardy to Jaunty)
 │ │ └ language-support-extra-$languagecode/ (only for Hardy to Jaunty)
 │ └ zh-transitional/ (only for Karmic and Lucid)
 ├ langpack-o-matic/
 └ logs/
  • home/: contains code for fetching static translations, like gnome-docs from launchpadlibrarian

  • $release[-proposed]/: $release for develpoment releases and $release-proposed for stable releases.

  • langpack-o-matic/: houses the langpack-o-matic branch (see below)

  • logs/: import and upload logs go here

Code directory structure

./langpack-o-matic/
 ├ bin/
 ├ check-supdeps-components
 ├ config/
 ├ copy-packages
 ├ cron.daily
 ├ doc/
 ├ extra-files/
 ├ import
 ├ langpacksize
 ├ lib/
 ├ maps/
 ├ merge-tarballs
 ├ mozilla-upstream-locales/
 ├ operator-guide.txt
 ├ packages
 ├ po2xpi/
 ├ skel-*/
 ├ support-depends/
 │ └ $release/
 │   └ $languagecode
 ├ updated-packages
 ├ update-maps
 ├ update-support
 ├ upgrade-notes/
 └ zh-transitional/
  • bin/: reference to po2xpi code call

  • check-supdeps-components: script to check which packages referenced in the support dependencies are not in main

  • config/: configuration files for langpack-o-matic

  • copy-packages: script to produce shell code for copying language-packs from PPA -> -proposed -> -update for stable releases.

  • cron.daily: cron script for unattended language-pack updates

  • doc/: documentation for langpack-o-matic

  • extra-files/: extra files for KDE, which do not depend on a particular translation template. They get included in the language-packs as extra.tar.

  • import: the main script which unpacks the translation tarball and sorts the files into language-packs.

  • langpacksize: script to calculate the grand total size of all packages related to the most common languages ('en', 'es', 'xh', 'pt', 'de', 'fr', 'bn', 'hi', 'ar', 'ru', 'zh', 'ja')

  • lib/: libraries

  • maps/: mapping files

  • merge-tarballs: script to merge Base translation tarballs with a Delta tarball (to produce a "fake" Base tarball)

  • mozilla-upstream-locales/: houses the directories and files from the data/ directory in po2xpi (see below)

  • operator-guide.txt: Manual how to use the scripts in this project to generate language-packs

  • packages: script to process all packages in updated-packages: Build source packages and optionally upload them. Generated source tar.gz/dsc/changes files are deleted after successful upload. Entries from updated-packages are removed after successful processing.

  • po2xpi/: houses the po2xpi branch

  • skel-*/: skeleton directories for all supported packages

  • support-depends/$release/$languagecode: text files, one per language code as used in the language-support-* packages, which contain a list of packages that the language-support-* meta packages should depend on. The format of these text files is: $group:$package, where $group is one of the following:

    • fn = fonts -> language-support-fonts-$languagecode

    • in = input method -> language-support-input-$languagecode

    • wa = writing assistence -> language-support-writing-$languagecode

      • Warning /!\ Finnish voikko packages, which depend on Mozilla products or Openoffice.org, are handled by language-selector from Karmic onwards.

    Warning /!\ The following codes have been used until Jaunty and are depreciated since Karmic. Those dependencies are now handled in language-selector.

    • tr = extra translations -> language-support-translations-$languagecode

    • ex = extra software -> language-selector-extra-$languagecode

  • updated-packages: temporary text file, which gets generated by the import script and contains the path to the updated packages in order to be uploaded.

  • update-maps: script to update the mapping files in the maps/ directory.

  • update-support: script to (re-)generate language-support-* packages, based on the changes in support-depends/$release/$languagecode. Takes two arguments: $release $languagecode.

  • upgrade-notes/: not used

  • zh-transitional/: contains deb package skeleton to create transitional packages to update from older language-{pack|support}[-*]-zh[-base] packages to the new zh-hans / zh-hant packages (only needed for Karmic and Lucid).

po2xpi (Mozilla translation handling)

Background knowledge

  1. XPI layout vs. upstream VCS The Layout of the XPIs (and also how Firefox accepts them) is radically different than what upstream uses it its VCS (Mercurial).
    1. XPI format:

      XPI files are nothing else than ZIP archives. You can uncompress them with unzip $langcode.xpi in a temporary directory. The archive structure looks like this:

        ./
         ├ chrome/
         │ └ $langcode.jar
         ├ chrome.manifest
         └ install.rdf

      The $langcode.jar file is another ZIP archive containing the real translations:

        ./locale/
         └ $component/
           │ ├ .dtd
           │ └ .properties
           └ $subcomponent/
             ├ .dtd
             └ .properties

      The chrome.manifest file states the paths to the translations for the different Firefox components within the .jar file, the install.rdf file is a XML file which contains some metadata:

      <?xml version="1.0"?>
      <!--
      
      -->
      
      <RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
           xmlns:em="http://www.mozilla.org/2004/em-rdf#">
        <Description about="urn:mozilla:install-manifest"
                     em:id="langpack-bs@xulrunner-1.9.ubuntu.com"
                     em:name="Xulrunner (bs)"
                     em:version="1.9"
                     em:type="8"
                     em:creator="http://translations.launchpad.net">
          <em:contributor></em:contributor>
      
          <em:targetApplication>
            <Description>
              <em:id>toolkit@mozilla.org</em:id>
              <em:minVersion>1.9</em:minVersion>
              <em:maxVersion>1.9.0.*</em:maxVersion>
            </Description>
          </em:targetApplication>
        </Description>
      </RDF>

      Warning /!\ Please notice the version numbers within the install.rdf file! They decide whether or not the translations are compatible with the current Firefox or Xulrunner version. The above example is compatible to the Xulrunner version we ship in Hardy, Intrepid and Jaunty (i.e. Xulrunner 1.9), and won't work with the version we ship in Karmic and Lucid currently (i.e. Xulrunner 1.9.1).

    2. Upstream VCS (Mercurial) layout:

  • In the upstream VCS, each language is its own branch. The list of language codes is here: http://hg.mozilla.org/l10n-central/ To check out all of them, do something like this:

    for l in af ar as be bg bn-BD bn-IN ca cs cy da de el en-GB en-ZA eo es-AR es-CL es-ES es-MX et eu fa fi fr fy-NL ga-IE gl gu-IN he hi-IN hr hu hy-AM id is it ja ka kk kn ko ku lt lv mk ml mn mr ms nb-NO ne-NP nl nn-NO nr nso oc or pa-IN pl pt-BR pt-PT rm ro ru rw si sk sl sq sr ss st sv-SE ta ta-LK te th tn tr ts uk ve vi xh zh-CN zh-TW zu; do hg clone http://hg.mozilla.org/l10n-central/$l; done
    To keep them up-to-date do something like this:
    for l in `ls`; do cd ${l}; hg pull; hg update; cd ..; done

  • If a .dtd or .properties file in the XPI needs a patch, the following needs to be kept in mind:

    • the location of the files in the XPI and in upstream VCS are not the same. If you keep the path the patch file, it can't be applied to the files in upstream VCS. So, don't bother to submit such patches to upstream as bug reports.
    • upstream won't fix XPI files wich have been released, unless they break the application. Failure to import in Launchpad is not a reason for upstream to fix these issues.
    • it would be best to locate the buggy file in the VCS tree yourself (use find), patch it yourself and submit the diff to upstream either via the bug tracker, or commit it if you have commit rights.

    • the XML parser in Rosetta is more restrictive than the Mozilla applications. In the past I have found the following issues, which won't trigger an error in Firefox itself, but will fail the import into Rosetta:
      • BOM (U+FEFF) marks at the beginning of a file. These are inserted by Windows text editors and do no harm.
      • missing newline at the end of the file. Also doesn't do any harm.
      • missing escape characters or newlines in tags. These are actual errors and can cause the translation to not be displayed correctly
      • characters, which are escaped as Unicode codepoints, like \u010d, where the last letter of the notion has been omitted. This is also a bug which causes the translation to not be rendered correctly.

      • newlines in the beginning of a file. Doesn't do any harm, but our parser expects a XML file to begin with <! and not with newlines.

  1. Checking upstream Mozilla translations for errors before they get released
    • FIXME: Ask jtv to review this section.
    Yes, we can do that! That is, if you have the Launchpad source code unpacked on your machine. To set up the environment to do such checks, you'll need the following:
    1. The Launchpad source code (https://dev.launchpad.net/Getting)

    2. The following manual steps:
        $ cd ~/launchpad/lp-branches/
        $ bzr branch lp:~jtv/launchpad/validate-translations-file
        $ cd validate-translations-file
        $ ./utilities/link-external-sourcecode ../trunk

    Finished. Now you can use the same XML parser as in Launchpad to check if the translations will pass the import script (in this example, all the upstream translation branches are under ~/build/mozilla/):

     $ cd ~/launchpad/lp-branches/validate-translations-file
     $ find ~/build/mozilla/ -name \*.dtd | xargs ./scripts/rosetta/validate-translations-file.py &> ~/mozilla_dtd_errors.log

    Now, you have the report in ~/mozilla_dtd_errors.log and can fix the upstream translations in their VCS tree. Submit the diff of your changes to the Mozilla bugtracker, one for each language you modified.

    (i) The XML parser Launchpad uses is not the best one out there. If you want to help improve or replace the parser, the source code is in ~/launchpad/lp-branches/trunk/sourcecode/old_xmlplus/.

  2. Mozilla translations from upstream get imported into Launchpad:
    1. History:

      While upstream ships Firefox as a single application, Debian and Ubuntu have split the source into the Xulrunner and Firefox packages. Xulrunner provides the base and Firefox is a XUL application, which gets executed in a XUL environment. This allows other applications, which are, like the Firefox extensions, written for XUL, to be executed as standalone programs in a XUL environment and don't depend on the whole Firefox browser.

      (i) Starting from Firefox 3.6, upstream will not use XUL any more for their browser extensions. As a consequense we won't split the Firefox code any more, but only ship one package, like upstream does.

      (i) When Firefox 3.6 gets released, we will update Firefox in all actively supported Ubuntu releases (Hardy to Lucid) to Firefox 3.6.

    2. Templates:

      What is the .pot file in applications using gettext, is the en-US.xpi package for Mozilla applications. Normally, when building Mozilla applications from source, the build process extracts the English strings from the source and puts them into a en-US XPI directory structure. This directory structure gets installed together with the compiled program.

      We have patched the Firefox and Xulrunner packages in order to extract that directory structure and convert it into a XPI package.

      This en-US.xpi pakage gets pushed to Rosetta the same way the stripped .pot and .po files get pushed from other applications. Rosetta treats the en-US.xpi as a template for the corresponding source package.

      (i) Although the en-US.xpi packages generated by the Firefox and Xulrunner packages carry the same name, they only contain the necessary strings for the corresponding component.

      (i) Starting from Firefox 3.6 the Firefox and Xulrunner templates in Launchpad will get merged into one. The corresponding translations will also get merged. This is a manual process and will take some time.

    3. Translations: Upstream Mozilla ships a number of translations as XPI packages. Since upstream does not separate Firefox from Xulrunner, their translation XPI packages contain the translations for both components. Every time upstream releases a new Firefox version, we need to update our language-packs to include their latest XPI translation packages. Occasionally the format changes, so older XPI translation packages are not compatible with the latest Firefox version, even for minor upgrades. For that we need to manually pull the upstream translations from Mozilla and import them into Launchpad. This is currently handled by Arne Goetje. Occasionally Launchpad refuses to import some upstream XPIs, because of buggy files inside the XPI packages. In those cases the XPIs need to be patched and uploaded again, until Launchpad accepts them. When uploading upstream XPIs to Launchpad, the import script will take care of splitting the translations into the firefox and xulrunner templates.

Building XPIs

When traversing the ./rosetta-$release/xpi/ directory in the unpacked translation tarball, the .po files will get fed to the po2xpi script.

1. Directory structure in po2xpi/data/:

./data/
 ├ $ubuntu_release_version/
 │ ├ blacklist.txt (obsolete)
 │ ├ $template (firefox[-3.5] and xulrunner[-1.9.1])
 │ │ └ xpi/ -> ../../common_xpi[-3.5]/ 
 │ ├ merge-hints.txt
 │ └ whitelist.txt
 ├ common_xpi/
 │ └ .xpi
 └ common_xpi-3.5/
   └ .xpi
  • $ubuntu_release_version/: 8.04, 8.10, 9.04, 9.10 or 10.04. One directory tree for each release.

  • blacklist.txt: Formerly used to blacklist incomplete languages in Launchpad. Now obsoleted by whitelist.txt

  • $template/: The template name(s) in Launchpad for this release. One directory for firefox, one for xulrunner

  • merge-hints.txt: Since Ubuntu 9.10 (Karmic) Chinese language-packs have been split into Simplified Chinese (zh-hans) and Traditional Chinese (zh-hant). This file maps the language codes of the translations to the corresponding language-packs.

  • whitelist.txt: Lists the language codes for the translations which are complete enough and of decent quality to be shipped in language-packs in addition to the upstream translations.

  • common_xpi[-3.5]/: Contains all upstream Mozilla XPIs, the same ones which have been imported into Launchpad. They are used to build the XPI directory structures from the XPIPO files from Launchpad.

For each language, po2xpi will check if the language has upstream XPI files in data/common_xpi[-3.5]/ or is whitelisted in data/$ubuntu_release_version/whitelist.txt.

Due to the way Mozilla uses translations, it is required that the translations are 100% completed in Launchpad, before they can be used. In contrast to gettext, where the English string is used as an identifier and is used as a fall back in case the translation string is empty, Mozilla uses a variable name as identifier. Mozilla does not have a fall back mechanism. That's why every message identifier must have a value.

The po2xpi script takes care that if a message identifier does not have a value in the .po files, which have been exported from Launchpad, the corresponding value from en-US.xpi gets filled in.

Warning /!\ If we would import those results into Launchpad again, like we did for the Firefox-3.0 to Firefox-3.5 transition and will do again when Firefox-3.6 comes out, the templates would appear to be 100% translated for those languages, although the missing translations have actually just be filled in from en-US.xpi and therefor should be considered to be untranslated.

The import script

The import script in the langpack-o-matic source tree assambles the language-pack source packages. If not present, it creates package skeletons in ../$release[-proposed]/sources-base/ and ../$release[-proposed]/sources-update/ by copying the skel-base/ and skel-update/ directories respectively.

Then, it traverses the tarball, pipes every .po file through msgfmt to check for errors and copies them into their respective language-pack structures.

If there are files in the xpi/ subdirectory, the import script will call po2xpi, which will build mozilla translation structures and tar them up into a mozilla.tar tarball. This tarball gets copied into the language-packs.

If there are files in the extra-files/ directory, they will get tar'ed up into extra.tar tarballs and also copied into the language-packs.

Since Karmic, the import script will check for static translations (e.g. gnome-docs translations) on launchpadlibrarian and copy them into the language-packs as well.

Upload

  • Development release:
    • dchroot -d /srv/language-packs.ubuntu.com/langpack-o-matic/packages upload
      This will convert all .po files into .mo files using msgfmt and then upload the packages into the archive, where they get built.
  • Stable releases:
    • dchroot -d /srv/language-packs.ubuntu.com/langpack-o-matic/packages upload ppa
      This will upload the packages into the Langpack PPA instead.

Langpack PPA

Recipes

These recipes are intended for common actions carried out by the the UbuntuTranslationsCoordinators team

Updating language packs for the stable release

  • (./) Check that there is no language pack update currently building for the release you are interested in. If a language pack is being built in the language pack PPA, wait until it has finished and use that one, and if necessary contact a language pack builder to stop the automatic build. Testing should focus on -proposed

  • (./) Ask an archive admin (usually pitti on #ubuntu-devel) to copy the language packs from the PPA to -proposed

    • The ETA for copying the sources once started is half an hour, and perhaps a day for all the binaries to get built.
  • (./) Prepare the Translations/LanguagePackUpdatesQA for testing

  • (./) Once the packages have been built and uploaded to -proposed, announce it on the ubuntu-translators mailing list

  • (./) After the announcement, use Translations/LanguagePackUpdatesQA for coordinating the testing

  • (./) Once testing has finished, point pitti to the language packs tested in the wiki page which should be uploaded to -updates

(i) If you are a language pack admin you should consult the langpack-o-matic operator guide for more detailed technical information on how to perform some of these steps.


CategoryTranslations

Translations/TranslationLifecycle/LanguagePackGeneration (last edited 2010-03-16 08:01:00 by www)