translation-statistics

Summary

OEMs would love to have information on which languages are well-supported. As well as software-level support (information being collected elsewhere), this involves translation levels of visible parts of the desktop. Current translation statistics of main do not do a good job of representing this, because they include things like gcc error messages which are largely invisible to most users.

It would be useful to have instrumentation so that we can ask users to install a package and thereby gather information on which messages they actually see, and which ones are translated; this information can then be sent back and incorporated into the ordering presented to translators by Launchpad.

Release Note

Rationale

Use Cases

Assumptions

Design

Most software (with a few prominent exceptions such as Firefox and OpenOffice.org) uses the gettext library for translated strings, either directly in C or via bindings to languages such as Python. This can be intercepted with an LD_PRELOAD wrapper, which dumps out information (e.g. to a database or a raw file) and then calls the underlying gettext functions. A user can then install that wrapper and play with a desktop system for a while, then send the dump file to us.

Implementation

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release.

This need not be added or completed until the specification is nearing beta.

Outstanding Issues

BoF agenda and discussion

Problem statement

OEMs would love to have information on which languages are well-supported. As well as software-level support (information being collected elsewhere), this involves translation levels of visible parts of the desktop. Current translation statistics of main do not do a good job of representing this, because they include things like gcc error messages which are largely invisible to most users.

It would be useful to have instrumentation so that we can ask users to install a package and thereby gather information on which messages they actually see, and which ones are translated; this information can then be sent back and incorporated into the ordering presented to translators by Launchpad.

(New languages can be added by means of a request on https://answers.launchpad.net/rosetta with information on the new language.)

Design

Most software (with a few prominent exceptions such as Firefox and OpenOffice.org) uses the gettext library for translated strings, either directly in C or via bindings to languages such as Python. This can be intercepted with an LD_PRELOAD wrapper, which dumps out information (e.g. to a database or a raw file) and then calls the underlying gettext functions. A user can then install that wrapper and play with a desktop system for a while, then send the dump file to us.

Font coverage

Statistics of which Unicode codepoints are covered by which fonts, or not covered at all

Some blocks have multiple preferences or other difficulties:

  • Han characters (Chinese, Japanese, Korean Hanja)
  • Arabic (e.g. Arabic vs. Persian); some fonts only have basic support
  • Extended Latin (complex composed diacritics)
  • Indic (ligatures; components are in Unicode, but no codepoints for the combined forms)

3 statistics:


CategorySpec

translation-statistics (last edited 2008-08-06 16:14:58 by localhost)