translation-statistics
Launchpad Entry: translation-statistics
Summary
OEMs would love to have information on which languages are well-supported. As well as software-level support (information being collected elsewhere), this involves translation levels of visible parts of the desktop. Current translation statistics of main do not do a good job of representing this, because they include things like gcc error messages which are largely invisible to most users.
It would be useful to have instrumentation so that we can ask users to install a package and thereby gather information on which messages they actually see, and which ones are translated; this information can then be sent back and incorporated into the ordering presented to translators by Launchpad.
Release Note
Rationale
Use Cases
Assumptions
Design
Most software (with a few prominent exceptions such as Firefox and OpenOffice.org) uses the gettext library for translated strings, either directly in C or via bindings to languages such as Python. This can be intercepted with an LD_PRELOAD wrapper, which dumps out information (e.g. to a database or a raw file) and then calls the underlying gettext functions. A user can then install that wrapper and play with a desktop system for a while, then send the dump file to us.
Implementation
Test/Demo Plan
It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release.
This need not be added or completed until the specification is nearing beta.
Outstanding Issues
BoF agenda and discussion
Problem statement
OEMs would love to have information on which languages are well-supported. As well as software-level support (information being collected elsewhere), this involves translation levels of visible parts of the desktop. Current translation statistics of main do not do a good job of representing this, because they include things like gcc error messages which are largely invisible to most users.
It would be useful to have instrumentation so that we can ask users to install a package and thereby gather information on which messages they actually see, and which ones are translated; this information can then be sent back and incorporated into the ordering presented to translators by Launchpad.
(New languages can be added by means of a request on https://answers.launchpad.net/rosetta with information on the new language.)
Design
Most software (with a few prominent exceptions such as Firefox and OpenOffice.org) uses the gettext library for translated strings, either directly in C or via bindings to languages such as Python. This can be intercepted with an LD_PRELOAD wrapper, which dumps out information (e.g. to a database or a raw file) and then calls the underlying gettext functions. A user can then install that wrapper and play with a desktop system for a while, then send the dump file to us.
Font coverage
Statistics of which Unicode codepoints are covered by which fonts, or not covered at all
Some blocks have multiple preferences or other difficulties:
- Han characters (Chinese, Japanese, Korean Hanja)
- Arabic (e.g. Arabic vs. Persian); some fonts only have basic support
- Extended Latin (complex composed diacritics)
- Indic (ligatures; components are in Unicode, but no codepoints for the combined forms)
3 statistics:
- coverage on the font
- matching a corpus of text with a given font to see which characters are not covered
- matching a font with a particular language (see fc-lang and the fontconfing orthographies) Various scripts
http://svn.debian.org/wsvn/pkg-fonts/people/yosch/?rev=0&sc=0
translation-statistics (last edited 2008-08-06 16:14:58 by localhost)