Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

The goal is to provide a set of Pan-CJKV Unicode fonts, covering all CJKV glyphs currently defined in Unicode. This spec describes the various steps and timeline to reach that goal.

Release Note

Might be changed in future, depending on the progress.

8.04 (Hardy)

This release contains updated 'AR PL Uming' and 'AR PL Ukai' Chinese Unicode fonts, so that we can now fully support the GB18030, Big5 and HKSCS-2004 standards. Also introducing the new {$yet-to-be-named} sans-serif font, which also supports those standards.

Rationale

Using CJKV languages with Unicode is a bit tricky, as Han characters used in the different CJKV regions (i.e. China, Hong Kong/Macao, Taiwan, Japan, Korea, Vietnam) often use the same Unicode codepoint (i.e. they have been "unified"), but have different shapes in each region. As the national standards in those regions also describe the shape of the characters, it is usually necessary to provide separate font sets for each region.

Traditionally separate fonts have been provided for Chinese, Japanese and Korean, which sometimes have a similar font style but don't really "fit" to each other when mixed in a single document.

The aim of this project is to provide a free set of Pan-CJKV fonts, which provide the different shapes required by the different national bodies, but also retain the same font style for a consistent look and feel.

Use Cases

Assumptions

Design

Most of the glyphs have the same shape in all regions. Therefor it would be a waste of disk space and memory if we would have 6 fonts all covering the same 45000 characters with only a few glyphs different in each font. Instead, a TrueType Collection (TTC) shall be used.

TTF files consist of multiple binary tables, each having a distinct purpose in the font. The two most important ones to understand how TTC works are:

Therefor we can have multiple different glyph shapes in the 'glyf' table and map only one of them to a Unicode codepoint in the 'cmap' table. For this project, we would need 6 different fonts, one for all the glyph flavors in each region. In each font the 'glyf' table is actually the same, means each font contains all possible shapes in exactly the same internal position (glyph ID / glyph name). The 'cmap' table however differs in each font, mapping only the desired glyph shape to the Unicode codepoint.

As the 'glyf' table is by far the largest in a CJK font (about 25 ~ 30 MB for this project) and the 'cmap' table only a few kB in size, and the monster 'glyf' table is the same in each font, we can use a TTC which stores the 'glyf' table only once and then 6 individual 'cmap' tables, one for each font. So, instead of having 6 individual fonts each 30 MB in size, we only have one TTC which is only 35 MB in size, but provides all 6 fonts to the OS.

Implementation

Goals for Hardy

Goals for Intrepid

Later

Test/Demo Plan

Outstanding Issues


CategorySpec

CJK-Unifonts (last edited 2008-08-06 16:16:39 by localhost)