CJK-Unifonts

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

  • Launchpad Entry: cjk-unifonts

  • Packages affected: ttf-arphic-ukai, ttf-arphic-uming

Summary

The goal is to provide a set of Pan-CJKV Unicode fonts, covering all CJKV glyphs currently defined in Unicode. This spec describes the various steps and timeline to reach that goal.

Release Note

Might be changed in future, depending on the progress.

8.04 (Hardy)

This release contains updated 'AR PL Uming' and 'AR PL Ukai' Chinese Unicode fonts, so that we can now fully support the GB18030, Big5 and HKSCS-2004 standards. Also introducing the new {$yet-to-be-named} sans-serif font, which also supports those standards.

Rationale

Using CJKV languages with Unicode is a bit tricky, as Han characters used in the different CJKV regions (i.e. China, Hong Kong/Macao, Taiwan, Japan, Korea, Vietnam) often use the same Unicode codepoint (i.e. they have been "unified"), but have different shapes in each region. As the national standards in those regions also describe the shape of the characters, it is usually necessary to provide separate font sets for each region.

Traditionally separate fonts have been provided for Chinese, Japanese and Korean, which sometimes have a similar font style but don't really "fit" to each other when mixed in a single document.

The aim of this project is to provide a free set of Pan-CJKV fonts, which provide the different shapes required by the different national bodies, but also retain the same font style for a consistent look and feel.

Use Cases

Assumptions

Design

  • Fonts are in OTF/TTF format
    • Max. # of glyphs possible per font file: 65535
    • # of unified glyphs from Unicode Plane 0 and Plane 2 required to provide a basic set for all CJKV regions: ~45000
    • # of Han characters currently in Unicode: 70000+
    • Total # of glyph variants necessary to encode: TBD
  • need to provide two fonts:
    • basic coverage (ca. 45000 unified glyphs)
    • additional glyphs (all additional Han characters in Plane 2)

Most of the glyphs have the same shape in all regions. Therefor it would be a waste of disk space and memory if we would have 6 fonts all covering the same 45000 characters with only a few glyphs different in each font. Instead, a TrueType Collection (TTC) shall be used.

TTF files consist of multiple binary tables, each having a distinct purpose in the font. The two most important ones to understand how TTC works are:

  • 'glyf': contains the actual shape of the glyphs; uses font internal glyph IDs and glyph names to identify each single glyph.
  • 'cmap': maps the glyphs (glyph names) to (in our case: Unicode-) codepoints.

Therefor we can have multiple different glyph shapes in the 'glyf' table and map only one of them to a Unicode codepoint in the 'cmap' table. For this project, we would need 6 different fonts, one for all the glyph flavors in each region. In each font the 'glyf' table is actually the same, means each font contains all possible shapes in exactly the same internal position (glyph ID / glyph name). The 'cmap' table however differs in each font, mapping only the desired glyph shape to the Unicode codepoint.

As the 'glyf' table is by far the largest in a CJK font (about 25 ~ 30 MB for this project) and the 'cmap' table only a few kB in size, and the monster 'glyf' table is the same in each font, we can use a TTC which stores the 'glyf' table only once and then 6 individual 'cmap' tables, one for each font. So, instead of having 6 individual fonts each 30 MB in size, we only have one TTC which is only 35 MB in size, but provides all 6 fonts to the OS.

Implementation

Goals for Hardy

  • complete all missing Han glyphs in Unicode Plane 0 (China flavor) to comply with the PRC standard GB18030
  • create a new font, which visually matches to other sans-serif fonts and contains all Han glyphs (China flavor) in Unicode Plane 0. Probably the glyph strokes will be all the same width first.
  • supersedes: none

Goals for Intrepid

  • identify all Han glyph variants for Unicode Plane 0
  • implement all missing glyphs for Japanese (JIS X0213-2004) in Unicode Plane 2
  • supersedes: ttf-kochi-mincho

Later

  • implement all glyph variants for all CJKV regions
  • beautify the sans-serif glyphs
  • supersedes: ttf-kochi-gothic, ttf-baekmuk, ttf-unfonts-base

Test/Demo Plan

Outstanding Issues


CategorySpec

CJK-Unifonts (last edited 2008-08-06 16:16:39 by localhost)