IBusChinese

Summary

  • Extract the currently concurrent word lists from the different input method modules and unify them into one.
  • Develop a library to provide easy access to this unified word lists for any Chinese input method module.

Release Note

TBD

Rationale

Smart Chinese input methods contain word lists in order to predict what the user wants to type. Without those the user would need to select every single character out of multiple candidates, which slows down typing. Up to now, every smart Chinese input method contains and maintains it's own word list, thus re-inventing the wheel. Unfortunately these word lists differ in coverage and quality, some exist only for Simplified Chinese, others only for Traditional Chinese. Therefor it is currently not possible to convert typed strings from Simplified into Traditional Chinese and vice versa.

Design

  • The idea is to unify and merge the different word lists available and develop a library to provide easy access to this unified word list to any Chinese input method module.
  • Further more, it should include all available keystroke-to-character tables (.cin tables) for all Chinese languages.
  • It would need to know which XKB layout the user is using at the time of typing and translate it into a usable format to match the character candidates from the .cin files.
  • create a directory in the user's home directory to store a personal phrase database and a place where the user can drop a custom .cin file, which then gets compiled and used by the library on runtime.

Work flow

  1. The input method framework calls the library together with some parameters upon triggering (e.g CRTL-Space).
  2. The library translates the string of keystrokes into either some transliteration system appropriate to the language the user is typing in, or a component based keyboard layout.
  3. Based on the transliterated string or the components, the library looks up potential characters in the .cin tables an places them into an array.
  4. Based on the list of candidate characters the library looks up the word list and returns a list of strings together with the array of candidate characters to the input method module.

Parameters

  1. language code to lookup the correct word list (e.g. cmn = Mandarin, yue = Cantonese, nan= Minnan)
  2. whether the user wants Simplified or Traditional Chinese as output (hans vs. hant)
  3. fuzzy search (on/off) -- include similar pronunciations to what the user has typed, default: off
  4. ID of the input method (to select the proper .cin table)
  5. if using Zhuyin input method, the ID of the keyboard layout (e.g. regular, eten, ...)
  6. return
    • a) only character candidates b) a) plus list of strings according to what the user has typed already c) a) plus b) plus list of possible strings for string completion
  7. if using Zhuyin input method, attach Bopomofo to the characters (on/off), default: off

Implementation

  • written in C
  • use glib for data hashes

Test/Demo Plan

TBD

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.


CategorySpec

DesktopTeam/Specs/Lucid/IBusChinese (last edited 2009-12-02 03:09:18 by www)