Package Internationalization and Localization

We are starting work on internationalization and localization issues for the commands and libraries. In this context internationalization refers to the mechanism by which messages and documentation are displayed in a locale-specific language other than US-English (well, our rendition of US-English), including collation order and date and monetary representation, and localization refers to the mechanism by which locale-specific glyphs are input and displayed. This work is being done in stages:
  1. Identify message and documentation text in the source code: this is done using the nmake(1) msgcat common action and the msgcc(1) command to extract the text. This mechanism has identified over 90% of the text in our code with no source modifications required. The remaining strings have been or will be manually marked using the GNU style '_(message-text)' translate macro. Messages are associated with an individual command or groups of commands and libraries. The source contains no message numbers; messages are identified by the US-English text. See msgcc(1) for more details.
  2. Gather messages into text catalogs for the LC_MESSAGES and LC_TIME categories in the C locale: machine independent message catalogs are generated by the msggen(1) command for use on all architectures. Strings in the machine-independent files are UTF-8 encoded. The machine-independent message files are placed in the directory $PACKAGEROOT/share/lib/locale/locale/category.
  3. Since the ast commands are self-documenting (try the --man or --html option on any ast command), manual pages are automatically included in the message translation catalogs.
  4. Provide translations for the LC_MESSAGES and LC_TIME message files: the package-locale packages contain machine-independent message files for the C (US-English), de (German), es (Spanish), fr (French), it (Italian) and pt (Portuguese) languages. NOTE: the initial translations were done by babelfish; we hope that interested users will humanize these translations. If you are interested, mail ast-users for coordination and details.
  5. Next we will focus on LC_CTYPE, LC_COLLATE, LC_MONETARY and LC_NUMERIC. These will hinge on an implementation of localedef(1), either from GNU or our own.

The problem with this work is that although the user interfaces are clearly defined by the standards, the implementation details, such as generated file formats and locations are unspecified, making it difficult to use efficiently. For example, how do you implement strcoll(3) if you don't own localedef(1)?

July 28, 2011