[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

15. Other Programming Languages

While the presentation of gettext focuses mostly on C and implicitly applies to C++ as well, its scope is far broader than that: Many programming languages, scripting languages and other textual data like GUI resources or package descriptions can make use of the gettext approach.

15.1 The Language Implementor's View

All programming and scripting languages that have the notion of strings are eligible to supporting gettext. Supporting gettext means the following:

  1. You should add to the language a syntax for translatable strings. In principle, a function call of gettext would do, but a shorthand syntax helps keeping the legibility of internationalized programs. For example, in C we use the syntax _("string"), and in GNU awk we use the shorthand _"string".
  2. You should arrange that evaluation of such a translatable string at runtime calls the gettext function, or performs equivalent processing.
  3. Similarly, you should make the functions ngettext, dcgettext, dcngettext available from within the language. These functions are less often used, but are nevertheless necessary for particular purposes: ngettext for correct plural handling, and dcgettext and dcngettext for obeying other locale-related environment variables than LC_MESSAGES, such as LC_TIME or LC_MONETARY. For these latter functions, you need to make the LC_* constants, available in the C header <locale.h>, referenceable from within the language, usually either as enumeration values or as strings.
  4. You should allow the programmer to designate a message domain, either by making the textdomain function available from within the language, or by introducing a magic variable called TEXTDOMAIN. Similarly, you should allow the programmer to designate where to search for message catalogs, by providing access to the bindtextdomain function or — on native Windows platforms — to the wbindtextdomain function.
  5. You should either perform a setlocale (LC_ALL, "") call during the startup of your language runtime, or allow the programmer to do so. Remember that gettext will act as a no-op if the LC_MESSAGES and LC_CTYPE locale categories are not both set.
  6. A programmer should have a way to extract translatable strings from a program into a PO file. The GNU xgettext program is being extended to support very different programming languages. Please contact the GNU gettext maintainers to help them doing this. The GNU gettext maintainers will need from you a formal description of the lexical structure of source files. It should answer the questions:

    Based on this description, the GNU gettext maintainers can add support to xgettext.

    If the string extractor is best integrated into your language's parser, GNU xgettext can function as a front end to your string extractor.

  7. The language's library should have a string formatting facility. Additionally:
    1. There must be a way, in the format string, to denote the arguments by a positional number or a name. This is needed because for some languages and some messages with more than one substitutable argument, the translation will need to output the substituted arguments in different order. See section Special Comments preceding Keywords.
    2. The syntax of format strings must be documented in a way that translators can understand. The GNU gettext manual will be extended to include a pointer to this documentation.

    Based on this, the GNU gettext maintainers can add a format string equivalence checker to msgfmt, so that translators get told immediately when they have made a mistake during the translation of a format string.

  8. If the language has more than one implementation, and not all of the implementations use gettext, but the programs should be portable across implementations, you should provide a no-i18n emulation, that makes the other implementations accept programs written for yours, without actually translating the strings.
  9. To help the programmer in the task of marking translatable strings, which is sometimes performed using the Emacs PO mode (see section Marking Translatable Strings), you are welcome to contact the GNU gettext maintainers, so they can add support for your language to ‘po-mode.el’.

On the implementation side, two approaches are possible, with different effects on portability and copyright:

15.2 The Programmer's View

For the programmer, the general procedure is the same as for the C language. The Emacs PO mode marking supports other languages, and the GNU xgettext string extractor recognizes other languages based on the file extension or a command-line option. In some languages, setlocale is not needed because it is already performed by the underlying language runtime.

15.3 The Translator's View

The translator works exactly as in the C language case. The only difference is that when translating format strings, she has to be aware of the language's particular syntax for positional arguments in format strings.

15.3.1 C Format Strings

C format strings are described in POSIX (IEEE P1003.1 2001), section XSH 3 fprintf(), http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html. See also the fprintf() manual page, http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php, http://informatik.fh-wuerzburg.de/student/i510/man/printf.html.

Although format strings with positions that reorder arguments, such as

 
"Only %2$d bytes free on '%1$s'."

which is semantically equivalent to

 
"'%s' has only %d bytes free."

are a POSIX/XSI feature and not specified by ISO C 99, translators can rely on this reordering ability: On the few platforms where printf(), fprintf() etc. don't support this feature natively, ‘libintl.a’ or ‘libintl.so’ provides replacement functions, and GNU <libintl.h> activates these replacement functions automatically.

As a special feature for Farsi (Persian) and maybe Arabic, translators can insert an ‘I’ flag into numeric format directives. For example, the translation of "%d" can be "%Id". The effect of this flag, on systems with GNU libc, is that in the output, the ASCII digits are replaced with the ‘outdigits’ defined in the LC_CTYPE locale category. On other systems, the gettext function removes this flag, so that it has no effect.

Note that the programmer should not put this flag into the untranslated string. (Putting the ‘I’ format directive flag into an msgid string would lead to undefined behaviour on platforms without glibc when NLS is disabled.)

15.3.2 Objective C Format Strings

Objective C format strings are like C format strings. They support an additional format directive: "%@", which when executed consumes an argument of type Object *.

15.3.3 Python Format Strings

There are two kinds of format strings in Python: those acceptable to the Python built-in format operator %, labelled as ‘python-format’, and those acceptable to the format method of the ‘str’ object.

Python % format strings are described in Python Library reference / 5. Built-in Types / 5.6. Sequence Types / 5.6.2. String Formatting Operations. https://docs.python.org/2/library/stdtypes.html#string-formatting-operations.

Python brace format strings are described in PEP 3101 – Advanced String Formatting, https://www.python.org/dev/peps/pep-3101/.

15.3.4 Java Format Strings

There are two kinds of format strings in Java: those acceptable to the MessageFormat.format function, labelled as ‘java-format’, and those acceptable to the String.format and PrintStream.printf functions, labelled as ‘java-printf-format’.

Java format strings are described in the JDK documentation for class java.text.MessageFormat, https://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html. See also the ICU documentation http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html.

Java printf format strings are described in the JDK documentation for class java.util.Formatter, https://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html.

15.3.5 C# Format Strings

C# format strings are described in the .NET documentation for class System.String and in