| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
While the presentation of gettext focuses mostly on C and
implicitly applies to C++ as well, its scope is far broader than that:
Many programming languages, scripting languages and other textual data
like GUI resources or package descriptions can make use of the gettext
approach.
All programming and scripting languages that have the notion of strings
are eligible to supporting gettext. Supporting gettext
means the following:
gettext would do, but a shorthand
syntax helps keeping the legibility of internationalized programs. For
example, in C we use the syntax _("string"), and in GNU awk we use
the shorthand _"string".
gettext function, or performs equivalent
processing.
ngettext,
dcgettext, dcngettext available from within the language.
These functions are less often used, but are nevertheless necessary for
particular purposes: ngettext for correct plural handling, and
dcgettext and dcngettext for obeying other locale-related
environment variables than LC_MESSAGES, such as LC_TIME or
LC_MONETARY. For these latter functions, you need to make the
LC_* constants, available in the C header <locale.h>,
referenceable from within the language, usually either as enumeration
values or as strings.
textdomain function available from within the
language, or by introducing a magic variable called TEXTDOMAIN.
Similarly, you should allow the programmer to designate where to search
for message catalogs, by providing access to the bindtextdomain
function or — on native Windows platforms — to the wbindtextdomain
function.
setlocale (LC_ALL, "") call during
the startup of your language runtime, or allow the programmer to do so.
Remember that gettext will act as a no-op if the LC_MESSAGES and
LC_CTYPE locale categories are not both set.
xgettext program is being
extended to support very different programming languages. Please
contact the GNU gettext maintainers to help them doing this.
The GNU gettext maintainers will need from you a formal
description of the lexical structure of source files. It should
answer the questions:
Based on this description, the GNU gettext maintainers
can add support to xgettext.
If the string extractor is best integrated into your language's parser,
GNU xgettext can function as a front end to your string extractor.
gettext manual will be extended to
include a pointer to this documentation.
Based on this, the GNU gettext maintainers can add a format string
equivalence checker to msgfmt, so that translators get told
immediately when they have made a mistake during the translation of a
format string.
gettext, but the programs should be portable
across implementations, you should provide a no-i18n emulation, that
makes the other implementations accept programs written for yours,
without actually translating the strings.
gettext maintainers, so they can add support for
your language to ‘po-mode.el’.
On the implementation side, two approaches are possible, with different effects on portability and copyright:
gettext functions if they are found in
the C library. For example, an autoconf test for gettext() and
ngettext() will detect this situation. For the moment, this test
will succeed on GNU systems and on Solaris 11 platforms. No severe
copyright restrictions apply, except if you want to distribute statically
linked binaries.
gettext functionality.
This has the advantage of full portability and no copyright
restrictions, but also the drawback that you have to reimplement the GNU
gettext features (such as the LANGUAGE environment
variable, the locale aliases database, the automatic charset conversion,
and plural handling).
For the programmer, the general procedure is the same as for the C
language. The Emacs PO mode marking supports other languages, and the GNU
xgettext string extractor recognizes other languages based on the
file extension or a command-line option. In some languages,
setlocale is not needed because it is already performed by the
underlying language runtime.
The translator works exactly as in the C language case. The only difference is that when translating format strings, she has to be aware of the language's particular syntax for positional arguments in format strings.
C format strings are described in POSIX (IEEE P1003.1 2001), section XSH 3 fprintf(), http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html. See also the fprintf() manual page, http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php, http://informatik.fh-wuerzburg.de/student/i510/man/printf.html.
Although format strings with positions that reorder arguments, such as
"Only %2$d bytes free on '%1$s'." |
which is semantically equivalent to
"'%s' has only %d bytes free." |
are a POSIX/XSI feature and not specified by ISO C 99, translators can rely
on this reordering ability: On the few platforms where printf(),
fprintf() etc. don't support this feature natively, ‘libintl.a’
or ‘libintl.so’ provides replacement functions, and GNU <libintl.h>
activates these replacement functions automatically.
As a special feature for Farsi (Persian) and maybe Arabic, translators can
insert an ‘I’ flag into numeric format directives. For example, the
translation of "%d" can be "%Id". The effect of this flag,
on systems with GNU libc, is that in the output, the ASCII digits are
replaced with the ‘outdigits’ defined in the LC_CTYPE locale
category. On other systems, the gettext function removes this flag,
so that it has no effect.
Note that the programmer should not put this flag into the untranslated string. (Putting the ‘I’ format directive flag into an msgid string would lead to undefined behaviour on platforms without glibc when NLS is disabled.)
Objective C format strings are like C format strings. They support an
additional format directive: "%@", which when executed consumes an argument
of type Object *.
There are two kinds of format strings in Python: those acceptable to
the Python built-in format operator %, labelled as
‘python-format’, and those acceptable to the format method
of the ‘str’ object.
Python % format strings are described in
Python Library reference /
5. Built-in Types /
5.6. Sequence Types /
5.6.2. String Formatting Operations.
https://docs.python.org/2/library/stdtypes.html#string-formatting-operations.
Python brace format strings are described in PEP 3101 – Advanced String Formatting, https://www.python.org/dev/peps/pep-3101/.
There are two kinds of format strings in Java: those acceptable to the
MessageFormat.format function, labelled as ‘java-format’,
and those acceptable to the String.format and
PrintStream.printf functions, labelled as ‘java-printf-format’.
Java format strings are described in the JDK documentation for class
java.text.MessageFormat,
https://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html.
See also the ICU documentation
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html.
Java printf format strings are described in the JDK documentation
for class java.util.Formatter,
https://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html.
C# format strings are described in the .NET documentation for class
System.String and in