Localizing XML Data

The XML data that defines the TOC, index, and helpset files can be localized as specified in the XML 1.0 specification (http://w3c.org/XML/). Both the character encoding and language can be set for these files.

Character Encoding

Character encoding is an unambiguous mapping of the members of a character set (letters, ideographs, digits, symbols, or control functions) to specific numeric code values. The specified encoding applies to the entire file. Character encoding can be set for XML files using the following methods (listed in order of precedence):

Only one encoding can be specified for any file.

HTTP Protocol

If the XML file is provided by a server via the HTTP protocol, the server can specify the character set using the charset parameter in the HTTP Content-Type field.

XML Prolog Declaration

Typically, the encoding attribute in the prolog to all of the XML metadata files is used to specify the encoding used for its character set. For example, the following prolog specifies the Latin-1 (ISO-8859-1) character set:

<?xml version='1.0' encoding='ISO-8859-1' standalone='yes' ?>

Setting the Language

The language can be set for the XML files using the following methods (listed in order of precedence):

  It is possible to mix languages in these files. A different language can be specified for each tag; however, only one character encoding can be specified for each file.

The xml:lang Attribute

The language for any element (tag) in XML files can be set using the xml:lang attribute. For example, the following code sets the language for that table of contents entry to German. Any elements (<tocitem> tags) nested in that tag automatically inherit that language:

<tocitem xml:lang="de" target="jde.intro">Homepage der JDE Online-Hilfe</tocitem>

Typically, the xml:lang attribute is set in the opening tag (for example, <toc xml:lang="de">), so all of the other elements in the TOC inherit the attribute. In this case the entire TOC is in German.

The syntax of the lang attribute is:

lang = language-code
language-code = primarycode ('-' subcode)
primarycode = ISO639 | IonaCode | UserCode
ISO639 = 2 alpha characters
IonaCode = (i | I) '-' (alpha characters)
UserCode = (x | X) '-' (alpha characters)
subcode = (alpha characters)

For more information about the lang attribute, please refer to the XML recommendation at the World Wide Web Consortium web site (http://w3c.org/XML/).

HTTP Protocol

If the XML file is provided by a server via the HTTP protocol, the server can specify the language for that file using the HTTP Content-Language header (for example, Content-Language:en-US).

See also:

Localizing Help Information
Localizing Help Presentation
Localizing Helpsets
Localizing HTML Data
Localization and Fonts
Localizing the Full-Text Search Database