Localization/Localization Industry Standards

In the localization industry, there are several key standards that allow data to be more open and compatible within different tools, than if they were limited to proprietary formats.

The modern global economy is driven by technology, which creates great need for language translation and localization efforts. More and more information, in more and more languages, is the fuel for this economic growth. This information needs to be electronically available and retrievable, in common formats, necessitating standards. This is similar to what was needed by the original railroad builders—if rail gauges were incompatible, collaboration between potential partners would be difficult or impossible

Standards not only provide reliability, they also enable efficiency, quality, and most importantly, interoperability. They are developed to make possible the exchange of localizable data between tools; in other words, to ensure the data remain more open and re-usable than with a proprietary format.

XLIFF
XML Localisation Interchange File Format (XLIFF) is an industry standard based on XML. It is the core standard for data exchange in the localization industry. XLIFF is designed to standardize the way localizable data is passed between platforms and tools during a localization process. The current version is 2.1.

The XML Localization Interchange File Format (XLIFF) XLIFF 2.1 has been approved by the OASIS open development organization as an official OASIS Standard, a status that signifies the highest level of ratification. The International Organization for Standardization (ISO) has also approved XLIFF for release under the designation 'ISO 21720:2017'. XLIFF gives a multilingual content owner a single interchange file format that can be understood by any localization provider, using any conformant localization tool.

Structure: The XLIFF specification includes Core, Module and Extension elements

1) Core: The core includes the most essential elements needed to store content and to translate it.

2) Modules: there are eight specialized pre-defined modules which can be used to add features such as metadata, translation candidates, terminology, etc.

3) Extension elements: standard XML namespaces are to be used to store data in elements or attributes defined in a custom XML Schema

An XLIFF compliant tool must support Core, with optional Module and Extension support, however, it should preserve the modules and extensions and simply roundtrip them if they are not deemed relevant to the translation process

CORE

As a general rule, an XLIFF-compliant TMS tool should be able to read, write and roundtrip every single element that is contained in the XLIFF core. Typically, the core contains a group element (e.g. a document) and at least one unit element (represents e.g. a paragraph). Each unit contains one or more segments (e.g a sentence). Each segment has a state on it (examples for the state attribute: initial, translated, reviewed and final). A segment may also have a custom value specified in subState (e.g. ‘failed review’).

1) Segmentation (Core) XLIFF's Core features built-in segmentation, which can be executed at any level: by file, group, unit and segment. A TMS should be able to make use of both segmented and unsegmented content interchangeably. For example, the translators may want to see a whole paragraph at the unit level for context but reorder segments to fit the target convention better. This can be achieved with XLIFF's order attribute. A recycling engine might return more matches for discrete segments. In turn, a machine translation system might give better results with more context at the unit level. Therefore, it is important to maintain segmentation flexibility within the XLIFF file. 2) Inline Codes (Core) - XLIFF also protects and preserves the original inline codes of an extracted document. For instance, HTML may contain a span of tagged text. When extracted, the tags should be preserved. This is done with the pc and ph elements. You can move tags around in the source if they need to be repositioned.

3) Notes (Core). Context from the original documents or from the content creators can be provided to translators using notes and the metadata module.

4) Annotations (Core). Annotations provide information that is not contained in the original document. Unlike notes and metadata, this information can be referring to recycling or matches from either machine translation or translation memory.

Additional XLIFF Modules:

1) Glossary Module: simple glossaries consisting of a list of terms with a definition or translation 2) Format Style Module: gives the information needed to create a quick HTML generated preview of the content using a predefined set of simple HTML formatting elements

3) Resource Data Module: could contain screen grabs to show the content used in context, a reference to the screen grab is stored within the XLIFF and provided as context to the translator

4) Size and length restriction module – help constrain content within general size restrictions – e.g. in SW localization can help prevent truncations.

5) Validation module – contains a set of validations rules that can be applied to the target text both globally and locally (e.g. if the source contains a placeholder, it can be checked if that source placeholder is contained in the translation).

6) ITS Module: ITS (International Tag Set) – this is a standard set of tags, attributes and elements that were designed to provided internationalization and localization support in XML documents. The ITS Module describes how ITS data categories are expressed within XLIFF. It can be used to decorate tags - e.g. DO NOT translate source content, for example, product or brand names.

In terms of economic benefit, XLIFF streamlines localization by simplifying file management and formatting. The standard is also ideal for use in web services.

As an open, standardized, and tool-independent format, XLIFF is widely supported in Computer-Assisted Translation (CAT) and Content Management System (CMS) tools. Translators need to master and understand only one standard format instead of proprietary ones. XML is displayed well in most web browsers, and files can be opened and modified in a simple text editor, instead of specialized software.

TMX
Translation Memory eXchange (TMX) is an open XML standard, which is designed to support the exchange of translation memories between computer-aided translation and localization tools, with little or no loss of critical data. The current version is 1.4b. The purpose of TMX is to allow any tool which uses translation memories to pass databases between their own formats and a common format, allowing the databases to be used over time, with different versions of the same software, or different software entirely.

Its use allows localization services to have greater choice in selection of computer-aided translation tools, and not to be restricted to a specific tool.

TBX
TermBase eXchange (TBX), which is identical to ISO 30042:2008, is an XML-based standard that allows sharing of glossaries between tools. It supports various processes involving terminological data, including analysis, descriptive representation, dissemination, and interchange.

TBX also offers a set of data-categories that are used in terminological work.

SRX
Segmentation Rules eXchange (SRX) is an XML-based standard that describes the ways in which translation and other language-processing tools segment text for processing. It is designed to facilitate the exchange of segmentation rules, using regular expressions. The current version is 2.0.

Because SRX provides compatibility among CAT tools, it offers business value by increasing options and facilitating tool use by translation companies.

ITS
Internationalization Tag Set (ITS) defines translatable attribute values, protected element content, and text for internationalization or localization, by attaching special attributes to such elements, to enable automated creation and processing of multilingual Web content.

The current version, Version 2.0 focuses on HTML, XML-based formats, and leverages processing based on XML Localization Interchange File Format (XLIFF) and Natural Language Processing Interchange Format (NIF).