Translation memory

Translation memory

A translation memory, or TM, is a type of database that stores segments that have been previously translated. A translation-memory system stores the words, phrases and paragraphs that have already been translated and aid human translators. The translation memory stores the source text and its corresponding translation in language pairs called “translation units”.

Some software programs that use translation memories are known as translation memory managers (TMM).

Translation memories are typically used in conjunction with a dedicated computer assisted translation (CAT) tool, word processing program, terminology management systems, multilingual dictionary, or even raw machine translation output.

A translation memory consists of text segments in a source language and their translations into one or more target languages. These segments can be blocks, paragraphs, sentences, or phrases. Individual words are handled by terminology bases and are not within the domain of TM.

Research indicates that many companies producing multilingual documentation are using translation memory systems. In a survey of language professionals in 2006, 82.5 % out of 874 replies confirmed the use of a TM. Usage of TM correlated with text type characterised by technical terms and simple sentence structure (technical, to a lesser degree marketing and financial), computing skills, and repetitiveness of content [Elina Lagoudaki (2006), "Translation Memory systems: Enlightening users' perspective. Key finding of the TM Survey 2006 carried out during July and August 2006. (Imperial College London, Translation Memories Survey 2006), p.16 [] ]

Using translation memories

The program breaks the source text (the text to be translated) into segments, looks for matches between segments and the source half of previously translated source-target pairs stored in a translation memory, and presents such matching pairs as translation candidates. The translator can accept a candidate, replace it with a fresh translation, or modify it to match the source. In the last two cases, the new or modified translation goes into the database.

Some translation memories systems search for 100% matches only, that is to say that they can only retrieve segments of text that match entries in the database exactly, while others employ fuzzy matching algorithms to retrieve similar segments, which are presented to the translator with differences flagged. It is important to note that typical translation memory systems only search for text in the source segment. The flexibility and robustness of the matching algorithm largely determine the performance of the translation memory, although for some applications the recall rate of exact matches can be high enough to justify the 100%-match approach.

Segments where no match is found will have to be translated by the translator manually. These newly translated segments are stored in the database where they can be used for future translations as well as repetitions of that segment in the current text.

Translation memories work best on texts which are highly repetitive, such as technical manuals. They are also helpful for translating incremental changes in a previously translated document, corresponding, for example, to minor changes in a new version of a user manual. Traditionally, translation memories have not been considered appropriate for literary or creative texts, for the simple reason that there is so little repetition in the language used. However, others find them of value even for non-repetitive texts, because the database resources created have value for concordance searches to determine appropriate usage of terms, for quality assurance (no empty segments), and the simplification of the review process (source and target segment are always displayed together while translators have to work with two documents in a traditional review environment).

If a translation memory system is used consistently on appropriate texts over a period of time, it can save translators considerable work.

Main benefits

Translation memory managers are most suitable for translating technical documentation and documents containing specialized vocabularies. Their benefits include:

* Ensuring that the document is completely translated (translation memories do not accept empty target segments)
* Ensuring that the translated documents are consistent, including common definitions, phrasings and terminology. This is important when different translators are working on a single project.
* Enabling translators to translate documents in a wide variety of formats without having to own the software typically required to process these formats.
* Accelerating the overall translation process; since translation memories "remember" previously translated material, translators have to translate it only once.
* Reducing costs of long-term translation projects; for example the text of manuals, warning messages or series of documents needs to be translated only once and can be used several times.
* For large documentation projects, savings (in time or money) thanks to the use of a TM package may already be apparent even for the first translation of a new project, but normally such savings are only apparent when translating subsequent versions of a project that was translated before using translation memory.

Main obstacles

The main problems hindering wider use of translation memory managers include:

* The concept of "translation memories" is based on the premise that sentences used in previous translations can be "recycled". However, a guiding principle of translation is that the translator must translate the "message" of the text, and not its component "sentences".
* Translation memory managers do not easily fit into existing translation or localization processes. In order to take advantages of TM technology, the translation processes must be redesigned.
* Translation memory managers do not presently support all documentation formats, and filters may not exist to support all file types.
* There is a learning curve associated with using translation memory managers, and the programs must be customized for greatest effectiveness.
* In cases where all or part of the translation process is outsourced or handled by freelance translators working off-site, the off-site workers require special tools to be able to work with the texts generated by the translation memory manager.
* Full versions of many translation memory managers can cost from US$500 to US$2,500 per seat, which can represent a considerable investment (although lower cost programs are also available). However, some developers produce free or low-cost versions of their tools with reduced feature sets that individual translators can use to work on projects set up with full versions of those tools. (Note that there are freeware and shareware TM packages available, but none of these has yet gained a large market share.)
* The costs involved in importing the user's past translations into the translation memory database, training, as well as any add-on products may also represent a considerable investment.
* Maintenance of translation memory databases still tends to be a manual process in most cases, and failure to maintain them can result in significantly decreased usability and quality of TM matches.
* As stated previously, translation memory managers may not be suitable for text that lacks internal repetition or which does not contain unchanged portions between revisions. Technical text is generally best suited for translation memory, while marketing or creative texts will be less suitable.
* The quality of the text recorded in the translation memory is not guaranteed; if the translation for particular segment is incorrect, it is in fact more likely that the incorrect translation will be reused the next time the same source text, or a similar source text, is translated, thereby perpetuating the error.
* There is also a potential, and, if present, probably an unconscious effect on the translated text. Different languages use different sequences for the logical elements within a sentence and a translator presented with a multiple clause sentence that is half translated is less likely to completely rebuild a sentence.
* There is also a potential for the translator to deal with the text mechanically sentence-by-sentence, instead of focusing on how each sentence relates to those around it and to the text as a whole.
* Translation memories also raise certain industrial relations issues as they make exploitation of human translators easier.Fact|date=April 2008

Functions of a translation memory

The following is a summary of the main functions of a Translation Memory.

Off-line functions


This function is used to transfer a text and its translation from a text file to the TM. Import can be done from a "raw format", in which an external source text is available for importing into a TM along with its translation. Sometimes the texts have to be reprocessed by the user. There is another format that can be used to import: the "native format". This format is the one that uses the TM to save translation memories in a file.


The process of analysis is developed through the following steps:; Textual parsing: It is very important to recognize punctuation in order to distinguish for example the end of sentence from abbreviation. Thus, mark-up is a kind of pre-editing. Usually, the materials which have been processed through translators' aid programs contain mark-up, as the translation stage is embedded in a multilingual document production line. Other special text elements may be set off by mark-up. There are special elements which do not need to be translated, such as proper names and codes, while others may need to be converted to native format.; Linguistic parsing: The base form reduction is used to prepare lists of words and a text for automatic retrieval of terms from a term bank. On the other hand, syntactic parsing may be used to extract multi-word terms or phraseology from a source text. So parsing is used to normalise word order variation of phraseology, this is which words can form a phrase.; Segmentation: Its purpose is to choose the most useful translation units. Segmentation is like a type of parsing. It is done monolingually using superficial parsing and alignment is based on segmentation. If the translators correct the segmentations manually, later versions of the document will not find matches against the TM based on the corrected segmentation because the program will repeat its own errors. Translators usually proceed sentence by sentence, although the translation of one sentence may depend on the translation of the surrounding ones.; Alignment: It is the task of defining translation correspondences between source and target texts. There should be feedback from alignment to segmentation and a good alignment algorithm should be able to correct initial segmentation.; Term extraction: It can have as input a previous dictionary. Moreover, when extracting unknown terms, it can use parsing based on text statistics. These are used to estimate the amount of work involved in a translation job. This is very useful for planning and scheduling the work. Translation statistics usually count the words and estimate the amount of repetition in the text.


Export transfers the text from the TM into an external text file. Import and export should be inverses.

Online functions

When translating, one of the main purposes of the TM is to retrieve the most useful matches in the memory so that the translator can choose the best one. The TM must show both the source and target text pointing out the identities and differences.


It is possible to retrieve from the TM one or more types of matches.

; Exact match: Exact matches appear when the match between the current source segment and the stored one has been a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches are also called "100% matches".; In Context Exact (ICE) match or Guaranteed Match: An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions. ; Fuzzy match: When the match has not been exact, it is a "fuzzy" match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is specified.; Concordance: This feature allows translators to select one or more words in the source segment and the system retrieves segment pairs that match the search criteria. This feature is helpful for finding translations of terms and idioms in the absence of a terminology database.


A TM is updated with a new translation when it has been accepted by the translator. As always in updating a database, there is the question what to do with the previous contents of the database. A TM can be modified by changing or deleting entries in the TM. Some systems allow translators to save multiple translations of the same source segment.

Automatic translation

Translation memories can do retrieval and substitution automatically without the help of the translator. If so.

; Automatic retrieval: A TM features automatic retrieval and evaluation of translation correspondences in a translator's workbench.; Automatic substitution: Exact matches come up in translating new versions of a document. During automatic substitution, the translator does check the translation against the original, so if there are any mistakes in the previous translation, they will carry over.


When networking during the translation it is possible to translate a text efficiently together with a group of translators. This way, the translations entered by one translator are available to the others. Moreover, if translation memories are shared before the final translation, there is a chance that mistakes made by one translator will be corrected by other team members.

Text memory

"Text memory" is the basis of the proposed [ Lisa OSCAR xml:tm standard] . Text memory comprises author memory and translation memory.

Translation memory

The unique identifiers are remembered during translation so that the target language document is 'exactly' aligned at the text unit level. If the source document is subsequently modified, then those text units that have not changed can be directly transferred to the new target version of the document without the need for any translator interaction. This is the concept of 'exact' or 'perfect' matching to the translation memory. xml:tm can also provide mechanisms for in-document leveraged and fuzzy matching.

History of translation memories

The concept behind translation memories is not recent — university research into the concept began in the late 1970s, and the earliest commercializations became available in the late 1980s — but they became commercially viable only in the late 1990s. Originally translation memory systems stored aligned source and target sentences in a database, from which they could be recalled during translation. The problem with this 'leveraged' approach is that there is no guarantee if the new source language sentence is from the same context as the original database sentence. Therefore all 'leveraged' matches require that a translator reviews the memory match for relevance in the new document. Although cheaper than outright translation, this review still carries a cost.

upport for new languages

Translation memory tools from majority of the companies do not support many upcoming languages. Recently Asian countries like India also jumped in to language computing and there is high scope for Translation memories in such developing countries. As most of the CAT software companies are concentrating on legacy languages, nothing much is happening on Asian languages.

Recent trends

One recent development is the concept of 'text memory' in contrast to translation memory (see [ Translating XML Documents with xml:tm] ). This is also the basis of the proposed LISA OSCAR [ xml:tm] standard. Text memory within xml:tm comprises 'author memory' and 'translation memory'. Author memory is used to keep track of changes during the authoring cycle. Translation memory uses the information from author memory to implement translation memory matching. Although primarily targeted at XML documents, xml:tm can be used on any document that can be converted to [ XLIFF] format.

econd generation translation memories

Much more powerful than first-generation TMs, they include a linguistic analysis engine, use chunk technology to break down segments into intelligent terminological groups, and automatically generate specific glossaries.

Translation memory and related standards


[ Translation Memory Exchange format] . This standard enables the interchange of translation memories between translation suppliers. TMX has been adopted by the translation community as the best way of importing and exporting translation memories. The current version is 1.4b - it allows for the recreation of the original source and target documents from the TMX data. An updated version, 2.0, is due to be released in 2008.


[ Termbase Exchange format] . This LISA standard, which is currently being revised and republished as ISO 30042, allows for the interchange of terminology data including detailed lexical information. The framework for TBX is provided by three ISO standards: ISO 12620, ISO 12200 and ISO 16642. ISO 12620 provides an inventory of well-defined “data categories” with standardized names that function as data element types or as predefined values. ISO 12200 (also known as MARTIF) provides the basis for the core structure of TBX. ISO 16642 (also known as Terminological Markup Framework) includes a structural metamodel for Terminology Markup Languages in general.


[ Segmentation Rules Exchange format] . SRX is intended to enhance the TMX standard so that translation memory data that is exchanged between applications can be used more effectively. The ability to specify the segmentation rules that were used in the previous translation increases the leveraging that can be achieved.


[ GILT Metrics] . GILT stands for (Globalization, Internationalization, Localization, and Translation). The GILT Metrics standard comprises three parts: GMX-V for volume metrics, GMX-C for complexity metrics and GMX-Q for quality metrics. The proposed GILT Metrics standard is tasked with quantifying the workload and quality requirements for any given GILT task.


[ Open Lexicon Interchange Format] . OLIF is an open, XML-compliant standard for the exchange of terminological and lexical data. Although originally intended as a means for the exchange of lexical data between proprietary machine translation lexicons, it has evolved into a more general standard for terminology exchange.


[ XML Localisation Interchange File Format] . It is intended to provide a single interchange file format that can be understood by any localization provider. XLIFF is the preferred way of exchanging data in XML format in the translation industry.


[ Translation Web Services] . TransWS specifies the calls needed to use Web services for the submission and retrieval of files and messages relating to localization projects. It is intended as a detailed framework for the automation of much of the current localization process by the use of Web Services.


This approach to translation memory is based on the concept of text memory which comprises author and translation memory. has been donated to Lisa OSCAR by [ XML-INTL] .


Gettext Portable Object format. Though often not regarded as a translation memory format, Gettext PO files are bilingual files that are also used in translation memory processes in the same way translation memories are used. Typically, a PO translation memory system will consist of various separate files in a director tree structure. Common tools that work with PO files include the [ GNU Gettext Tools] and the [ Translate Toolkit] . Several tools and programs also exist that edit PO files as if they are mere source text files.

ee also

* Translation
* text corpus
* Eurodicautom
* Computer-assisted reviewing

Desktop translation memory software

Desktop translation memory tools are typically what individual translators use to complete translations. They are a specialized tool for translation in the same way that a word processor is a specialized tool for writing.

Free and Open Source Software (FOSS)

* [ Anaphraseus] . A macro set for Writer. Former name: OpenWordfast.
* [ OmegaT+] , cross-platform computer assisted translation tool suite. No language limitation (source and target), directly supports MS Office XML formats in newer version, formats, OpenDocument Format, DocBook XML, (X)HTML, HTML Help Compiler files, plain text, java .properties, PO. License: GPL. Suite includes other tools: bitext aligner/converter to TMX, Validator. Requires: Java JRE
* [ Open Language Tools] , multiplatform computer aided translation tool. No language limitation (source and target), works with its own format (compressed XLIFF 1.0), round trip conversion provided from HTML, DocBook SGML, JSP, XML (needs configuration file), formats, Open Document Format, plain text, PO, java .properties, Java RessourceBundle, Mozilla .DTD ressource files. License: CDDL. Requires Java JRE
* [ Transolution] , multiplatform computer aided translation tool. No language limitation (source and target), supports XLIFF files, license: GPL, requires: Python
* [ TinyTM] , a translation memory consisting of a TinyTM server and a Microsoft Word client. In addition, the [ OmegaT] and [ MetaTexis] TMs provide interfaces to the TinyTM server.


* [ AidTrans Studio Basic] – free edition of commercial tool. No limitation in languages (source and target), supports single tm/terminology database in project. Supports: Ms Office 2007 files (Word, Excel, Power Point), HTML, XML, Trados Ttx, OpenOffice Writer files. TMX import/export support by Tm/Tdb managers. Segmentation and non-translatables expressions processing fully configurable using regular expressions.
* [ Appletrans] , Mac OS X computer aided translation tool. No language limitations (source and target). Supports RTF, HTML, XML. No access to source.
* [ Similis] , second generation translation memory, offers a freeware version.


* [ Across Personal Edition] - translation memory tool
* [ AidTrans Studio Professional] - Free beta version available for download
* [ Araya Translation Editor and related tools]
* [ Déjà Vu X (DVX)] - Evaluation version available
* [ FastHelp] - Windows Help File Generator
* [ Felix] - Time-unlimited trial version available; registration only required for exceeding 500 entries
* [ Heartsome Translation Suite]
* [ Linear B Searchable Translation Memories]
* [ Lingobit Localization Tool] - Software localization tool with translation memory
* [ Logoport Lionbridge's]
* [ MetaTexis] - Evaluation version available
* [ MLTS]
* [ MemoQ Translator Pro] - Free freelancer's version available
* [ MultiCorpora MultiTrans] - Next-generation TM-tool for advanced leveraging
* [ MultiLing Fortis]
* [ SIMILIS (2nd Generation Translation Memory)]
* [ Sisulizer Localization Tool] - Software localization tool with translation memory
* [ STAR Transit]
* [ SDL TRADOS] - Portal to all SDL Trados based solutions, including Translation Memory Server
* [ Swordfish] - Cross-Platform Translation Environment based on open standards (XLIFF, TMX, SRX, Unicode). Evaluation version available.
* [ TransAssist] - Name changed to [ Felix]
* [ Tr-AID v.2] - Demo download available
* [ T-Remote Memory]
* Wordfast - Time-unlimited shareware, registration only required for translation memories exceeding 500 translation units
* [ XTM] - Totally Open Standards Based translation memory using Lisa OSCAR standard and supporting all other XML based Open Standards including SRX, Unicode Standard Annex #29-9, XLIFF 1.2, GMX-V, TMX, DITA and W3C ITS.

Centralized translation memory

Centralized translation memory systems store TM on a central server. They work together with desktop TM and can increase TM match rates by 30-60% more than the TM leverage attained by desktop TM alone. They export prebuilt "translation kits" or "t-kits" to desktop TM tools. A t-kit contains content to be translated pre-segmented on the central server and a subset of the TM containing all applicable TM matches. Centralized TM is usually part of a globalization management system (GMS), which may also include a centralized terminology database (or glossary), a workflow engine, cost estimation, and other tools.


* [ GlobalSight]
** GlobalSight is an Open Source Initiative, sponsored by Welocalize
* [ Lingotek]
** Free web application that supports public and private translation memories, an alignment tool, terminology management features, and a workflow tool. Supports both TMX and XLIFF.
* [ MyMemory]
** Free collaborative TM with an open standard API. On-the-fly creation on memories from a document.
* Pootle (official site), Pootle server at []
** Free web application with internal translation memory for the collaborative translation of PO and XLIFF files.
* [ Ando Tools]
**Free package of MS Word and Excel add-ons designed for translators, proofreaders and localization engineers.


* [ Translation Search Engine (TSE)]
** Translation memory that is pre-populated with over 30 million translated sentences. Supports TMX. Created by Elanex. Private beta test version available for use without charge by invitation from Elanex.
* [ MultiTrans]
** MultiTrans Expert Software - hosting available
** MultiTrans Freelance Software
* [ AidTrans Studio Enterprise]
** Free beta version available for download, it is currently under beta testing. Translation Memory and Terminology server.
* [ XTM]
** Totally Open Standards Based translation memory using Lisa OSCAR standard and supporting all other XML based Open Standards including SRX, Unicode Standard Annex #29-9, XLIFF 1.2, GMX-V, TMX, DITA and W3C ITS.
* [ Araya]
** XML-RPC based translation memory
* [ Idiom WorldServer]
** Idiom WorldServer Globalization Management System (GMS) -- hosted or standalone
* [ MemoQ LSP5 / Enterprise]
** Kilgray MemoQ Integrated Localization Environment (ILE) LSP5 or Enterprise edition -- standalone
* [ MEMOrg - Online Translation Memory]
** Access by invitation only
* [ MetaTexis Server]
** MetaTexis Server (available versions: Team, Office, Enterprise)
* [ SDL/Trados]
**SDL Translation Management System -- hosted or standalone
** SDL TRADOS TeamWorks -- standalone
** SDL TRADOS TM Server -- standalone
** TRADOS GXT -- hosted (discontinued, formerly Uniscape GXT)
* [ STAR James]
** Workflow solution, can be integrated with GRIPS CMS system, STAR Transit, TermStar, WebTerm, FormatChecker and the SPIDER publishing engine.
* [ Déjà Vu DVX Workgroup]
* [ Transware] (acquired by Welocalize)
* [ ESTeam Translator]

References and interesting links

* [ Index of Translation Memory Tools]
* [ Translating XML Documents with xml:tm - a revolutionary new approach to translation memory]
* [ xml:tm - A Radical New Approach to Translating XML Documents]
* [ How to Leverage the Maximum Potential of XML for Localization]
* [ Coping with Babel: How to localize XML]
* [ Articles on the TMX translation memory standard]
* [ Translation memories]
* [ Benchmarking translation memories]
* [ Translation Memory Software]
* [ Extending translation memories] (PDF document)
* [ Translation technology: Machine Translation and TM]
* [ Translators Now And Then - How Technology Has Changed Their Trade]
* [ LISA/OSCAR 2004 Translation Memory Survey]
* [ LISA/OSCAR 2002 Translation Memory Survey]
* [ Evolution toward TM]
* [ XML in localisation: Reuse translations with TM and TMX]
* [ Imperial College London Translation Memories Survey 2006] (PDF document)
* [ Ecolore survey of TM use by freelance translators (Word document)]
* [ Powerful English-Russian parallel texts corpora compiled mostly on the basis of translation memories]
* [ bAbel : Translation Systems with Learning Capabilities]
* [ Power Shifts in Web-Based Translation Memory]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Translation Memory — Ein Übersetzungsspeicher (auch Übersetzungsarchiv; engl. translation memory, abgekürzt TM) ist eine (in der Regel die Haupt )Komponente von Anwendungen zur rechnerunterstützten Übersetzung (Computer aided translation, abgekürzt CAT).… …   Deutsch Wikipedia

  • Translation Memory — Память переводов (ПП, англ. translation memory, TM иногда называемая «Накопитель переводов») база данных, содержащая набор ранее переведенных текстов. Одна запись в такой базе данных соответствует сегменту или «единице перевода» (англ.… …   Википедия

  • Translation memory — Память переводов (ПП, англ. translation memory, TM иногда называемая «Накопитель переводов») база данных, содержащая набор ранее переведенных текстов. Одна запись в такой базе данных соответствует сегменту или «единице перевода» (англ.… …   Википедия

  • translation memory — vertimo atmintis statusas T sritis informatika apibrėžtis ↑Duomenų bazė, kurioje laikomi teksto segmentai ir jų vertimai tam, kad vertimus būtų galima panaudoti iš naujo. Teksto segmentas paprastai atitinka pastraipą, sakinį, frazę, rečiau – žodį …   Enciklopedinis kompiuterijos žodynas

  • Translation Memory eXchange — (TMX) ist ein offenes Datenformat, das zum Datenaustausch zwischen verschiedenen Übersetzungsprogrammen (Übersetzungsspeicher, engl. translation memory) dient. Es basiert auf XML und stellt Translation Memory Daten (d. h. vor allem die… …   Deutsch Wikipedia

  • Translation Memory eXchange — TMX (Translation Memory eXchange) is an open XML standard for the exchange of translation memory data created by computer aided translation and localization tools. TMX is developed and maintained by OSCAR [ [ OSCAR] …   Wikipedia

  • Translation Memory eXchange — TMX (Translation Memory eXchange  Обмен памятью переводов)  открытый формат файлов XML для обмена данными памяти переводов, которые создаются в процессе автоматизированного перевода. Формат TMX разработан и поддерживается группой… …   Википедия

  • Translation Memory eXchange — TMX formatas statusas T sritis informatika apibrėžtis ↑Vertimo atminties ↑XML kalbos pagrindo standartizuotas formatas. Sukurtas 1998 m. Lokalizavimo pramonės standartų asociacijos (LISA) darbo grupės. Pagrindinė formato paskirtis – vertimo… …   Enciklopedinis kompiuterijos žodynas

  • Memory management unit — This 68451 MMU could be used with the Motorola 68010 A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware component responsible for handling accesses to memory requested by the CPU. Its… …   Wikipedia

  • Translation — For other uses, see Translation (disambiguation). Translator redirects here. For other uses, see Translator (disambiguation). Contents 1 Etymology 2 Theory …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”