Transfer-based machine translation

Transfer-based machine translation

Transfer-based machine translation is a type of machine translation, it is based on the idea of interlingua and is currently one of the most widely used methods of machine translation

Overview

Both transfer-based and interlingua-based machine translation have the same idea: to make a translation it is necessary to have an intermediate representation that captures the "meaning" of the original sentence in order to generate the correct translation. In interlingua-based MT this intermediate representation must be independent of the languages in question, whereas in transfer-based MT, it has some dependence on the language pair involved.

The way in which transfer-based machine translation systems work varies substantially, but in general they follow the same pattern: they apply sets of linguistic rules which are defined as correspondences between the structure of the source language and that of the target language. The first stage involves analysing the input text for morphology and syntax (and sometimes semantics) to create an internal representation. The translation is generated from this representation using both bilingual dictionaries and grammatical rules.

It is possible with this translation strategy to obtain fairly high quality translations, with accuracy in the region of 90% (although this is highly dependent on the language pair in question — for example the distance between the two).

How it works

In a rule-based machine translation system the original text is first analysed morphologically and syntactically in order to obtain a syntactic representation. This representation can then be refined to a more abstract level putting emphasis on the parts relevant for translation and ignoring other types of information. The transfer process then converts this final representation (still in the original language) to a representation of the same level of abstraction in the target language. These two representations are referred to as "intermediate" representations. From the target language representation, the stages are then applied in reverse.

Analysis and transformation

Various methods of analysis and transformation can be used before obtaining the final result. Along with these statistical approaches may be augmented generating hybrid systems. The methods which are chosen and the emphasis depends largely on the design of the system, however, most systems include at least the following stages:

* Morphological analysis. Surface forms of the input text are classified as to part-of-speech (e.g. noun, verb, etc.) and sub-category (number, gender, tense, etc.) All of the possible "analyses" for each surface form are typically outputted at this stage, along with the lemma of the word.
* Lexical categorisation. In any given text some of the words may have more than one meaning, causing ambiguity in analysis. Lexical categorisation looks at the context of a word to try and determine the correct meaning in the context of the input. This can involve part-of-speech tagging and word sense disambiguation.
* Lexical transfer. This is basically dictionary translation, the source language lemma (perhaps with sense information) is looked up in a bilingual dictionary and the translation is chosen.
* Structural transfer. While the previous stages deal with words, this stage deals with larger constituents, for example phrases and chunks. Typical features of this stage include concordance of gender and number, and re-ordering of words or phrases.
* Morphological generation. From the output of the structural transfer stage, the target language surface forms are generated.

Transfer types

One of the main features of transfer based machine translation systems is a phase that "transfers" an intermediate representation of the text in the original language to an intermediate representation of text in the target language. This can work at one of two levels of linguistic analysis , or somewhere in between. The levels are:

* Superficial transfer (or syntactic). This level is characterised by transferring "syntactic structures" between the source and target languages. It is suitable for languages in the same family or of the same type, for example in the Romance languages between Spanish, Catalan, French, Italian, etc.
* Deep transfer (or semantic). This level constructs a semantic representation that is dependent on the source language. This representation can consist of a series of structures which represent the meaning. In these transfer systems predicates are typically produced. The translation also typically requires structural transfer. This level is used to translate between more distantly related languages, or languages which have no genetic relationship at all (e.g. Spanish-English or Spanish-Basque, etc.)

* Statistical machine translation


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Dictionary-based machine translation — Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does – word by word, usually without much correlation of meaning between them. Dictionary lookups may be done with or… …   Wikipedia

  • Machine translation — Part of a series on Translation Types Language interpretation …   Wikipedia

  • Interlingual machine translation — is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language independent representation. The target language is then… …   Wikipedia

  • Comparison of machine translation applications — A machine translation application is a program which can translate text or speech from one natural language to another. Machine translation applications are essential to the modern language industry. Please see the individual products articles… …   Wikipedia

  • Translation — For other uses, see Translation (disambiguation). Translator redirects here. For other uses, see Translator (disambiguation). Contents 1 Etymology 2 Theory …   Wikipedia

  • Translation memory — A translation memory, or TM, is a type of database that stores segments that have been previously translated. A translation memory system stores the words, phrases and paragraphs that have already been translated and aid human translators. The… …   Wikipedia

  • Wireless energy transfer — or wireless power is the transmission of electrical energy from a power source to an electrical load without artificial interconnecting conductors. Wireless transmission is useful in cases where interconnecting wires are inconvenient, hazardous,… …   Wikipedia

  • Random access machine — In computer science, random access machine (RAM) is an abstract machine in the general class of register machines. The RAM is very similar to the counter machine but with the added capability of indirect addressing of its registers. Like the… …   Wikipedia

  • Turing machine — For the test of artificial intelligence, see Turing test. For the instrumental rock band, see Turing Machine (band). Turing machine(s) Machina Universal Turing machine Alternating Turing machine Quantum Turing machine Read only Turing machine… …   Wikipedia

  • Turing machine equivalents — Turing machine(s) Machina Universal Turing machine Alternating Turing machine Quantum Turing machine Read only Turing machine Read only right moving Turing Machines Probabilistic Turing machine Multi track Turing machine Turing machine… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”