- Machine translation software usability
The sections below give objective criteria for evaluating the usability of
machine translation software output.Stationarity or Canonical Form
Do repeated translations converge on a single expression in both languages? I.e. does the translation method show
stationarity or produce acanonical form . In the above example, the translation does become stationary, although the original meaning is lost. SeeRound-trip translation for further discussion. This metric has been criticized as not being well correlated withBilingual Evaluation Understudy scores [Somers, H. (2005) " [http://www.co.umist.ac.uk/~harold/RoundTrip.doc Round-trip Translation: What Is It Good For?] "]Adaptive to colloquialism, argot or slang
Is the system adaptive to
colloquialism ,argot orslang ? TheFrench language has many rules for creating words in the speech and writing ofpopular culture . Two such rules are: (a) The reverse spelling of words such as "femme" to "meuf". (This is calledverlan .) (b) The attachment of the suffix "-ard" to a noun or verb to form a proper noun. For example, the noun "faluche " means "student hat". The word "faluchard" formed from "faluche" colloquially can mean, depending on context, "a group of students", "a gathering of students" and "behavior typical of a student". The Google translator as of 28 December 2006 doesn't derive the constructed words as for example from rule (b), as shown here:Il y a une chorale falucharde mercredi, venez nombreux, les faluchards chantent des paillardes! => "There is a choral society falucharde Wednesday, come many, the faluchards sing loose-living women!"
French argot has three levels of usage: [ [http://chitlinsandcamembert.blogspot.com/2005/10/agony-of-argot.html "The Agony of Argot", Chitlins & Camembert, October 28, 2005] ]
#"familier" or friendly, acceptable among friends, family and peers but not at work
#"grossier" or swear words, acceptable among friends and peers but not at work or in family
#"verlan" or ghetto slang, acceptable among lower classes but not among middle or upper classesThe United States
National Institute of Standards and Technology conducts annual evaluations [http://www.nist.gov/speech/tests/mt/] ofmachine translation systems based on theBLEU -4 criterion [http://www.nist.gov/speech/tests/mt/doc/mt06_evalplan.v4.pdf] . A combined method called IQmt which incorporates BLEU and additional metrics NIST, GTM, ROUGE and METEOR has been implemeneted by Gimenez and Amigo [http://www.lsi.upc.edu/~nlp/IQMT/IQMT.v1.0.pdf] .Well-formed output
Is the output grammatical or
well-formed in the target language? Using an interlingua should be helpful in this regard, because with a fixed interlingua one should be able to write a grammatical mapping to the target language from the interlingua. Consider the followingArabic language input andEnglish language translation result from the Google translator as of 27 December 2006 [http://www.google.com/language_tools?hl=en] . This Google translator output doesn't parse using a reasonableEnglish grammar :وعن حوادث التدافع عند شعيرة رمي الجمرات -التي كثيرا ما يسقط فيها العديد من الضحايا- أشار الأمير نايف إلى إدخال "تحسينات كثيرة في جسر الجمرات ستمنع بإذن الله حدوث أي تزاحم". => And incidents at the push Carbuncles-throwing ritual, which often fall where many of the victims - Prince Nayef pointed to the introduction of "many improvements in bridge Carbuncles God would stop the occurrence of any competing."
Semantics preservation
Do repeated re-translations preserve the
semantics of the original sentence? For example, consider the following English input passed multiple times into and out of French using the Google translator as of 27 December 2006:Better a day earlier than a day late. => "Améliorer un jour plus tôt qu'un jour tard." => To improve one day earlier than a day late. => "Pour améliorer un jour plus tôt qu'un jour tard." => To improve one day earlier than a day late.
Trustworthiness and Security
An interesting peculiarity of [http://www.google.com/translate_t?langpair=en|es Google Translate] as of 24 January 2008 (corrected as of 25 January 2008) is the following result when translating from English to Spanish, which shows an embedded
joke in the English-Spanish dictionary which has some added poignancy given recent events:Heath Ledger is dead => "Tom Cruise está muerto"Life-critical system in which the translation system has input to aSafety Critical Decision Making process. Conjointly it raises the issue of whether in a given use the software of the machine translation system is safe from hackers.It is not known whether this feature of Google Translate was the result of a joke/hack or perhaps an unintended consequence of the use of a method such as
statistical machine translation . Reporters fromCNET Networks asked Google for an explanation on January 24, 2008; Google said only that it was an "internal issue with Google Translate". [ [http://www.news.com/8301-13577_3-9857280-36.html?tag=newsmap "Google Translate bug mixes up Heath Ledger, Tom Cruise", by Caroline McCarthy,CNET Networks , January 24,2008] ] The mistranslation was the subject of much hilarity and speculation on the Internet. [ [http://gawker.com/5002510/tom-cruise-is-spanish-for-heath-ledger '"Tom Cruise" is Spanish for "Heath Ledger"', gawker.com, January 24, 2008] ] [ [http://rayhey2.blogspot.com/2008/01/tom-cruise-est-muerto.html "Tom Cruise está muerto", Ray Leon Blog Project, January 24, 2008] ]If it is an unintended consequence of the use of a method such as
statistical machine translation , and not a joke/hack, then this event is a demonstration of a potential source of critical unreliability in the statistical machine translation method.In human translations, in particular on the part of interpreters, selectivity on the part of the translator in performing a translation is often commented on when one of the two parties being served by the interpreter knows both languages.
This leads to the issue of whether a particular translation could be considered "verifiable". In this case, a converging round-trip translation would be a kind of verification.
Notes
References
* Gimenez, Jesus and Enrique Amigo. (2005) [http://www.lsi.upc.edu/~nlp/IQMT/IQMT.v1.0.pdf IQmt: A framework for machine translation evaluation] .
* NIST. [http://www.nist.gov/speech/tests/mt Annual machine translation system evaluations] and [http://www.nist.gov/speech/tests/mt/doc/mt06_evalplan.v4.pdf evaluation plan] .
* Papineni, Kishore, Salim Roukos, Todd Ward and Wei-Jing Zhu. (2002) BLEU: A Method for automatic evaluation of machine translation. Proc. 40th Annual Meeting of the ACL, July, 2002, pp. 311-318.
ee also
*
Evaluation of machine translation
*Round-trip translation
*Translation
Wikimedia Foundation. 2010.