- Asia Online
Asia Online is a
Thailand –based company undertaking what it calls the world's largest literacy project by translating vast quantities of the worlds English language knowledge into Asian languages. This is achieved usingstatistical machine translation (SMT) technologies developed and enhanced in Thailand with a specific focus on Asian languages.It was founded in 2006 by the
University of Edinburgh 's Philipp Koehn, Gregory Binger a leading technoligist and IT/IP lawyer, and formerGartner senior analysts Bob Hayward and Dion Wiggins.Asia Online’s statistically-based translation software is an instance of a recent advance in automated translation. While earlier machine translation technology relied on collections of linguistic rules to analyze the source sentence, and then map the syntactic and semantic structure into the target language, Asia Online uses statistical techniques from
cryptography , applyingmachine learning algorithms that automatically acquire statistical models from existing parallel collections of human translations.Until early 2008, Google, Microsoft and
Language Weaver had publicly available SMT systems. Asia Online claims there are flaws in the existing processes and techniques of SMT and worked to resolve these issues. It claims three key differences from traditional SMT approaches:
* Clean data - The traditional approach leveraged content found on the web in corporate sites, news articles and other similar sources where the same content was available in multiple languages. The quality of the data was very low. Asia Online has focussed machine and human resources in this area to ensure that the data is as clean and as accurate as possible. Data is sourced from high quality translations provided by book publishers and translation companies and is aligned at the segment level (usually sentences) and converted into a consistent format in order to be processed by the learning software. This step includes:
**Extracting segments from files and documents if they are not in a TMX format.
**Aligning segments (if necessary) once they have been extracted. While this is automated by machines, humans are also used to validate the accuracy.
**Converting data to a baseUTF-8 encoding for training the SMT system.
**Extracting small subsets from the data to guide training.
**Reviewing, cleaning and analyzing the data to ensure optimal training impact.* Multiple Domains - Extensive efforts have been put into a system that allows for training in many domains. This is done by extending a base set of information with multiple additional learning sources.
* Real Time Corrections
* Languages Available - Asia Online currently has 203 language pairs available in a baseline form and several with domain data. These systems are currently used to build customized translation systems for corporate and language service provider (LSP) customers who add their bilingual parallel corpus to the existing data to create higher quality translation systems. These available languages include English, French, Italian, German, Spanish, Portuguese, Dutch, Swedish, Danish, Greek, Finnish, Thai, Simplified Chinese and Hindi.
Asia Online is also building SMT systems for English to Indonesian, Malay, Vietnamese, Tagalog, Traditional Chinese, Japanese and Korean.
ee also
Google Translate External links
* [http://www.asiaonline.net/ Company Homepage]
Wikimedia Foundation. 2010.