- Chinese speech synthesis
-
Chinese speech synthesis is the application of speech synthesis to the Chinese language (usually Standard Chinese). It poses additional difficulties due to the Chinese characters (which frequently have different pronunciations in different contexts), the complex prosody which is essential to convey the meaning of words, and sometimes the difficulty in obtaining agreement among native speakers concerning what is the correct pronunciation of certain phonemes.
Contents
Approaches taken
Corpus-based
Anhui USTC iFlyTek Co., Ltd (iFlyTek) published a W3C paper in which they adapted Speech Synthesis Markup Language to produce a mark-up language called Chinese Speech Synthesis Markup Language (CSSML) which can include additional markup to clarify the pronunciation of characters and to add some prosody information.[1] Their synthesiser takes a "corpus-based" approach, which means it can sound very natural in most cases but can err in dealing with unusual phrases if they can't be matched with the corpus. The amount of data involved is not disclosed by iFlyTek but can be seen from the commercial products that iFlyTek have licensed their technology to; for example, Bider's SpeechPlus is a 1.3 Gigabyte download, 1.2 Gigabytes of which is used for the highly-compressed data for a single Chinese voice. iFlyTek's synthesiser can also synthesise mixed Chinese and English text with the same voice (e.g. Chinese sentences containing some English words); they claim their English synthesis to be "average".
The iFlyTek corpus appears to be heavily dependent on Chinese characters, and it is not possible to synthesize from pinyin alone. It is sometimes possible by means of CSSML to add pinyin to the characters to disambiguate between multiple possible pronunciations, but this does not always work.
A corpus-based approach is also taken by Tsinghua University's SinoSonic, with the Harbin voice data taking 800 Megabytes. As of 2007 (and 2011), the download link for SinoSonic has not yet been activated. (Vapourware?)
Concatenation (KeyTip)
A less complex approach is taken by cjkware.com's KeyTip Putonghua Reader, which contains 120 Megabytes of sound recordings (GSM-compressed to 40 Megabytes in the evaluation version), comprising 10,000 multi-syllable dictionary words plus single-syllable recordings in 6 different prosodies (4 tones, neutral tone, and an extra third-tone recording for use at the end of a phrase). These recordings can be concatenated in any desired combination, but the joins sound forced (as is usual for simple concatenation-based speech synthesis) and this can severely affect prosody; the synthesizer is also inflexible in terms of speed and expression. However, because this synthesizer does not rely on a corpus, there is no noticeable degradation in performance when it is given more unusual or awkward phrases.
eSpeak
The lightweight open-source speech project eSpeak, which has its own approach to synthesis, has started experimenting with Chinese synthesis. It was used by Google Translate from May 2010[2] until December 2010[3].
Ekho
Ekho is another open source TTS, which simply concatenates sampled syllables. It currently supports Cantonese, Mandarin, and Korean. Some of the Mandarin syllables have been pitched-normalised in Praat. A modified version of these is used in Gradint's "synthesis from partials".
Online Demos and Bell Labs
There is an online interactive demonstration for NeoSpeech speech synthesis,[4] but it is not possible to customize the Chinese pronunciation by entering pinyin. iFlyTek has two demos available online.[5][6]
Bell Labs have an online Mandarin text-to-speech demo[7] dated 1997, but it is now non-functional (the server that the query is to be submitted to does not exist in the DNS) and the contact email is no longer valid. However, their approach was described in a monograph "Multilingual Text-to-Speech Synthesis: The Bell Labs Approach" (Springer, October 31, 1997, ISBN 978-0792380276), and the former employee who was responsible for the project, Chilin Shih (who now works at the University of Illinois), has some notes about her methods on her website.[8]
Non-Windows systems
The above-mentioned Chinese speech synthesis systems (apart from the online demos) are available only for Windows. However, the spaced-interval repetition language-practice program Gradint includes code and instructions for using KeyTIP and SpeechPlus data on other operating systems, by reading the data directly or using the WINE emulator.
There are some reports[9] that SAPI 5-based speech synthesizers can be run on recent versions of the WINE emulator.
Mac OS had Chinese speech synthesizers available up to version 9. This was removed in Mac OS X. From the release 10.5 (Leopard), the built-in VoiceOver application claims to support third-party Chinese voices,[10] but no Chinese voice is built in to the operating system and Apple does not provide any links to actual Mac OS X Chinese voice products. In 10.7 (Lion), voice packs are automatically downloaded as needed when selected in Speech settings in System Preferences.
Notable approaches not yet taken
As of 2007, it appears that there have been no projects to synthesize Chinese by simulating the human vocal tract, as Gnuspeech is doing for English.[11] Chinese is also not one of the languages being synthesized in the multilingual MBROLA project.
See also
References
- ^ http://www.w3.org/2005/08/SSML/Papers/iFLYTech.pdf
- ^ http://googletranslate.blogspot.com/2010/05/giving-voice-to-more-languages-on.html
- ^ http://googletranslate.blogspot.com/2010/12/listen-to-us-now.html
- ^ http://www.neospeech.com/demo/demo_text.php
- ^ Anhui USTC iFlyTek Co., Ltd Demo
- ^ Anhui USTC iFlyTek Co., Ltd Beta 1.0
- ^ Mandarin TTS
- ^ Home Page: Chilin Shih
- ^ Text To Speech Blog (www.TextToSpeechBlog.com): February 2007
- ^ Apple's VoiceOver page
- ^ [1]
External links
- Anhui USTC iFlyTek Co., Ltd homepage
Categories:- Standard Chinese
- Chinese-language computing
- Computational linguistics
- Artificial intelligence applications
- Speech synthesis
Wikimedia Foundation. 2010.