Chinese speech synthesis

Chinese speech synthesis

Chinese speech synthesis is the application of speech synthesis to the Chinese language (usually Standard Chinese). It poses additional difficulties due to the Chinese characters (which frequently have different pronunciations in different contexts), the complex prosody which is essential to convey the meaning of words, and sometimes the difficulty in obtaining agreement among native speakers concerning what is the correct pronunciation of certain phonemes.

Contents

Approaches taken

Corpus-based

Anhui USTC iFlyTek Co., Ltd (iFlyTek) published a W3C paper in which they adapted Speech Synthesis Markup Language to produce a mark-up language called Chinese Speech Synthesis Markup Language (CSSML) which can include additional markup to clarify the pronunciation of characters and to add some prosody information.[1] Their synthesiser takes a "corpus-based" approach, which means it can sound very natural in most cases but can err in dealing with unusual phrases if they can't be matched with the corpus. The amount of data involved is not disclosed by iFlyTek but can be seen from the commercial products that iFlyTek have licensed their technology to; for example, Bider's SpeechPlus is a 1.3 Gigabyte download, 1.2 Gigabytes of which is used for the highly-compressed data for a single Chinese voice. iFlyTek's synthesiser can also synthesise mixed Chinese and English text with the same voice (e.g. Chinese sentences containing some English words); they claim their English synthesis to be "average".

The iFlyTek corpus appears to be heavily dependent on Chinese characters, and it is not possible to synthesize from pinyin alone. It is sometimes possible by means of CSSML to add pinyin to the characters to disambiguate between multiple possible pronunciations, but this does not always work.

A corpus-based approach is also taken by Tsinghua University's SinoSonic, with the Harbin voice data taking 800 Megabytes. As of 2007 (and 2011), the download link for SinoSonic has not yet been activated. (Vapourware?)

Concatenation (KeyTip)

A less complex approach is taken by cjkware.com's KeyTip Putonghua Reader, which contains 120 Megabytes of sound recordings (GSM-compressed to 40 Megabytes in the evaluation version), comprising 10,000 multi-syllable dictionary words plus single-syllable recordings in 6 different prosodies (4 tones, neutral tone, and an extra third-tone recording for use at the end of a phrase). These recordings can be concatenated in any desired combination, but the joins sound forced (as is usual for simple concatenation-based speech synthesis) and this can severely affect prosody; the synthesizer is also inflexible in terms of speed and expression. However, because this synthesizer does not rely on a corpus, there is no noticeable degradation in performance when it is given more unusual or awkward phrases.

eSpeak

The lightweight open-source speech project eSpeak, which has its own approach to synthesis, has started experimenting with Chinese synthesis. It was used by Google Translate from May 2010[2] until December 2010[3].

Ekho

Ekho is another open source TTS, which simply concatenates sampled syllables. It currently supports Cantonese, Mandarin, and Korean. Some of the Mandarin syllables have been pitched-normalised in Praat. A modified version of these is used in Gradint's "synthesis from partials".

Online Demos and Bell Labs

There is an online interactive demonstration for NeoSpeech speech synthesis,[4] but it is not possible to customize the Chinese pronunciation by entering pinyin. iFlyTek has two demos available online.[5][6]

Bell Labs have an online Mandarin text-to-speech demo[7] dated 1997, but it is now non-functional (the server that the query is to be submitted to does not exist in the DNS) and the contact email is no longer valid. However, their approach was described in a monograph "Multilingual Text-to-Speech Synthesis: The Bell Labs Approach" (Springer, October 31, 1997, ISBN 978-0792380276), and the former employee who was responsible for the project, Chilin Shih (who now works at the University of Illinois), has some notes about her methods on her website.[8]

Non-Windows systems

The above-mentioned Chinese speech synthesis systems (apart from the online demos) are available only for Windows. However, the spaced-interval repetition language-practice program Gradint includes code and instructions for using KeyTIP and SpeechPlus data on other operating systems, by reading the data directly or using the WINE emulator.

There are some reports[9] that SAPI 5-based speech synthesizers can be run on recent versions of the WINE emulator.

Mac OS had Chinese speech synthesizers available up to version 9. This was removed in Mac OS X. From the release 10.5 (Leopard), the built-in VoiceOver application claims to support third-party Chinese voices,[10] but no Chinese voice is built in to the operating system and Apple does not provide any links to actual Mac OS X Chinese voice products. In 10.7 (Lion), voice packs are automatically downloaded as needed when selected in Speech settings in System Preferences.

Notable approaches not yet taken

As of 2007, it appears that there have been no projects to synthesize Chinese by simulating the human vocal tract, as Gnuspeech is doing for English.[11] Chinese is also not one of the languages being synthesized in the multilingual MBROLA project.

See also

References

External links


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Speech synthesis — Stephen Hawking is one of the most famous people using speech synthesis to communicate Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented… …   Wikipedia

  • speech — /speech/, n. 1. the faculty or power of speaking; oral communication; ability to express one s thoughts and emotions by speech sounds and gesture: Losing her speech made her feel isolated from humanity. 2. the act of speaking: He expresses… …   Universalium

  • Speech Application Programming Interface — The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date a number of versions of the API have been released, which have… …   Wikipedia

  • Standard Chinese — For other uses, see Standard Chinese (disambiguation). Standard Chinese 普通話 / 普通话 Pǔtōnghuà 國語 / 国语 Guóyǔ 標準華語 / 标准华语 Biāozhǔn Huáyǔ 現代標準漢語 / 现代标准汉语 Xiàndài Biāozhǔn Hànyǔ Spoken in People s Republic of China, Republic of China (Taiwan),… …   Wikipedia

  • Chinese translation theory — was born out of contact with vassal states during the Zhou Dynasty. It developed through translations of Buddhist scripture into Chinese. It is a response to the universals of the experience of translation and to the specifics of the experience… …   Wikipedia

  • Chinese Translation Theory — was born out of contact with vassal states during the Zhou Dynasty. It developed through translations of Buddhist scripture into Chinese. It is a response to the universals of the experience of translation and to the specifics of the experience… …   Wikipedia

  • Chinese writing — Introduction       basically logographic writing system, one of the world s great writing systems.       Like Semitic writing in the West, Chinese script was fundamental to the writing systems in the East. Until relatively recently, Chinese… …   Universalium

  • Microsoft Speech API — This article is about the Speech API. For other uses, see SAPI (disambiguation). The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows… …   Wikipedia

  • Microsoft text-to-speech voices — The Microsoft text to speech voices are speech synthesizers provided for use with applications that use the Microsoft Speech API (SAPI). Microsoft Sam is the default text to speech male voice in Microsoft Windows 2000 and Windows XP. It is used… …   Wikipedia

  • Comparison of speech synthesizers — Here is a non exhaustive comparison of speech synthesis programs : Creator(s) First public release date Latest stable version Software license Cost Apple PlainTalk Apple Inc. 1984 2007, October 26 Bundled with Mac OS X Bundled AT T Natural… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”