Articulatory synthesis

Articulatory synthesis

Articulatory synthesis refers to computational techniques for synthesizing speech based on models of the human vocal tract and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech articulators, such as the tongue, jaw, and lips. Speech is created by digitally simulating the flow of air through the representation of the vocal tract.

Mechanical Talking Heads

There is a long history of attempts to build mechanical "talking heads." [] [] .
Gerbert (d. 1003), Albertus Magnus (1198-1280) and Roger Bacon (1214-1294) are all said to have built speaking heads (Wheatstone 1837). However, historically confirmed speech synthesis begins with Wolfgang von Kempelen (1734-1804), who published an account of his research in 1791 (see also Dudley and Tarnoczy 1950).

Electrical Vocal Tract Analogs

The first electrical vocal tract analogs were static, like those of Dunn (1950), Ken Stevens and colleagues (1953), Gunnar Fant (1960). Rosen (1958) built a dynamic vocal tract (DAVO), which Dennis (1963) later attempted to control by computer. Dennis et al. (1964), Hiki et al. (1968) and Baxter and Strong (1969) have also described hardware vocal-tract analogs. Kelly and Lochbaum (1962) made the first computer simulation; later digital computer simulations have been made, e.g. by Nakata and Mitsuoka (1965), Matsui (1968) and Paul Mermelstein (1971). Honda et al. (1968) have made an analog computer simulation.

Haskins and Maeda Models

The first software articulatory synthesizer regularly used for laboratory experiments was developed at Haskins Laboratories in the mid-1970s by Philip Rubin, Tom Baer, and Paul Mermelstein. This synthesizer, known as ASY [] , was a computational model of speech production based on vocal tract models developed at Bell Laboratories in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues. Another popular model that has been frequently used is that of Shinji Maeda, which uses a factor-based approach to control tongue shape.

Modern Models

Recent progress in speech production imaging, articulatory control modeling, and tongue biomechanics modeling has led to changes in the way articulatory synthesis is performed [] . Examples include the Haskins CASY model (Configurable Articulatory Synthesis) [] , designed by Philip Rubin, Mark Tiede [] , and Louis Goldstein [] , which matches midsagittal vocal tracts to actual magnetic resonance imaging (MRI) data, and uses MRI data to construct a 3D model of the vocal tract. A full 3D articulatory synthesis model has been described by Olaf Engwall. The ArtiSynth project [] , headed by Sidney Fels [] at the University of British Columbia, is a 3D biomechanical modeling toolkit for the human vocal tract and upper airway. Biomechanical modeling of articulators such as the tongue has been pioneered by a number of scientists, including Reiner Wilhelms-Tricarico [] , Yohan Payan [] , Pascal Perrier and Jean-Michel Gerard [] , Jianwu Dang and Kiyoshi Honda [] .

Commercial Models

One of the few commercial articulatory speech synthesis systems is the NeXT-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the University of Calgary, where much of the original research was conducted. Following the demise of the various incarnations of NeXT (started by Steve Jobs in the late 1980s and merged with Apple Computer in 1997), the Trillium software was published under a GNU General Public Licence, with work continuing as gnuspeech. The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Rene Carré's "distinctive region model" [] .


* Baxter, Brent, and William J. Strong. (1969). WINDBAG -- a vocal-tract analog speech synthesizer. "Journal of the Acoustical Society of America", 45, 309(A).
* Coker. C. H. (1968). Speech synthesis with a parametric articulatory model. "Proc. Speech. Symp., Kyoto, Japan", paper A-4.
* Coker. C. H. (1976). A model for articulatory dynamics and control. "Proceedings of the IEEE", 64/4, 452-460.
* Coker. C. & Fujimura, O. (1966). Model for the specification of the vocal tract area function. "Journal of the Acoustical Society of America", 40, 1271.
* Dennis, Jack B. (1963). Computer control of an analog vocal tract. "Journal of the Acoustical Society of America", 35, 1115(A).
* Dudley, Homer, and Thomas H. Tarnoczy. (1950). The speaking machine of Wolfgang von Kempelen. "Journal of the Acoustical Society of America", 22, 151-66.
* Dunn, Hugh K. (1950). Calculation of vowel resonances, and an electrical vocal tract. "Journal of the Acoustical Society of America", 22, 740-53.
* Engwall, O. (1998). A 3D vocal tract model for articulatory and visual speech synthesis. Proc. Fonetik 98, The Swedish Phonetics Conference, 196-199.
* Fant, C. Gunnar M. (1960). "Acoustic theory of speech production". The Hague, Mouton.
* Gariel. (1879). Machine parlante de M. Faber. "J. Physique Théorique et Appliquée" 8, 274-5
* Gerard J.M., Wilhelms-Tricarico R., Perrier P. & Payan Y. (2003). A 3D dynamical biomechanical tongue model to study speech motor control. "Recent Research Developments in Biomechanics", Vol.1, pp. 49-64
* Henke, W. L. (1966). Dynamic Articulatory Model of Speech Production Using Computer Simulation. Unpublished doctoral dissertation, MIT, Cambridge, MA.
* Honda, Takashi, Seiichi Inoue, and Yasuo Ogawa. (1968). A hybrid control system of a human vocal tract simulator. "Reports of the 6th International Congress on Acoustics", ed. by Y. Kohasi, pp. 175-8. Tokyo, International Council of Scientific Unions.
* Kelly, John L., and Carol Lochbaum. (1962). Speech synthesis. "Proceedings of the Speech Communications Seminar", paper F7. Stockholm, Speech Transmission Laboratory, Royal Institute of Technology.
* Kempelen, Wolfgang R. Von. (1791). "Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine". Wien, J. B. Degen.
* Maeda, S. (1988). Improved articulatory model. "Journal of the Acoustical Society of America", 84, Sup. 1, S146.
* Maeda, S. (1990). Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In W. J. Hardcastle and A. Marchal (Eds.), "Speech Production and Speech Modelling", Kluwer Academic, Dordrecht, 131-149.
* Matsui, Eiichi. (1968). Computer-simulated vocal organs. "Reports of the 6th International Congress on Acoustics", ed. by Y. Kohasi, pp. 151-4. Tokyo, International Council of Scientific Unions.
* Mermelstein, Paul. (1969). Computer simulation of articulatory activity in speech production. "Proceedings of the International Joint Conference on Artificial Intelligence", Washington, D.C., 1969, ed. by D. E. Walker and L. M. Norton. New York, Gordon & Breach.
* Mermelstein, P. (1973). Articulatory model for the study of speech production. "Journal of the Acoustical Society of America", 53, 1070-1082.
* Nakata, Kazuo, and T. Mitsuoka. (1965). Phonemic transformation and control aspects of synthesis of connected speech. "J. Radio Res. Labs.", 12, 171-86.
* Payan, Y. & Perrier, P. (1997). Synthesis of V-V sequences with a 2D biomechanical tongue model controlled by the Equilibrium Point Hypothesis. "Speech Communication", 22, 185-205.
* Rahim, M., Goodyear, C., Kleijn, W., Schroeter, J., & Sondhi, M. (1993). On the use of neural networks in articulatory speech synthesis. "Journal of the Acoustical Society of America", 93, 1109-1121.
* Rosen, George. (1958). Dynamic analog speech synthesizer. "Journal of the Acoustical Society of America", 30, 201-9
* Rubin, P. E., Baer, T., & Mermelstein, P. (1981). An articulatory synthesizer for perceptual research, "Journal of the Acoustical Society of America", 70, 321-328.
* Rubin, P., Saltzman, E., Goldstein, L., McGowan, R., Tiede, M., & Browman, C. (1996). CASY and extensions to the task-dynamic model. "Proceedings of the 1st ESCA Tutorial and Research Workshop on Speech Producing Modeling - 4th Speech Production Seminar", 125-128.
* Stevens, Kenneth N., S. Kasowski, and C. Gunnar M. Fant. (1953). An electrical analog of the vocal tract. "Journal of the Acoustical Society of America", 25, 734-42.

External links

* [ ArtiSynth]
* [ ASY]
* [ CASY]
* [ From MRI and Acoustic Data to Articulatory Synthesis]
* [ Praat]
* [ Real-time articulatory speech-synthesis-by-rules]
* [ Smithsonian Speech Synthesis History Project (SSSHP) 1986-2002]
* [ Talking Heads]
* [ TractSyn]
* [ Introduction to Articulatory Speech Synthesis]

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Articulatory phonology — [] [ 96/mt post.html] is a linguistic theory originally proposed in 1986 by Catherine Browman [] of… …   Wikipedia

  • Speech synthesis — Stephen Hawking is one of the most famous people using speech synthesis to communicate Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented… …   Wikipedia

  • Concatenative synthesis — is a technique for synthesising sounds by concatenating short samples of recorded sound (called units). The duration of the units is not strictly defined and may vary according to the implementation, roughly in the range of 10 milliseconds up to… …   Wikipedia

  • Speech Synthesis Markup Language — (SSML) (Язык Разметки Синтеза Речи) представляет собой основанный на XML язык разметки для приложений синтеза речи[1]. Он был рекомендован рабочей группой W3C[2]. SSML часто встраивается в сценарии VoiceXML для интерактивных систем телефонии[3].… …   Википедия

  • Sinewave synthesis — is a technique for synthesizing speech by replacing the formants (main bands of energy) with pure tone whistles. The first sinewave synthesis program ( SWS ) for the automatic creation of stimuli for perceptual experiments was developed by Philip …   Wikipedia

  • Sintetizador del habla — Uno o varios wikipedistas están trabajando actualmente en este artículo o sección. Es posible que a causa de ello haya lagunas de contenido o deficiencias de formato. Si quieres, puedes ayudar y editar, pero por favor: antes de realizar… …   Wikipedia Español

  • Haskins Laboratories — [] is an independent, international, multidisciplinary community of researchers conducting basic research on spoken and written language. Founded in 1935 and located in New Haven, Connecticut since 1970, Haskins… …   Wikipedia

  • Philip Rubin — Philip E. Rubin (born May 22 1949, in Newark, New Jersey) is an American cognitive scientist who since 2003 has been the Chief Executive Officer and a Senior Scientist at Haskins Laboratories in New Haven, Connecticut. He is also a Professor… …   Wikipedia

  • Catherine Browman — Catherine P. Browman [] is an American linguist and speech scientist. She was a research scientist at Bell Laboratories in New Jersey and Haskins Laboratories in New Haven, Connecticut, from which she …   Wikipedia

  • Louis M. Goldstein — [] [] is an American linguist and cognitive scientist. He is a professor and chair of the Department of Linguistics and a professor of psychology at… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”