Viseme

Viseme

A viseme is a supposed basic unit of speech in the visual domain. The term "viseme" was introduced based on the interpretation of the "phoneme" as a basic unit of speech in the acoustic/auditory domain, (Fisher, 1968). This is, however, at variance with the accepted definition of the phoneme as the smallest structural unit that distinguishes meaning within a given language - as a cognitive abstraction that is not bound to any sensory modality.

A "viseme" describes the particular facial and oral positions and movements that occur alongside the voicing of phonemes. The analogous term for the acoustic reflection of a phoneme would be "audieme", but this is not in use.

Phonemes and visemes do not always share a one-to-one correspondence; often, several phonemes share the same viseme. In other words, several phonemes look the same on the face when produced, such as IPA|/k/, /g/, /ŋ/, (viseme: /k/), or IPA|/ʧ/, /ʃ/, /ʤ/, /ʒ/ (viseme: /ch/). However, there could be differences in timing and duration during actual speech in terms of the visual 'signature' of a given gesture that can not be captured with a single photograph. Conversely, some sounds which are hard to distinguish acoustically are clearly distinguished by the face (Chen 2001). For example, acoustically speaking English /l/ and /r/ could be quite similar (especially in clusters, such as 'grass' vs. 'glass'). Yet visual information can show a clear contrast. This is demonstrated by the more frequent mishearing of words on the telephone than in person. Some linguists have argued that speech is best understood as bimodal (aural and visual), and comprehension can be compromised if one of these two domains is absent (McGurk and MacDonald 1976). The comprehension of speech by visemes alone is known as speechreading or "lip reading".

Applications for the study of visemes includes speech processing, speech recognition and computer facial animation.

References

* Chen, T. (1998, May). "Audio-visual integration in multi-modal communication." Proceedings of the IEEE 86, 837–852.
* Chen, T. (2001). "Audiovisual speech processing." IEEE Signal Processing Magazine, 9–31.
* Fisher, C.G. (1968). "Confusions among visually perceived consonants." Journal of Speech and Hearing Research, 11(4):796–804.
* McGurk, H. and J. MacDonald (1976, December). "Hearing lips and seeing voices." Nature, 746–748.
* Patrick Lucey, Terrence Martin and Sridha Sridharan. 2004. "Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments". Presented at Tenth Australian International Conference on Speech Science & Technology, Macquarie University, Sydney, 8th-10th December, 2004. [http://www.assta.org/sst/2004/proceedings/papers/sst2004-377.pdf Article online] (PDF document)


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Computer facial animation — is primarily an area of computer graphics that encapsulates models and techniques for generating and animating images of the human head and face. Due to its subject and output type, it is also related to many other scientific and artistic fields… …   Wikipedia

  • Lip reading — Lip reading, also known as lipreading, speech reading, or speechreading, is a technique of understanding speech by visually interpreting the movements of the lips, face and tongue with information provided by the context, language, and any… …   Wikipedia

  • Phoneme — This article is about the speech unit. For the JavaME library, see phoneME. In a language or dialect, a phoneme (from the Greek: φώνημα, phōnēma, a sound uttered ) is the smallest segmental unit of sound employed to form meaningful contrasts… …   Wikipedia

  • McGurk effect — The McGurk effect is a perceptual phenomenon which demonstrates an interaction between hearing and vision in speech perception. It is a compelling illusion in which humans perceive mismatched audiovisual speech as a completely different… …   Wikipedia

  • Mundbild — bezeichnet im Bereich der Gehörlosenpädagogik und Schwerhörigenpädagogik die visuell wahrnehmbare Haltung des unteren Gesichtsbereichs und der Lippen bei der Produktion von Worten der gesprochenen Sprache. Bei der Sprachproduktion des Menschen… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”