Acoustic Model

Acoustic Model

An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech.


Speech recognition engines require two types of files to recognize speech. They require an acoustic model, which is created by taking audio recordings of speech and their transcriptions (taken from a speech corpus), and 'compiling' them into a statistical representations of the sounds that make up each word (through a process called 'training'). They also require a language model or grammar file. A language model is a file containing the probabilities of sequences of words. A grammar is a much smaller file containing sets of predefined combinations of words. Language models are used for dictation applications, whereas grammars are used in desktop command and control or telephony interactive voice response (IVR) type applications.

Speech Audio Characteristics

Audio can be encoded at different sampling rates (i.e. samples per second - the most common being: 8 kHz, 16 kHz, 32 kHz, 44.1 kHz, 48 kHz and 96 kHz), and different bits per sample (the most common being: 8-bits, 16-bits or 32-bits). Speech recognition engines work best if the acoustic model they use was trained with speech audio which was recorded at the same sampling rate/bits per sample as the speech being recognized.

Telephony-based Speech Recognition

The limiting factor for telephony based speech recognition is the bandwidth at which speech can be transmitted. For example, your standard land-line telephone only has a bandwidth of 64 kbit/s at a sampling rate of 8 kHz and 8-bits per sample (8000 samples per second * 8-bits per sample = 64000 bit/s). Therefore, for telephony based speech recognition, you need acoustic models trained with 8 kHz/8-bit speech audio files.

In the case of Voice over IP, the codec determines the sampling rate/bits per sample of speech transmission. If you use a codec with a higher sampling rate/bits per sample for speech transmission (to improve the sound quality), then your acoustic model must be trained with audio data that matches that sampling rate/bits per sample.

Desktop-based Speech Recognition

For speech recognition on a standard desktop PC, the limiting factor is the sound card. Most sound cards today can record at sampling rates of between 16 kHz-48 kHz of audio, with bit rates of 8 to 16-bits per sample, and playback at up to 96 kHz.

As a general rule, a speech recognition engine works better with acoustic models trained with speech audio data recorded at higher sampling rates/bits per sample. But using audio with too high a sampling rate/bits per sample can slow the recognition engine down. A compromise is needed. Thus for desktop speech recognition, the current standard is acoustic models trained with speech audio data recorded at sampling rates of 16 kHz/16bits per sample.

External links

* [ Acoustic models] (last modified: November 25, 2002) from CMU Sphinx
* [ Japanese acoustic models] for the use with Julius
* [ open source acoustic models] at VoxForge

Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Acoustic Research — was a Cambridge, Massachusetts based company that manufactured high end audio equipment. The brand is now owned by Audiovox. Acoustic Research was well known for the AR 3 series of speaker systems, which used the 12 inch (305 mm) acoustic… …   Wikipedia

  • Acoustic Control Corporation — was a manufacturer of instrument amplifiers, founded by Steve Marks (with the help of his father) and based in Van Nuys, California. Its original location was a shack on Sunset Boulevard in Los Angeles, California.Most of the amplifiers produced… …   Wikipedia

  • Acoustic phonetics — is a subfield of phonetics which deals with acoustic aspects of speech sounds. Acoustic phonetics investigates properties like the mean squared amplitude of a waveform, its duration, its fundamental frequency, or other properties of its frequency …   Wikipedia

  • Acoustic theory — is the field relating to mathematical description of sound waves. It is derived from fluid dynamics. See acoustics for the engineering approach.The propagation of sound waves in a fluid (such as air) can be modeled by an equation of motion… …   Wikipedia

  • Acoustic Control Induction System — or ACIS, is an implementation of a Variable Length Intake Manifold system designed by Toyota.Simply put, the ACIS system uses a single intake air control valve located in the intake to vary the length of the intake tract in order to optimize pow …   Wikipedia

  • Acoustic landmarks and distinctive features — Kenneth N. Stevens and his colleagues at MIT proposed a model of speech perception that is called acoustic landmarks and distinctive features .In this model, the incoming acoustic signal is believed to be first processed to determine the so alled …   Wikipedia

  • Acoustic metric — In mathematical physics, a metric describes the arrangement of relative distances within a surface or volume, usually measured by signals passing through the region – essentially describing the intrinsic geometry of the region. An acoustic metric …   Wikipedia

  • Acoustic transmission lines — An acoustic transmission line is the acoustic analog of the electrical transmission line, typically thought of as a rigid walled tube that is long and thin relative to the wavelength of sound present in it. Pipe organs, woodwinds, and the like… …   Wikipedia

  • Model 102 telephone — A Western Electric model 102 telephone with B1 base and E1 handset with early spitcup mouthpiece, refitted with new cords and modular plug. The Model 102 telephone (B1 mount/set) was Western Electric s first widely distributed telephone set to… …   Wikipedia

  • Model 202 telephone — Western Electric 202 The Model 202 telephone (D1 mount/set) was a desktop telephone produced by Western Electric from 1930 through 1936. It was a modified version of the Model 102, and contained newly created anti sidetone circuitry to prevent… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”