Speech coding

Speech coding

Speech coding is the application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

The two most important applications of speech coding are mobile telephony and Voice over IP.

The techniques used in speech coding are similar to that in audio data compression and audio coding where knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory system. For example, in narrowband speech coding, only information in the frequency band 400 Hz to 3500 Hz is transmitted but the reconstructed signal is still adequate for intelligibility.

Speech coding differs from other forms of audio coding in that speech is a much simpler signal than most other audio signals, and that there is a lot more statistical information available about the properties of speech. As a result, some auditory information which is relevant in audio coding can be unnecessary in the speech coding context. In speech coding, the most important criterion is preservation of intelligibility and "pleasantness" of speech, with a constrained amount of transmitted data.

It should be emphasised that the intelligibility of speech includes, besides the actual literal content, also speaker identity, emotions, intonation, timbre etc. that are all important for perfect intelligibility. The more abstract concept of pleasantness of degraded speech is a different property than intelligibility, since it is possible that degraded speech is completely intelligible, but subjectively annoying to the listener.

In addition, most speech applications require low coding delay, as long coding delays interfere with speech interaction.

Sample companding viewed as a form of speech coding

From this viewpoint, the A-law and μ-law algorithms used in traditional PCM digital telephony can be seen as a very early precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 bits of resolution. Although this would generate unacceptable distortion in a music signal, the peaky nature of speech waveforms, combined with the simple frequency structure of speech as a periodic waveform with a single fundamental frequency with occasional added noise bursts, make these very simple instantaneous compression algorithms acceptable for speech.

A wide variety of other algorithms were tried at the time, mostly variants on delta modulation, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made them an excellent engineering compromise. Their audio performance remains acceptable, and there has been no need to replace them in the stationary phone network.

Modern speech compression

Much of the later work in speech compression was motivated by military research into digital communications for secure military radios, where very low data rates were required to allow effective operation in a hostile radio environment. At the same time, far more processing power was available, in the form of VLSI integrated circuits, than was available for earlier compression techniques. As a result, modern speech compression algorithms could use far more complex techniques than were available in the 1960s to achieve far higher compression ratios.

These techniques were available through the open research literature to be used for civilian applications, allowing the creation of digital mobile phone networks with substantially higher channel capacities than the analog systems that preceded them.

The most common speech coding scheme is Code Excited Linear Prediction (CELP) coding, which is used for example in the GSM standard. In CELP, the modelling is divided in two stages, a linear predictive stage that models the spectral envelope and code-book based model of the residual of the linear predictive model.

In addition to the actual speech coding of the signal, it is often necessary to use channel coding for transmission, to avoid losses due to transmission errors. Usually, speech coding and channel coding methods have to be chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding, in order to get the best overall coding results.

The Speex project is an attempt to create a free software speech coder, unencumbered by patent restrictions.

Major subfields:
* Wide-band speech coding
**AMR-WB for WCDMA networks
**VMR-WB for CDMA2000 networks
* Narrow-band speech coding
**FNBDT for military applications
**SMV for CDMA networks
**Full Rate, Half Rate, EFR, AMR for GSM networks

See also

* Audio data compression
* Audio signal processing
* Data compression
* Digital signal processing
* Mobile phone
* Pulse-code modulation
* Psychoacoustic model
* Speech processing
* Telecommunication
* Vector quantization
* Vocoder


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Speech processing — is the study of speech signals and the processing methods of these signals. The signals are usually processed in a digital representation whereby speech processing can be seen as the intersection of digital signal processing and natural language… …   Wikipedia

  • Speech Code Theory — refers to a framework for communication in a given speech community. As an academic discipline, it explores the manner in which groups communicate based on societal, cultural, gender, occupational or other factors.A basic definition of speech… …   Wikipedia

  • Speech synthesis — Stephen Hawking is one of the most famous people using speech synthesis to communicate Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented… …   Wikipedia

  • Advanced Audio Coding — AAC redirects here. For other uses, see AAC (disambiguation). Advanced Audio Codings iTunes standard AAC file icon Filename extension .m4a, .m4b, .m4p, .m4v, .m4r, .3gp, .mp4, .aac Internet media type audio/aac, audio/aacp, au …   Wikipedia

  • Linear predictive coding — (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most… …   Wikipedia

  • Harmonic Vector Excitation Coding — Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm used in MPEG 4 standard for very low bit rate speech coding. HVXC supports bitrates of 2 and 4 kbit/s. Technology HVXC is a parametric speech codec, which in… …   Wikipedia

  • Joint source and channel coding — In information theory, joint source–channel coding is the encoding of a redundant information source for transmission over a noisy channel, and the corresponding decoding, using a single code instead of the more conventional steps of source… …   Wikipedia

  • Harmonic Vector Excitation Coding — (HVXC) ist ein auf die Übertragung von Sprache optimiertes Verfahren zur Kompression von Audiodaten. Trotz sehr geringer Bitraten sind HVXC kodierte Daten für die Übermittlung von Sprache in einer akzeptablen Qualität geeignet. Eingesetzt wird es …   Deutsch Wikipedia

  • Linear Predictive Coding — (LPC) ist ein in der Audio Signalverarbeitung und Sprachverarbeitung unter anderem für die Audiodatenkompression und Sprachanalyse verwendetes Verfahren, das mittels Audiosynthese arbeitet. Dabei wird der Stimmtrakt (des Menschen) modellhaft… …   Deutsch Wikipedia

  • Common coding theory — is a cognitive psychology theory describing how perceptual representations (e.g. of things we can see and hear) and motor representations (e.g. of hand actions) are linked. The theory claims that there is a shared representation (a common code)… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”