Auditory scene analysis

Auditory scene analysis

In psychophysics, auditory scene analysis (ASA) is the process by which the human auditory system organizes sound into perceptually meaningful elements. The term was coined by psychologist Albert Bregman.Bregman, A. S. (1990) Auditory scene analysis. MIT Press: Cambridge, MA] The related concept in machine perception is computational auditory scene analysis (CASA), which is closely related to source separation and blind signal separation.

The three key aspects of Bregman's ASA model are: segmentation, integration, and segregation.


Sound reaches the ear and the eardrum vibrates as a whole. This signal has to be analyzed (in some way). The model proposes that sounds will either be heard as "integrated" (heard as a whole -- much like harmony in music), or "segregated" into individual components (which leads to counterpoint). For example, a bell can be heard as a 'single' sound (integrated), or some people are able to hear the individual components -- they are able to segregate the sound. This can be done with chords where it can be heard as a 'color', or as the individual notes. In many circumstances the segregated elements can be linked together in time, producing an auditory stream. This ability of auditory streaming can be demonstrated by the so-called cocktail party effect. Up to a point, with a number of voices speaking at the same time or with background sounds, one is able to follow a particular voice even though other voices and background sounds are present. In this example, the ear is segregating this voice from other sounds (which are integrated), and the mind "streams" these segregated sounds into an auditory stream. This is a skill which is highly developed by musicians, notably conductors who are able to listen to one, two, three or more instruments at the same time (segregating them), and following each as an independent line through auditory streaming.

Most natural sounds, such as the human voice, musical instruments, or cars passing in the street, are made up of many frequencies, which contribute to the perceived quality (or timbre) of the sounds. When two or more natural sounds occur at once, all the components of the simultaneously active sounds are received at the same time, or overlapped in time, by the ears of listeners. This faces their auditory systems with a problem: Which parts of the sound should be grouped together and treated as parts of the same source or object? Grouping them incorrectly can cause the listener to hear non-existent sounds built from the wrong combinations of the original components.

Grouping and streams

A number of grouping principles appear to underlie ASA, many of which are related to principles of perceptual organization discovered by the school of Gestalt psychology. These can be broadly categorised into sequential grouping cues (those that operate across time -- segregated) and simultaneous grouping cues (those that operate across frequency -- integrated). In addition, schemas (learned patterns) play an important role.

Errors in simultaneous grouping can lead to the blending of sounds that should be heard as separate, the blended sounds having different perceived qualities (such as pitch or timbre) than any of the actually received sounds.

Errors in sequential grouping can lead, for example, to hearing a word created out of syllables originating from two different voices. The job of ASA is to group incoming sensory information to form an accurate mental representation of the environmental sounds

When sounds are grouped by the auditory system into a perceived sequence, distinct from other co-occurring sequences, each of these perceived sequences is called an “auditory stream”. Normally, a stream corresponds to a distinct environmental sound pattern that persists over time, such as a person talking, a piano playing, or a dog barking, but perceptual errors and illusions are possible under unusual circumstances. One example of this is the laboratory phenomenon of "streaming", also called "stream segregation." If two sounds, A and B, are rapidly alternated in time, after a few seconds the perception may seem to “split” so that the listener hears two rather than one stream of sound, each stream corresponding to the repetitions of one of the two sounds, for example, A-A-A-A-, etc. accompanied by B-B-B-B-, etc. The tendency towards segregation into separate streams is favored by differences in the acoustical properties of sounds A and B. Among the differences that favor segregation are those of frequency (for pure tones), fundamental frequency (for rich tones), frequency composition, spatial position, and speed of the sequence (faster sequences segregate more readily).

Experimental basis

Many experiments have studied the segregation of more complex patterns of sound, such as a sequence of high notes of different pitches, interleaved with low ones. In such sequences, the segregation of co-occurring sounds into distinct streams has a profound effect on the way they are heard. Perception of a melody is formed more easily if all its notes fall in the same auditory stream. We tend to hear the rhythms among notes that are in the same stream, excluding those that are in other streams. Judgments of timing are more precise between notes in the same stream than between notes in separate streams. Even perceived spatial location and perceived loudness can be affected by sequential grouping.

While the initial research on this topic was done on human adults, recent studies have shown that some ASA capabilities are present in newborn infants, showing that they are built-in, rather than learned through experience. Other research has shown that non-human animals also display ASA. Currently, scientists are studying the activity of neurons in the auditory regions of the cerebral cortex to discover the mechanisms underlying ASA.


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Computational auditory scene analysis — (CASA) is the study of auditory scene analysis by computational means [1]. In essence, CASA systems are machine listening systems that aim to separate mixtures of sound sources in the same way that human listeners do. CASA differs from the field… …   Wikipedia

  • Auditory — means of or relating to the process of hearing:* Auditory system, the neurological structures and pathways of sound perception. * Sound, the physical signal perceived by the auditory system. * Hearing (sense), is the auditory sense, the sense by… …   Wikipedia

  • Auditory display — is the use of sound to communicate information from a computer to the user (McGookin and Brewster, 2004). An explicit definition which is used unambiguously in scholarly literature does not exist.In auditory display research the possibilities of… …   Wikipedia

  • Source separation — problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. The classical example is the cocktail party problem , where a number of people are… …   Wikipedia

  • Cocktail party effect — The cocktail party effect describes the ability to focus one s listening attention on a single talker among a mixture of conversations and background noises, ignoring other conversations.[1] The effect enables most people to talk in a noisy place …   Wikipedia

  • Psychoacoustics — is the study of subjective human perception of sounds. Alternatively it can be described as the study of the psychological correlates of the physical parameters of acoustics. Background Hearing is not a purely mechanical phenomenon of wave… …   Wikipedia

  • Hearing (sense) — Hearing (or audition) is one of the traditional five senses. It is the ability to perceive sound by detecting vibrations via an organ such as the ear. The inability to hear is called deafness.In humans and other vertebrates, hearing is performed… …   Wikipedia

  • Albert S. Bregman — Albert Bregman Albert S. Bregman est un psychologue et professeur québécois né à Toronto le 15 septembre 1936]. Il détient un B.A. (1957) et un M.A. (1959) de l Université de Toronto ainsi qu un Ph.D. de l Université Yale (1963). Il a ensuite… …   Wikipédia en Français

  • Musical syntax — When analysing the regularities and structure of music as well as the processing of music in the brain, certain findings lead to the question, if music is based on a syntax which could be compared with linguistic syntax. To get closer to this… …   Wikipedia

  • Deutsch's scale illusion — Deutsch s scale illusion. Discovered by Diana Deutsch in 1973, Deutsch s scale illusion is an auditory illusion in which principles of grouping by frequency proximity and spatial location are put into conflict with each other and in which… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”