- Speech analytics
Speech Analytics is a term used to describe automatic methods of analyzing speech to extract useful information about the speech content or the speakers. Although it often includes elements of automatic speech recognition, where the identities of spoken words or phrases are determined, it may also include analysis of one or more of the following:
* the topic(s) being discussed
* the identities of the speaker(s)
* the genders of the speakers
* the emotional character of the speech
* the amount and locations of speech versus non-speech (e.g. background noise or silence)One use of speech analytics applications is to spot spoken keywords or phrases, either as real-time alerts on live audio or as a post-processing step on recorded speech. This technique is also known as
audio mining . Other uses include categorization of speech, for example in the contact center environment, to identify calls from unsatisfied customers. Speech analytics technology may combine results from different techniques to achieve its aims. For example knowledge about where certain keywords were spoken in a customer telephone conversation could be combined with knowledge about which speaker (customer or contact center agent) spoke the words and perhaps knowledge of how often the two speakers were talking at the same time as each other.Speech Analytics in contact centers can be used to extract critical business intelligence that would otherwise be lost. By analyzing and categorizing recorded phone conversations between companies and their customers, useful information can be discovered relating to strategy, product, process, and operational issues. This information gives decision-makers insight into what customers really think about their company so that they can quickly react.
Technology
There are two main approaches "under the hood", phonetic approach and LVCSR. There are some variations on top of these technologies, such as "direct" analysis which is built on top of LVCSR. Some Speech Analytics vendors use the "Engine" of a 3rd party. The biggest names creating core engines today are IBM and Nuance/Scansoft.weasel-inline|date=August 2008 There are however smaller players such as SER and Philips, and there are some Speech Analytics vendors that have developed there own proprietary engine (such as Nexidia).
Phonetic
This is the fastest approach, mostly because the size of the grammar is very small. The basic recognition unit is a phoneme. There are only few tens unique phoneme in most languages, and the output of this recognition is a stream (text) of phonemes. The speed may be an advantage, and also the fact that since it is at its core dealing with phonemes, not words, it may correctly handle any "new" word that is not part of the language (think names, product names etc').
LVCSR
Much slower, since the basic unit is a word, it needs to have hundred of thousands of words to match the audio against. The output however is a stream of words, making it easier to work with.
Quality
The best LVCSR engines may reach about 50% WER (word error rate). This is considered top of the line performance in today's standards.weasel-inline|date=August 2008 Notice however that this means that every second word you read in the output is wrong! Still, it may provide more then enough accuracy for statistical analytics. The quality of the phonetic engines is considerably lower. While hard to compare apples to apples, it is in the range of what would be 20% WER.
ee also
*
Customer intelligence
Wikimedia Foundation. 2010.