List of speech recognition software

List of speech recognition software

Modern Speech recognition software enables a single computer user to speak text and/or commands to the computer, largely, but not entirely, bypassing the use of the keyboard and mouse interface.

The idea has been portrayed in science fiction for many decades, quite frequently depicting computers that do not even have keyboards or mice. Such computers are also typically depicted as being able to keep up no matter how fast a person speaks, and without regard to who the speaker is, the language spoken, or even how many speakers there are. In other words, they're depicting a computer that hears in like manner as a multilingual person.

Attempts to develop usable speech recognition software began in the mid-1900s, and proved to be far more daunting than anyone had imagined. It also turned out to require so much computing power that only the most modern computers are now able to perform the functions required in real time (i.e., as fast as you can speak).

The first commercially practical products became available around 1990, (e.g. the Voice Navigator, a standalone computer dedicated 100% to speech recognition) and used up all the available computing power of the machine, which would send its output to a second computer. They weren't particularly accurate and could only understand a single person at a time, requiring retraining, not of the operator but of the machine itself, to work for another person. Despite these limitations, they could type so rapidly that even after taking time to make corrections, a person with disabilities could easily accomplish more work with the machine than without it. For persons with physical disabilities, the ability to simply talk to your computer could be a priceless asset. Consider for instance, an author with Parkinson's disease who can barely control his hands, yet is conveniently able to create an article.

There are other scenarios where the deficiencies of the equipment are easily outweighed.

Consider a facility where corrosive materials, or high-voltage equipment, are being handled... The massive gloves required for that type of work typically preclude using a keyboard.

Most modern telephones now include voice dialing -- with the simplified requirements associated with voice dialing, it is easily accomplished without training the computer for a specific user.

The current state-of-the-art in 2008 is that a properly trained computer, operated by a normal healthy adult (i.e. no speech impediments), with an Intel Core Duo 1.5 GHz CPU (or faster), can achieve approximately 99% accuracy while transcribing up to about 150 words per minute (while using most of the computing power available). Superficially this might sound very good. Note however, a very stable voice is required. A successful operator, upon developing a nasty head cold, may suddenly find that his machine does not understand him at all. And yet most humans have no trouble at all understanding even in that difficult situation.

Consider for example, the machines do not have enough intelligence yet to properly process a child's voice. Obstacles include the fact that most children don't yet fully understand how language is used (e.g. proper construction of a complete sentence) and as they are growing their voices are continuously changing. (How many times have you had to ask the parents of the youngster what the child said?)

There are now both proprietary and open source systems on the market, with development emphasis being placed upon serving the legal and medical markets.

Free and open source software

* CMU Sphinx — open source under a BSD license
* HTK — copyrighted by Microsoft, but altering the software for the Licensee's internal use is allowed.
* Julius — BSD-style license
* VoxForge — open source, GPL

Proprietary software

* CSLU Toolkit
* Dragon NaturallySpeaking from Nuance Communications is the continuous-speech successor to the older DragonDictate product, and appears to be the focus of all their current development effort (in the dictation area). It won't run on 64-bit operating systems.
* IBM ViaVoice - as it pertains to Linux, Mac OS, and Windows was licensed to Nuance Communications (formerly ScanSoft) a few years ago. Control and development as it pertains to embedded processors remain in the hands of IBM. Functionality is similar to Dragon NaturallySpeaking. ViaVoice is available on Linux and Mac OS X (although these versions are no longer maintained). The Nuance website provides a list of which legacy systems can run the final versions. It is unclear if the Windows version will be updated beyond XP. So far, Nuance is not listing Vista as a recommended system.
* MacSpeech Dictate - Mac OS X speech recognition using the Dragon NaturallySpeaking engine. This replaces MacSpeech's former iListen product which is based on Philips Speech Technology.
* Microsoft Speech API - Speech recognition functionality included as part of Microsoft Office and on Tablet PCs running Microsoft Windows XP Tablet PC Edition. It may also be downloaded as part of the Speech SDK 5.1 for Windows applications, but since that is aimed at developers building speech applications, it lacks any user interface, and thus is unsuitable for end users. Windows Vista includes version 8.0 of the Microsoft speech recognition engine along with a completely new speech experience, known as Windows Speech Recognition.
* Philips SpeechMagic - Market leader within the medical industry according to Frost & Sullivan, Philips SpeechMagic is a recognition engine that may be run either as a stand-alone product or integrated into other applications. [ [http://www.forbes.com/businesswire/feeds/businesswire/2007/12/10/businesswire20071209005015r1.html] ] [ [http://www.frost.com/prod/servlet/press-release.pag?docid=54492494 Philips SpeechMagic named European Technology Leader by Frost & Sullivan ] ]
* Proteus Conversational Interface
* Quack.com (acquired by AOL)
* SpeechWorks
* Tellme Networks

References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Speech recognition — For the human linguistic concept, see Speech perception. The display of the Speech Recognition screensaver on a PC, in which the character responds to questions, e.g. Where are you? or statements, e.g. Hello. Speech recognition (also known as… …   Wikipedia

  • Speech recognition in Linux — There is currently no open source equivalent of proprietary speech recognition software (e.g. Nuances Dragon NaturallySpeaking or Windows Speech Recognition) for Linux. However, there are several incomplete, open source projects and solutions… …   Wikipedia

  • List of open source software packages — This is a list of open source software packages: computer software licensed under an open source license. Software that fits the Free software definition may be more appropriately called free software; the GNU project in particular objects to… …   Wikipedia

  • Windows Speech Recognition — Infobox Software name = Windows Speech Recognition caption = Windows Speech Recognition in Sleep mode developer = Microsoft latest release version = 6.0.6001 latest release date = February 4, 2008 operating system = Microsoft Windows genre =… …   Wikipedia

  • Speech Application Programming Interface — The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date a number of versions of the API have been released, which have… …   Wikipedia

  • List of free and open source software packages — This article is about software free to be modified and distributed. For examples of software free in the monetary sense, see List of freeware. This is a list of free and open source software packages: computer software licensed under free… …   Wikipedia

  • Microsoft Speech API — This article is about the Speech API. For other uses, see SAPI (disambiguation). The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows… …   Wikipedia

  • Julius (software) — Julius is an open source speech recognition engine.Julius is a high performance, two pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech related researchers and developers. Based on word 3 gram and context… …   Wikipedia

  • Software engineering — (SE) is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software.[1] It is the… …   Wikipedia

  • HTK (software) — HTK (Hidden Markov Model Toolkit) is software toolkit for handling HMMs. It is mainly intended for speech recognition, but has been used in many other pattern recognition applications that employ HMMs.ee also* List of speech recognition… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”