SpeechWeb

SpeechWeb

A SpeechWeb is a collection of hyperlinked speech applications, accessed remotely by speech browsers running on end-user devices. Links are activated through spoken commands.

The idea of surfing the web by voice dates back to at least the work of Hemphill and Thrift in 1995 Hemphill, C.T. and Thrift, P. R. "Surfing the Web by Voice " "Proceedings of the third ACM International Multimedia Conference (San Francisco 1995)", Year: 1995, Pages: 215 – 222.] who developed a system in which, HTML pages were downloaded and processed on client-side computers enabling voice access to web page content, and activation of hyperlinks through spoken commands.

Also in the mid 90’s, researchers at AT&T were discussing the development of a new markup language that would enable the web to be accessed through regular phones. From 1995 to 1999, AT&T, Lucent, Motorola, and IBM all developed their own versions of phone and speech markup languages. These companies created the [http://www.voicexml.org/ VoiceXML Forum] , and jointly designed the Voice Markup Language, VXML, which was accepted by the W3C Committee in 2000. VXML is typically used to create hyperlinked speech applications Lucas, B."VoiceXML for Web-based distributed conversational applications." "Commun. ACM 43, 9," Year: 2000, Pages: 53 – 57.] . VXML pages include commands for prompting user speech input, invoking recognition grammars, outputting synthesized voice, iterating through blocks of code, calling local JavaScript, and hyperlinking to other remote VXML pages downloaded in a manner similar to the linking of HTML pages in the conventional Web.

Around the same time as the emergence of VXML, a [http://www.myspeechweb.org research group] at the University of Windsor in Canada were developing an alternative approach in which speech applications deployed on the web can be accessed by client-side speech browsers which provide the speech-recognition capability, that is tailored to the application by downloading an application-specific recognition grammar from the remote speech application web site. Input that is recognized by the client-side browser is sent to the remote server which processes it and returns a text result to the browsers for output as synthesized voice. The term SpeechWeb was used, in 1999 Frost, R. A. and Chitte, S. "A New Approach for Providing Natural-Language Speech Access to Large Knowledge Bases" "Proc. of PACLING ’99, The Conference of the Pacific Association for Computational Linguistics, University of Waterloo, Ontario, Canada" Year: 1999, Pages: 82 – 90.] , to describe the collection of hyperlinked speech applications in this architecture . The first SpeechWeb browser was demonstrated at the AAAI Sixteenth National Conference on Artificial Intelligence Frost, R. A. "A Natural-Language Speech Interface Constructed Entirely as a Set of Executable Specifications." "Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, Orlando, Florida, USA." Year: 1999, Pages: 908 - 909.] .

The term "speechweb" has also been [http://www.SpeechWeb.org used] , since the 90’s, in a different context to describe a web based network of information on speech, language and speech-language pathology. In addition, it was also hoped to provide a meeting place for professionals and those who have been affected by communication disorders. The term "speechWeb™" has been trademarked by the company PipeBeach, which is now owned by HP, and refers to a software product which bridges telephone networks and conventional web servers.

In 2005 it was recognized that very few voice applications were available to the public through the Internet, despite the maturity of VXML at that time. It was also observed that nearly all VXML applications that were available had been constructed by people working in commerce and industry. This was in stark contrast to the huge growth of the conventional web, and the huge involvement of the public in the development of regular web pages, only a few years after the development of HTML. This observation led to the [http://portal.acm.org/citation.cfm?id=1096000.1096003 call for a Public-Domain SpeechWeb] Frost, R. A. "A [http://portal.acm.org/citation.cfm?id=1096000.1096003 call for a public-domain SpeechWeb] ." "Commun. ACM 48, 11," Year: 2005, Pages: 45 – 49.] which is accessible to the public through existing web browsers (with speech plugins) and which contains hyperlinked speech applications that are created and deployed by the public in a manner that is analogous to the creation and deployment of HTML pages on the conventional web. A [http://www2007.org/posters/poster927.pdf browser for the Public-Domain SpeechWeb] was demonstrated at the 16th International World Wide Web Conference, held in Banff, Canada in 2007 Frost, R. A., Ma, X. and Shi, Y. " [http://www2007.org/posters/poster927.pdf A browser for a public-domain SpeechWeb] ." "World Wide Web Conference,Banff, Canada" Year: 2007, Pages: 1307 - 1308.] . The browser is a small X+V page which is executed by the freely available [http://www.opera.com/ Opera] with the free IBM speech-recognition plugin.

Two research groups are developing software to facilitate the construction and deployment of SpeechWeb applications by non-experts:

* The [http://www.myspeechweb.org "MySpeechWeb"] research group at the University of Windsor has developed documentation and software to facilitate for people who want to access and/or create SpeechWeb applications. The group has also created a prototype Public-Domain SpeechWeb containing examples of [http://cs.uwindsor.ca/~speechweb/p_d_speechweb.html speech applications] which are available through a portal.
* The [http://w3voice.jp/skeleton/ "w3voice skeleton"] research group at the Auditory Media Laboratory, Wakayama University in Japan has created software that facilitates the construction and deployment of speech applications for the Japanese language.

References

External links

* [http://www.myspeechweb.org MySpeechWeb] - research group at the University of Windsor
* [http://davinci.newcs.uwindsor.ca/~speechweb/movie.mov Video demonstration of Public Domain SpeechWeb]


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • XHTML+Voice — (commonly X+V) is an XML language for describing multimodal user interfaces. The two essential modalities are visual and auditory. Visual interaction is defined like most current web pages via XHTML. Auditory components are defined by a subset of …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”