- IVONA
Infobox Software
name = IVONA
caption = IVONA visualisation
developer = [http://www.ivosoftware.com IVO Software]
released = 2005
frequently_updated = yes
programming language = C/C++
operating system =Cross-platform
language = English / Polish / Romanian / more coming soon
genre = Text-To-Speech
license =Commercial
website = [http://www.ivona.com/ www.ivona.com]IVONA is a multi-lingual
speech synthesis system developed at IVO Software.It offers a fulltext to speech system with variousAPI s.Inside IVONA
IVONA text-to-speech system was described at [http://www.festvox.org/blizzard/blizzard2006.html Blizzard Challenge 2006] . [ [http://festvox.org/blizzard/bc2006/ivo_blizzard2006.pdf Lukasz Osowski & Michal Kaszczuk, IVO Blizzard 2006 Entry. Blizzard Challenge 2006 Workshop.] ] and [http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_010.html Blizzard Challenge 2007] (special version for Blizzard Challenge). [ [http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_010.html Kaszczuk, Michal / Osowski, Lukasz: The IVO software Blizzard 2007 entry: improving Ivona speech synthesis system] ] It is composed of two parts: a
front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called "text normalization", "pre-processing", or "tokenization ". The front-end then assignsphonetic transcription s to each word, and divides and marks the text into prosodic units, likephrase s,clause s, and sentences. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the "synthesizer"—then converts the symbolic linguistic representation into sound.Unit selection synthesis
IVONA uses Unit Selection with Limited Time-scale Modyfication (USLTM) described in [ [http://festvox.org/blizzard/bc2006/ivo_blizzard2006.pdf Lukasz Osowski & Michal Kaszczuk, IVO Blizzard 2006 Entry. Blizzard Challenge 2006 Workshop.] ] . Unit selection synthesis uses large
database s of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following: individualphone s,syllable s,morpheme s,word s,phrase s, and sentences. The division into segments is done using a specially modified speech recognizer. [ [http://www.cs.cmu.edu/~awb/papers/IEEE2002/allthetime/allthetime.html Alan W. Black, Perfect synthesis for all of the people all of the time. IEEE TTS Workshop 2002.] ] An index of the units in the speech database is then created based on the segmentation and acoustic parameters like thefundamental frequency (pitch), duration, position in the syllable, and neighboring phones. Atruntime , the desired target utterance is created by determining the best chain of candidate units from the database (unit selection).Unit selection provides the greatest naturalness, because it applies
digital signal processing (DSP) to the recorded speech only at concatenation points. DSP often makes recorded speech sound less natural and IVONA avoids that. Thanks to that output speech from IVONA TTS is natural.Generated speech quality
IVONA Text To Speech System received the highest Mean Opinion Score (MOS) at the prestigious scientific contest [http://www.festvox.org/blizzard/blizzard2007.html Blizzard Challenge 2007] in Bonn, Germany. The sentences read out by IVONA were evaluated by experts, a group of British and American students and volunteers recruited via the Internet. Average mean opinion score for IVONA was the highest (3.9 points) from all speech synthesizers. A real person’s recording scored 4.7. [ [http://www.festvox.org/blizzard/bc2007/blizzard_2007/full_papers/blz3_003.pdf Clark, Robert A. J. / Podsiadlo, Monika / Fraser, Mark / Mayo, Catherine / King, Simon: "Statistical analysis of the Blizzard Challenge 2007 listening test results" (IVONA is identified as a system with letter P)] ]
IVONA was also evaluated at [http://www.festvox.org/blizzard/blizzard2006.html Blizzard Challenge 2006] in Pittsburgh, USA and received best Mean Opinion Score (MOS) provided by Speech Experts and Undergraduates for full database results. [http://festvox.org/blizzard/bc2006/eval_blizzard2006.pdf Christina L. Bennett and Alan W Black, Blizzard Challenge 2006: Results (IVONA is identified as a system with letter K)] ]
Voices and languages
IVONA speaks in three languages (US-English, Romanian and Polish) and with four voices (Jennifer, Carmen, Ewa, Jacek).
System compatibility
IVONA is compatible with Windows and Unix based systems.
See also
*
Speech synthesis
*Language
*Natural language processing
*Speech processing
*Speech recognition
*List of screen readers References
External links
* [http://www.ivona.com IVONA TTS] on-line.
* See [http://www.youtube.com/watch?v=ir9iLfuWvGY IVONA TTS in action] .
* [http://www.expressivo.com Expressivo] Text Reader application voiced by IVONA TTS.
* Free web service [http://say.expressivo.com say.expressivo.com] - send and publish prompts spoken by IVONA TTS voices.
* [http://www.festvox.org Festvox] - advancing speech synthesis project.
Wikimedia Foundation. 2010.