CMU Pronouncing Dictionary

CMU Pronouncing Dictionary
CMU Pronouncing Dictionary
Developer(s) Carnegie Mellon University
Stable release 0.7a / February 18, 2008; 3 years ago (2008-02-18)
Development status Maintained
Available in English
License Public Domain
Website Homepage

The CMU Pronouncing Dictionary (also known as cmudict) is a public domain pronouncing dictionary created by Carnegie Mellon University (CMU). It is used as the American lexicon for the Festival Speech Synthesis System and also for the CMU Sphinx speech recognition system. The latest release is 0.7a, which contains 133,746 entries (from 123,442 baseforms).

Contents

Database Format

The database is distributed as a text file of the format word <two spaces> pronunciation. If there are multiple pronunciations available for a word, all subsequent entries are followed by an index in parentheses. The pronunciation is encoded using a modified form of the Arpabet system. The difference is stress marks on vowels with levels 0, 1, 2; not all entries have stress however.

History

Version Release date [1]
0.1 16 September 1993
0.2 10 March 1994
0.3 28 September 1994
0.4 8 November 1995
0.5 No public release
0.6 11 August 1998
0.7a 19 February 2008 [2]

Applications

  • The Unifon converter is based on the CMU Pronouncing Dictionary.
  • The Natural Language Toolkit contains an interface to the CMU Pronouncing Dictionary.
  • The Carnegie Mellon Logios[3] tool incorporates the CMU Pronouncing Dictionary.

References

  1. ^ ftp://ftp.cs.cmu.edu/project/speech/dict/
  2. ^ http://sourceforge.net/forum/forum.php?forum_id=787627
  3. ^ https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/logios/

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • General American — is an accent of American English within American English, General American and accents approximating it are contrasted with Southern American English, several Northeastern accents, and other distinct regional accents and social group accents like …   Wikipedia

  • Burma — Republic of the Union of Myanmar ပြည်ထောင်စု သမ္မတ မြန်မာနိုင်ငံတော် Pyidaunzu Thanmăda Myăma Nainngandaw …   Wikipedia

  • Qatar — State of Qatar دولة قطر Dawlat Qaṭar …   Wikipedia

  • George Deukmejian — official portrait 35th Governor of California In office January 3, 1983 – January 7, 1991 …   Wikipedia

  • Dassow — Dassow …   Wikipedia

  • Arpabet — is a phonetic alphabet developed by ARPA as a part of their Speech Understanding Project (1971 1976), as a way to represent phonemes with ASCII characters. It has been used in several speech synthesizers, like SAM for the Commodore 64, Say for… …   Wikipedia

  • Sri Lanka — Ceylon redirects here. For the time period of 1948 1972, see Dominion of Ceylon. For other uses, see Ceylon (disambiguation). Democratic Socialist Republic of Sri Lanka ශ්‍රී ලංකා ප්‍රජාතාන්ත්‍රික සමාජවාදී ජනරජය (Sinhala) இலங்கை சனநாயக சமத்துவ… …   Wikipedia

  • Roman Phonetic Alphabet for English — The Roman Phonetic Alphabet for English is a system based on the Extended Basic Roman spelling of English, augmented with two pairs of stress marks in order to disambiguate homographs and ensure a one to one phoneme grapheme correspondence.L.… …   Wikipedia

  • Scientology — Infobox Organization size = 120px caption = The Scientology Symbol is composed of the letter S that stands for Scientology and the ARC and KRC triangles, two important concepts in Scientology name = Scientology formation = 1953 type = Religious / …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”