Non-native speech database

A non-native speech database is a speech database of non-native pronunciations of English. Such databases are essential for the ongoing development of multilingual automatic speech recognition systems, text to speech systems, pronunciation trainers or even fully featured second language learning systems. Because of the comparably small size of the databases, however, many of them are not available through the common distributors of speech databases. This leads to the fact that it is hard for researchers in speech recognition to keep an overview of what kind of databases have already been collected, and for what purposes there are still no collections.

This article is based on a paper from the ASRU speech conference #38. The paper wanted to provide a useful resource regarding the issue above. This online article is intended to provide a place where information about non-native speech databases can be updated continuously by the speech research community.

Legend

In the table of non-native databases some abbreviations for language names are used. They are listed in Table 1. Table 2 gives the following information about each corpus: The name of the corpus, the institution where the corpus can be obtained, or at least further information should be available, the language which was actually spoken by the speakers, the number of speakers, the native language of the speakers, the total amount of non-native utterances the corpus contains, the duration in hours of the non-native part, the date of the first public reference to this corpus, some free text highlighting special aspects of this database and a reference to another publication. The reference in the last field is in most cases to the paper which is especially devoted to describe this corpus by the original collectors. In some cases it was not possible to identify such a paper. In these cases a paper is referenced which is using this corpus is.

Some entries are left blank and others are marked with unknown. The difference here is that blank entries refer to attributes where the value is just not known. Unknown entries, however, indicate that no information about this attribute is available in the database itself. As an example, in the Jupiter weather database #1 no information about the origin of the speakers is given. Therefore this data would be less useful for verifying accent detection or similar issues.

Where possible, the name is a standard name of the corpus, for some of the smaller corpora, however, there was no established name and hence an identifier had to be created. In such cases, a combination of the institution and the collector of the database is used.

In the case where the databases contain native and non-native speech, only attributes of the non-native part of the corpus are listed. Most of the corpora are collections of read speech. If the corpus instead consists either partly or completely of spontaneous utterances, this is mentioned in the Specials column.

Overview of non-native databases

Table 1: Abbreviations for languages used in Table 2

Arabic	A	Japanese	J
Chinese	C	Korean	K
Czech	Cze	Malaysian	M
Danish	D	Norwegian	N
Dutch	Dut	Portuguese	P
English	E	Russian	R
French	F	Spanish	S
German	G	Swedish	Swe
Greek	Gre	Thai	T
Indonesian	Ind	Vietnamese	V
Italian	I

The actual table with information about the different databases is shown in Table 2.

Table 2: Overview of non-native Databases

Corpus	Author	Available at	Language(s)	#Speakers	native Language	#Utt.	Duration	Date	Specials	Reference
AMI		EU	E		Dut and other		100h		meeting recordings	#40
ATR-Gruhn	Gruhn	ATR	E	96	C G F J Ind	15000		2004	proficiency rating	#4
BAS Strange Corpus I+II		ELRA	G	139	50 countries	7500		1998		#5
Berkeley Restaurant		ICSI	E	55	G I H C F S J	2500		1994		#41
Broadcast News		LDC	E					1997		#6
Cambridge-Witt	Witt	U. Cambridge	E	10	J I K S	1200		1999		#7
Cambridge-Ye	Ye	U. Cambridge	E	20	C	1600		2005		#8
Children News	Tomokiyo	CMU	E	62	J C	7500		2000	partly spontaneous	#6
CLIPS-IMAG	Tan	CLIPS-IMAG	F	15	C V		6h	2006		#3
CLSU		LDC	E		22 countries	5000		2007	telephone, spontaneous	#9
CMU		CMU	E	64	G	452	0.9h		not available	#10
Cross Towns	Schaden	U. Bochum	E F G I Cze Dut	161	E F G I S	72000	133h	2006	city names	#11
Duke-Arslan	Arslan	Duke University	E	93	15 countries	2200		1995	partly telephone speech	#12
ERJ	Minematsu	U. Tokyo	E	200	J	68000		2002	proficiency rating	#13
Fischer		LDC	E		many		200h		telephone speech	#39
Fitt	Fitt	U. Edinburgh	F I N Gre	10	E	700		1995	city names	#14
Fraenki		U. Erlangen	E	19	G	2148				#15
Hispanic	Byrne		E	22	S		20h	1998	partly spontaneous	#16
IBM-Fischer		IBM	E	40	S F G I	2000		2002	digits	#17
ISLE	Atwell	EU/ELDA	E	46	G I	4000	18h	2000		#18
Jupiter	Zue	MIT	E	unknown	unknown	5146		1999	telephone speech	#1
K-SEC	Rhee	SiTEC	E	unknown	K			2004		#42
LDC WSJ1		LDC		10		800	1h	1994		#6
MIST		ELRA	E F G	75	Dut	2200		1996		#19
NATO HIWIRE		NATO	E	81	F Gre I S	8100		2007	clean speech	#2
NATO M-ATC	Pigeon	NATO	E	622	F G I S	9833	17h	2007	heavy background noise	#20
NATO N4		NATO	E	115	unknown		7.5h	2006	heavy background noise	#21
Onomastica			D Dut E F G Gre I N P S Swe			(121000)		1995	only lexicon	#22
PF-STAR		U. Erlangen	E	57	G	4627	3.4h	2005	children speech	#23
Sunstar		EU	E	100	G S I P D	40000		1992	parliament speech	#24
TC-STAR	Heuvel	ELDA	E S	unknown	EU countries		13h	2006	multiple data sets	#25
TED	Lamel	ELDA	E	40(188)	many		10h(47h)	1994	eurospeech 93	#26
TLTS		DARPA	A		E		1h	2004		#27
Tokyo-Kikuko		U. Tokyo	J	140	10 countries	35000		2004	proficiency rating	#28
Verbmobil		U. Munich	E	44	G		1.5h	1994	very spontaneous	#29
VODIS		EU	F G	178	F G	2500		1998	about car navigation	#30
WP Arabic	Rocca	LDC	A	35	E	800	1h	2002		#31
WP Russian	Rocca	LDC	R	26	E	2500	2h	2003		#32
WP Spanish	Morgan	LDC	S		E			2006		#33
WSJ Spoke			E	10	unknown	800		1993		#34

References

1: K. Livescu,
``Analysis and modeling of non-native speech for automatic speech recognition,''
M.S. thesis, Massachusetts Institute of Technology, Cambridge, MA, 1999.
2: J.C. Segura et al.,
``The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication,'' 2007,
http://www.hiwire.org/.
3: T. P. Tan and L. Besacier,
``A French non-native corpus for automatic speech recognition,''
in LREC, Genoa, Italy, 2006.
4: R. Gruhn, T. Cincarek, and S. Nakamura,
``A multi-accent non-native English database,''
in ASJ, 2004.
5: University Munich,
``Bavarian archive for speech signals strange corpus,'' http://www.phonetik.uni-muenchen.de/Bas/.
6: L. Tomokiyo,
Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition,
Ph.D. thesis, Carnegie Mellon University, Pennsylvania, 2001.
7: S. Witt,
Use of Speech Recognition in Computer-Assisted Language Learning,
Ph.D. thesis, Cambridge University Engineering Department, UK, 1999.
8: H. Ye and S. Young,
``Improving the speech recognition performance of beginners in spoken conversational interaction for language learning,''
in Proc. Interspeech, Lisbon, Portugal, 2005.
9: T. Lander,
``CSLU: Foreign accented English release 1.2,''
Tech. Rep., LDC, Philadelphia, Pennsylvania, 2007.
10: Z. Wang, T. Schultz, and A. Waibel,
``Comparison of acoustic model adaptation techniques on non-native speech,''
in Proc. ICASSP, 2003.
11: S. Schaden,
Regelbasierte Modellierung fremdsprachlich akzentbehafteter Aussprachevarianten,
Ph.D. thesis, University Duisburg-Essen, 2006.
12: L. M. Arslan and J. H. Hansen,
``Frequency characteristics of foreign accented speech,''
in Proc. of ICASSP, Munich, Germany, 1997, pp. 1123-1126.
13: N. Minematsu et al.,
``Development of English speech database read by Japanese to support CALL research,''
in ICA, Kyoto, Japan, 2004, pp. 577-560.
14: S. Fitt,
``The pronunciation of unfamiliar native and non-native town names,''
in Proc. of Eurospeech, 1995, pp. 2227-2230.
15: G. Stemmer, E. Noeth, and H. Niemann,
``Acoustic modeling of foreign words in a German speech recognition system,''
in Proc. Eurospeech, P. Dalsgaard, B. Lindberg, and H. Benner, Eds., 2001, vol. 4, pp. 2745-2748.
16: W. Byrne, E. Knodt, S. Khudanpur, and J. Bernstein,
``Is automatic speech recognition ready for non-native speech? A data-collection effort and initial experiments in modeling conversational Hispanic English,''
in STiLL, Marholmen, Sweden, 1998, pp. 37-40.
17: V. Fischer, E. Janke, and S. Kunzmann,
``Recent progress in the decoding of non-native speech with multilingual acoustic models,''
in Proc. of Eurospeech, 2003, pp. 3105-3108.
18: W. Menzel, E. Atwell, P. Bonaventura, D. Herron, P. Howarth, R. Morton, and C. Souter,
``The ISLE corpus of non-native spoken English,''
in LREC, Athens, Greece, 2000, pp. 957-963.
19: TNO Human Factors Research Institute,
``Mist multi-lingual interoperability in speech technology database,''
Tech. Rep., ELRA, Paris, France, 2007,
ELRA Catalog Reference S0238.
20: S. Pigeon, W. Shen, and D. van Leeuwen,
``Design and characterization of the non-native military air traffic communications database,''
in ICSLP, Antwerp, Belgium, 2007.
21: L. Benarousse et al.,
``The NATO native and non-native (n4) speech corpus,''
in Proc. of the MIST workshop (ESCA-NATO), Leusden, Sep 1999.
22: Onomastica Consortium,
``The ONOMASTICA interlanguage pronunciation lexicon,''
in Proc. Eurospeech, Madrid, Spain, 1995, pp. 829-832.
23: C. Hacker, T. Cincarek, A. Maier, A. Hessler, and E. Noeth,
``Boosting of prosodic and pronunciation features to detect mispronunciations of non-native children,''
in Proc. of ICASSP, Honolulu, Hawai, 2007, pp. 197-200.
24: C. Teixeira, I. Trancoso, and A. Serralheiro,
``Recognition of non-native accents,''
in Proc. Eurospeech, Rhodes, Greece, 1997, pp. 2375-2378.
25: H. Heuvel, K. Choukri, C. Gollan, A. Moreno, and D. Mostefa,
``TC-STAR: New language resources for ASR and SLT purposes,''
in LREC, Genoa, 2006, pp. 2570-2573.
26: L.F. Lamel, F. Schiel, A. Fourcin, J. Mariani, and H. Tillmann,
``The translanguage English database TED,''
in ICSLP, Yokohama, Japan, Sep 1994.
27: N. Mote, L. Johnson, A. Sethy, J. Silva, and S. Narayanan,
``Tactical language detection and modeling of learner speech errors: The case of Arabic tactical language training for American English speakers,''
in Proc. of InSTIL, June 2004.
28: K. Nishina,
``Development of Japanese speech database read by non-native speakers for constructing CALL system,''
in ICA, Kyoto, Japan, 2004, pp. 561-564.
29: University Munich,
``The Verbmobil project,'' http://www.phonetik.uni-muenchen.de/Forschung/Verbmobil/VerbOverview.html.
30: I. Trancoso, C. Viana, I. Mascarenhas, and C. Teixeira,
``On deriving rules for nativised pronunciation in navigation queries,''
in Proc. Eurospeech, 1999.
31: A. LaRocca and R. Chouairi,
``West point Arabic speech corpus,''
Tech. Rep., LDC, Philadelphia, Pennsylvania, 2002.
32: A. LaRocca and C. Tomei,
``West point Russian speech corpus,''
Tech. Rep., LDC, Philadelphia, Pennsylvania, 2003.
33: J. Morgan,
``West point heroico Spanish speech,''
Tech. Rep., LDC, Philadelphia, Pennsylvania, 2006.
34: I. Amdal, F. Korkmazskiy, and A. C. Surendran,
``Joint pronunciation modelling of non-native speakers using data-driven methods,''
in ICSLP, Beijing, China, 2000, pp. 622-625.
35: Speech Resources Consortium,
``UME-ERJ English speech database read by Japanese students,'' http://research.nii.ac.jp/src/eng/list/index.html.
36: Federal Aviation Administration,
``Controller pilot datalink communications (CPDLC),'' http://tf.tc.faa.gov/capabilities/cpdlc.htm.
37: S. Schaden,
``Casselberveetovallarga and other unpronounceable places: The CrossTowns corpus,''
in Proc. LREC, Genova, Italy, 2006.
38: M. Raab, R. Gruhn and E. Noeth
``Non-Native speech databases''
in Proc. ASRU, Kyoto, Japan, 2007.
39: Christopher Cieri, David Miller, Kevin Walker
``The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text'' Proc. LREC 2004
40: AMI Project
``AMI Meeting Corpus'' http://corpus.amiproject.org/
41: Jurafsky et al.
``The Berkeley Restaurant Project'' Proc. ICSLP 1994
42: S-C. Rhee and S-H. Lee and S-K. Kang and Y-J. Lee
``Design and Construction of Korean-Spoken English Corpus (K-SEC)'' Proc. ICSLP 2004

Categories:

Speech recognition

Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

Non-native pronunciations of English — result from the common linguistic phenomenon in which non native users of any language tend to carry the intonation, phonological processes and pronunciation rules from their mother tongue into their English speech. They may also create… … Wikipedia
Speech corpus — A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions in a format that can be used to create acoustic models (which can then be used with a speech recognition engine). A corpus is one such database.… … Wikipedia
Microsoft Jet Database Engine — This article is about JET Red used in Microsoft Access. For the JET Blue ISAM implementation, see Extensible Storage Engine. The Microsoft Jet Database Engine is a database engine on which several Microsoft products have been built. A database… … Wikipedia
Lower Yangtze Mandarin — Xiajiang Guanhua Spoken in Huai and Yangzi Rivers Language family Sino Tibetan Chinese … Wikipedia
South Africa — This article is about the modern country. For other uses, see South Africa (disambiguation). Republic of South Africa … Wikipedia
Oliver Cromwell — Cromwell redirects here. For other uses, see Cromwell (disambiguation). For other people named Oliver Cromwell, see Oliver Cromwell (disambiguation). Oliver Cromwell Portrait of Oliver Cromwell by Samuel Cooper … Wikipedia
Boston — This article is about the capital of Massachusetts. For other uses, see Boston (disambiguation). Boston City Clockwise: Skyline of Back Bay seen from the … Wikipedia
Nigger — This article is about the pejorative use of the word. For nigga as a colloquial reference, see Nigga. For other uses, see Nigger (disambiguation). Not to be confused with niggardly … Wikipedia
Gender-neutrality in genderless languages — is typically achieved by using gender inclusive words ( human being , person , businessperson , and so on) instead of gender specific ones ( man , he , businessman , etc.) when one speaks of people whose gender is unknown, ambiguous, or… … Wikipedia
Sesotho nouns — Notes: *The orthography used in this and related articles is that of South Africa, not Lesotho. For a discussion of the differences between the two see the notes on Sesotho orthography. *Hovering the mouse cursor over most H:title| [ɪ talɪk] |… … Wikipedia

Academic Dictionaries and Encyclopedias

Non-native speech database

Legend

Overview of non-native databases

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Non-native speech database

Legend

Overview of non-native databases

References

Look at other dictionaries:

Share the article and excerpts

Direct link