Metaphone

Metaphone
Lawrence Philips redirects here. For the football player, see Lawrence Phillips.

Metaphone is a phonetic algorithm, an algorithm published in 1990 for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding that does a better job of matching words and names which sound similar. As with Soundex, similar sounding words should share the same keys.

Metaphone was developed by Lawrence Philips as a response to deficiencies in the Soundex algorithm. It uses a larger set of rules for English pronunciation. Metaphone is available as a built-in operator in a number of systems, including later versions of PHP.

The original author later produced a new version of the algorithm, which he named Double Metaphone. Contrary to the original algorithm whose application is limited to English only, this version takes into account spelling peculiarities of a number of other languages. In 2009 Lawrence Philips released a third version, called Metaphone 3, which achieves an accuracy of approximately 99% for English words, non-English words familiar to Americans, and first names and family names commonly found in the United States, having been developed according to modern engineering standards against a test harness of prepared correct encodings.

Contents

Procedure

Metaphone codes use the 16 consonant symbols 0BFHJKLMNPRSTWXY.[1] The '0' represents "th" (as an ASCII approximation of Θ), 'X' represents "sh" or "ch", and the others represent their usual English pronunciations. The vowels AEIOU are also used, but only at the beginning of the code.[2]

  1. Drop duplicate adjacent letters, except for C.
  2. If the word begins with 'KN', 'GN', 'PN', 'AE', 'WR', drop the first letter.
  3. Drop 'B' if after 'M' and if it is at the end of the word.
  4. 'C' transforms to 'X' if followed by 'IA' or 'H' (unless in latter case, it is part of '-SCH-', in which case it transforms to 'K'). 'C' transforms to 'S' if followed by 'I', 'E', or 'Y'. Otherwise, 'C' transforms to 'K'.
  5. 'D' transforms to 'J' if followed by 'GE', 'GY', or 'GI'. Otherwise, 'D' transforms to 'T'.
  6. Drop 'G' if followed by 'H' and 'H' is not at the end or before a vowel. Drop 'G' if followed by 'N' or 'NED' and is at the end.
  7. 'G' transforms to 'J' if before 'I', 'E', or 'Y', and it is not in 'GG'. Otherwise, 'G' transforms to 'K'.
  8. Drop 'H' if after vowel and not before a vowel.
  9. 'CK' transforms to 'K'.
  10. 'PH' transforms to 'F'.
  11. 'Q' transforms to 'K'.
  12. 'S' transforms to 'X' if followed by 'H', 'IO', or 'IA'.
  13. 'T' transforms to 'X' if followed by 'IA' or 'IO'. 'TH' transforms to '0'. Drop 'T' if followed by 'CH'.
  14. 'V' transforms to 'F'.
  15. 'WH' transforms to 'W' if at the beginning. Drop 'W' if not followed by a vowel.
  16. 'X' transforms to 'S' if at the beginning. Otherwise, 'X' transforms to 'KS'.
  17. Drop 'Y' if not followed by a vowel.
  18. 'Z' transforms to 'S'.
  19. Drop all vowels unless it is the beginning.

Double Metaphone

The Double Metaphone search algorithm is the second generation of this algorithm. Its implementation was described in the June 2000 issue of C/C++ Users Journal.

It is called "Double" because it can return both a primary and a secondary code for a string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. For example, encoding the name "Smith" yields a primary code of SM0 and a secondary code of XMT, while the name "Schmidt" yields a primary code of XMT and a secondary code of SMT--both have XMT in common.

Double Metaphone tries to account for myriad irregularities in English of Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other origin. Thus it uses a much more complex ruleset for coding than its predecessor; for example, it tests for approximately 100 different contexts of the use of the letter C alone.

Metaphone 3

Developed by the same author, this algorithm aims at further improving the accuracy of phonetic encoding of words in the English language. The ability to encode Metaphone keys taking non-initial vowels into account, as well as encoding voiced and unvoiced consonants differently, has been added. This allows the result set to be more closely focused if desired. Development for other language versions has been announced. Metaphone 3 is sold as source code in C++, Java and C# for 40 USD each.

See also

External links

Metaphone Implementations

Double Metaphone Implementations

References

  1. ^ http://www.sound-ex.com/alternative_zu_soundex
  2. ^ http://www.morfoedro.it/doc.php?n=222&lang=en

Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Metaphone — Metaphone  это фонетический алгоритм для индексирования слов по их звучанию с учётом основных правил английского произношения. На выходе алгоритм даёт ключи переменной длины, в отличие от алгоритма Soundex, который генерирует ключи с… …   Википедия

  • Metaphone — es un algoritmo fonético, un algoritmo para indexar palabras por su sonido al ser pronunciadas en inglés. Metaphone fue desarrollado por Lawrence Philips como respuesta a las deficiencias del algoritmo Soundex. Es más exacto que Soundex porque… …   Wikipedia Español

  • Metaphone — ist ein phonetischer Algorithmus zur Indizierung von Wörtern und Phrasen nach ihrem Klang in der englischen Sprache. Metaphone wurde von Lawrence Philips als Antwort zu der sehr groben Unterscheidbarkeit des Soundex Algorithmus entwickelt. Er ist …   Deutsch Wikipedia

  • Metaphone — Le Metaphone est un algorithme phonétique, algorithme pour indexer les mots selon leur sonorité lorsque prononcé en anglais. Metaphone a été développé par Lawrence Philips comme une réponse aux déficiences de l algorithme Soundex. Il est plus… …   Wikipédia en Français

  • metaphone — met·a·phone …   English syllables

  • metaphone — /ˈmɛtəfoʊn/ (say metuhfohn) noun Phonetics a form of a phoneme which is in free variation with other such forms in particular words or particular circumstances. {meta + phone} …  

  • metaphone — ˈ ̷ ̷ ̷ ̷ˌfōn noun Etymology: meta + phone : a free allophonic variant chosen in preference to another because regarded as more suitable to the style of speech being used …   Useful english dictionary

  • Double Metaphone — The Double Metaphone search algorithm is a phonetic algorithm written by Lawrence Philips and is the second generation of his Metaphone algorithm. Its implementation was described in the June 2000 issue of C/C++ Users Journal .It is called Double …   Wikipedia

  • Double Metaphone — Le Double Metaphone est un algorithme de recherche phonétique écrit par Lawrence Philips et est la deuxième génération de l algorithme Metaphone. Son implémentation a été décrite en juin 2000 dans le magazine C/C++ Users Journal. Il est appelé… …   Wikipédia en Français

  • Soundex — is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for names with the same pronunciation to be encoded to the same representation so that they can be matched despite minor differences in spelling. Soundex… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”