Caverphone

Caverphone

The Caverphone phonetic matching algorithm was created by David Hood in the Caversham Project at the University of Otago in New Zealand in 2002. It was created to assist in data matching between late 19th century and early 20th century electoral rolls, where the name only needed to be in a "commonly recognisable form". The algorithm was intended to apply to those names that could not easily be matched between electoral rolls, after the exact matches were removed from the pool of potential matches). The algorithm is optimised for accents present in the study area (southern part of the city of Dunedin, New Zealand).

The rules of the algorithm are applied consecutively to any particular name, as a series of replacements.

The exact algorithm is as follows:
# Convert to lowercase
# Remove anything not A-Z
# If the name starts with
## cough make it cou2f
## rough make it rou2f
## tough make it tou2f
## enough make it enou2f
## gn make it 2n
## mb make it m2
# Replace
## cq with 2q
## ci with si
## ce with se
## cy with sy
## tch with 2ch
## c with k
## q with k
## x with k
## v with f
## dg with 2g
## tio with sio
## tia with sia
## d with t
## ph with fh
## b with p
## sh with s2
## z with s
## any initial vowel with an A
## all other vowels with a 3
## 3gh3 with 3kh3
## gh with 22
## g with k
## groups of the letter s with a S
## groups of the letter t with a T
## groups of the letter p with a P
## groups of the letter k with a K
## groups of the letter f with a F
## groups of the letter m with a M
## groups of the letter n with a N
## w3 with W3
## wy with Wy
## wh3 with Wh3
## why with Why
## w with 2
## any initial h with an A
## all other occurrences of h with a 2
## r3 with R3
## ry with Ry
## r with 2
## l3 with L3
## ly with Ly
## l with 2
## j with y
## y3 with Y3
## y with 2
# remove all
## 2s
## 3s
# put six 1s on the end
# take the first six characters as the code

Examples

Lee -> leelee -> l33l33 -> L33L33 -> LL -> L111111L111111 -> L11111
Thompson -> thompsonthompson -> th3mps3nth3mps3n -> th3mpS3nth3mpS3n -> Th3mpS3nTh3mpS3n -> Th3mPS3nTh3mPS3n -> Th3MPS3nTh3MPS3n -> Th3MPS3NTh3MPS3N -> T23MPS3NT23MPS3N -> TMPSNTMPSN111111 -> TMPSN1

External links

* Project Dedupe http://dedupe.sourceforge.net
* Caversham Project http://caversham.otago.ac.nz/
* Original (2002) Caverphone algorithm http://caversham.otago.ac.nz/files/working/ctp060902.pdf
* Revised (2004) Caverphone algorithm http://caversham.otago.ac.nz/files/working/ctp150804.pdf


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Metaphone — Lawrence Philips redirects here. For the football player, see Lawrence Phillips. Metaphone is a phonetic algorithm, an algorithm published in 1990 for indexing words by their English pronunciation. It fundamentally improves on the Soundex… …   Wikipedia

  • List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… …   Wikipedia

  • Список терминов, относящихся к алгоритмам и структурам данных —   Это служебный список статей, созданный для координации работ по развитию темы.   Данное предупреждение не устанавливается на информационные списки и глоссарии …   Википедия

  • Metaphone — Metaphone  это фонетический алгоритм для индексирования слов по их звучанию с учётом основных правил английского произношения. На выходе алгоритм даёт ключи переменной длины, в отличие от алгоритма Soundex, который генерирует ключи с… …   Википедия

  • Список терминов — Список терминов, относящихся к алгоритмам и структурам данных   Это сл …   Википедия

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”