Normalized Google distance

Google distance is a semantic similarity measure derived from the number of hits returned by the Google search engine for a given set of keywords. Keywords with the same or similar meanings in a natural language sense tend to be "close" in units of Google distance, while words with dissimilar meanings tend to be farther apart.

Specifically, the normalized Google distance between two search terms x and y is

$\operatorname{NGD}(x,y) = \frac{\max\{\log f(x), \log f(y)\} - \log f(x,y)} {\log M - \min\{\log f(x), \log f(y)\}}$

where M is the total number of web pages searched by Google; f(x) and f(y) are the number of hits for search terms x and y, respectively; and f(x, y) is the number of web pages on which both x and y occur.

If the two search terms x and y never occur together on the same web page, but do occur separately, the normalized Google distance between them is infinite. If both terms always occur together, their NGD is zero, or equivalent to the coefficient between x squared and y squared.

References

Rudi Cilibrasi and Paul Vitanyi (2004). , The Google Similarity Distance, ArXiv.org or The Google Similarity Distance, IEEE Trans. Knowledge and Data Engineering, 19:3(2007), 370–383..
Google's search for meaning at Newscientist.com.
Jan Poland and Thomas Zeugmann (2006), Clustering the Google Distance with Eigenvectors and Semidefinite Programming
Aarti Gupta and Tim Oates (2007), Using Ontologies and the Web to Learn Lexical Semantics (Includes comparison of NGD to other algorithms.)
Wong, W., Liu, W. & Bennamoun, M. (2007) Tree-Traversing Ant Algorithm for Term Clustering based on Featureless Similarities. In: Data Mining and Knowledge Discovery, Volume 15, Issue 3, Pages 349–381. [doi: 10.1007/s10618-007-0073-y] (the use of NGD for term clustering)

This linguistics article is a stub. You can help Wikipedia by expanding it.v · Categories:

Computational linguistics
Statistical distance measures
Linguistics stubs

Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

Normalisierte Google-Distanz — Einer Theorie nach kann die normalisierte Google Distanz (engl. normalized Google distance, kurz NGD) als statistische Größe für die semantische Nähe zweier Begriffe oder semantischer Konzepte dienen. Sie wird über die Anzahl der Treffer… … Deutsch Wikipedia
Google.be — Google (moteur de recherche) Logo de Google URL google.com Commercial oui … Wikipédia en Français
Google.ch — Google (moteur de recherche) Logo de Google URL google.com Commercial oui … Wikipédia en Français
Google.co.ma — Google (moteur de recherche) Logo de Google URL google.com Commercial oui … Wikipédia en Français
Google.com — Google (moteur de recherche) Logo de Google URL google.com Commercial oui … Wikipédia en Français
Google.es — Google (moteur de recherche) Logo de Google URL google.com Commercial oui … Wikipédia en Français
Google.fr — Google (moteur de recherche) Logo de Google URL google.com Commercial oui … Wikipédia en Français
Google Whack — Google (moteur de recherche) Logo de Google URL google.com Commercial oui … Wikipédia en Français
Google whack — Google (moteur de recherche) Logo de Google URL google.com Commercial oui … Wikipédia en Français
Moteur de recherche Google — Google (moteur de recherche) Logo de Google URL google.com Commercial oui … Wikipédia en Français

Mark and share
Search through all dictionaries
Translate…
Search Internet

Academic Dictionaries and Encyclopedias

Normalized Google distance

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Normalized Google distance

References

Look at other dictionaries:

Share the article and excerpts

Direct link