Term Discrimination

Term Discrimination

Term Discrimination is a way to rank keywords in how useful they are for Information Retrieval.

Overview

This is a method similar to tf-idf but it deals with finding keywords suitable for information retrieval and ones that are not. Please refer to Vector Space Model first.

This method uses the concept of "Vector Space Density" that the less dense an occurrency matrix is, the more optimal an information retrieval query will be.

An optimal index term is one that can distinguish two different documents from each other and relate two similar documents. On the other hand, a sub-optimal index term can not distinguish two different document from two similar documents.

The discrimination value is the difference in the occurrence matrix's vector-space density versus the same matrix's vector-space without the index term's density.

Let: A be the occurrence matrix A_k be the occurrence matrix without the index term k and Q(A) be density of A. Then: The discrimination value of the index term k is: DV_k = Q(A_k) - Q(A)

How to compute

Given an occurrency matrix: A and one keyword: k
* Find the global document centroid: C (this is just the average document vector)
* Find the average euclidean distance from every document vector, D_i to C
* Find the average euclidean distance from every document vector, D_i to C "IGNORING" k
* The difference between the two values in the above step is the "discrimination value" for keyword K

A higher value is better because including the keyword will result in better information retrieval.

Qualitative Observations

Keywords that are "sparse" should be poor discriminators because they have poor "recall,"where askeywords that are "frequent" should be poor discriminators because they have poor "precision."

References

* G. Salton, A. Wong, and C. S. Yang (1975), " [http://www.cs.uiuc.edu/class/fa05/cs511/Spring05/other_papers/p613-salton.pdf A Vector Space Model for Automatic Indexing] ," "Communications of the ACM", vol. 18, nr. 11, pages 613–620. "(The article in which the vector space model was first presented)"


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Discrimination in the United States — Discrimination, according to Merriam Webster’s dictionary, is the process by which two stimuli differing in some aspect are responded to differently.[1] This term is used to highlight the difference of treatment between members of different… …   Wikipedia

  • Discrimination — This article focuses on discrimination in sociology, not statistical discrimination. For other uses of the term, see the entry for discrimination at Wiktionary. Part of a series on …   Wikipedia

  • Discrimination against Chinese Indonesians — Part of a series on Discrimination General forms …   Wikipedia

  • Discrimination based on skin color — Part of a series on Discrimination General forms …   Wikipedia

  • Discrimination against atheists — Freedom of religion Concepts …   Wikipedia

  • Term indexing — In computer science, term indexing is the task of creating an index of terms and clauses in a collection.Many operations in automatic theorem provers require search in huge collections of terms and clauses. Such operations typically fall intothe… …   Wikipedia

  • Discrimination faced by the Bihari community in India — Bihari communities living in other states have been subjected to a growing degree of racial discriminationVir Sanghvi, [http://www.ibnlive.com/news/state of neglect deluged bihar falls off govt map/72343 3 p0.html The Bhaiyya Effect] , Hindustan… …   Wikipedia

  • discrimination by common carrier — Any act, device, or arrangement by a common carrier which operates to give to one or more patrons rates, services, or privileges not accorded to all under similar conditions or circumstances, or, vice versa, which operates to render unavailable… …   Ballentine's law dictionary

  • invidious discrimination — Term invidious in context of claim that difference in treatment amounts to invidious discrimination in violation of the Fourteenth Amendment, means arbitrary, irrational and not reasonably related to a legitimate purpose. Eaton v. State, Del.,… …   Black's law dictionary

  • invidious discrimination — Term invidious in context of claim that difference in treatment amounts to invidious discrimination in violation of the Fourteenth Amendment, means arbitrary, irrational and not reasonably related to a legitimate purpose. Eaton v. State, Del.,… …   Black's law dictionary

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”