Relevance (information retrieval)

Relevance (information retrieval)

In the context of information science and information retrieval, relevance denotes how well a retrieved set of documents (or a single document) meets the information need of the user.

Topical relevance and other kinds of relevance

"Relevance" most commonly refers to "topical" relevance or "aboutness", i.e. to what extent the topic of a result matches the topic of the query or information need. Relevance can also be interpreted more broadly, referring to generally how "good" a retrieved result is with regard to the information need. The latter definition of relevance, sometimes referred to as "user" relevance, encompasses "topical" relevance and possibly other concerns of the user such as timeliness, authority or novelty of the result.

History

The concern with the problem of finding relevant information dates back at least to the first publication of scientific journals in 17th Century.

The formal study of relevance began in the 20th Century with the study of what would later be called bibliometrics. In the 1930s and 1940s, S. C. Bradford used of the term "relevant" to characterize articles relevant to a subject (cf., Bradford's law). In the 1950s, the first information retrieval systems emerged, and researchers noted the retrieval of irrelevant articles as a significant concern. In 1958, B. C. Vickery made the concept of relevance explicit in an address at the International Conference on Scientific Information. [ Mizzaro, S. (1997). Relevance: The Whole History. Journal of the American Society for Information Science. 48, 810‐832. ]

Since 1958, information scientists have explored and debated definitions of relevance. A particular focus of the debate was the distinction between "relevance to a subject" or "topical relevance" and "user relevance".

Evaluation and Relevance

The information retrieval community has emphasized the use of test collections and benchmark tasks to measure topical relevance, starting with the Cranfield Experiments of the early 1960s and culminating in the TREC evaluations that continue to this day as the main evaluation framework for information retrieval research.

In order to evaluate how well an information retrieval system retrieved topically relevant results, the relevance of retrieved results must be quantified. In Cranfield-style evaluations, this typically involves assigning a "relevance level" to each retrieved result, a process known as "relevance assessment". Relevance levels can be binary, indicating a result is or is not relevant, or graded, indicating results have a varying degree of match between the topic of the result and the information need. Once relevance levels have been assigned to the retrieved results, information retrieval performance measures can be used to assess the quality of a retrieval system's output.

In contrast to this focus solely on topical relevance, the information science community has emphasized user studies that consider user relevance. These studies often focus on aspects of human-computer interaction (see also human-computer information retrieval).

Clustering and Relevance

The cluster hypothesis, proposed by C. J. van Rijsbergen in 1979, asserts that two documents that are similar to each other have a high likelihood of being relevant to the same information need. With respect to the embedding similarity space, the cluster hypothesis can be interpreted globally or locally.F. Diaz, Autocorrelation and Regularization of Query-Based Retrieval Scores. PhD thesis, University of Massachusetts Amherst, Amherst, MA, February 2008, Chapter 3.] The global interpretation assumes that there exist some fixed set of underlying topics derived from inter-document similarity. These global clusters or their representatives can then be used to relate relevance of two documents (e.g. two documents in the same cluster should both be relevant to the same request). Methods in this spirit include,
* cluster-based information retrievalW. B. Croft, “A model of cluster searching based on classification,” Information Systems, vol. 5, pp. 189–195, 1980.] A. Griffiths, H. C. Luckhurst, and P. Willett, “Using interdocument similarity information in document retrieval systems,” Journal of the American Society for Information Science, vol. 37, no. 1, pp. 3–11, 1986.]
* cluster-based document expansion such as latent semantic analysis or its language modeling equivalents.X. Liu and W. B. Croft, “Cluster-based retrieval using language models,” in SIGIR ’04: Proceedings of the 27th annual international conference on Research and development in information retrieval, (New York, NY, USA), pp. 186–193, ACM Press, 2004.] It is important to ensure that clusters-either in isolation or combination-successfully model the set of possible relevant documents.

A second interpretation, most notably advanced by Ellen Voorhees,E. M. Voorhees, “The cluster hypothesis revisited,” in SIGIR ’85: Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval, (New York, NY, USA), pp. 188–196, ACM Press, 1985.] focuses on the local relationships between documents. The local interpretation avoids having to model the number or size of clusters in the collection and allow relevance at multiple scales. Methods in this spirit include,
* multiple cluster retrieval
* spreading activationS. Preece, A spreading activation network model for information retrieval. PhD thesis, University of Illinois, Urbana-Champaign, 1981.] and relevance propagationT. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma, “A study of relevance propagation for web search,” in SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, (New York, NY, USA), pp. 408–415, ACM Press, 2005.] methods
* local document expansionA. Singhal and F. Pereira, “Document expansion for speech retrieval,” in SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, (New York, NY, USA), pp. 34–41, ACM Press, 1999.]
* score regularizationF. Diaz, “Regularizing query-based retrieval scores,” Information Retrieval, vol. 10, pp. 531–562, December 2007.] Local methods require a accurate and appropriate document similarity measure.

Epistemological issues

Are users best to evaluate the relevance of a given documents, or is it better to use experts?Most research about relevance in information retrieval in recent years have implicitly assumed that the users' evaluation of the output a given system should be used to increase "relevance" output. An alternative strategy would be to use journal impact factor to rank output and thus base relevance on expert evaluations. Other strategies may be used. The important thing to recognize is, however, that relevance is fundamentally a question of epistemology, not psychology. (Peoples' psychology reflects certain epistemological influences).

References

reflist

Additional reading

*Relevance : communication and cognition. by Dan Sperber; Deirdre Wilson. 2nd ed. Oxford ; Cambridge, MA : Blackwell Publishers, 2001. ISBN: 9780631198789

*Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. Journal of the American Society for Information Science and Technology, 58(3), 1915-1933. ( [http://www.scils.rutgers.edu/~tefko/Saracevic%20relevance%20pt%20II%20JASIST%20%2707.pdf pdf] )

*Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance. Journal of the American Society for Information Science and Technology, 58(13), 2126-2144. ( [http://www.scils.rutgers.edu/~tefko/Saracevic%20relevance%20pt%20III%20JASIST%20%2707.pdf pdf] )

*Saracevic, T. (2007). Relevance in information science. Invited Annual Thomson Scientific Lazerow Memorial Lecture at School of Information Sciences, University of Tennessee. Sept. 19, 2007. ( [http://www.sis.utk.edu/lazerow2007 video] )


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Information retrieval — This article is about information retrieval in general. For the fictional government department, see Brazil (film). Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for… …   Wikipedia

  • Information retrieval query language — An information retrieval query language is a query language used to make queries into database, where the semantics of the query are defined not by a precise rendering of a formal syntax, but by an interpretation of the most suitable results of… …   Wikipedia

  • Geographic Information Retrieval — (GIR) or Geographical Information Retrieval is the augmentation of Information Retrieval with geographic metadata.Information Retrieval generally views documents as a collection or bag of words. In contrast Geographic Information Retrieval… …   Wikipedia

  • Human Computer Information Retrieval — The fields of human computer interaction (HCI) and information retrieval (IR) have both developed innovative techniques to address the challenge of navigating the complex information spaces, but their insights have to date often failed to cross… …   Wikipedia

  • Relevance — is a term used to describe how pertinent, connected, or applicable something is to a given matter. A thing is relevant if it serves as a mean to a given purpose. Imagine a patient suffering a well defined disease such as scurvy caused by lack of… …   Wikipedia

  • Cognitive models of information retrieval — rest on the mix of areas such as cognitive science, human computer interaction, information retrieval, and library science. They describe the relationship between a person s cognitive model of the information sought and the organization of this… …   Wikipedia

  • SMART Information Retrieval System — The SMART (System for the Mechanical Analysis and Retrieval of Text) Information Retrieval System is an information retrieval system developed at Cornell University in the 1960s. Many important concepts in information retrieval were developed as… …   Wikipedia

  • Relevance (disambiguation) — Relevance is a measure of how pertinent, connected, or applicable something is.Relevance may also refer to:*Relevance (information retrieval), a measure of a document s applicability to a given subject or search query *Relevance (law), regarding… …   Wikipedia

  • Relevance feedback — is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query and to use information about whether or not those results are relevant to perform a new… …   Wikipedia

  • information processing — Acquisition, recording, organization, retrieval, display, and dissemination of information. Today the term usually refers to computer based operations. Information processing consists of locating and capturing information, using software to… …   Universalium

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”