Document retrieval

Document retrieval: Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. User queries can range from multi-sentence full descriptions of an information need to a few words.

Document retrieval is sometimes referred to as, or as a branch of, Text Retrieval. Text retrieval is a branch of information retrieval where the information is stored primarily in the form of text. Text databases became decentralized thanks to the personal computer and the CD-ROM. Text retrieval is a critical area of study today, since it is the fundamental basis of all internet search engines.

Contents

1 Description

2 Variations

3 Example: PubMed

4 References

5 See also

6 External links

Description

Document retrieval systems find information to given criteria by matching text records (documents) against user queries, as opposed to expert systems that answer questions by inferring over a logical knowledge database. A document retrieval system consists of a database of documents, a classification algorithm to build a full text index, and a user interface to access the database.

A document retrieval system has two main tasks:

Find relevant documents to user queries

Evaluate the matching results and sort them according to relevance, using algorithms such as PageRank.

Internet search engines are classical applications of document retrieval. The vast majority of retrieval systems currently in use range from simple Boolean systems through to systems using statistical or natural language processing techniques.

Variations

There are two main classes of indexing schemata for document retrieval systems: form based (or word based), and content based indexing. The document classification scheme (or indexing algorithm) in use determines the nature of the document retrieval system.

Form based document retrieval addresses the exact syntactic properties of a text, comparable to substring matching in string searches. The text is generally unstructured and not necessarily in a natural language, the system could for example be used to process large sets of chemical representations in molecular biology. A suffix tree algorithm is an example for form based indexing.

The content based approach exploits semantic connections between documents and parts thereof, and semantic connections between queries and documents. Most content based document retrieval systems use an inverted index algorithm.

Example: PubMed

The PubMed^[1] form interface features the "related articles" search which works through a comparison of words from the documents' title, abstract, and MeSH terms using a word-weighted algorithm. ^[2]

References

^ Kim W, Aronson AR, Wilbur WJ (2001). "Automatic MeSH term assignment and quality assessment". Proc AMIA Symp: 319–23. PMC 2243528. PMID 11825203. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2243528.

^ Computation

See also

Compound term processing

Document classification

Enterprise search

Information retrieval

Latent semantic indexing

Search engine

External links

Document Summary System, a commercial product that performs document retrieval and summarization

Searching semantic information and XML/RDF documents

Categories:
Information retrieval
Electronic documents

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

document retrieval — noun The matching of a user query against a set of free text records, including unstructured text, such as newspaper articles, real estate records or paragraphs in a manual, with user queries ranging from multi sentence full descriptions of an… … Wiktionary
document retrieval — The ability to search for, select, and display a document or its facsimile from storage … IT glossary of terms, acronyms and abbreviations
document retrieval — locating and recovering documents … English contemporary dictionary
Document classification — or document categorization is a problem in both library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done manually (or intellectually ) or algorithmically.… … Wikipedia
Retrieval — could refer to: * Information Retrieval * Text retrieval * Image retrieval * Document retrieval * Music information retrieval * Medical retrieval * In psychology, retrieval refers to the process of recalling information that is stored in memory … Wikipedia
Document management system — A document management system (DMS) is a computer system (or set of computer programs) used to track and store electronic documents and/or images of paper documents. It is usually also capable of keeping track of the different versions created by… … Wikipedia
Document-oriented database — A document oriented database is a computer program designed for storing, retrieving, and managing document oriented, or semi structured data, information. Document oriented databases are one of the main categories of so called NoSQL databases and … Wikipedia
document staging — In a document retrieval from an optical jukebox, the process where the image is fetched from the server by the software, and stored on the user s local PC until it is used … IT glossary of terms, acronyms and abbreviations
Document Type Definition — (DTD) is a set of markup declarations that define a document type for SGML family markup languages (SGML, XML, HTML). DTDs were a precursor to XML schema and have a similar function, although different capabilities. DTDs use a terse formal syntax … Wikipedia
Document clustering — (also referred to as Text clustering) is closely related to the concept of data clustering. Document clustering is a more specific technique for unsupervised document organization, automatic topic extraction and fast information retrieval or… … Wikipedia

Academic Dictionaries and Encyclopedias

Document retrieval

Contents

Description

Variations

Example: PubMed

References

See also

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Document retrieval

Contents

Description

Variations

Example: PubMed

References

See also

External links

Look at other dictionaries:

Share the article and excerpts

Direct link