Standard Boolean model

Standard Boolean model

The Boolean model of information retrieval (BIR) is a classical information retrieval (IR) model and, at the same time, the first and most adopted one. It is used by virtually all commercial IR systems today. The BIR is based on Boolean Logic and classical Sets Theory in that both the documents to be searched and the user's query are conceived as sets of terms. Retrieval is based on whether or not the documents contain the query terms. Given a finite set

T = {t1, t2, ..., tj, ..., tm}

of elements called index terms (e.g. words or expressions  which may be stemmed  describing or characterising documents such as keywords given for a journal article), a finite set

D = {D1, ..., Di, ..., Dn}, where Di is an element of the powerset of T

of elements called documents. Given a Boolean expression  in a normal form  Q called a query as follows:

Q = (Wi OR Wk OR ...) AND ... AND (Wj OR Ws OR ...), with Wi=ti, Wk=tk, Wj=tj, Ws=ts, or Wi=NON ti, Wk=NON tk, Wj=NON tj, Ws=NONts

where ti means that the term ti is present in document Di, whereas NON ti means that it is not.

Equivalently, Q can be given in a disjunctive normal form, too. An operation called retrieval, consisting of two steps, is defined as follows:

1. The sets Sj of documents are obtained that contain or not term tj (depending on whether Wj=tj or Wj=NON tj) :

Sj = {Di|Wj element of Di}

2. Those documents are retrieved in response to Q which are the result of the corresponding sets operations, i.e. the answer to Q is as follows:

UNION ( INTERSECTION Sj)

EXAMPLE.

Let the set of original (real) documents be, for example

O = {O1, O2, O3}

where

O1 = Bayes' Principle: The principle that, in estimating a parameter, one should initially assume that each possible value has equal probability (a uniform prior distribution).

O2 = Bayesian Decision Theory: A mathematical theory of decision-making which presumes utility and probability functions, and according to which the act to be chosen is the Bayes act, i.e. the one with highest Subjective Expected Utility. If one had unlimited time and calculating power with which to make every decision, this procedure would be the best way to make any decision.

O3 = Bayesian Epistemology: A philosophical theory which holds that the epistemic status of a proposition (i.e. how well proven or well established it is) is best measured by a probability and that the proper way to revise this probability is given by Bayesian conditionalisation or similar procedures. A Bayesian epistemologist would use probability to define, and explore the relationship between, concepts such as epistemic status, support or explanatory power.

Let the set T of terms be:

T = {t1 = Bayes' Principle, t2 = probability, t3 = decision-making, t4 = Bayesian Epistemology}

Then, the set D of documents is as follows:

D = {D1, D2, D3}

where

D1 = {Bayes' Principle, probability}

D2 = {probability, decision-making}

D3 = {probability, Bayesian Epistemology}

Let the query Q be:

Q = probability AND decision-making

1. Firstly, the following sets S1 and S2 of documents Di are obtained (retrieved):

S1 = {D1, D2, D3}

S2 = {D2}

2. Finally, the following documents Di are retrieved in response to Q:{D1, D2, D3} INTERSECTION {D2} = {D2}

This means that the original document O2 (corresponding to D2) is the answer to Q.

Obviously, if there are more than one documents with the same representation, every such document is retrieved. Such documents are, in the BIR, indistinguishable (or, in other words, equivalent). From a pure formal mathematical point of view, the BIR is straightforward. From a practical point of view, however, several further problems should be solved that relate to e.g. algorithms and data structures, such as, for example, the choice of terms (manual or automatic selection or both), stemming, hash tables, inverted file structure, and so on.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Boolean algebras canonically defined — Boolean algebras have been formally defined variously as a kind of lattice and as a kind of ring. This article presents them more neutrally but equally formally as simply the models of the equational theory of two values, and observes the… …   Wikipedia

  • Boolean logic — is a complete system for logical operations. It was named after George Boole, who first defined an algebraic system of logic in the mid 19th century. Boolean logic has many applications in electronics, computer hardware and software, and is the… …   Wikipedia

  • Model theory — This article is about the mathematical discipline. For the informal notion in other parts of mathematics and science, see Mathematical model. In mathematics, model theory is the study of (classes of) mathematical structures (e.g. groups, fields,… …   Wikipedia

  • Boolean satisfiability problem — For the concept in mathematical logic, see Satisfiability. 3SAT redirects here. For the Central European television network, see 3sat. In computer science, satisfiability (often written in all capitals or abbreviated SAT) is the problem of… …   Wikipedia

  • Non-standard model — See also Interpretation (logic) In model theory, a discipline within mathematical logic, a non standard model is a model of a theory that is not isomorphic to the intended model (or standard model). If the intended model is infinite and the… …   Wikipedia

  • List of Boolean algebra topics — This is a list of topics around Boolean algebra and propositional logic. Contents 1 Articles with a wide scope and introductions 2 Boolean functions and connectives 3 Examples of Boolean algebras …   Wikipedia

  • Relational model — The relational model for database management is a database model based on first order predicate logic, first formulated and proposed in 1969 by Edgar Codd. [ Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks , E.F …   Wikipedia

  • Java 2 Platform Standard Edition — Die Java Platform, Standard Edition oder kurz Java SE (deutsch „Java Plattform, Standardausgabe“ bis Version 5.0 „Java 2 Platform, Standard Edition“, J2SE [ˌdʒeɪˈtuː ˌɛsˈiː]) ist eine Sammlung von Java APIs. Die Java SE dient als Grundlage für… …   Deutsch Wikipedia

  • Java Platform Standard Edition — Die Java Platform, Standard Edition oder kurz Java SE (deutsch „Java Plattform, Standardausgabe“ bis Version 5.0 „Java 2 Platform, Standard Edition“, J2SE [ˌdʒeɪˈtuː ˌɛsˈiː]) ist eine Sammlung von Java APIs. Die Java SE dient als Grundlage für… …   Deutsch Wikipedia

  • Java Standard Edition — Die Java Platform, Standard Edition oder kurz Java SE (deutsch „Java Plattform, Standardausgabe“ bis Version 5.0 „Java 2 Platform, Standard Edition“, J2SE [ˌdʒeɪˈtuː ˌɛsˈiː]) ist eine Sammlung von Java APIs. Die Java SE dient als Grundlage für… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”