 Dempster–Shafer theory

The Dempster–Shafer theory (DST) is a mathematical theory of evidence.^{[1]} It allows one to combine evidence from different sources and arrive at a degree of belief (represented by a belief function) that takes into account all the available evidence. The theory was first developed by Arthur P. Dempster^{[2]} and Glenn Shafer.^{[1]}
In a narrow sense, the term Dempster–Shafer theory refers to the original conception of the theory by Dempster and Shafer. However, it is more common to use the term in the wider sense of the same general approach, as adapted to specific kinds of situations. In particular, many authors have proposed different rules for combining evidence, often with a view to handling conflicts in evidence better.^{[3]}
Contents
Overview
Dempster–Shafer theory is a generalization of the Bayesian theory of subjective probability; whereas the latter requires probabilities for each question of interest, belief functions base degrees of belief (or confidence, or trust) for one question on the probabilities for a related question. These degrees of belief may or may not have the mathematical properties of probabilities; how much they differ depends on how closely the two questions are related.^{[4]} Put another way, it is a way of representing epistemic plausibilities but it can yield answers that contradict those arrived at using probability theory.
Often used as a method of sensor fusion, Dempster–Shafer theory is based on two ideas: obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster's rule^{[5]} for combining such degrees of belief when they are based on independent items of evidence. In essence, the degree of belief in a proposition depends primarily upon the number of answers (to the related questions) containing the proposition, and the subjective probability of each answer. Also contributing are the rules of combination that reflect general assumptions about the data.
In this formalism a degree of belief (also referred to as a mass) is represented as a belief function rather than a Bayesian probability distribution. Probability values are assigned to sets of possibilities rather than single events: their appeal rests on the fact they naturally encode evidence in favor of propositions.
Dempster–Shafer theory assigns its masses to all of the nonempty subsets of the entities that comprise a system. Suppose for example that a system has five members, that is to say five independent states, exactly one of which is actual. If the original set is called S — so that  S  = 5 — then the set of all subsets — the power set — is called . Since you can express each possible subset as a binary vector (describing whether any particular member is present or not by writing a “1” or a “0” for that member's slot), it can be seen that there are 2^{5} subsets possible (2^{S} in general), ranging from the empty subset (0, 0, 0, 0, 0) to the “everything” subset (1, 1, 1, 1, 1). The empty subset represents a contradiction, which is not true in any state, and is thus assigned a mass of zero; the remaining masses are normalised so that their total is 1. The “everything” subset is often labelled “unknown” as it represents the state where all elements are present, in the sense that you cannot tell which is actual.
Belief and plausibility
Shafer's framework allows for belief about propositions to be represented as intervals, bounded by two values, belief (or support) and plausibility:
 belief ≤ plausibility.
Belief in a hypothesis is constituted by the sum of the masses of all sets enclosed by it (i.e. the sum of the masses of all subsets of the hypothesis). It is the amount of belief that directly supports a given hypothesis at least in part, forming a lower bound. Plausibility is 1 minus the sum of the masses of all sets whose intersection with the hypothesis is empty. It is an upper bound on the possibility that the hypothesis could be true, i.e. it “could possibly be the true state of the system” up to that value, because there is only so much evidence that contradicts that hypothesis.
For example, suppose we have a belief of 0.5 and a plausibility of 0.8 for a proposition, say “the cat in the box is dead.” This means that we have evidence that allows us to state strongly that the proposition is true with a confidence of 0.5. However, the evidence contrary to that hypothesis (i.e. “the cat is alive”) only has a confidence of 0.2. The remaining mass of 0.3 (the gap between the 0.5 supporting evidence on the one hand, and the 0.2 contrary evidence on the other) is “indeterminate,” meaning that the cat could either be dead or alive. This interval represents the level of uncertainty based on the evidence in your system.
Hypothesis Mass Belief Plausibility Null (neither alive nor dead) 0 0 0 Alive 0.2 0.2 0.5 Dead 0.5 0.5 0.8 Either (alive or dead) 0.3 1.0 1.0 The null hypothesis is set to zero by definition (it corresponds to “no solution”). The orthogonal hypotheses “Alive” and “Dead” have probabilities of 0.2 and 0.5, respectively. This could correspond to “Live/Dead Cat Detector” signals, which have respective reliabilities of 0.2 and 0.5. Finally, the allencompassing “Either” hypothesis (which simply acknowledges there is a cat in the box) picks up the slack so that the sum of the masses is 1. The belief for the “Alive” and “Dead” hypotheses matches their corresponding masses because they have no subsets; belief for “Either” consists of the sum of all three masses (Either, Alive, and Dead) because “Alive” and “Dead” are each subsets of “Either”. The “Alive” plausibility is 1 − m (Dead) and the “Dead” plausibility is 1 − m (Alive). Finally, the “Either” plausibility sums m(Alive) + m(Dead) + m(Either). The universal hypothesis (“Either”) will always have 100% belief and plausibility —it acts as a checksum of sorts.
Here is a somewhat more elaborate example where the behavior of belief and plausibility begins to emerge. We're looking through a variety of detector systems at a single faraway signal light, which can only be coloured in one of three colours (red, yellow, or green):
Hypothesis Mass Belief Plausibility Null 0 0 0 Red 0.35 0.35 0.56 Yellow 0.25 0.25 0.45 Green 0.15 0.15 0.34 Red or Yellow 0.06 0.66 0.85 Red or Green 0.05 0.55 0.75 Yellow or Green 0.04 0.44 0.65 Any 0.1 1.0 1.0 Events of this kind would not be modeled as disjoint sets in probability space as they are here in mass assignment space. Rather the event "Red or Yellow" would be considered as the union of the events "Red" and "Yellow", and (see the axioms of probability theory) P(Red or Yellow) ≥ P(Yellow), and P(Any)=1, where Any refers to Red or Yellow or Green. In DST the mass assigned to Any refers to the proportion of evidence that can't be assigned to any of the other states, which here means evidence that says there is a light but doesn't say anything about what color it is. In this example, the proportion of evidence that shows the light is either Red or Green is given a mass of 0.05. Such evidence might, for example, be obtained from a R/G color blind person. DST lets us extract the value of this sensor's evidence. Also, in DST the Null set is considered to have zero mass, meaning here that the signal light system exists and we are examining its possible states, not speculating as to whether it exists at all.
Combining beliefs
Beliefs corresponding to independent pieces of information are combined using Dempster's rule of combination, which is a generalization of the special case of Bayes' theorem where events are independent. Note that the probability masses from propositions that contradict each other can also be used to obtain a measure of how much conflict there is in a system. This measure has been used as a criterion for clustering multiple pieces of seemingly conflicting evidence around competing hypotheses.
In addition, one of the computational advantages of the Dempster–Shafer framework is that priors and conditionals need not be specified, unlike Bayesian methods, which often use a symmetry (minimax error) argument to assign prior probabilities to random variables (e.g. assigning 0.5 to binary values for which no information is available about which is more likely). However, any information contained in the missing priors and conditionals is not used in the Dempster–Shafer framework unless it can be obtained indirectly—and arguably is then available for calculation using Bayes equations.
Dempster–Shafer theory allows one to specify a degree of ignorance in this situation instead of being forced to supply prior probabilities that add to unity. This sort of situation, and whether there is a real distinction between risk and ignorance, has been extensively discussed by statisticians and economists. See, for example, the contrasting views of Daniel Ellsberg, Howard Raiffa, Kenneth Arrow and Frank Knight.
Formal definition
Let X be the universal set: the set representing all possible states of a system under consideration. The power set
is the set of all subsets of X, including the empty set . For example, if:
then
The elements of the power set can be taken to represent propositions concerning the actual state of the system, by containing all and only the states in which the proposition is true.
The theory of evidence assigns a belief mass to each element of the power set. Formally, a function
is called a basic belief assignment (BBA), when it has two properties. First, the mass of the empty set is zero:
Second, the masses of the remaining members of the power set add up to a total of 1:
The mass m(A) of A, a given member of the power set, expresses the proportion of all relevant and available evidence that supports the claim that the actual state belongs to A but to no particular subset of A. The value of m(A) pertains only to the set A and makes no additional claims about any subsets of A, each of which have, by definition, their own mass.
From the mass assignments, the upper and lower bounds of a probability interval can be defined. This interval contains the precise probability of a set of interest (in the classical sense), and is bounded by two nonadditive continuous measures called belief (or support) and plausibility:
The belief bel(A) for a set A is defined as the sum of all the masses of subsets of the set of interest:
The plausibility pl(A) is the sum of all the masses of the sets B that intersect the set of interest A:
The two measures are related to each other as follows:
And conversely, for finite A, given the belief measure bel(B) for all subsets B of A, we can find the masses m(A) with the following inverse function:
where A − B is the difference of the cardinalities of the two sets.^{[3]}
It follows from the last two equations that, for a finite set X, you need know only one of the three (mass, belief, or plausibility) to deduce the other two; though you may need to know the values for many sets in order to calculate one of the other values for a particular set. In the case of an infinite X, there can be welldefined belief and plausibility functions but no welldefined mass function.^{[6]}
Dempster's rule of combination
The problem we now face is how to combine two independent sets of mass assignments. That is, how do we combine evidence from difference sources? We do this through Dempster's rule of combination. This rule strongly emphasises the agreement between multiple sources and ignores all the conflicting evidence through a normalization factor. Use of that rule has come under serious criticism when significant conflict in the information is encountered.
Specifically, the combination (called the joint mass) is calculated from the two sets of masses m_{1} and m_{2} in the following manner:
where
K is a measure of the amount of conflict between the two mass sets.
Effects of conflict
The normalization factor above, 1 − K, has the effect of completely ignoring conflict and attributing any mass associated with conflict to the null set. This combination rule for evidence can therefore produce counterintuitive results when there is significant conflict, as we show next.
Example with low conflict
The following example, where Dempster's rule is more appropriate, results from reversing the probability values of the preceding example.
 Suppose that one doctor believes a patient has either a brain tumor— with a probability of 0.99—or meningitis—with a probability of only 0.01. A second doctor also believes the patient has a brain tumor—with a probability of 0.99—and believes the patient suffers from concussion—with a probability of only 0.01. If we calculate m (brain tumor) with Dempster’s rule, we obtain
This result implies complete support for the diagnosis of a brain tumour, which both doctors believed very likely. The agreement arises from the low degree of conflict between the two sets of evidence comprised by the two doctors' opinions.
In either case, it would be reasonable to expect that:
since the existence of nonzero belief probabilities for other diagnoses implies less than complete support for the brain tumour diagnosis.
Example with high conflict
The following example has been introduced by Zadeh in 1979 ^{[7]}, ^{[8]},^{[9]} to point out the counterintuitive result generated by Dempster's rule.
Suppose that one has two equireliable doctors and one doctor believes a patient has either a brain tumor— with a probability (i.e. a basic belief assignment  bba's, or mass of belief) of 0.99—or meningitis—with a probability of only 0.01. A second doctor believes the patient has a concussion —with a probability of 0.99—and believes the patient suffers from meningitis—with a probability of only 0.01. Applying Dempster’s rule to combine these two sets of masses of belief, one gets finally m(meningitis)=1 (the meningitis is diagnosed with 100 percent of confidence). Such result goes against the common sense since both doctors agree that there is a little chance that the patient has a meningitis.
This very interesting example has been the starting point of many research works for trying to find a solid justification for Dempster's rule and for foundations of DempsterShafer Theory ^{[10]}, ^{[11]}, or to show the inconsistencies of this theory ^{[12]}, ^{[13]}, ^{[14]}.
Criticism
Judea Pearl (1988a, chapter 9;^{[15]} 1988b^{[16]} and 1990);^{[17]} has argued that it is misleading to interpret belief functions as representing either “probabilities of an event,” or “the confidence one has in the probabilities assigned to various outcomes,” or “degrees of belief (or confidence, or trust) in a proposition,” or “degree of ignorance in a situation.” Instead, belief functions represent the probability that a given proposition is provable from a set of other propositions, to which probabilities are assigned. Confusing probabilities of truth with probabilities of provability may lead to counterintuitive results in reasoning tasks such as (1) representing incomplete knowledge, (2) beliefupdating and (3) evidence pooling. He further demonstrated that, if partial knowledge is encoded and updated by belief function methods, the resulting beliefs cannot serve as a basis for rational decisions.
Kłopotek and Wierzchoń ^{[18]} proposed to interpret the Dempster–Shafer theory in terms of statistics of decision tables (of the rough set theory), whereby the operator of combining evidence should be seen as relational join of decision tables. In another interpretation
M.A. Kłopotek and S.T. Wierzchoń ^{[19]} propose to view this theory as describing destructive material processing (under loss of properties), e.g. like in some semiconductor production processes. Under both interpretations reasoning in DST gives correct results, contrary to the earlier probabilistic interpretations, criticized by Pearl in the cited papers and by other researchers.
See also
 Imprecise probability
 Upper and lower probabilities
 Possibility theory
 Probability theory
 Bayes' theorem
 Bayesian network
 G. L. S. Shackle
 Transferable belief model
 Infogap decision theory
 Subjective logic
 Doxastic logic
 Linear belief function
References
 ^ ^{a} ^{b} Shafer, Glenn; A Mathematical Theory of Evidence, Princeton University Press, 1976, ISBN 0608025089
 ^ Dempster, A. P. (1967). "Upper and lower probabilities induced by a multivalued mapping". The Annals of Mathematical Statistics 38 (2): 325–339. doi:10.1214/aoms/1177698950.
 ^ ^{a} ^{b} Kari Sentz and Scott Ferson (2002); Combination of Evidence in Dempster–Shafer Theory, Sandia National Laboratories SAND 20020835
 ^ Shafer, Glenn; Dempster–Shafer theory, 2002
 ^ Dempster, Arthur P.; A generalization of Bayesian inference, Journal of the Royal Statistical Society, Series B, Vol. 30, pp. 205–247, 1968
 ^ J.Y. Halpern (2003) Reasoning about Uncertainty MIT Press
 ^ L. Zadeh, On the validity of Dempster's rule of combination, Memo M79/24, Univ. of California, Berkeley, USA, 1979
 ^ L. Zadeh, Book review: A mathematical theory of evidence, The Al Magazine, Vol. 5, No. 3, pp. 8183, 1984
 ^ L. Zadeh, A simple view of the DempsterShafer Theory of Evidence and its implication for the rule of combination, The Al Magazine, Vol. 7, No. 2, pp. 8590, Summer 1986.
 ^ E. Ruspini, the logical foundations of evidential reasoning, SRI Technical Note 408, December 20, 1986 (revised April 27, 1987)
 ^ N. Wilson, The assumptions behind Dempster's rule, in Proceedings of the 9h Conference on Uncertainty in Artificial Intelligence, pages 527534, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1993
 ^ F. Vorbraak, On the justification of Demspster's rule of combination, Artificial Intelligence, Vol. 48, pp. 171197, 1991
 ^ Pei Wang, A Defect in DempsterShafer Theory, in Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pages 560566, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1994
 ^ P. Walley, Statistical Reasoning with Imprecise Probabilities, Chapman and Hall, London, pp. 278281, 1991
 ^ Pearl, J. (1988a), Probabilistic Reasoning in Intelligent Systems, (Revised Second Printing) San Mateo, CA: Morgan Kaufmann.
 ^ Pearl, J. (1988b). "On Probability Intervals". International Journal of Approximate Reasoning 2 (3): 211–216. doi:10.1016/0888613X(88)90117X.
 ^ Pearl, J. (1990). "Reasoning with Belief Functions: An Analysis of Compatibility". The International Journal of Approximate Reasoning 4 (5/6): 363–389. doi:10.1016/0888613X(90)90013R.
 ^ M.A. Kłopotek, S.T. Wierzchoń': A New Qualitative RoughSet Approach to Modeling Belief Functions. [in:] L. Polkowski, A, Skowron eds: Rough Sets And Current Trends In Computing. Proc. 1st International Conference RSCTC'98, Warsaw, June 22–26, 1998, Lecture Notes in Artificial Intelligence 1424, SpringerVerlag, pp. 346–353.
 ^ Empirical Models for the Dempster–Shafer Theory. in: Srivastava, R.P., Mock, T.J., (Eds.). Belief Functions in Business Decisions. Series: Studies in Fuzziness and Soft Computing. VOL. 88 SpringerVerlag. March 2002. ISBN 3790814512, pp. 62–112
 Joseph C. Giarratano and Gary D. Riley (2005); Expert Systems: principles and programming, ed. Thomson Course Tech., ISBN 0534384471
Further reading
 Yager, R. R., & Liu, L. (2008). Classic works of the Dempster–Shafer theory of belief functions. Studies in fuzziness and soft computing, v. 219. Berlin: Springer. ISBN 9783540253815.
 more references
External links
Categories:
Wikimedia Foundation. 2010.