Multifactor dimensionality reduction

Multifactor dimensionality reduction: Multifactor dimensionality reduction (MDR) is a data mining approach for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable. MDR was designed specifically to identify interactions among discrete variables that influence a binary outcome and is considered a nonparametric alternative to traditional statistical methods such as logistic regression.

The basis of the MDR method is a constructive induction algorithm that converts two or more variables or attributes to a single attribute. This process of constructing a new attribute changes the representation space of the data. The end goal is to create or discover a representation that facilitates the detection of nonlinear or nonadditive interactions among the attributes such that prediction of the class variable is improved over that of the original representation of the data.

Contents

1 Illustrative example

2 Data mining with MDR

3 Applications

4 Software

5 See also

6 References

7 Further reading

Illustrative example

Consider the following simple example using the exclusive OR (XOR) function. XOR is a logical operator that is commonly used in data mining and machine learning as an example of a function that is not linearly separable. The table below represents a simple dataset where the relationship between the attributes (X1 and X2) and the class variable (Y) is defined by the XOR function such that Y = X1 XOR X2.

Table 1

X1 X2 Y

0 0 0

0 1 1

1 0 1

1 1 0

A data mining algorithm would need to discover or approximate the XOR function in order to accurately predict Y using information about X1 and X2. An alternative strategy would be to first change the representation of the data using constructive induction to facilitate predictive modeling. The MDR algorithm would change the representation of the data (X1 and X2) in the following manner. MDR starts by selecting two attributes. In this simple example, X1 and X2 are selected. Each combination of values for X1 and X2 are examined and the number of times Y=1 and/or Y=0 is counted. In this simple example, Y=1 occurs zero times and Y=0 occurs once for the combination of X1=0 and X2=0. With MDR, the ratio of these counts is computed and compared to a fixed threshold. Here, the ratio of counts is 0/1 which is less than our fixed threshold of 1. Since 0/1 < 1 we encode a new attribute (Z) as a 0. When the ratio is greater than one we encode Z as a 1. This process is repeated for all unique combinations of values for X1 and X2. Table 2 illustrates our new transformation of the data.

Table 2

Z Y

0 0

1 1

1 1

0 0

The data mining algorithm now has much less work to do to find a good predictive function. In fact, in this very simple example, the function Y = Z has a classification accuracy of 1. A nice feature of constructive induction methods such as MDR is the ability to use any data mining or machine learning method to analyze the new representation of the data. Decision trees, neural networks, or a naive Bayes classifier could be used.

Data mining with MDR

As illustrated above, the basic constructive induction algorithm in MDR is very simple. However, its implementation for mining patterns from real data can be computationally complex. As with any data mining algorithm there is always concern about overfitting. That is, data mining algorithms are good at finding patterns in completely random data. It is often difficult to determine whether a reported pattern is an important signal or just chance. One approach is to estimate the generalizability of a model to independent datasets using methods such as cross-validation. Typically, models that describe random data typically don't generalize. Another approach is to generate many random permutations of the data to see what the data mining algorithm finds when given the chance to overfit. Permutation testing makes it possible to generate an empirical p-value for the result. These approaches have all been shown to be useful for choosing and evaluating MDR models.

Applications

MDR has mostly been applied^{[citation needed]} to detecting gene-gene interactions or epistasis in genetic studies of common human diseases such as atrial fibrillation, autism, bladder cancer, breast cancer, cardiovascular disease, hypertension, prostate cancer, schizophrenia, and type II diabetes. However, it can be applied to other domains such as economics, engineering, meteorology, etc. where interactions among discrete attributes might be important for predicting a binary outcome.^{[citation needed]}

Software

www.epistasis.org provides an open-source and freely-available MDR software package.

See also

Dimensionality reduction

Machine Learning

Data Mining

References

Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001 Jul;69(1):138-47. PubMed

Moore JH, Williams SM. New strategies for identifying gene-gene interactions in hypertension. Ann Med. 2002;34(2):88-95. PubMed

Ritchie MD, Hahn LW, Moore JH. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol. 2003 Feb;24(2):150-7. PubMed

Hahn LW, Ritchie MD, Moore JH. Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics. 2003 Feb 12;19(3):376-82. PubMed

Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered. 2003;56(1-3):73-82. PubMed

Cho YM, Ritchie MD, Moore JH, Park JY, Lee KU, Shin HD, Lee HK, Park KS. Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus. Diabetologia. 2004 Mar;47(3):549-54. PubMed

Tsai CT, Lai LP, Lin JL, Chiang FT, Hwang JJ, Ritchie MD, Moore JH, Hsu KL, Tseng CD, Liau CS, Tseng YZ. Renin-angiotensin system gene polymorphisms and atrial fibrillation. Circulation. 2004 Apr 6;109(13):1640-6. PubMed

Hahn LW, Moore JH. Ideal discrimination of discrete clinical endpoints using multilocus genotypes. In Silico Biol. 2004;4(2):183-94. PubMed

Coffey CS, Hebert PR, Ritchie MD, Krumholz HM, Gaziano JM, Ridker PM, Brown NJ, Vaughan DE, Moore JH. An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene interactions on risk of myocardial infarction: the importance of model validation. BMC Bioinformatics. 2004 Apr 30;5:49. PubMed

Moore JH. Computational analysis of gene-gene interactions using multifactor dimensionality reduction.

Expert Rev Mol Diagn. 2004 Nov;4(6):795-803. PubMed

Williams SM, Ritchie MD, Phillips JA 3rd, Dawson E, Prince M, Dzhura E, Willis A, Semenya A, Summar M, White BC, Addy JH, Kpodonu J, Wong LJ, Felder RA, Jose PA, Moore JH. Multilocus analysis of hypertension: a hierarchical approach.

Hum Hered. 2004;57(1):28-38. PubMed

Bastone L, Reilly M, Rader DJ, Foulkes AS. MDR and PRP: a comparison of methods for high-order genotype-phenotype associations. Hum Hered. 2004;58(2):82-92. PubMed

Ma DQ, Whitehead PL, Menold MM, Martin ER, Ashley-Koch AE, Mei H, Ritchie MD, Delong GR, Abramson RK, Wright HH, Cuccaro ML, Hussman JP, Gilbert JR, Pericak-Vance MA. Identification of significant association and gene-gene interaction of GABA receptor subunit genes in autism. Am J Hum Genet. 2005 Sep;77(3):377-88. PubMed

Soares ML, Coelho T, Sousa A, Batalov S, Conceicao I, Sales-Luis ML, Ritchie MD, Williams SM, Nievergelt CM, Schork NJ, Saraiva MJ, Buxbaum JN. Susceptibility and modifier genes in Portuguese transthyretin V30M amyloid polyneuropathy: complexity in a single-gene disease. Hum Mol Genet. 2005 Feb 15;14(4):543-53. PubMed

Qin S, Zhao X, Pan Y, Liu J, Feng G, Fu J, Bao J, Zhang Z, He L. An association study of the N-methyl-D-aspartate receptor NR1 subunit gene (GRIN1) and NR2B subunit gene (GRIN2B) in schizophrenia with universal DNA microarray.

Eur J Hum Genet. 2005 Jul;13(7):807-14. PubMed

Wilke RA, Moore JH, Burmester JK. Relative impact of CYP3A genotype and concomitant medication on the severity of atorvastatin-induced muscle damage. Pharmacogenet Genomics. 2005 Jun;15(6):415-21. PubMed

Xu J, Lowey J, Wiklund F, Sun J, Lindmark F, Hsu FC, Dimitrov L, Chang B, Turner AR, Liu W, Adami HO, Suh E, Moore JH, Zheng SL, Isaacs WB, Trent JM, Gronberg H. The interaction of four genes in the inflammation pathway significantly predicts prostate cancer risk. Cancer Epidemiol Biomarkers Prev. 2005 Nov;14 (11 Pt 1):2563-8. PubMed

Wilke RA, Reif DM, Moore JH. Combinatorial pharmacogenetics. Nat Rev Drug Discov. 2005 Nov;4(11):911-8. PubMed

Ritchie MD, Motsinger AA. Multifactor dimensionality reduction for detecting gene-gene and gene-environment interactions in pharmacogenomics studies. Pharmacogenomics. 2005 Dec;6(8):823-34. PubMed

Andrew AS, Nelson HH, Kelsey KT, Moore JH, Meng AC, Casella DP, Tosteson TD, Schned AR, Karagas MR. Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. Carcinogenesis. 2006 May;27(5):1030-7. PubMed

Moore JH, Gilbert JC, Tsai CT, Chiang FT, Holden T, Barney N, White BC. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol. 2006 Jul 21;241(2):252-61. PubMed

Further reading

R. S. Michalski, "Pattern Recognition as Knowledge-Guided Computer Induction," Department of Computer Science Reports, No. 927, University of Illinois, Urbana, June 1978.

Categories:
Data mining
Dimension reduction
Classification algorithms

X1	X2	Y
0	0	0
0	1	1
1	0	1
1	1	0

Z	Y
0	0
1	1
1	1
0	0

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

Dimension reduction — For dimensional reduction in physics, see Dimensional reduction. In machine learning, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature… … Wikipedia
Feature extraction — In pattern recognition and in image processing, Feature extraction is a special form of dimensionality reduction.When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not… … Wikipedia
List of mathematics articles (M) — NOTOC M M estimator M group M matrix M separation M set M. C. Escher s legacy M. Riesz extension theorem M/M/1 model Maass wave form Mac Lane s planarity criterion Macaulay brackets Macbeath surface MacCormack method Macdonald polynomial Machin… … Wikipedia
Data mining — Not to be confused with analytics, information extraction, or data analysis. Data mining (the analysis step of the knowledge discovery in databases process,[1] or KDD), a relatively young and interdisciplinary field of computer science[2][3] is… … Wikipedia
MDR — or mdr may refer to: MDR is abbreviation of Master Document Register. MDR is abbreviation of Mort de rire in french (LOL equivalent). Mammalian diving reflex Mandar language Marina del Rey, California Management Data Repository, the relationship… … Wikipedia
Minería de datos — La minería de datos (DM, Data Mining) consiste en la extracción no trivial de información que reside de manera implícita en los datos. Dicha información era previamente desconocida y podrá resultar útil para algún proceso. En otras palabras, la… … Wikipedia Español
MDR — Mort De Rire (International » French) Multi Drug Resistance (Medical » Laboratory) *** Multi Drug Resistant (Medical » Oncology) * Multiple Drug Resistant (Medical » Hospitals) * McDermott International, Inc. (Business » NYSE Symbols) * Minimum… … Abbreviations dictionary

Academic Dictionaries and Encyclopedias

Multifactor dimensionality reduction

Contents

Illustrative example

Data mining with MDR

Applications

Software

See also

References

Further reading

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Multifactor dimensionality reduction

Contents

Illustrative example

Data mining with MDR

Applications

Software

See also

References

Further reading

Look at other dictionaries:

Share the article and excerpts

Direct link