Random forest

Random forest

In machine learning, a random forest is a classifier that consists of many decision trees and outputs the class that is the mode of the classes output by individual trees. The algorithm for inducing a random forest was developed by Leo Breiman and Adele Cutler, and "Random Forests" is their trademark. The term came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in 1995. The method combines Breiman's "bagging" idea and Ho's "random subspace method" to construct a collection of decision trees with controlled variations.

Learning algorithm

Each tree is constructed using the following algorithm:
# Let the number of training cases be "N", and the number of variables in the classifier be "M".
# We are told the number "m" of input variables to be used to determine the decision at a node of the tree; "m" should be much less than "M".
# Choose a training set for this tree by choosing "N" times with replacement from all "N" available training cases (i.e. take a bootstrap sample). Use the rest of the cases to estimate the error of the tree, by predicting their classes.
# For each node of the tree, randomly choose "m" variables on which to base the decision at that node. Calculate the best split based on these m variables in the training set.
# Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier).

Advantages

The advantages of random forest are:
* For many data sets, it produces a highly accurate classifier.
* It handles a very large number of input variables.
* It estimates the importance of variables in determining classification.
* It generates an internal unbiased estimate of the generalization error as the forest building progresses.
* It includes a good method for estimating missing data and maintains accuracy when a large proportion of the data are missing.
* It provides an experimental way to detect variable interactions.
* It can balance error in class population unbalanced data sets.
* It computes proximities between cases, useful for clustering, detecting outliers, and (by scaling) visualizing the data.
* Using the above, it can be extended to unlabeled data, leading to unsupervised clustering, outlier detection and data views.
* Learning is fast.

External links

* [http://cm.bell-labs.com/cm/cs/who/tkh/papers/odt.pdf Ho, Tin Kam (1995). "Random Decision Forest". Proc. of the 3rd Int'l Conf. on Document Analysis and Recognition, Montreal, Canada, August 14-18, 1995, 278-282] (Preceding Work)
* [http://cm.bell-labs.com/cm/cs/who/tkh/papers/df.pdf Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests". IEEE Trans. on Pattern Analysis and Machine Intelligence 20 (8), 832-844] (Preceding Work)
* [http://www.cis.jhu.edu/publications/papers_in_database/GEMAN/shape.pdf Amit, Yali and Geman, Donald (1997) "Shape quantization and recognition with randomized trees". Neural Computation 9, 1545-1588.] (Preceding work)
* [http://www.ics.uci.edu/~liang/seminars/win05/papers/wald2002-2.pdf Breiman, Leo "Looking Inside The Black Box". Wald Lecture II] (Lecture)
* [http://www.springerlink.com/content/u0p06167n6173512/fulltext.pdf Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1), 5-32] (Original Article)
* [http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm Random Forest classifier description] (Site of Leo Breiman)
* [http://cran.r-project.org/doc/Rnews/Rnews_2002-3.pdf Liaw, Andy & Wiener, Matthew "Classification and Regression by randomForest" R News (2002) Vol. 2/3 p. 18] (Discussion of the use of the random forest package for R)
* [http://cm.bell-labs.com/cm/cs/who/tkh/papers/compare.pdf Ho, Tin Kam (2002). "A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors". Pattern Analysis and Applications 5, p. 102-112] (Comparison of bagging and random subspace method)
* [http://dx.doi.org/10.1007/978-3-540-74469-6_35 Prinzie, A., Van den Poel, D. (2007). Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB, Dexa 2007, Lecture Notes in Computer Science, 4653, 349-358.] Generalizing Random Forest framework to other methods. The paper introduces Random MNL and Random NB as two generalizations of Random Forests.
* [http://dx.doi.org/10.1016/j.eswa.2007.01.029 Prinzie, A., Van den Poel, D. (2008). Random Forests for multiclass classification: Random MultiNomial Logit, Expert Systems with Applications, 34(3), 1721-1732.] Generalization of Random Forests to choice models like the Multinomial Logit Model (MNL): Random Multinomial Logit.

See also

*Random multinomial logit
*Random naive bayes


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Random forest — (англ. случайный лес)  алгоритм машинного обучения, предложенный Лео Брейманом[1][2] и Адель Катлер, заключающийся в использовании комитета (ансамбля) решающих деревьев. Алгоритм сочетает в себе две основные идеи: метод бэггинга… …   Википедия

  • Random Forest — Ein Random Forest ist ein Klassifikationsverfahren, welches aus mehreren verschiedenen, unkorrelierten Entscheidungsbäumen besteht. Alle Entscheidungsbäume sind unter einer bestimmten Art von Randomisierung während des Lernprozesses gewachsen.… …   Deutsch Wikipedia

  • Random Forest — …   Википедия

  • Random naive Bayes — extends the Naive Bayes classifier by adopting the random forest principles: random input selection (bagging, i.e. bootstrap aggregating) and random feature selection ( [Breiman, 2001] ). Naive Bayes classifier Naive Bayes is a probabilistic… …   Wikipedia

  • Random multinomial logit — In statistics and machine learning, random multinomial logit (RMNL) is a technique for (multi class) statistical classification using repeated multinomial logit analyses via Leo Breiman s random forests. Rationale for the new methodSeveral… …   Wikipedia

  • Forest inventory — is the systematic collection of data and forest information for assessment or analysis. It is also commonly known as timber cruising. It is important for owners to cruise the timber to get an estimate of the value and possible uses of the timber …   Wikipedia

  • Random encounter — A random encounter is a feature commonly used in hack and slash role playing games and computer and video games whereby encounters with non player character (NPC) enemies or other dangers occur sporadically and at random. In general, random… …   Wikipedia

  • Forest City Stockade — The Forest City Stockade was built to defend the area settlers from Indian attacks. It became famous during the Dakota War of 1862. The following account is taken from Terry Tales 2, a book by Terry R. Shaw: It had been Jesse Branham, Sr.’s son… …   Wikipedia

  • Allegheny National Forest — Infobox protected area | name = Allegheny National Forest iucn category = VI caption = locator x = 230 locator y = 64 location = Warren, McKean, Forest, and Elk counties, Pennsylvania, USA nearest city = Warren, PA lat degrees = 41 lat minutes =… …   Wikipedia

  • Lake Forest College — Infobox University name = Lake Forest College native name = latin name = motto = Natura et Scientia Amore established = 1857 type = Liberal Arts School endowment = $76,700,000 staff = faculty = 117 president = Stephen D. Schutt provost = Janet… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”