Semi-supervised learning

Semi-supervised learning

In computer science, semi-supervised learning is a class of machine learning techniques that make use of both labeled and unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. The acquisition of labeled data for a learning problem often requires a skilled human agent to manually classify training examples. The cost associated with the labeling process thus may render a fully labeled training set infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value.

One example of a semi-supervised learning technique is co-training, in which two or possibly more learners are each trained on a set of examples, but with each learner using a different, and ideally independent, set of features for each example.

An alternative approach is to model the joint probability distribution of the features and the labels. For the unlabelled data the labels can then be treated as 'missing data'. It is common to use the EM algorithm to maximize the likelihood of the model.

ee also

* Constrained clustering
* Transductive learning

References

# Abney, S., Semisupervised Learning for Computational Linguistics. Chapman & Hall/CRC, 2008.
# Blum, A., Mitchell, T. [http://www.cs.wustl.edu/~zy/paper/cotrain.ps Combining labeled and unlabeled data with co-training] . "COLT: Proceedings of the Workshop on Computational Learning Theory", Morgan Kaufmann, 1998, p. 92-100.
# Chapelle, O., B. Schölkopf and A. Zien: "Semi-Supervised Learning". MIT Press, Cambridge, MA (2006). [http://www.kyb.tuebingen.mpg.de/ssl-book/ Further information] .
# Huang T-M., Kecman V., Kopriva I. [http://www.learning-from-data.com] , "Kernel Based Algorithms for Mining Huge Data Sets, Supervised, Semisupervised and Unsupervised Learning", Springer-Verlag, Berlin, Heidelberg, 260 pp. 96 illus., Hardcover, ISBN 3-540-31681-7, 2006.
# O'Neill, T. J. (1978) Normal discrimination with unclassified observations. Journal of the American Statistical Association, 73, 821–826.
# Zhu, X. [http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf Semi-supervised learning literature survey] .


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Machine learning — is a subfield of artificial intelligence that is concerned with the design and development of algorithms and techniques that allow computers to learn . In general, there are two types of learning: inductive, and deductive. Inductive machine… …   Wikipedia

  • Transduction (machine learning) — In logic, statistical inference, and supervised learning,transduction or transductive inference is reasoning fromobserved, specific (training) cases to specific (test) cases. In contrast, induction is reasoning from observed training casesto… …   Wikipedia

  • Machine learning — Apprentissage automatique L apprentissage automatique (machine learning en anglais) est un des champs d étude de l intelligence artificielle. L apprentissage automatique fait référence au développement, à l analyse et à l implémentation de… …   Wikipédia en Français

  • Maching learning — Apprentissage automatique L apprentissage automatique (machine learning en anglais) est un des champs d étude de l intelligence artificielle. L apprentissage automatique fait référence au développement, à l analyse et à l implémentation de… …   Wikipédia en Français

  • Word-sense disambiguation — Disambiguation redirects here. For other uses, see Disambiguation (disambiguation). In computational linguistics, word sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying which… …   Wikipedia

  • Разрешение лексической многозначности — Необходимо проверить качество перевода и привести статью в соответствие со стилистическими правилами Википедии. Вы можете помочь …   Википедия

  • Co-training — is a machine learning algorithm used when there are only small amounts of labeled data and large amounts of unlabeled data. One of its uses is in text mining for search engines. It was introduced by Avrim Blum and Tom Mitchell in 1998. Contents 1 …   Wikipedia

  • Natural language processing — (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages; it began as a branch of artificial intelligence.[1] In theory, natural language processing is a very attractive… …   Wikipedia

  • Машинное обучение — (англ. Machine Learning)  обширный подраздел искусственного интеллекта, изучающий методы построения алгоритмов, способных обучаться. Различают два типа обучения. Обучение по прецедентам, или индуктивное обучение, основано на выявлении… …   Википедия

  • Cluster assumption — The cluster assumption is a type of data modeling used in machine learning specifically in Supervised learning and Semi supervised learning. It states that if points are in the same cluster, they are likely to be of the same class.[1] There may… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”