Constrained clustering

Constrained clustering

In computer science, constrained clustering is a class of semi-supervised learning algorithms. Typically, constrained clustering incorporates either a set of must-link constraints, cannot-link constraints, or both, with a Data clustering algorithm. Both a must-link and a cannot-link constraint define a relationship between two data instances. A must-link constraint is used to specify that the two instances in the must-link relation should be associated with the same cluster. A cannot-link constraint is used to specify that the two instances in the cannot-link relation should not be associated with the same cluster. These sets of constraints acts as a guide for which a constrained clustering algorithm will attempt to find clusters in a data set which satisfy the specified must-link and cannot-link constraints. Some constrained clustering algorithms will abort if no such clustering exists which satisfies the specified constraints. Others will try to minimize the amount of constraint violation should it be impossible to find a clustering which satisfies the constraints.

Examples of constrained clustering algorithms include:

  • COP K-means
  • PCKmeans

References


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Document clustering — (also referred to as Text clustering) is closely related to the concept of data clustering. Document clustering is a more specific technique for unsupervised document organization, automatic topic extraction and fast information retrieval or… …   Wikipedia

  • Cluster analysis — The result of a cluster analysis shown as the coloring of the squares into three clusters. Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more… …   Wikipedia

  • Expectation-maximization algorithm — An expectation maximization (EM) algorithm is used in statistics for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. EM alternates between performing an… …   Wikipedia

  • Semi-supervised learning — In computer science, semi supervised learning is a class of machine learning techniques that make use of both labeled and unlabeled data for training typically a small amount of labeled data with a large amount of unlabeled data. Semi supervised… …   Wikipedia

  • Elsevier BIOBASE — Not to be confused with BIOBASE (company), a German publisher of molecular biology databases. Elsevier BIOBASE Producer Elsevier History 1954–present Languages English Access Providers DataStar, DIALOG, DIMDI, STN …   Wikipedia

  • 2-satisfiability — In computer science, 2 satisfiability (abbreviated as 2 SAT or just 2SAT) is the problem of determining whether a collection of two valued (Boolean or binary) variables with constraints on pairs of variables can be assigned values satisfying all… …   Wikipedia

  • Non-negative matrix factorization — NMF redirects here. For the bridge convention, see new minor forcing. Non negative matrix factorization (NMF) is a group of algorithms in multivariate analysis and linear algebra where a matrix, , is factorized into (usually) two matrices, and… …   Wikipedia

  • Classification sous contrainte — En intelligence artificielle, la classification sous contrainte désigne une famille d algorithmes d apprentissage semi supervisée. Cette dernière occupe une place intermédiaire entre l apprentissage supervisé (pour laquelle les étiquettes de… …   Wikipédia en Français

  • Database — A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports… …   Wikipedia

  • Mixture model — See also: Mixture distribution In statistics, a mixture model is a probabilistic model for representing the presence of sub populations within an overall population, without requiring that an observed data set should identify the sub population… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”