 Constrained Conditional Models

A Constrained Conditional Model (CCM) is a machine learning and inference framework that augments the learning of conditional (probabilistic or discriminative) models with declarative constraints. The constraint can be used as a way to incorporate expressive prior knowledge into the model and bias the assignments made by the learned model to satisfy these constraints. The framework can be used to support decisions in an expressive output space while maintaining modularity and tractability of training and inference.
Models of this kind have recently attracted much attention within the Natural Language Processing (NLP) community. Formulating problems as constrained optimization problems over the output of learned models has several advantages. It allows one to focus on the modeling of problems by providing the opportunity to incorporate domainspecific knowledge as global constraints using a first order language. Using this declarative framework frees the developer from low level feature engineering while capturing the problem's domainspecific properties and guarantying exact inference. From a machine learning perspective it allows decoupling the stage of model generation (learning) from that of the constrained inference stage, thus helping to simplify the learning stage while improving the quality of the solutions. For example, in the case of generating compressed sentences, rather than simply relaying on a language model to keep in the sentence the most commonly used ngrams, constraints can be use to make sure that if a modifier is kept in the compressed sentence, its subject will also be kept.
Contents
Motivation
Making decisions in many domains (such as natural language processing and computer vision problems) often involve assigning values to sets of interdependent variables where the expressive dependency structure can influence, or even dictate, what assignments are possible. These settings are applicable to Structured Learning problems such as semantic role labeling but also for cases that require making use of multiple prelearned components, such as summarization, textual entailment and question answering. In all these cases, it is natural to formulate the decision problem as a constrained optimization problem, with an objective function that is composed of learned models, subject to domain or problem specific constraints.
Constrained Conditional Models is a learning and inference framework that augments the learning of conditional (probabilistic or discriminative) models with declarative constraints (written, for example, using a firstorder representation) as a way to support decisions in an expressive output space while maintaining modularity and tractability of training and inference. These constraints can express either hard restrictions, completely prohibiting some assignments, or soft restrictions, penalizing unlikely assignments. In most applications of this framework in NLP, following ^{[1]}, Integer Linear Programming (ILP) was used as the inference framework, although other algorithms can be used for that purpose.
Formal Definition
Given a set of feature functions {ϕ_{i}(x,y)} and a set of constraints {C_{i}(x,y)}, defined over an input structure and an output structure , a constraint conditional model is characterized by two weight vectors, w and ρ, and is defined as the solution to the following optimization problem:
 .
Each constraint is a boolean mapping indicating if the joint assignment (x,y) violates a constraint, and ρ is the penalty incurred for violating the constraints. Constraints assigned an infinite penalty are known as hard constraints, and represent unfeasible assignments to the optimization problem.
Training Paradigms
Learning Local VS. Global Models
The objective function used by CCMs can be decomposed and learned in several ways, ranging from a complete joint training of the model along with the constraints to completely decoupling between the learning and the inference stage. In the latter case, several local models are learned independently and the dependency between these models is considered only at decision time via a global decision process. The advantages of each approach are discussed in ^{[2]}, which studies the two training paradigms: (1) local models: L+I (learning+inference) and (2) global model: IBT (Inference based training), and shows both theoretically and experimentally that while IBT (joint training) is best in the limit, under some conditions (basically, ”good” components”) L+I can generalize better.
The ability of CCM to combine local model is especially beneficial in cases where joint leaning is computationally intractable or when training data is not available for joint learning. This flexibility distinguishes CCM from other learning frameworks (e.g., Markov logic network) which emphasize joint training.
Minimally Supervised CCM
CCM can help reduce supervision by using domain knowledge (expressed as constraints) to drive learning. These setting were studied in ^{[3]} and ^{[4]}. These works introduce semisupervised Constraints Driven Learning (CODL) and show that by incorporating domain knowledge the performance of the learned model improves significantly.
Learning over Latent Representations
CCMs were also applied to latent learning frameworks, where the learning problem is defined over a latent representation layer. Since the notion of a correct representation is inherently illdefined no goldlabeled data regarding the representation decision is available to the learner. Identifying the correct (or optimal) learning representation is viewed as a structured prediction process and therefore modeled as a CCM. This problem was studied by several papers, in both supervised ^{[5]} and unsupervised ^{[6]} settings and in all cases showed that explicitly modeling the interdependencies between representation decisions via constraints results in an improved performance.
Integer Linear Programming for Natural Language Processing Applications
The advantages of the CCM declarative formulation and the availability of offtheshelf solvers have led to a large variety of natural language processing tasks being formulated within framework, including semantic role labeling ^{[7]}, syntactic parsing ^{[8]}, coreference resolution ^{[9]}, summarization ^{[10]} ^{[11]} ^{[12]}., transliteration ^{[13]}, natural language generation ^{[14]} and joint information extraction ^{[15]} ^{[16]}.
Most of these works use an Integer Linear Programming (ILP) solver to solve the decision problem. Although theoretically solving an Integer Linear Program is exponential in the size of the decision problem in practice using stateoftheart solvers and approximated inference techniques ^{[17]} large scale problems can be solved efficiently.
The key advantage of using an ILP solver for solving the optimization problem defined by a constrained conditional model is the declarative formulation used as input for the ILP solver, consisting of a linear objective function and a set of linear constraints.
Resources
 CCM Tutorial Integer Linear Programming in NLP – Constrained Conditional Models, NAACL2010
 CCM Software Learning Based Java
External links
 University of Illinois Cognitive Computation Group
 Workshop on Integer Linear Programming for Natural Language Processing, NAACL2009
References
 ^ Dan Roth and Wentau Yih, "A Linear Programming Formulation for Global Inference in Natural Language Tasks." CoNLL, (2004).
 ^ Vasin Punyakanok and Dan Roth and WenTau Yih and Dav Zimak, "Learning and Inference over Constrained Output." IJCAI, (2005).
 ^ MingWei Chang and Lev Ratinov and Dan Roth, "Guiding SemiSupervision with ConstraintDriven Learning." ACL, (2007).
 ^ MingWei Chang and Lev Ratinov and Dan Roth, "Constraints as Prior Knowledge." ICML Workshop on Prior Knowledge for Text and Language Processing}, (2008).
 ^ MingWei Chang and Dan Goldwasser and Dan Roth and Vivek Srikumar, " Discriminative Learning over Constrained Latent Representations." NAACL, (2010).
 ^ MingWei Chang Dan Goldwasser Dan Roth and Yuancheng Tu, "Unsupervised Constraint Driven Learning For Transliteration Discovery." NAACL, (2009).
 ^ Vasin Punyakanok, Dan Roth, Wentau Yih and Dav Zimak, "Semantic Role Labeling via Integer Linear Programming Inference." COLING, (2004).
 ^ Kenji Sagae and Yusuke Miyao and Jun’ichi Tsujii, "HPSG Parsing with Shallow Dependency Constraints." ACL, (2007).
 ^ Pascal Denis and Jason Baldridge, "Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming." NAACLHLT, (2007).
 ^ James Clarke and Mirella Lapata, "Global Inference for Sentence Compression: An Integer Linear Programming Approach." Journal of Artificial Intelligence Research (JAIR), (2008).
 ^ Katja Filippova and Michael Strube, "Dependency Tree Based Sentence Compression." INLG, (2008).
 ^ Katja Filippova and Michael Strube, "Sentence Fusion via Dependency Graph Compression." EMNLP, (2008).
 ^ Dan Goldwasser and Dan Roth, "Transliteration as Constrained Optimization." EMNLP, (2008).
 ^ Regina Barzilay and Mirrela Lapata, "Aggregation via Set Partitioning for Natural Language Generation." NAACL, (2006).
 ^ Dan Roth and Wentau Yih, "A Linear Programming Formulation for Global Inference in Natural Language Tasks." CoNLL, (2004).
 ^ Yejin Choi and Eric Breck and Claire Cardie, "Joint Extraction of Entities and Relations for Opinion Recognition." EMNLP, (2006).
 ^ André F. T. Martins, Noah A. Smith, and Eric P. Xing , "Concise Integer Linear Programming Formulations for Dependency Parsing ." ACL, (2009).
Categories: Artificial intelligence
 Theoretical computer science
 Machine learning algorithms
Wikimedia Foundation. 2010.