Optimal matching

Optimal matching: Optimal matching is a sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced. Once such distances have been calculated for a set of observations (e.g. individuals in a cohort) classical tools (such as cluster analysis) can be used. The method was tailored to social sciences^[1] from a technique originally introduced to study molecular biology (protein or genetic) sequences (see sequence alignment).

Contents

1 Algorithm

2 Criticism

3 Optimal matching in causal modelling

4 Software

5 References and notes

Algorithm

Let $S = (s_1, s_2, s_3, \ldots s_T)$ be a sequence of states $s i$ belonging to a finite set of possible states. Let us denote ${\mathbf S}$ the sequence space, i.e. the set of all possible sequences of states.

Optimal matching algorithms work by defining simple operator algebras that manipulate sequences, i.e. a set of operators $a_i: {\mathbf S} \rightarrow {\mathbf S}$ . In the most simple approach, a set composed of only three basic operations to transform sequences is used:

one state $s$ is inserted in the sequence $a^{\rm Ins}_{s'} (s_1, s_2, s_3, \ldots s_T) = (s_1, s_2, s_3, \ldots, s', \ldots s_T)$

one state is deleted from the sequence $a^{\rm Del}_{s_2} (s_1, s_2, s_3, \ldots s_T) = (s_1, s_3, \ldots s_T)$ and

a state $s 1$ is replaced (substituted) by state $s' 1$ , $a^{\rm Sub}_{s_1,s'_1} (s_1, s_2, s_3, \ldots s_T) = (s'_1, s_2, s_3, \ldots s_T)$ .

Imagine now that a cost $c(a_i) \in {\mathbf R}^+_0$ is associated to each operator. Given two sequences $S 1$ and $S 2$ , the idea is to measure the cost of obtaining $S 2$ from $S 1$ using operators from the algebra. Let $A={a_1, a_2, \ldots a_n}$ be a sequence of operators such that the application of all the operators of this sequence $A$ to the first sequence $S 1$ gives the second sequence $S 2$ : $S_2 = a_1 \circ a_2 \circ \ldots \circ a_{n} (S_1)$ where $a_1 \circ a_2$ denotes the compound operator. To this set we associate the cost $c(A) = \sum_{i=1}^n c(a_i)$ , that represents the total cost of the transformation. One should consider at this point that there might exist different such sequences $A$ that transform $S 1$ into $S 2$ ; a reasonable choice is to select the cheapest of such sequences. We thus call distance
$d(S_1,S_2)= \min_A \left \{ c(A)~{\rm such~that}~S_2 = A (S_1) \right \}$
that is, the cost of the less expensive set of transformations that turn $S 1$ into $S 2$ . Notice that $d (S 1, S 2)$ is by definition nonnegative since it is the sum of positive costs, and trivially $d (S 1, S 2) = 0$ if and only if $S 1 = S 2$ , that is there is no cost. The distance function is symmetric if insertion and deletion costs are equal $c (a Ins) = c (a Del)$ ; the term indel cost usually refers to the common cost of insertion and deletion.

Considering a set composed of only the three basic operations described above, this proximity measure satisfies the triangular inequality. Transitivity however, depends on the definition of the set of elementary operations.

Criticism

Although widely used in sociology, demography, several critics that have been moved to optimal matching techniques. As was pointed out by several authors (see for example^[2]), the main problem in the application of optimal matching concerns the definition of the costs $c (a i)$ .

Optimal matching in causal modelling

Optimal matching is also a term used in statistical modelling of causal effects. In this context it refers to matching "cases" with "controls", and is completely separate from the sequence-analytic sense.

Software

Abbott's original program is available. The source code and windows binary are available.

TDA is a powerful program, offering access to some of the latest developments in transition data analysis.

STATA has implemented a package to run optimal matching analysis.

TraMineR is an open source R-package for analysing and visualizing states and events sequences, including optimal matching analysis.

References and notes

^ A. Abbott and A. Tsay, (2000) Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect Sociological Methods & Research], Vol. 29, 3-33. DOI: 10.1177/0049124100029001001

^ L. L. Wu. (2000) Some Comments on "Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect" Sociological Methods & Research, 29 41-64. DOI: 10.1177/0049124100029001003

Categories:
Data mining
Statistical distance measures

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

optimal matching — optimal matching, optimal alignment See sequence analysis … Dictionary of sociology
Matching distance — In mathematics, the matching distance[1][2] is a metric on the space of size functions. Example: The matching distance between … Wikipedia
Matching pursuit — Signal reconstruction with matching pursuit algorithm. Matching pursuit is a type of numerical technique which involves finding the best matching projections of multidimensional data onto an over complete dictionary D. The basic idea is to… … Wikipedia
Impedance matching — In electronics, impedance matching is the practice of designing the input impedance of an electrical load (or the output impedance of its corresponding signal source) to maximize the power transfer and/or minimize reflections from the load.… … Wikipedia
National Resident Matching Program — The National Resident Matching Program (NRMP) (or the Match[1]) is a United States based non profit non governmental organization created in 1952 to help match medical school students with residency programs. The NRMP is sponsored by the American … Wikipedia
Compressed pattern matching — In computer science Compressed Pattern Matching or CPM is the process of searching for pattern in compressed data with little or no decompression. Searching in a compressed string is faster than searching an uncompressed string and requires less… … Wikipedia
Probability matching — is a suboptimal decision strategy in which predictions of class membership are proportional to the class base rates. Thus, if in the training set positive examples are observed 60% of the time, and negative examples are observed 40% of the time,… … Wikipedia
Sequence alignment — In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.[1]… … Wikipedia
List of mathematics articles (O) — NOTOC O O minimal theory O Nan group O(n) Obelus Oberwolfach Prize Object of the mind Object theory Oblate spheroid Oblate spheroidal coordinates Oblique projection Oblique reflection Observability Observability Gramian Observable subgroup… … Wikipedia
Bioluminescence — Flying and glowing firefly, a.k.a. Photinus pyralis … Wikipedia

Academic Dictionaries and Encyclopedias

Optimal matching

Contents

Algorithm

Criticism

Optimal matching in causal modelling

Software

References and notes

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Optimal matching

Contents

Algorithm

Criticism

Optimal matching in causal modelling

Software

References and notes

Look at other dictionaries:

Share the article and excerpts

Direct link