Closest string

Closest string

In theoretical computer science, closest string is the name of an NP-hard computational problem, which tries to find the geometrical center of a set of input strings.

To understand the word "center" it is necessary to define a distance between two strings. Usually, this problem is studied with the Hamming distance in mind.

Contents

Formal definition

More formally, given n length-m strings s_1,s_2,\dots,s_n and a radius k, the closest string problem seeks for a new length-m string s such that d(s,s_i)\le k\  \forall s_i where d denotes the Hamming distance. [1]

Motivation

Closest string is an intensively studied facet of the problem of finding signals in DNA.

Simplifications and data reductions

Instances of closest string may contain information that is not essential to the problem. In some sense, the usual input of closest string contains information, that does not contribute to the hardness of the problem. For example, if some strings contain the character a, but none contains the character z, we could replace all as with zs and this modified instance would essentially be equivalent, that is: given a solution of the modified instance, we could tell the original solution and vice versa.

Normalizing the input

When all input strings that share the same length are written on top of each other, they form a matrix. Now, we can see that certain row types have essentially the same implications to the solution. Say, given a column with three entries, (a,a,b), then we might change the solution string, but not the solvability, by replacing this column with another column (x,x,y), because they express the same structure. This structure being that the first two entries are the same while the third is different.

We can, in each column, replace the character that occurs the most often with a, the character that occurs the second most often, and so forth. An input instance with this property is called normalized.

If we have a solution to the modified instance, we can find the original instance by remapping the characters of the solution to its original version in every column.

The order of the columns does not contribute to the hardness of the problem. That means, if we permute all input strings according to a certain permutation π and obtain a solution string s to that modified instance, then π − 1(s) will be a solution to the original instance.

Example

Given an instance with three input strings 'uvwx', 'xuwv', 'xvwu'. This could be written as a matrix like this:


\begin{matrix}
u&v&w&x\\
x&u&w&v\\
x&v&w&u
\end{matrix}

The first column has the values (u,x,x). As x is the character that appears the most often, we replace it by a, and we replace u, the second most often character, by b, obtaining the new first column (b,a,a).

Doing the same with all columns gives the new normalized instance


\begin{matrix}
b&a&a&a\\
a&b&a&b\\
a&a&a&c
\end{matrix}

Data reduction obtained from normalization

Normalizing the input reduces the alphabet size to at most the number of input strings. This can be useful for algorithms whose running times depend on the alphabet size.

Approximability

Li et al. evolved a polynomial-time approximation scheme[2] which is practically unusable because of the large hidden constants.

Fixed-parameter tractability

Closest String can be solved in O(kL+kd\cdot d^d), where k is the number of input strings, L is the length of all strings and d is the desired maximum distance from the solution string to any input string.[3]

Relations to other problems

Closest string is a special case of the more general closest substring problem, which is strictly more difficult. While closest string turns out to be fixed-parameter tractable in a number of ways, closest substring is [[W[1]-hard]] with regard to these parameters.

references

  1. ^ More Efficient Algorithms for Closest String and Substring Problems, Bin Ma and Xiaming Sun, http://www.springerlink.com/content/f0087236k70043j2/
  2. ^ M. Li, B. Ma, and L. Wang. On the Closest String and Substring Problems. Journal of the ACM, 49(2):157-171,2002
  3. ^ Fixed-Parameter Algorithms for Closest String and Related Problems, Jens Gramm, Rolf Niedermeier and Peter Rossmanith, Algoirthmica (2003) 37,25–42

Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Scale (string instruments) — For the musical (rather than instrumental) scale, see Pythagorean tuning. In a string instrument, the scale length (often simply called the scale ) is the sounding length of the strings. On instruments with strings which are not stopped (harp,… …   Wikipedia

  • Nut (string instrument) — For other uses, see Nut (disambiguation). The nut of a string instrument is a small piece of hard material which supports the strings at the end closest to the headstock or scroll. The nut marks one end of the speaking length of each open string …   Wikipedia

  • cosmos — /koz meuhs, mohs/, n., pl. cosmos, cosmoses for 2, 4. 1. the world or universe regarded as an orderly, harmonious system. 2. a complete, orderly, harmonious system. 3. order; harmony. 4. any composite plant of the genus Cosmos, of tropical… …   Universalium

  • Glossary of cue sports terms — The following is a glossary of traditional English language terms used in the three overarching cue sports disciplines: carom (or carambole) billiards referring to the various carom games played on a billiard table without pockets; pool (pocket… …   Wikipedia

  • Cello — This article is about the stringed musical instrument. For other uses, see Cello (disambiguation). Cello Cello, front and side view String Other names Violoncello Hornbos …   Wikipedia

  • Double bass — Contrabass redirects here. For other uses, see Contrabass (disambiguation). Not to be confused with Acoustic bass guitar. For the technique used in percussion, see Double bass drum. Double Bass Side and front views of a modern double bass with a… …   Wikipedia

  • Rotation (pool) — Rotation, sometimes called rotation pool or 61, is a pocket billiards (pool) game, requiring a standard pool table, Cuegloss|Cue ball|cue ball and triangular rack of fifteen pool balls, in which the lowest numbered Cuegloss|Object ball|object… …   Wikipedia

  • arts, East Asian — Introduction       music and visual and performing arts of China, Korea, and Japan. The literatures of these countries are covered in the articles Chinese literature, Korean literature, and Japanese literature.       Some studies of East Asia… …   Universalium

  • United States — a republic in the N Western Hemisphere comprising 48 conterminous states, the District of Columbia, and Alaska in North America, and Hawaii in the N Pacific. 267,954,767; conterminous United States, 3,022,387 sq. mi. (7,827,982 sq. km); with… …   Universalium

  • Floating point — In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”