Disjunct matrix

Disjunct matrix

Disjunct and separable matrices play a pivotal role in the mathematical area of non-adaptive group testing. This area investigates efficient designs and procedures to identify 'needles in haystacks' by conducting the tests on groups of items instead of each item alone. The main concept is that if there are very few special items (needles) and the groups are constructed according to certain combinatorial guidelines, then one can test the groups and find all the needles. This can reduce the cost and the labor associated with of large scale experiments.

The grouping pattern can be represented by a t\times n binary matrix, where each column represents an item and each row represents a pool. The symbol '1' denotes participation in the pool and '0' absence from a pool. The d-disjuntness and the d-separability of the matrix describe sufficient condition to identify d special items.

In a matrix that is d-separable, the Boolean sum of every d columns is unique. In a matrix that is d-disjunct the Boolean sum of every d columns does not contain any other column in the matrix. Theoretically, for the same number of columns (items), one can construct d-separable matrices with fewer rows (tests) than d-disjunct. However, designs that are based on d-separable are less applicable since the decoding time to identify the special items is exponential. In contrast, the decoding time for d-dijunct matrices is polynomial.

Contents

d-separable

Definition: A t\times n matrix M is d-separable if and only if  \forall S_1 \neq S_2 \subseteq [n] where |S_1|,|S_2| \leq d such that \bigcup_{j \in S_1} M_j \neq \bigcup_{i \in S_2} M_i

Decoding algorithm

First we will describe another way to look at the problem of group testing and how to decode it from a different notation. We can give a new interpretation of how group testing works as follows:

Group testing: Given input M and \mathbf{r} such that \mathbf{r} = M \mathbf{x} output \mathbf{x}

  • Take Mj to be the jth column of M
  • Define  S_{M_j} \subseteq [t] so that Mj(i) = 1 if and only if  i \in S_{M_j}
  • This gives that  S_\mathbf{r} = \bigcup_{j \in [n], \mathbf{x}_j = 1} S_{M_j}

This formalizes the relation between \mathbf{x} and the columns of M and \mathbf{r} in a way more suitable to the thinking of d-separable and d-disjunct matrices. The algorithm to decode a d-separable matrix is as follows:

Given a t\times n matrix M such that M is d-separable:

  1. For each T \subseteq [n] such that |T| \leq d check if S_\mathbf{r} = \bigcup_{j \in T} S_{M_j}

This algorithm runs in time n^{\mathcal{O}(d)}.

d-disjunct

In literature disjunct matrices are also called super-imposed codes and d-cover-free families.

Definition: A t x n matrix M is d-disjunct if \forall S \subseteq [n] such that |S| \leq d, \forall j \notin S  \exists i such that Mi,j = 1 but \forall k \in S, M_{i,k} = 0. Denoting Ma is the ath column of M and S_{M_a} \subseteq [t] where Ma(b) = 1 if and only if  b \in S_{M_a} gives that M is d-disjunct if and only if  S_{M_j} \subsetneq \cup_{k \in S} S_{M_k}

Claim: M is d-disjunct implies M is d-separable

Proof: (by contradiction) Let M be a t x n d-disjunct matrix. Assume for contradiction that M is not d-separable. Then there exists T_1, T_2 \in [n] and T_1 \neq T_2 with |T_1|,|T_2| \leq d such that \bigcup_{i \in T_1} M_i = \cup_{i \in T_2} S_{M_i}. This implies that  \exists j \in T_2 \setminus T_1 such that  S_{M_j} \subseteq \bigcup_{k \in T_1} T_{M_k}. This contradicts the fact that M is d-disjunct. Therefore M is d-separable. \Box

Decoding algorithm

The algorithm for d-separable matrices was still a polynomial in n. The following will give a nicer algorithm for d-disjunct matrices which will be a d multiple instead of raised to the power of d given our bounds for t. The algorithm is as follows in the proof of the following lemma:

Lemma 1: There exists an \mathcal{O}(nt) time decoding for any d-disjunct t x n matrix.

  • Observation 1: For any matrix M and given M\mathbf{x} = \mathbf{r} if \mathbf{r}_i = 1 it implies  \exists j such that Mi,j = 1 and  \mathbf{x}_j = 1 where  1 \leq i \leq t and  1 \leq j \leq n . The opposite is also true. If \mathbf{r}_i = 0 it implies  \forall j if Mi,j = 1 then  \mathbf{x}_j = 0 . This is the case because \mathbf{r} is generated by taking all of the logical or of the  \mathbf{x}_j's where Mi,j = 1.
  • Observation 2: For any d-disjunct matrix and every set  T = \{j | \mathbf{x}_j = 1\} where  |T| \leq d and for each  j \notin T where  1 \leq j \leq n there exists some i where  1 \leq i \leq t such that Mi,j = 1 but  M_{i,l} = 0 \text{ }\forall l \in T. Thus, if \mathbf{r}_i = 0 then \mathbf{x}_j = 0.

Proof of Lemma 1: Given as input  \mathbf{r} \in \{0,1\}^t, M use the following algorithm:

  1. For each  j \in [n] set \mathbf{x}_j = 1
  2. For  i = 1 \ldots t , if  \mathbf{r}_i = 0 then for all  j \in [n] , if Mi,j = 1 set  \mathbf{x}_j = 0

By Observation 1 we get that any position where mathbfri = 0 the appropriate  \mathbf{x}_j 's will be set to 0 by step 2 of the algorithm. By Observation 2 we have that there is at least one i such that if  \mathbf{x}_j is supposed to be 1 then Mi,j = 1 and, if  \mathbf{x}_j is supposed to be 1, it can only be the case that  \mathbf{r}_i = 1 as well. Therefore step 2 will never assign  \mathbf{x}_j the value 0 leaving it as a 1 and solving for \mathbf{x}. This takes time  \mathcal{O}(nt) overall. \Box

Upper bounds for non-adaptive group testing

The results for these upper bounds rely mostly on the properties of d-disjunct matrices. Not only are the upper bounds nice, but from Lemma 1 we know that there is also a nice decoding algorithm for these bounds. First the following lemma will be proved since it is relied upon for both constructions:

Lemma 2: Given  1 \leq d \leq n let M be a t\times n matrix and:

  1.  \forall j \in [n] \text{, } |S_{M_j}| \geq w_\min
  2.  \forall i \neq j \in [n], |S_{M_i} \cap S_{M_j}| \leq a_\max

for some integers a_\max \leq w_\min \leq t then M is \geq d' \left\lfloor \frac{w_\min - 1}{a_\max} \right\rfloor -disjunct.

Note: these conditions are stronger than simply having a subset of size d but rather applies to any pair of columns in a matrix. Therefore no matter what column i that is chosen in the matrix, that column will contain at least wmin  1's and the total number of shared 1's by any two columns is amax .

Proof of Lemma 2: Fix an arbitrary  S \subseteq [n], |S| \leq d, j \notin S and a matrix M. There exists a match between i \in S \text{ and } j \notin S if column i has a 1 in the same row position as in column j. Then the total number of matches is  \leq a_\max \cdot d \leq a_\max \cdot (\frac{w_\min - 1}{a_\max}) = w_\min - 1 < \text{ } w_\min , i.e. a column j has a fewer number of matches than the number of ones in it. Therefore there must be a row with all 0s in S but a 1 in j. \Box

We will now generate constructions for the bounds.

Randomized construction

This first construction will use a probabilistic argument to show the property wanted, in particular the Chernoff bound. Using this randomized construction gives that  t(d,n) \leq \mathcal{O}(d^2 \log n) . The following lemma will give the result needed.

Theorem 1: There exists a random d-disjunct matrix with \mathcal{O}(d^2 \log n) rows.

Proof of Theorem 1: Begin by building a random t\times n matrix M with t = cd2log n (where c will be picked later). It will be shown that M is Ω(d)-disjunct. First note that M_{i,j} \in \{0,1\} and let Mi,j = 1 independently with probability \frac{1}{d} for i \in [t] and j \in [n] . Now fix  j \in [n] . Denote the jth column of M as T_j \subseteq [t] . Then the expectancy is \mathbb{E}[|T_j|] = \frac{t}{d}. Using the Chernoff bound, with \mu = \frac{1}{2} , gives  \mathrm{Pr}[ |T_j| < \frac{t}{2d}] \leq e^{\frac{-t}{12d}} = e^{\frac{-cd\log n}{12}} \leq n^{-2d} [if  c \geq 24 ]. Taking the union bound over all columns gives  \mathrm{Pr}[\exists j,  |T_j| < \frac{t}{2d}] \leq n \cdot n^{-2d} \leq n^{-d}. This gives  \mathrm{Pr}[\forall j ,  |T_j| \geq \frac{t}{2d}] \geq 1 - n^{-d}. Therefore  w_\min \geq \frac{t}{2d} with probability  \geq 1 - n^{-d} .

Now suppose j \neq k \in [n] and  i \in [t] then \mathrm{Pr} [M_{i,j} = M_{i,k} = 1] = \frac{1}{d^2} . So \mathbb{E}[|T_j \cap T_k|] = \frac{t}{d^2}. Using the Chernoff bound on this gives \mathrm{Pr}[ |T_j \cap T_k| < \frac{2t}{d^2}] \leq e^{\frac{-t}{3d^2}} = e^{-2\log n} \leq n^{-4} [if  c \geq 12 ]. By the union bound over (j,k) pairs  \mathrm{Pr}[\exists (j,k) such that  |T_j \cap T_k| < \frac{2t}{d^2}] \leq n^2 \cdot n^{-4} = n^{-2}. This gives that  a_\max \leq \frac{2t}{d^2} and w_\min \geq \frac{t}{2d} with probability  \geq 1 - n^{-d} - n^{-2} \geq 1 - \frac{1}{n} . Note that by changing c the probability 1 - \frac{1}{n} can be made to be 1 - \frac{1}{poly(n)}. Thus  d' = \lfloor\frac{\frac{t}{2d} - 1}{\frac{2t}{d^2}}\rfloor \approx \frac{d}{4} . By setting d to be 4d, the above argument shows that M is d-disjunct.

Note that in this proof t = d2log n thus giving the upper bound of  t(d,n) \leq \mathcal{O}(d^2 \log n) . \Box

Strongly explicit construction

It is possible to prove a bound of  t(d,n) \leq \mathcal{O}(d^2\log^2{n}) using a strongly explicit code. Although this bound is worse by a logn factor it is preferable because this produces a strongly explicit construction instead of a randomized one.

Theorem 2: There exists a strongly explicit d-disjunct matrix with \mathcal{O}(d^2\log^2{n}) rows.

This proof will use the properties of concatenated codes along with the properties of disjunct matrices to construct a code that will satisfy the bound we are after.

Proof of Theorem 2: Let  C \subseteq \{0,1\}^t, |C| = n such that  C = \{\mathbf{c}_1,\ldots,\mathbf{c}_n\} . Denote MC as the matrix with its ith column being \mathbf{c}_i. If C * can be found such that

  1.  \forall i \in C^* \text{, } |\mathbf{c}_i| \geq w_\min
  2.  \forall \mathbf{c}^1 \neq \mathbf{c}^2 \in C^* \text{, } |\{i  | \mathbf{c}^1_i = \mathbf{c}^2_i = 1\}| \leq a_\max ,

then  M_{C^*} is  \lfloor \frac{w_\min - 1}{a_\max} \rfloor -disjunct. To complete the proof another concept must be introduced. This concept uses code concatenation to obtain the result we want.

Kautz-Singleton '64

Let C^* = C_{out} \circ C_{in}. Let Cout be a [q,k]q-Reed–Solomon code. Let C_{in} = [q] \rightarrow \{0,1\}^q such that for i \in [q], c_{in}(i) = (0,\ldots,0,1,0,\ldots,0) where the 1 is in the ith position. Then n = qk, t = q2, and wmin  = q.

---

Example: Let k = 1,q = 3,Cout = {(0,0,0),(1,1,1),(2,2,2)}. Below, MC denotes the matrix of codewords for Cout and M_{C^*} denotes the matrix of codewords for C^* = C_{out} \circ C_{in}, where each column is a codeword. The overall image shows the transition from the outer code to the concatenated code.

M_C=\begin{bmatrix}0&1&2\\0&1&2\\0&1&2\end{bmatrix}\quad\Rightarrow\quad M_{C^*}=\begin{bmatrix}0&0&1\\0&1&0\\1&0&0\\0&0&1\\0&1&0\\1&0&0\\0&0&1\\0&1&0\\1&0&0\end{bmatrix}

---

Divide the rows of M_{C^*} into sets of size q and number them as (i,j) \in [q] \times [q] where i indexes the set of rows and j indexes the row in the set. If M_{(i,j),k_1} = M_{(i,j),k_2} = 1 then note that \mathbf{c}_{k_1}(i) = \mathbf{c}_{k_2}(i) = j where \mathbf{c}_{k_1}, \mathbf{c}_{k_2} \in C_{out} . So that means |M_{k_1} \cap M_{k_2}| = q - \Delta(\mathbf{c}_{k_1}, \mathbf{c}_{k_2}). Since  \Delta(\mathbf{c}_{k_1}, \mathbf{c}_{k_2}) \geq q - k + 1 it gives that |M_{k_1} \cap M_{k_2}| \leq k - 1 so let amax  = k − 1. Since t = q2, the entries in each column of M_{C^*} can be looked at as q sets of q entries where only one of the entries is nonzero (by definition of Cin) which gives a total of q nonzero entries in each column. Therefore wmin = q and d =_{def} \lfloor \frac{w_\min - 1}{a_\max} \rfloor (so M_{C^*} is d-disjunct).

Now pick q and k such that \lfloor \frac{q-1}{k-1}\rfloor = d (so \lfloor \frac{q}{k}\rfloor \approx d). Since qk = n we have k = \frac{\log n}{\log q} \leq \log n. Since q \approx kd and t = q2 it gives that t = q^2 \approx (kd)^2 \leq (d \log n)^2. \Box

Thus we have a strongly explicit construction for a code that can be used to form a group testing matrix and so t(d,n) \leq (d \log n)^2.

For non-adaptive testing we have shown that \Omega(d\log n) \leq t(d,n) and we have that (i) t(d,n) \leq \mathcal{O}(d^2\log^2{n}) (strongly explicit) and (ii) t(d,n) \leq \mathcal{O}(d^2\log n) (randomized). As of recent work by Porat and Rothscheld they presented a explicit method construction (i.e. deterministic time but not strongly explicit) for t(d,n) \leq \mathcal{O}(d^2\log n)[1], however it is not shown here. There is also a lower bound for disjunct matrices of t(d,n) \geq \Omega(\frac{d^2}{\log d}\log n)[2][3][4] which is not shown here either.

See also

Notes

  1. ^ Porat, E., & Rothschild, A. (2008). Explicit Non-adaptive Combinatorial Group Testing Schemes. In Proceedings of the 35th International Colloquium on Automata, Languages and Programming (ICALP) (pp. 748–759).
  2. ^ Dýachkov, A. G., & Rykov, V. V. (1982). Bounds on the length of disjunctive codes. Problemy Peredachi Informatsii [Problems of Information Transmission], 18(3), 7–13.
  3. ^ Dýachkov, A. G., Rashad, A. M., & Rykov, V. V. (1989). Superimposed distance codes. Problemy Upravlenija i Teorii Informacii [Problems of Control and Information Theory], 18(4), 237–250.
  4. ^ Zoltan Furedi, On r-Cover-free Families, Journal of Combinatorial Theory, Series A, Volume 73, Issue 1, January 1996, Pages 172–173, ISSN 0097-3165, DOI: 10.1006/jcta.1996.0012. (http://www.sciencedirect.com/science/article/B6WHS-45NJMVF-39/2/172ef8c5c4aee2d85d1ddd56b107eef3)

References

  1. Atri Rudra's course on Error Correcting Codes: Combinatorics, Algorithms, and Applications (Spring 2010), Lectures 28, 29.

Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Exclusive or — The logical operation exclusive disjunction, also called exclusive or (symbolized XOR or EOR), is a type of logical disjunction on two operands that results in a value of “true” if and only if exactly one of the operands has a value of “true”. [… …   Wikipedia

  • arts, East Asian — Introduction       music and visual and performing arts of China, Korea, and Japan. The literatures of these countries are covered in the articles Chinese literature, Korean literature, and Japanese literature.       Some studies of East Asia… …   Universalium

  • Karlheinz Stockhausen — (22 August 1928 ndash; 5 December 2007) was a German composer, widely acknowledged by critics as one of the most important (Barrett 1988, 45; Harvey 1975b, 705; Hopkins 1972, 33; Klein 1968, 117) but also controversial (Power 1990, 30) composers… …   Wikipedia

  • Interval (music) — For the album by See You Next Tuesday, see Intervals (album). Melodic and harmonic intervals.   …   Wikipedia

  • Ulmus laevis — Taxobox status = LC name = Ulmus laevis image caption = European White Elm in winter regnum = Plantae divisio = Magnoliophyta classis = Magnoliopsida ordo = Rosales familia = Ulmaceae genus = Ulmus species = U. laevis binomial = Ulmus laevis… …   Wikipedia

  • Poaceae — ▪ plant family Introduction formerly called  Gramineae        grass family of monocotyledonous flowering plants, a division of the order Poales. The Poaceae are the world s single most important source of food. They rank among the top five… …   Universalium

  • Outline of logic — The following outline is provided as an overview of and topical guide to logic: Logic – formal science of using reason, considered a branch of both philosophy and mathematics. Logic investigates and classifies the structure of statements and… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”