N50 statistic

N50 statistic

In Computational Biology, the N50 statistic is a measure of the average length of a set of sequences, with greater weight given to longer sequences. It is used widely in genome assembly, especially in reference to contig lengths within a draft assembly. Given a set of sequences of varying lengths, the N50 length is defined as the length N for which 50% of all bases in the sequences are in a sequence of length L < N.

N50 may also be defined as the contig length such that using equal or longer contigs produces half the bases of the genome. The N50 size is computed by sorting all contigs from largest to smallest and by determining the minimum set of contigs whose sizes total 50% of the entire genome. For example, for a genome of 600Mb, if the assembled sequences add up to 500Mb, the N50 would the calculated by sorting the contigs from largest to smallest and finding the length of the contig where the cumulative size is 250Mb. Thus, N50 is calculated in the context of the assembly size rather than the genome size. The NG50 statistic is the same as the N50 except that the genome size is used rather than the assembly size.

N50 can be found mathematically as follows: Take a list L of positive integers. Create another list L' , which is identical to L, except that every element n in L has been replaced with n copies of itself. Then the median of L' is the N50 of L. For example: If L = {2, 2, 2, 3, 3, 4, 8, 8}, then L' consists of six 2's, six 3's, four 4's, and sixteen 8's (e.g. We replaced every 2 in L with 2, 2, so in L' there are six 2s in L') ; the N50 of L is the median of L' , which is the average of the 16th element 4 and 17th element 8, so it is (4+8)/2 = 6.

Contradictory definitions

There has been identified some contradictions in the definition(s) of the N50 value, as discussed in a thread on the SEQ Answers forum.

References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • N50 — may refer to : N50 (Long Island bus) Acer N50, a PDA model Li Calzi Airport (FAA code: N50) N50 (Long Island bus) N50 statistic, used in genome assembly A grade of magnet with a maximum energy product (BHmax) of 50 megagauss oersteds… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”