- The 1000 Genomes Project
The 1000 Genomes Project, launched in January 2008, is an international research effort to establish by far the most detailed catalogue of
human genetic variation . Scientists plan to sequence thegenome s of at least one thousand anonymous participants from a number of different ethnic groups within the next three years, using newly developed faster and less expensive sequencing technologies. The project unites multidisciplinary research teams from institutes around the world, including Britain,China and theUnited States . Each will contribute to the enormous sequence dataset and to a refined human genome map, which will be freely accessible through public databases to the scientific community and the general public alike. By providing an overview of all genetic variation, not only what is biomedically relevant, the consortium will generate a valuable tool for all fields of natural science, especially in the disciplines ofGenetics ,Medicine ,Pharmacology ,Biochemistry andBioinformatics .G Spencer, International Consortium Announces the 1000 Genomes Project, EMBARGOED (2008) http://www.1000genomes.org/files/1000Genomes-NewsRelease.pdf]__TOC__
Background
Within the past few decades, advances in human
population genetics andcomparative genomics have made it possible to gain increasing insight into the nature of genetic diversity. Although, we are just beginning to understand how processes like the random sampling ofgamete s, structural variations (insertions/ deletions (indel s), copy number variations (CNV),retroelements ),single nucleotide polymorphisms (SNPs) andnatural selection have shaped the level and pattern of variation withinspecies and also between species.JC Long, Human Genetic Variation: The mechanisms and results of microevolution, American Anthropological Association (2004)] T Anzai et al., Comparative sequencing of human and chimpanzee MHC class I regions unveils insertions/ deletions as the major path to genomic divergence, PNAS (2003) vol. 100 no. 13: 7708–7713] R Redon et al., Global variation in copy number in the human genome, Nature, Volume 444 (2006)] LB Barreiro et al., Natural selection has driven population differentiation in modern humans, Nature Genetics (2008) published online]Human Genetic Variation
The random sampling of gametes during sexual reproduction leads to
genetic drift - a random fluctuation in the population frequency of a trait – in subsequent generations and would result in the loss of all variation in the absence of external influence. It is postulated that the rate of genetic drift is inversely proportional to population size, and that it may be accelerated in specific situations such as bottlenecks, where the population size is reduced for a certain period of time, and by thefounder effect (individuals in a population tracing back to a small number of founding individuals).Anzai et al. demonstrated that indels account for 90.4 % of all observed variations in the sequence of the major histocompatibility locus (MHC) between
humans andchimpanzees . After taking multiple indels into consideration, the high degree of genomic similarity between the two species (98.6 %nucleotide sequence identity) drops to only 86.7 %. For example, a large deletion of 95 kilobases (kb) between the loci of the human "MICA" and "MICB"genes , results in a single hybrid chimpanzee "MIC" gene, linking this region to a species-specific handling of several retroviral infections and the resultant susceptibility to various autoimmune diseases. The authors conclude that instead of more subtle SNPs, indels were the driving mechanism in primate speciation.Besides
mutations , SNPs and other structural variants such as CNVs are contributing to the genetic diversity in human populations. Almost 1,500 copy number variable regions, covering around 12% of the genome and containing hundreds of genes, disease loci, functional elements and segmental duplications, have been identified using theHapMap collection, a project describing common patterns of human DNA sequence variation. Although the specific function of CNVs remains elusive, the fact that CNVs span more nucleotide content per genome than SNPs emphasizes the importance of CNVs in genetic diversity and evolution.Investigating human genomic variations holds great potential for identifying genes that might underlie differences in disease resistance (e.g. MHC region) or
drug metabolism .R Nielsen et al., Recent and ongoing selection in the human genome, Nature Reviews, Volume 8 (2007)]Natural Selection
Natural selection in theevolution of a trait can be divided into three classes. Directional orpositive selection refers to a situation where a certain allele has a greater fitness than otheralleles , consequently increasing its population frequency (e.g.antibiotic resistance of bacteria). In contrast, stabilizing ornegative selection (also known as purifying selection) lowers the frequency or even removes alleles from a population due to disadvantages associated with it with respect to other alleles. Finally, a number of forms ofbalancing selection exist; those increase genetic variation within a species by being overdominant (heterozygous individuals are fitter thanhomozygous individuals, e.g. "G6PD", the gene involved insickle cell anaemia andmalaria resistance) or can vary spatially within a species that inhabits different niches, thus favouring different alleles.EE Harris et al., The molecular signature of selection underlying human adaptations, Yearbook of Physical Anthropology 49: 89-130 (2006)] Some genomic differences may not affect fitness. Neutral variation, previously thought to be “junk” DNA, is unaffected by natural selection resulting in higher genetic variation at such sites when compared to sites where variation does influence fitness.M Bamshad, Signatures of natural selection in the human genome, Nature Reviews, Volume 4 (2003)]It is not fully clear how natural selection has shaped population differences; however, genetic candidate regions under selection have been identified recently. Patterns of DNA polymorphisms can be used to reliably detect signatures of selection and may help to identify genes that might underlie variation in disease resistance or drug metabolism. Barreiro et al. found evidence that negative selection has reduced population differentiation at the
amino acid –altering level (particularly in disease-related genes), whereas, positive selection has ensured regional adaptation of human populations by increasing population differentiation in gene regions (mainly nonsynonymous and 5'-untranslated region variants).It is thought that most complex and Mendelian diseases (except diseases with late onset, assuming that older individuals no longer contribute to the fitness of their offspring) will have an effect on survival and/or reproduction, thus, genetic factors underlying those diseases should be influenced by natural selection.
Gaucher disease (mutations in the "GBA" gene),Crohn’s disease (mutation of "NOD2") and familial hypertrophic cardiomyopathy (mutations in "CMH1", "CMH2", "CMH3" and "CMH4") are all examples of negative selection. These disease mutations are primarily recessive and segregate as expected at a low frequency, supporting the hypothesized negative selection. Few cases have been reported, where disease-causing mutations appear at the high frequencies supported by balanced selection. The most prominent example is mutations of the "G6PD" locus where, if homozygous G6PDenzyme deficiency and consequentlysickle-cell anaemia results, but in the heterozygous state are partially protective againstmalaria . Other possible explanations for segregation of disease alleles at moderate or high frequencies include genetic drift and recent alterations towards positive selection due to environmental changes such as diet or genetic hitch-hiking.Genome-wide comparative analyses of different human populations, as well as between species (e.g. human versus chimpanzee) are helping us to understand the relationship between diseases and selection and provide evidence of mutations in constrained genes being disproportionally associated with
heritable disease phenotypes . Genes implicated in complex disorders tend to be under less negative selection than Mendelian disease genes or non-disease genes.The Project
Goals
There are two kinds of genetic variants related to disease. The first are rare genetic variants that have a severe effect predominantly on simple traits (e.g.
Cystic Fibrosis ,Huntington disease ). The second, more common, genetic variants have a mild effect and are thought to be implicated in complex traits (e.g.Diabetes ,Heart Disease ). Between these two types of genetic variants lies a significant gap of knowledge, which the 1000 Genomes Project is designed to address.The primary goal of this project is to create a complete and detailed catalogue of
human genetic variation s, which in turn can be used forassociation studies relating genetic variation to disease. By doing so the consortium aims to discover >95 % of the variants (e.g. SNPs, CNVs, indels) with minor allele frequencies as low as 1% across the genome and 0.1-0.5% in gene regions, as well as to estimate the population frequencies,haplotype backgrounds andlinkage disequilibrium patterns of variant alleles.Meeting Report: A Workshop to Plan a Deep Catalog of Human Genetic Variation, (2007) http://www.1000genomes.org/files/1000Genomes-MeetingReport.pdf]Secondary goals will include the support of better SNP and probe selection for
genotyping platforms in future studies and the improvement of the human reference sequence. Furthermore, the completed database will be a useful tool for studying regions under selection, variation in multiple populations and understanding the underlying processes of mutation and recombination.Outline
The
human genome consists of approximately 3 billion DNA base pairs and is estimated to carry 20,000–25,000protein codinggenes . In designing the study the consortium needed to address several critical issues regarding the project metrics such as technology challenges, proper data quality standards, sequence coverage and data quality.Over the course of the next three years, scientists at the
Sanger Institute , BGI Shenzhen and theNational Human Genome Research Institute ’s Large-Scale Sequencing Network are planning to sequence a minimum of 1,000 human genomes. Due to the large amount of sequence data that need to be generated and analyzed it is possible that other participants may be recruited over time.Almost 10 billion bases will be sequenced per day over a period of the two year production phase. This equates to more than two human genomes every 24 hours; a groundbreaking capacity. Challenging the leading experts of
bioinformatics and statistical genetics, the sequence dataset will comprise 6 trillion DNA bases, 60-fold more sequence data than what has been published inDNA databases over the past 25 years.To determine the final design of the full project three pilot studies were designed and will be carried out within the first year of the project. Providing comprehensive variation,
haplotype , and transmission data on a few samples, the first pilot intends to sequence the genomes of two nuclear families (both parents and an adult child) with deep coverage (20x per genome). For the second pilot study, 180 people of 3 major geographic groups are going to be genotyped at low coverage (2x). The last pilot study involves sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).It has been estimated that the project would likely cost more than $500 million if standard DNA sequencing technologies were used. Therefore, several new technologies (e.g.
Solexa , 454, SOLiD) will be applied, lowering the expected costs to $30 million to $50 million. The major support will be provided by theWellcome Trust Sanger Institute in Hinxton, England; theBeijing Genomics Institute , Shenzhen (BGI Shenzhen), China; and theNHGRI , part of the National Institutes of Health (NIH).The compiled genome sequence data will be made freely available.
Human Genome Samples
Based on the overall goals for the project, the samples will be chosen to provide power in populations where
association studies for common diseases are being carried out. Furthermore, the samples do not need to have medical or phenotype information since the proposed catalogue will be a basic resource on human variation.For the pilot studies human genome samples from the
HapMap collection will be sequenced. It will be useful to focus on samples that have additional data available (such asENCODE sequence, genome-wide genotypes,fosmid -end sequence, structural variation assays, andgene expression ) to be able to compare the results with those from other projects.Complying with extensive ethical procedures, the 1000 Genomes Project will then use samples from volunteer donors. The following populations will be included in the study: Yoruba in
Ibadan ,Nigeria ; Japanese inTokyo ; Chinese inBeijing ;Utah residents with ancestry from northern and westernEurope ;Luhya inWebuye ,Kenya ;Maasai in Kinyawa, Kenya; Toscani inItaly ; Gujarati Indians inHouston ; Chinese in metropolitanDenver ; people ofMexican ancestry inLos Angeles ; and people ofAfrican ancestry in the southwesternUnited States .See also
*
Human Genome Project
*HapMap Project
*Personal genomics External links
* http://www.1000genomes.org/ official web page: 1000 Genomes - A Deep Catalog of Human Genetic Variation
* http://www.hapmap.org/ official web page: International HapMap Project
* http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml Human Genome Project Information
* http://www.gdb.org/ official web page: The GDB Human Genome DatabaseReferences
Wikimedia Foundation. 2010.