- Human genome
The human genome is the
genome of "Homo sapiens ", which is stored on 23 chromosome pairs. Twenty-two of these are autosomal chromosome pairs, while the remaining pair is sex-determining. Thehaploid human genome occupies a total of just over 3 billionDNA base pair s and has a data size of approximately 750Megabyte s, [cite web | title = DNA - Encoded Messages - Dennis Overbye - Essay - New York Times | url = http://www.nytimes.com/2007/06/26/science/26DNA.html |] which is slightly larger than the capacity of a standardCompact Disc . TheHuman Genome Project produced a reference sequence of the euchromatic human genome, which is used worldwide inbiomedical science s.The haploid human genome contains an estimated 20,000–25,000 protein-coding genes, far fewer than had been expected before its sequencing.cite journal | author = International Human Genome Sequencing Consortium | title = Finishing the euchromatic sequence of the human genome. | journal = Nature | volume = 431 | issue = 7011 | pages = 931–45 | year = 2004 | pmid = 15496913 | doi = 10.1038/nature03001 [http://www.nature.com/nature/journal/v431/n7011/full/nature03001.html] ] In fact, only about 1.5% of the genome codes for
protein s, while the rest consists ofRNA gene s,regulatory sequence s,introns and (controversially) "junk" DNA.cite journal | author = International Human Genome Sequencing Consortium | title = Initial sequencing and analysis of the human genome. | journal = Nature | volume = 409 | issue = 6822 | pages = 860–921 | year = 2001 | pmid = 11237011 | doi = 10.1038/35057062 [http://www.nature.com/nature/journal/v409/n6822/full/409860a0.html] ]Features
Chromosomes
There are 24 "distinct" human
chromosome s: 22autosomal chromosomes, plus the sex-determining X and Y chromosomes. Chromosomes 1–22 are numbered roughly in order of decreasing size.Somatic cell s usually have 23 chromosome pairs: one copy of chromosomes 1–22 from each parent, plus an X chromosome from the mother, and either an X or Y chromosome from the father, for a total of 46.Genes
There are estimated 20–25,000 human protein-coding
gene s.. The estimate of the number of human genes has been repeatedly revised down from initial predictions of 100,000 or more as genome sequence quality andgene finding methods have improved, and could continue to drop further. [Science 316 p 1113 25-May-2007, probably in the range 20,488-20,588. (note, this is a news article in Science magazine reporting on a conference presentation. It is not a peer-reviewed publication, and therefore its figures should not be considered "authoritative")]Surprisingly, the number of human genes seems to be less than a factor of two greater than that of many much simpler organisms, such as the roundworm and the fruit fly. However, human cells make extensive use of
alternative splicing to produce several different proteins from a single gene, and the humanproteome is thought to be much larger than those of the aforementioned organisms. Besides, most human genes have multipleexon s, and humanintron s are frequently much longer than the flanking exons.Human genes are distributed unevenly across the chromosomes. Each chromosome contains various gene-rich and gene-poor regions, which seem to be correlated with chromosome bands and
GC-content . The significance of these nonrandom patterns of gene density is not well understood. In addition to protein coding genes, the human genome contains thousands ofRNA gene s, includingtRNA , ribosomal RNA,microRNA , and other non-coding RNA genes.Regulatory sequences
The human genome has many different regulatory sequences which are crucial to controlling
gene expression . These are typically short sequences that appear near or within genes. A systematic understanding of these regulatory sequences and how they together act as agene regulatory network is only beginning to emerge from computational, high-throughput expression andcomparative genomics studies.Identification of regulatory sequences relies in part on evolutionary conservation. The evolutionary branch between the human and
mouse , for example, occurred 70–90 million years ago. [cite journal | author = Nei M, Xu P, Glazko G | title = Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms. | journal = Proc Natl Acad Sci U S A | volume = 98 | issue = 5 | pages = 2497–502 | year = 2001 | pmid = 11226267 | url=http://www.pnas.org/cgi/content/full/051611498 | doi = 10.1073/pnas.051611498 ] So computer comparisons of gene sequences that identify conserved non-coding sequences will be an indication of their importance in duties such as gene regulation. [cite journal | author = Loots G, Locksley R, Blankespoor C, Wang Z, Miller W, Rubin E, Frazer K | title = Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. | journal = Science | volume = 288 | issue = 5463 | pages = 136–40 | year = 2000 | pmid = 10753117 | doi = 10.1126/science.288.5463.136 [http://www.lbl.gov/Science-Articles/Archive/mouse-dna-model.html Summary] ]Another comparative genomic approach to locating regulatory sequences in humans is the gene sequencing of the
puffer fish . These vertebrates have essentially the same genes and regulatory gene sequences as humans, but with only one-eighth the "junk" DNA. The compact DNA sequence of the puffer fish makes it much easier to locate the regulatory genes. [cite web | last = Meunier | first = Monique | url = http://www.cns.fr/externe/English/Actualites/Presse/261001_1.html | title = Genoscope and Whitehead announce a high sequence coverage of the Tetraodon nigroviridis genome | publisher = Genoscope | accessdate = 2006-09-12 ]Other DNA
Protein-coding sequences (specifically, coding
exon s) comprise less than 1.5% of the human genome. Aside from genes and known regulatory sequences, the human genome contains vast regions of DNA the function of which, if any, remains unknown. These regions in fact comprise the vast majority, by some estimates 97%, of the humangenome size . Much of this is composed of:
=repeat elements=*
Tandem repeat s
**Satellite DNA
**Minisatellite
**Microsatellite
*Interspersed repeat s
**SINEs
**LINEstransposon s*
Retrotransposon s
**LTR
***Ty1-copia
***Ty3-gypsy
**Non-LTR
***SINEs
***LINEs
*pseudogene sHowever, there is also a large amount of sequence that does not fall under any known classification.
Much of this sequence may be an evolutionary artifact that serves no present-day purpose, and these regions are sometimes collectively referred to as "junk" DNA. There are, however, a variety of emerging indications that many sequences within are likely to function in ways that are not fully understood. Recent experiments using microarrays have revealed that a substantial fraction of non-genic DNA is in fact transcribed into
RNA , ["...a tiling array with 5-nucleotide resolution that mapped transcription activity along 10 human chromosomes revealed that an average of 10% of the genome (compared to the 1 to 2% represented by bona fide exons) corresponds to polyadenylated transcripts, of which more than half do not overlap with known gene locations.cite journal | author = Claverie J | title = Fewer genes, more noncoding RNA. | journal = Science | volume = 309 | issue = 5740 | pages = 1529–30 | year = 2005 | pmid = 16141064 | doi = 10.1126/science.1116800 ] which leads to the possibility that the resulting transcripts may have some unknown function. Also, the evolutionary conservation across themammal ian genomes of much more sequence than can be explained by protein-coding regions indicates that many, and perhaps most, functional elements in the genome remain unknown."...the proportion of small (50-100 bp) segments in the mammalian genome that is under (purifying) selection can be estimated to be about 5%. This proportion is much higher than can be explained by protein-coding sequences alone, implying that the genome contains many additional features (such as untranslated regions, regulatory elements, non-protein-coding genes, and chromosomal structural elements) under selection for biological function." cite journal | author = Mouse Genome Sequencing Consortium | title = Initial sequencing and comparative analysis of the mouse genome. | journal = Nature | volume = 420 | issue = 6915 | pages = 520–62 | year = 2002 | pmid = 12466850 | doi = 10.1038/nature01262 ] The investigation of the vast quantity of sequence information in the human genome whose function remains unknown is currently a major avenue of scientific inquiry.cite journal | author = The ENCODE Project Consortium | title = "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project" | journal = Nature | volume = 447 | pages = 799–816 | year = 2007 | doi = 10.1038/nature05874]Variation
Most studies of human genetic variation have focused on single nucleotide polymorphisms (SNPs), which are substitutions in individual bases along a chromosome. Most analyses estimate that SNPs occur on average somewhere between every 1 in 100 and 1 in 1,000 base pairs in the euchromatic human genome, although they do not occur at a uniform density. Thus follows the popular statement that "we are all, regardless of race, genetically 99.9% the same", [from Bill Clinton's 2000 State of the Union address [http://clinton4.nara.gov/WH/SOTU00/sotu-text.html] ] although this would be somewhat qualified by most geneticists. For example, a much larger fraction of the genome is now thought to be involved in
copy number variation . [ [http://www.nature.com/nature/journal/v444/n7118/full/nature05329.html Global variation in copy number in the human genome : Article : Nature ] ] A large-scale collaborative effort to catalog SNP variations in the human genome is being undertaken by theInternational HapMap Project .The genomic loci and length of certain types of small repetitive sequences are highly variable from person to person, which is the basis of
DNA fingerprinting and DNA paternity testing technologies. The heterochromatic portions of the human genome, which total several hundred million base pairs, are also thought to be quite variable within the human population (they are so repetitive and so long that they cannot be accurately sequenced with current technology). These regions contain few genes, and it is unclear whether any significant phenotypic effect results from typical variation in repeats or heterochromatin.Most gross genomic mutations in
Gamete germ cells probably result in inviable embryos; however, a number of human diseases are related to large-scale genomic abnormalities.Down syndrome ,Turner Syndrome , and a number of other diseases result fromnondisjunction of entire chromosomes.Cancer cells frequently haveaneuploidy of chromosomes and chromosome arms, although a cause and effect relationship between aneuploidy and cancer has not been established.Genetic disorders
Most aspects of human biology involve both genetic (inherited) and non-genetic (environmental) factors. Some inherited variation influences aspects of our biology that are not medical in nature (height, eye color, ability to taste or smell certain compounds, etc). Moreover, some genetic disorders only cause disease in combination with the appropriate environmental factors (such as diet). With these caveats, genetic disorders may be described as clinically defined diseases caused by genomic DNA sequence variation. In the most straightforward cases, the disorder can be associated with variation in a single gene. For example,
cystic fibrosis is caused by mutations in the CFTR gene, and is the most common recessive disorder in caucasian populations with over 1300 different mutations known. Disease-causing mutations in specific genes are usually severe in terms of gene function, and are fortunately rare, thus genetic disorders are similarly individually rare. However, since there are many genes that can vary to cause genetic disorders, in aggregate they comprise a significant component of known medical conditions, especially in pediatric medicine. Molecularly characterized genetic disorders are those for which the underlying causal gene has been identified, currently there are approximately 2200 such disorders annotated in the OMIM database,. [ Online Mendelian Inheritance in Man (OMIM) [http://www.ncbi.nlm.nih.gov/Omim/mimstats.html] ]Studies of genetic disorders are often performed by means of family-based studies. In some instances population based approaches are employed, particularly in the case of so-called founder populations such as those in Finland, French-Canada, Utah, Sardinia, etc. Diagnosis and treatment of genetic disorders are usually performed by a
geneticist -physician trained in clinical/medical genetics. The results of theHuman Genome Project are likely to provide increased availability ofgenetic testing for gene-related disorders, and eventually improved treatment. Parents can be screened for hereditary conditions and counselled on the consequences, the probability it will be inherited, and how to avoid or ameliorate it in their offspring.As noted above, there are many different kinds of DNA sequence variation, ranging from complete extra or missing chromosomes down to single nucleotide changes. It is generally presumed that much naturally occurring genetic variation in human populations is phenotypically neutral, i.e. has little or no detectable effect on the physiology of the individual (although there may be fractional differences in fitness defined over evolutionary time frames). Genetic disorders can be caused by any or all known types of sequence variation. To molecularly characterize a new genetic disorder, it is necessary to establish a causal link between a particular genomic sequence variant and the clinical disease under investigation. Such studies constitute the realm of human molecular genetics.
With the advent of the Human Genome and
International HapMap Project , it has become feasible to explore subtle genetic influences on many common disease conditions such as diabetes, asthma, migraine, schizophrenia, etc. Although some causal links have been made between genomic sequence variants in particular genes and some of these diseases, often with much publicity in the general media, these are usually not considered to be genetic disorders "per se" as their causes are complex, involving many different genetic and environmental factors. Thus there may be disagreement in particular cases whether a specific medical condition should be termed a genetic disorder.Evolution
Comparative genomics studies of mammalian genomes suggest that approximately 5% of the human genome has been conserved by evolution since the divergence of those species approximately 200 million years ago, containing the vast majority of genes. Intriguingly, since genes and known regulatory sequences probably comprise less than 2% of the genome, this suggests that there may be more unknown functional sequence than known functional sequence. A smaller, yet large, fraction of human genes seem to be shared among most knownvertebrate s.Thechimpanzee genome is 95% identical to the human genome. On average, a typical human protein-coding gene differs from its chimpanzeeortholog by only twoamino acid substitutions; nearly one third of human genes have exactly the same protein translation as their chimpanzee orthologs. A major difference between the two genomes is human chromosome 2, which is equivalent to a fusion product of chimpanzee chromosomes 12 and 13. ["Human chromosome 2 resulted from a fusion of two ancestral chromosomes that remained separate in the chimpanzee lineage" cite journal | author = The Chimpanzee Sequencing and Analysis Consortium | title = Initial sequence of the chimpanzee genome and comparison with the human genome. | journal = Nature | volume = 437 | issue = 7055 | pages = 69–87 | year = 2005 | pmid = 16136131 | doi = 10.1038/nature04072
"Large-scale sequencing of the chimpanzee genome is now imminent."cite journal | author = Olson M, Varki A | title = Sequencing the chimpanzee genome: insights into human evolution and disease. | journal = Nat Rev Genet | volume = 4 | issue = 1 | pages = 20–8 | year = 2003 | pmid = 12509750 | doi = 10.1038/nrg981 ]Humans have undergone an extraordinary loss of
olfactory receptor genes during our recent evolution, which explains our relatively crude sense of smell compared to most other mammals. Evolutionary evidence suggests that the emergence ofcolor vision in humans and several otherprimate species has diminished the need for the sense of smell. ["Our findings suggest that the deterioration of the olfactory repertoire occurred concomitant with the acquisition of full trichromatic color vision in primates." cite journal | author = Gilad Y, Wiebe V, Przeworski M, Lancet D, Pääbo S | title = Loss of olfactory receptor genes coincides with the acquisition of full trichromatic vision in primates. | journal = PLoS Biol | volume = 2 | issue = 1 | pages = E5 | year = 2004 | pmid = 14737185 | doi = 10.1371/journal.pbio.0020005 ]Mitochondrial genome
The human
mitochondrial genome , while usually not included when referring to the "human genome", is of tremendous interest to geneticists, since it undoubtedly plays a role inmitochondrial disease . It also sheds light on human evolution; for example, analysis of variation in the human mitochondrial genome has led to the postulation of a recent common ancestor for all humans on the maternal line of descent. (seeMitochondrial Eve )Due to the lack of a system for checking for copying errors, Mitochondrial DNA (mtDNA) has a more rapid rate of variation than nuclear DNA. This 20-fold increase in the mutation rate allows mtDNA to be used for more accurate tracing of maternal ancestry. Studies of mtDNA in populations have allowed ancient migration paths to be traced, such as the migration of Native Americans from
Siberia orPolynesia ns from southeasternAsia . It has also been used to show that there is no trace ofNeanderthal DNA in the European gene mixture inherited through purely maternal lineage. [cite web | last = Sykes | first = Bryan | date = 2003-10-09 | url = http://genome.wellcome.ac.uk/doc_WTD020876.html | title = Mitochondrial DNA and human history | publisher = The Human Genome | accessdate = 2006-09-19 ]Epigenome
A variety of features of the human genome that transcend its primary DNA sequence, such as
chromatin packaging,histone modifications andDNA methylation , are important in regulating gene expression, genome replication and other cellular processes. [ [http://www.cell.com/content/article/abstract?uid=PIIS0092867407001262 Cell - Misteli ] ] [ [http://www.cell.com/content/article/abstract?uid=PIIS0092867407001286 Cell - Bernstein et al ] ] These "epigenetic" features are thought to be involved in cancer and other abnormalities, and some may be heritable across generations.See also
*Eukaryotic chromosome fine structure
*Eugenics
*Human Genome Project
*Genomic organization
*The Genographic Project
*Karyotype
*Mitochondrial Eve
*Y-chromosomal Adam
*genetic distance
*Human genetic engineering
*Craig Venter's genomeReferences
*cite journal | author = Lindblad-Toh K, et al. | title = Genome sequence, comparative analysis and haplotype structure of the domestic dog. | journal = Nature | volume = 438 | issue = 7069 | pages = 803–19 | year = 2005 | pmid = 16341006 | doi = 10.1038/nature04338 [http://www.nature.com/nature/journal/v438/n7069/abs/nature04338.html]
External links
* [http://www.genome.gov/ The National Human Genome Research Institute]
* [http://www.ensembl.org/ Ensembl] The
* [http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606 National Library of Medicine human genome viewer]
* [http://genome.ucsc.edu/ UCSC Genome Browser] .
* [http://www.ornl.gov/sci/techresources/Human_Genome/project/info.shtml Human Genome Project] .
* [http://www.sabanciuniv.edu/do/eng/PodCast/files/podcast18.mp3 Sabancı University School of Languages Podcasts What makes us different from chimpanzees? by Andrew Berry] (MP3 file)
* [http://www.cdc.gov/genomics/default.htm The National Office of Public Health Genomics]
*New findings: established views about human genome challenged [http://www.genome.gov/25521554] [http://www.france24.com/france24Public/en/news/science/20070613-biothec-genome-dna-genes-discoveries-biology-medecine.html] [http://www.spiritindia.com/health-care-news-articles-10638.html]
Wikimedia Foundation. 2010.