Open Regulatory Annotation Database

Open Regulatory Annotation Database

The Open Regulatory Annotation Database (also known as ORegAnno) is designed to promote community-based curation of regulatory information. Specifically, the database contains information about regulatory regions, transcription factor binding sites, regulatory variants, and haplotypes.

Contents

Overview

Data Management

For each entry, cross-references are maintained to EnsEMBL, dbSNP,Entrez Gene, the NCBI Taxonomy database and PubMed. The information within ORegAnno is regularly mapped and provided as a UCSC Genome Browser track. Furthermore, each entry is associated with its experimental evidence, embedded as an Evidence Ontology within ORegAnno. This allows the researcher to analyze regulatory data using their own conditions as to the suitability of the supporting evidence.

Software and data access

The project is open source - all data and all software that is produced in the project can be freely accessed and used.

Database contents

As of December 20 2006, ORegAnno contained 4220 regulatory sequences (excluding deprecated records) for 2190 transcription factor binding sites, 1853 regulatory regions (enhancers, promoters, etc), 170 regulatory polymorphisms, and 7 regulatory haplotypes for 17 different organisms (predominantly Drosophila melanogaster, Homo sapiens, Mus musculus, Caenorhabditis elegans, and Rattus norvegicus in that order). These records were obtained by manual curation of 828 publications by 45 ORegAnno users from the gene regulation community. The ORegAnno publication queue contained 4215 publications of which 858 were closed, 34 were in progress (open status), and 3321 were awaiting annotation (pending status). ORegAnno is continually updated and therefore current database contents should be obtained from www.oreganno.org.

RegCreative Jamboree 2006

The RegCreative jamboree was stimulated by a community initiative to curate in perpetuity the genomic sequences which have been experimentally determined to control gene expression. This objective is of fundamental importance to evolutionary analysis and translational research as regulatory mechanisms are widely-implicated in species-specific adaptation and the etiology of disease. This initiative culminated in the formation of an international consortium of like-minded scientists dedicated to accomplishing this task. The RegCreative jamboree was the first opportunity for these groups to meet to be able to accurately assess the current state of knowledge in gene regulation and to begin to develop standards by which to curate regulatory information.

In total, 44 researchers attended the workshop from 9 different countries and 23 institutions. Funding was also obtained from ENFIN, the BioSapiens Network, FWO Research Foundation, Genome Canada and Genome British Columbia.

The specific outcomes of the RegCreative meeting to date are:

  • Prior to the RegCreative Jamboree, attendees were asked to participate in an interannotator agreement assessment. Two ORegAnno mirrors were established with identical sets of publications to be annotated in their queue. In total, 33 redundant annotations from 18 publications were collected. (79 annotations for 31 papers and 60 annotations for 21 papers were collected on servers 1 and 2, respectively.) This effort was used as a baseline from which to establish annotator efficiency.
  • Hands-on annotation activities occurred during the first 2 days of the 3-day workshop. In total, 39 researchers contributed 184 TFBS and 317 Regulatory Regions from 96 papers. Many of these researchers were also trained on the ORegAnno system, significantly increasing its experienced-user community. The contribution of these annotations to individual species was 339 annotations in Homo sapiens, 42 annotations in Mus musculus, 72 annotations in Drosophila melanogaster, 24 annotations in Ciona intestinalis, 14 annotations in Rattus norvegicus, 6 annotations in Halocynthia roretzi, 2 annotations in Ciona savignyi and 2 annotations in HIV. Within these annotations, one new dataset was added to ORegAnno; 274 human enhancers were programmatically annotated by Maximillian Haessler, Institute Alfred Fessard, from Visel et al., Nucleic Acids Research, 2006. In total, 130 scientific studies were examined in depth. The annotated papers were pre-selected from expert-curated publications in the ORegAnno queue that had full-text available through HighWire Press.
  • There exists an immediate need for improved data standardization and development of associated ontologies. Specifically, this should include the open access development and integration of transcription factor naming conventions and sequence, cell type, cell line, tissue, and evidence ontologies. The groundwork for addressing and prioritizing these needs was accomplished in several ways during the meeting:
    • Transcription factor naming issues were addressed through discussion of integration of transcription factor prediction pipelines, such as DBD or flyTF, which have been supplemented with manual curation versus solely manual curated implementations like TFcat.
    • Marc Halfon, University at Buffalo, led a breakout session to improve the Sequence Ontology from existing ORegAnno and REDfly database conventions within the framework being developed as part of the Open Biomedical Ontologies. A preliminary version of these improvements can be found on the ORegAnno wiki.
    • Learning-based ontology development was widely regarded as an essential feature of the annotation process. Such that, annotators are not restricted from annotating based on the limitations of the controlled vocabulary and that these exceptions can be used to further develop the backbone ontologies.
    • Ontology development should be decentralized from the ORegAnno annotation framework. Specifically, it is planned that the ORegAnno evidence ontology will be removed and made available to broader community development.
    • Renewed focus on integrating species-specific resources with annotation framework.
  • A specific focus of the workshop was addressing the role of text-mining in facilitating regulatory annotation. Sessions were led by Dr. Lynette Hirschman, MITRE, and Dr. Martin Krallinger, CNIO, to formulate where text-mining can help. A short term object of text-mining based analyses was formulated around both populating the ORegAnno queue and using the expert-curated portion of the ORegAnno queue to validate text-mining-based publication acquisition. The latter objectives are being led by Dr. Stein Aerts, University of Leuven.

References

  • Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED, Prychyna Y, Zhang X, Jones SJ. (2006). "ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation.". Bioinformatics 22 (5): 637–40. doi:10.1093/bioinformatics/btk027. PMID 16397004. 

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Regulatory sequence — A regulatory sequence (also called a regulatory region or a regulatory area ) is a segment of DNA where regulatory proteins such as transcription factors bind preferentially. These regulatory proteins bind to short stretches of DNA called… …   Wikipedia

  • Promoter — In biology, a promoter is a region of DNA that facilitates the transcription of a particular gene. Promoters are typically located near the genes they regulate, on the same strand and upstream (towards the 5 region of the sense strand).OverviewIn …   Wikipedia

  • Secuencia reguladora — Una secuencia reguladora (también denominada región reguladora o elemento regulador ) es un segmento de ADN donde las proteínas de unión al ADN, tales como los factores de transcripción, se ligan preferentemente. Estas regiones o secuencias… …   Wikipedia Español

  • Bioinformatics — For the journal, see Bioinformatics (journal). Map of the human X chromosome (from the NCBI website). Assembly of the human genome is one of the greatest achievements of bioinformatics. Bioinformatics …   Wikipedia

  • DNA — For a non technical introduction to the topic, see Introduction to genetics. For other uses, see DNA (disambiguation). The structure of the DNA double helix. The atoms in the structure are colour coded by element and the detailed structure of two …   Wikipedia

  • Metabolic network modelling — Metabolic network showing interactions between enzymes and metabolites in the Arabidopsis thaliana citric acid cycle. Enzymes and metabolites are the red dots and interactions between them are the lines …   Wikipedia

  • Gene prediction — Gene finding typically refers to the area of computational biology that is concerned with algorithmically identifying stretches of sequence, usually genomic DNA, that are biologically functional. This especially includes protein coding genes, but …   Wikipedia

  • ENCODE — Content Description whole genome data Contact Research center University of California Santa Cruz …   Wikipedia

  • High-content screening — is an automated cell biology method drawing on optics, chemistry, biology and image analysis to permit rapid, highly parallel biological research and drug discovery. Contents 1 General principles 2 The history of high content screening …   Wikipedia

  • USA PATRIOT Act — Full title Uniting and Strengthening America by Providing Appropriate Tools Required to Intercept and Obstruct Terrorism Act of 2001 Acronym USA PATRIOT Act, also Patriot Act Enacted by the 107th United States Congress …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”