Rfam

Rfam

Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database hosted by the Wellcome Trust Sanger Institute in collaboration with Janelia Farm.cite journal | author = Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR | title = Rfam: an RNA family database | journal = Nucleic Acids Res. | volume = 31 | issue = 1 | pages = 439–41 | year = 2003 | pmid = 12520045 | doi = ] Rfam is designed to be similar to the Pfam database for annotating protein families.

Unlike proteins, ncRNAs often have similar secondary structure without sharing much similarity in the primary sequence. Rfam divides ncRNAs into families based on evolution from a common ancestor.cite journal | author = Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A | title = Rfam: annotating non-coding RNAs in complete genomes | journal = Nucleic Acids Res. | volume = 33 | issue = Database issue | pages = D121–4 | year = 2005 | pmid = 15608160 | doi = 10.1093/nar/gki081] Similar to protein families, making multiple sequence alignments (MSA) of these families can provide insight into their structure and function. These multiple sequence alignments become more useful with the addition of secondary structure information.

Uses of Rfam

The Rfam database can be used for a variety of functions. For each ncRNA family, the interface allows users to: view and download multiple sequence alignments; read annotation; and examine species distribution of family members. There are also links provided to literature references and other RNA databases.Rfam also provides links to wikipedia so that entries can be created or edited by users.

The interface at the Rfam website allows users to search ncRNAs by keyword, family name, or genome as well as to search by ncRNA sequence or EMBL accession number. [http://www.sanger.ac.uk/Software/Rfam/index.shtml ] The database information is also available for download, installation and use using the INFERNAL software package.cite journal | author = Eddy SR | title = A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure | journal = BMC Bioinformatics | volume = 3 | issue = | pages = 18 | year = 2002 | pmid = 12095421 | doi = ] The INFERNAL package can also be used with Rfam to annotate sequences (including complete genomes) for homologues to known ncRNAs.

Methods

In the database, the information of the secondary structure and the primary sequence, represented by the MSA, is combined in statistical models called profile stochastic context-free grammars (SCFGs), also known as covariance models. These are analogous to hidden Markov models used for protein family annotation in the Pfam database. Each family in the database is represented by two multiple sequence alignments and a SCFG.

The first MSA is the “seed” alignment. It is a hand curated alignment that contains representative members of the ncRNA family and is annotated with structural information. This seed alignment is used to create the SCFG, which is used with the Rfam software INFERNAL to identify additional family members and add them to the alignment. A family-specific threshold value is chosen to avoid false positives.

Performing Rfam searches using profile SCFG is very computationally expensive, and even for a small ncRNA family takes an unreasonable amount of time for a computer search. To reduce the search time, an initial BLAST search is used to reduce the search space to a manageable size.

The second MSA is the “full” alignment, and is created as a result of a search using the covariance model against the sequence database. All detected homologs are aligned to the model, giving the full alignment.

History

Version 1.0 of Rfam was launched in 2003 and contained 25 ncRNA families and annotated about 50 000 ncRNA genes. In 2005, version 6.1 was released and contained 379 families annotating over 280 000 genes. As of July 2008, the current version 9.0 contains 603 RNA families.

Problems

#Certain ncRNA families, such as snoRNA and miRNA, do not have conserved primary or secondary structure, and therefore cannot be well represented in the Rfam database. However, Rfam provides a link to miRBase, a separate miRNA database.
#Use of a BLAST search to reduce the ncRNA search space to a computationally manageable size causes reduced sensitivity in finding true homologs of the ncRNA family.
#The genomes of higher eukaryotes contain many ncRNA-derived pseudogenes and repeats. Distinguishing these non-functional copies from functional ncRNA is a formidable challenge.
#Introns are not modeled by covariance models.

References

External links

* [http://rfam.sanger.ac.uk/ Rfam Web site at the Sanger Institute]
* [http://infernal.janelia.org/ INFERNAL software package]
* [http://microrna.sanger.ac.uk/ miRBase]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • ənbərfam — ə. və f. ənbər rəngli, ənbər kimi …   Klassik Azərbaycan ədəbiyyatında islənən ərəb və fars sözləri lüğəti

  • Crinivirus — Virus classification Group: Group IV ((+)ssRNA) Family: Closteroviridae Genus: Crinivirus …   Wikipedia

  • Stockholm format — is a Multiple sequence alignment format used by Pfam and Rfam to disseminate protein and RNA sequence alignmentscite journal |author=Griffiths Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A |title=Rfam: annotating non coding RNAs in… …   Wikipedia

  • Mycobacterium tuberculosis sRNA — Secondary structure of b55, one of the sRNAs experimentally confirmed in M. tuberculosis Mycobactierum tuberculosis contains at least nine small RNA families in its genome.[1] The small RNA (sRNA) families were identified through RNomics the… …   Wikipedia

  • Telomerase RNA component — A 3D representation of part of the telomerase RNA component. This is the solution structure of the P2b P3 pseudoknot from human telomerase RNA.[1] …   Wikipedia

  • ctRNA — Predicted secondary structure and sequence conservation of ctRNA p42d Identifiers Symbol ctRNA p42d Rfam …   Wikipedia

  • Hammerhead ribozyme — Taxobox | color=white name = Hammerhead Ribozyme image caption = Stylized rendering of the full length hammerhead ribozyme RNA molecule species = Catalytic RNAHammerhead RNAs are small self cleaving RNAs that have a conserved motif found in… …   Wikipedia

  • Transfer RNA — The interaction of tRNA and mRNA in protein synthesis tRNA Identifiers Symbol tRNA …   Wikipedia

  • Small Cajal body-specific RNA 17 — scaRNAs are a specific class of small nuclear RNAs which localise to the Cajal bodies and guide the modification of RNA polymerase II transcribed spliceosomal RNAs U1, U2, U4, U5 and U12 [1] . The complete human U12 22/U4 8 scaRNA is composed of… …   Wikipedia

  • 16S rRNA — Die ribosomale Ribonukleinsäure, abgekürzt rRNA, ist eine Ribonukleinsäure, die in den Ribosomen vorkommt. Die rRNA ist wie die tRNA eine non coding RNA. Sie trägt somit keine genetische Information, die in Proteine umgeschrieben wird, sondern… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”