Beam RNA Interaction mOtif search tool

Beam RNA Interaction mOtif search tool

Documentation

Info BRIO (BEAM RNA Interaction mOtifs) is a web server designed for searching known sequence and structure protein binding motifs in RNA molecules of interest.
BRIO is addressed to users with a collection of RNA sequences that are interested in the identification of sequence or structure motifs involved in the interaction of these molecules with RNA binding proteins or domains.
The dataset of known motifs contains 2508 sequence and 2296 structure motifs, associated with the binding of 186 single proteins and 69 single protein domains.
The motifs were identified in Homo sapiens in three different types of experiments (PAR-CLIP, eCLIP, HITS), and Mus musculus (PAR-CLIP, HITS, (Adinolfi et al., 2019)).
The search is based on the BEAM algorithm for motifs finding (Pietrosanto et al., 2016).
Input Users can either insert only the RNA sequence/s of interest in the appropriate window, or the sequence and the corresponding secondary structure in the dot-bracket notation, in multiFASTA format. Alternatively, the input upload is also available. Sequences submitted without the secondary structure are folded with the RNAfold program (Lorenz et al., 2011). The RNA molecules have to be at least 3 nucleotides long, and shorter than 3000. To search for structure motifs, sequences in input are requested to be at least 50 nucleotides long. At most 100 sequences can be submitted at a time.

Example of an input in multiFASTA format with dot-braket notations:

>chr1:149783661-149783992(-) AGCACUUUGCGAGUCUUCAUUUGCAUACGGGCUCUAUAAGUAGCGCAUAACCAGCCCGUUUUGCGGUAGUUCGGAUUACUUCUUUAAGUCUCUUUUCUCUUUUUUCGCGCAAAAAUGCCGGAUCCAGCGAAAUCCGCUCCUGCUCCCAAGAAGGGCUCCAAAAAGGCUGUUACGAAAGUGCAGAAGAAGGACGGCAAGAAGCGCAAGCGCAGCCGCAAGGAGAGCUACUCCGUUUACGUGUACAAGGUGCUGAAGCAGGUCCACCCCGACACCGGCAUCUCGUCCAAGGCCAUGGGCAUCAUGAACUCCUUCGUCAACGACAUCUUCGAGC .(((((((..((((((............)))))).....(((((((......(((((.(((((.(((((..((((((((((....))))......((.((...((((.(((....)))))))...)).))))))))...)))))..))))).)))))........)).))))).))))))).((.((((((((((.....((....))...))))...((((.....)))).............((((((((..(..((......))..)..)))))))).(((((......))))).........)))))).))..(((.....)))... >chr1:149784741-149784985(-) CUUCCAGAGCUCGGCCGUGAUGGCGCUGCAGGAGGCCAGCGAGGCCUACCUGGUGGGGCUGUUCGAAGACACGAACCUGUGCGCCAUCCAUGCCAAGCGCGUGACCAUCAUGCCCAAGGACAUCCAGUUGGCCCGCCGCAUCCGCGGGGAGCGGGCCUAAGGCAUAUUUUUAAGUGGUCGAUCUAAAGGCUCUUUUCAGAGCCACUGCCGUUUUCAUCAAGAGCAGCUGUACCGGCUCUCCAUC .....(((((.(((..(.((((((((.(((((((.((.((.(((....))).)).)).)).((((......))))))))))))))))))((((.....))))((((((.(((((...((....))....(((((((....(((....))))))))))...)))))........))))))........((((((....))))))((.((.(((((.....))))).)).)).)))))))).....

Users can choose to compare their RNAs to the whole dataset of motifs, or to only human or mouse motifs. Users can also choose to select only a subset of the experiments analyzed (PAR-CLIP, eCLIP, HITS), deleting the unwanted type of experiment. Addition of species or of experimental data can be done by clicking on the empty lines of the box reported below. Deletion is done by clicking on the X character at the right of each species/dataset. Since selected eCLIP datasets were obtained from experiments regarding only Human, looking to find these type of experiments in Mouse will produce no results.
Method BRIO uses the substitution matrices (classic MBR, Matrix of Bear encoded RNA, for structure, and classic substitution matrix for nucleotides, with 3 for nucleotide matching and -2 otherwise), and scans the models in any single input RNA using a sliding-window ungapped alignment keeping the best match that, at the end, is compared to the original minimum score of the model. BRIO returns a collection of protein binding motifs identified in the input RNA molecules, with related statistics (position of the motif in the set of RNA molecules in which the motif was originally identified, p-value, odds ratio, BEAM score). BRIO applies the Fisher’s Test to determine if a motif is enriched in the input RNA molecules with respect to a set of background RNAs. The background can be specified by the user. By default, all Rfam 14.3 sequences are considered (Bateman et al., 2011).
Data collected BRIO compares the input RNA molecules with the secondary structure and sequence motifs identified in 186 RNA binding proteins and 69 protein domains analyzing 228 PAR-CLIP, eCLIP, HITS-CLIP experiments in human and mouse (Blin et al., 2014; Adinolfi et al., 2019).
Motif type # of motifs
sequence motifs 2296 (2112 hg19 + 184 mm9)
structure motifs 2508 (2319 hg19 + 189 mm9)
Output This server compares each RNA sequence and secondary structure with the dataset of sequence and secondary structure motifs identified in the CLIP experiments selected by the user.
The output is composed of two tables, one focused on showing the results regarding the enriched motifs found in the entire input set, the other focused on showing the results for each single input sequence provided.
Here follows an example of the Enriched Motifs results output Table:

In the Enriched Motifs Table, by default the different motifs are sorted according to their associated Fisher’s Test p-value (see below for more details):
In each column the user can see:
  1. the logo of the secondary structure motif in the BEAR alphabet or, in case of sequence motifs, in the IUPAC nucleic acid notation (logos have been generated using WebLogo (Crooks et al., 2004);
  2. the type of motif: structural or sequence motif;
  3. The type of mapping regions from gencode annotation of the RNAs datasets where the motif was originally found, includes UTR, CDS and the entire transcript for those involving RBPs known to act in the nucleus on unspliced RNAs;
  4. the coverage represents the number of input sequences in which the motif has a score higher than its associated threshold, divided by the number of query sequences;
  5. the Fisher’s Test p-value. The Fisher’s Test is applied to determine if a motif is enriched in the input RNA molecules with respect to a set of background RNAs;
  6. The type of the CLIP experiment analyzed (parCLIP/eCLIP/hitsCLIP);
  7. the protein associated to the RNA secondary structure motif in the CLIP experiment analyzed;
  8. the protein domain associated to the RNA secondary structure motif (this information is not always available);
  9. the cellular line used in the eCLIP experiments;
  10. the link to the experiment page (for eCLIP data), or to the corresponding paper for PAR-CLIP and HITS data);
  11. the organism in which the experiment was performed (human or mouse);
  12. last column reports the link for the download of the information on the motif described in the row.
The Table contents can be sorted in ascending or descending order by clicking on the appropriate arrow of the single columns.
The sequence-focused table can be accessed by selecting the appropriate tab, at the top of the Table:
Each row of this Table represents one of the input sequence/s given by the user, the four columns contains the name of the sequence, the number of sequence motifs found in that sequence, the number of structural motifs, and the total length of the input sequence.
For each sequence, by clicking on the “+” symbol, another table will appear, containing detailed informations about all the motifs found in that sequence:

In particular, in this case, the columns will display the following information for each motif:
  • the start and the end position of the motif in the selected particular sequence;
  • the representation of the motif in BEAR alphabet for structure motifs or in IUPAC nucleic acid notation for sequence motifs;
  • the type of the motif: sequence or structural;
  • the protein associated with the RNA secondary structure motif in the CLIP experiment analyzed by Adinolfi et al. (2019) and the type of the CLIP experiment.
Encodings BEAR: the BEAR encoding represents each position of an RNA molecule with a character taken from a set of 83, each describing its structural context and its length (Mattei et al., 2014).
quickBEAR(qBEAR): qBEAR was originally developed as a way to show alignments of RNA structures in a Logo form (Crooks et al., 2004), where each position of an RNA molecule is described by one of 18 characters encoding for structural contexts and length groups (Pietrosanto et al., 2016).
The following TABLE illustrates the mapping between Secondary Structure Elements and qBEAR and BEAR encodings used throughout the work. Motivations for BEAR can be found in the original work (Mattei et al., 2014).

Browser compatibility
OS Version Chrome Firefox Microsoft Edge Safari
MacOS Catalina 87.0.4280 84.0 n/a 14.0
MacOS High Sierra 87.0.4280 81.0.1 n/a 13.1.2
Linux Ubuntu 20.04 87.0.4280 84.0 n/a n/a
Windows 10 87.0.4280 84.0 87.0.664.66 n/a
Citation Please cite us if you make use of BRIO:

Guarracino A, Pepe G, Ballesio F, Adinolfi M, Pietrosanto M, Sangiovanni E, Vitale I, Ausiello G, Helmer-Citterich M. BRIO: a web server for RNA sequence and structure motif scan. Nucleic Acids Res (Link).
References
  • Adinolfi M, Pietrosanto M, Parca L, Ausiello G, Ferrè F, Helmer-Citterich M. (2019) Discovering sequence and structure landscapes in RNA interaction motifs. Nucleic Acids Res, 47:4958-4969.
  • Bateman,A., Finn,R.D. and Petrov,A.I. (2017) Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res., 46, 335–342.
  • Blin, K., Dieterich, C., Wurmus, R., Rajewsky, N., Landthaler, M., & Akalin, A. (2014). DoRiNA 2.0—upgrading the doRiNA database of RNA interactions in post-transcriptional regulation. Nucleic Acids Research.
  • Crooks, Gavin E., et al. "WebLogo: a sequence logo generator." Genome research 14.6 (2004): 1188-1190.
  • Lorenz R, Bernhart S.H., Hoener zu Siederdissen C., Tafer H., Flamm C., Stadler P.F. and Hofacker I.L. (2011), "ViennaRNA Package 2.0", Algorithms for Molecular Biology: 6:26
  • Mattei,E., Ausiello,G., Ferrè,F. and Helmer-Citterich,M. (2014) A novel approach to represent and compare RNA secondary structures. Nucleic Acids Res., 42, 6146–6157.
  • Pietrosanto,M., Mattei,E., Helmer-Citterich,M. and Ferrè,F. (2016) A novel method for the identification of conserved structural patterns in RNA: From small scale to high-throughput applications. Nucleic Acids Res., 44, 8600–8609.