ESEfinder



Luca Cartegni, Jinhua Wang, Zhengwei Zhu, Michael Q. Zhang and Adrian R. Krainer





NOT ALL HIGH-SCORES are ESEs !

- The presence of a high-score motif in a sequence does NOT necessarily identify that sequence as an exonic splicing enhancer in its native context. For example, a nearby silencer element may prevent the SR protein from binding.

- The default threshold values are based on statistical analysis and empirical data, but they are still somewhat arbitrary. Any refinements or updates will be incorporated as they become available.

- There is at best only a rough correlation between numerical scores and ESE activity of high-score motifs. For example, the maximum score is not necessarily the most effective ESE.

- The score values of ESEs corresponding to different SR proteins cannot be compared to each other.

- The program currently searches the ESE motifs corresponding to four SR proteins. There are several other SR proteins for which the ESE motifs have not yet been identified, at least by the strategy used here.

- The ESE motifs were identified using human SR proteins. Their relevance to other species depends on the extent of conservation of each SR protein.


SEQUENCE FORMAT

Important notice:

To improve performance, the maximum length of a single search string is 5000 characters (one continuous sequence or the total of multiple sequences), regardless of the format used.

The query sequences can be directly pasted into the input box, or can be uploaded from a text file.

Multiple sequences can be analyzed simultaneously, provided that a FASTA-format descriptive line (beginning with ">") precedes them.

FASTA format :

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.
The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column.

An example sequence in FASTA format is:

>your query
TCGTAGACCAGTCAAGTGCAACTTCCAGTGATCAGATCACTGCTTATTGCCTATTGTTTT

Even though ESEfinder is an RNA analysis tool, only standard DNA notation is accepted (A, C, G, and T, not U). The program will ignore any character other than A, C, G, and T, including spaces, paragraph brakes and so on.

In other words, the input can be sequence interspersed with numbers and/or spaces, such as the sequence portion of a GenBank/GenPept flatfile report:

1   gaattcactt tatttcagca ccagtcctct ccccttttcc ttcccaggat cttcactcaa
61  cttgaagatt tctgctttca taggagtttg tagtctgcac tgtaaccacc caatacatcg
121 acaggttaaa aaaagagagc tcttgctcag aaagagctag aaagactgta gagcctaagg
181 ggtttgtttt tacctccctc ctggaagcca atagcccttt tttttttttt cctgggaggt

Both upper and lower cases are accepted, but the output lines will be in upper case.



web site created and mantained by Luca Cartegni (cartegni@cshl.edu). CGI-scripts implemented by Jinhua Wang (wang@cshl.edu)

 

Return to
Krainer's lab page

Zhang's lab page