Arabidopsis Unannotated Secreted Peptide Database

created by Kevin Lease and John Walker, University of Missouri

If you have questions, email leasek@missouri.edu

This is a searchable database of bioinformatic predictions of putative intergenic unannotated secreted peptides from Arabidopsis thaliana.

Go to the search form


The protein list was created as follows:

1.) Search Arabidopsis thaliana v5 chromosomal dna sequences (http://www.arabidopsis.org; The Arabidopsis Genome Initiative, 2000, Nature, 408:796-815) for open reading frames (ORFs) encoding proteins between 25 and 250 amino acids.

2.) Filter the ORFs for the longest ORF if in-frame overlapping ORFs.

3.) Filter the ORFs for the presence of an amino terminal cleavable signal peptide by SignalP3.0 NN (Bendtsen et al. 2004, J. Mol. Biol. 340: 783-95).

4.) Filter the proproteins (signal peptide removed) for the absence of a transmembrane domain by TMHMM2.0 (Krogh et al. 2001, J. Mol. Biol. 305: 567-580)

5.) Filter the proproteins for the absence of an endoplasmic reticulum lumenal retention sequence (KDEL or HDEL before stop).

6.) Filter the ORFs for those that do not overlap annotated genes in TAIR6.0 annotated release of the Arabidopsis thaliana genome (http://www.arabidopsis.org/).

Additional information for ORFs provided:

1.) SALK whole genome Tiling array (Yamada et al. 2003) analysis.

Thank you to Dong Xu and Trupti Joshi for help accessing the data and providing tips on the analysis.

We identified all unique probes among the 12 chips.

We identified the median value for each chip for the root, leaf, suspension cell, and flower experiments.

We kept unique probes that were >= twice the median value on a chip.

We report the probe data (raw intensity) meeting the aforementioned criteria that overlap a given ORF. The sequence shown is the reverse complement of the actual probe sequence.

2.) Blastp (Altschul et al., 1990) reports of the preproproteins against the plant protein database, showing top ten hits, the hit scores and the e values.

3.) Tblastn with the preproproteins querying Oryza sativa cv japonica, assembly v4.0 of chromosomes by TIGR, showing best hit.

4.) Single-linkage clustering with blastclust of all of the preproproteins. The first number is the cluster number and the number in parentheses is the number of members in that cluster.

5.) The twenty nucleotide signature Arabidopsis Massively Parallel Signature Sequencing (MPSS) dataset from the Meyers' lab MPSS database was downloaded (http://mpss.udel.edu/at/public_data/20bp/20bp_summary.txt). Class four signatures (intergenic signatures) were obtained from the dataset and the positions of the class four signatures were compared with the database ORFs to identify 177 supporting signatures.

Helpful hints for searching: Most fields combine in an AND fashion. In other words, if you enter MAL into preproprotein and select "top" for strand, it will only retrieve records that match both of the two selective criteria. Leaving a particular field blank results in that field being unused as a selective filter. When viewing blast result columns in the table, a blank cell indicates no significant hit was found. To see more data presented in the table, check additional boxes at the bottom of the search form. If you have any questions, please send an email to leasek@missouri.edu.

Go to the search form

Acknowledgements: Special thanks to Alan Marshall and Josh Hartley for their help with the server. Thanks to Dong Xu and Trupti Joshi for access and help with tiling array data. Thanks to Bill Spollen and Gordon Springer for help submitting blast jobs on the research cluster. Thanks to the Walker lab for suggestions on the web interface. Thank you to anonymous reviewers for your helpful comments and suggestions. A special thanks to Jim Burnette for introducing me to Perl.