This is a searchable database of bioinformatic predictions of putative intergenic unannotated secreted peptides from Arabidopsis thaliana.

The protein list was created as follows:

1.) Search Arabidopsis thaliana v5 chromosomal dna sequences (; The Arabidopsis Genome Initiative, 2000, Nature, 408:796-815) for open reading frames (ORFs) encoding proteins between 25 and 250 amino acids.

2.) Filter the ORFs for the longest ORF if in-frame overlapping ORFs.

3.) Filter the ORFs for the presence of an amino terminal cleavable signal peptide by SignalP3.0 NN (Bendtsen et al. 2004, J. Mol. Biol. 340: 783-95).

4.) Filter the proproteins (signal peptide removed) for the absence of a transmembrane domain by TMHMM2.0 (Krogh et al. 2001, J. Mol. Biol. 305: 567-580)

5.) Filter the proproteins for the absence of an endoplasmic reticulum lumenal retention sequence (KDEL or HDEL before stop).

6.) Filter the ORFs for those that do not overlap annotated genes in TAIR6.0 annotated release of the Arabidopsis thaliana genome (

Additional information for ORFs provided:

1.) SALK whole genome Tiling array (Yamada et al.) analysis.

Thank you to Dong Xu and Trupti Joshi for help accessing the data and providing tips on the analysis.

We identified all unique probes among the 12 chips.

We identified the median value for each chip for the root, leaf, suspension cell, and flower experiments.

We filtered out unique probes that were >= twice the median value on a chip.

We report the probe data (raw intensity) meeting the aforementioned criteria that overlap a given ORF. The sequence shown is the reverse complement of the actual probe sequence.

2.) Blastp (Alt et al.) reports of the preproproteins against the plant protein database, showing top ten hits, the hit scores and the e values.

3.) Tblastn with the preproproteins querying Oryza sativa cv japonica, assembly v4.0 of chromosomes by TIGR, showing best hit.

4.) Single-linkage clustering with blastclust of all of the preproproteins. The first number is the cluster number and the number in parentheses is the number of members in that cluster.

