Example 1.
1.
Go to Search
at top of the web page.
2.
Paste NP_757069.1 in the search box and
click on the search button.

3.
The resulting page shows you the results of domain searches for the
annotated protein (defined as frame +1) and the putative protein
derived from the alternate ORF (Figure 1). In the case of this
example, the putative protein is derived from the translation of
alternate frame -1. (Figure 1)
4.
Exploring the results buttons shows that the annotated protein
NP_757069.1 has no known domains or motifs but the putative protein
from frame -1 has a hit with COG1027 (Fig. 2). This is an example of
potential mis-annotation where frame -1 is the more likely gene
candidate. (Figure 2)
Example 2.
1. A BlastP search using NP_757069.1 (from Escherichia coli CFT073) against the microbial database at NCBI reveals one additional significant hit, NP_737680.1 in Corynebacterium efficiens YS-314. (Fig. 3) )
2. When the alternate ORFs of NP_737680.1 are evaluated using
AlterORF a putative protein from frame -1 has domain hits. This
suggests that one gene was mis-annotated and later the error was
extended to a second genome. This example shows how the use of
AlterORF as a tool during genome annotation could help reduce
potential annotation errors that can propagate later through other
genomes.
Search by organism.
1.
Go to page
Organism Search at the top of the page.
2.
Choose ”Mesorhizobium
loti MAFF303099, complete genome” in the organism list.
A table listing all Mesorhizobium loti MAFF303099 protein coding genes with at least one ORF will be shown. (Figure 4)
1.
You can click in a protein ID to go to the page described in the
section “Search by protein ID” of this tutorial. Choose NP_107992.1.
2.
Look at the PFAM table for each alternate ORF and the +1 gene.
3.
Both the +1 and -1 genes have hit in Pfam (pfam02530 and
pfam05244 respectively). This is an example of bad annotation in
the sense that a protein with a good Pfam hit was not annotated.
Interestingly, this is the case that normal annotation pipelines
reject because people do not design the pipeline to find
overlapping genes. Conserved overlapping gene pairs have been
describe in Virus, Bacteria, Archeae and Eukarya and one the
aims of AlterORF Database is to make an insight I the biology of
this especial pair of genes.
Search by sequence.
1.
If you do not know the protein ID of you sequence of interest or it
is not in our data set you can start a sequence search pasting your
protein sequence at
here.
2.
From the resume table you can choose a protein ID (eventually yours)
and go to the page describe at
Search by protein ID in this tutorial.
What information
can I extract from alternate ORFs families?
The cross genera conservation of some alternate ORFs suggest that
they might represent new protein families or domains. Hierarchically
clustering was used to build sequence families using the hcluster_sg
software, developed as part of the
TreeFam project, because it is fast and avoids
loading the matrix to memory (our matrix is a square matrix of ~ 3
million elements). Blast e-values were normalized from 0 to 100
(with 100 meaning e-value = 0) with the formulae (–log10
(e-value))/2.
Some proteins will have a link to the corresponding alternate ORF
protein family. We make these families because they are useful to
find miss-annotation and search for new protein families encoded in
alternate ORFs. In this AlterORF release we have not curated the
protein families but for futures releases we will, probably
collapsing many families, and we will provide a method to evaluate
the significance of each family (taking in account organism G+C,
aminoacid usage and number of members).