Example 1.

 

 

1.      Go to Search at top of the web page.

2.      Paste NP_757069.1 in the search box and click on the search button.

Protein ID Result

3.      The resulting page shows you the results of domain searches for the annotated protein (defined as frame +1) and the putative protein derived from the alternate ORF (Figure 1). In the case of this example, the putative protein is derived from the translation of alternate frame -1. (Figure 1)

 

4.      Exploring the results buttons shows that the annotated protein NP_757069.1 has no known domains or motifs but the putative protein from frame -1 has a hit with COG1027 (Fig. 2). This is an example of potential mis-annotation where frame -1 is the more likely gene candidate. (Figure 2)

Protein ID Search 

 

Example 2.

 

1. A BlastP search using NP_757069.1 (from Escherichia coli CFT073) against the microbial database at NCBI reveals one additional significant hit, NP_737680.1 in Corynebacterium efficiens YS-314. (Fig. 3) )

BlastP Result 

2. When the alternate ORFs of NP_737680.1 are evaluated using AlterORF a putative protein from frame -1 has domain hits. This suggests that one gene was mis-annotated and later the error was extended to a second genome. This example shows how the use of AlterORF as a tool during genome annotation could help reduce potential annotation errors that can propagate later through other genomes.

 

 

Search by organism.

1.    Go to page Organism Search at the top of the page.

2.    Choose ”Mesorhizobium loti MAFF303099, complete genome” in the organism list.

A table listing all Mesorhizobium loti MAFF303099 protein coding genes with at least one ORF will be shown. (Figure 4) 

 

Organism Search 

1.    You can click in a protein ID to go to the page described in the section “Search by protein ID” of this tutorial. Choose NP_107992.1.

2.    Look at the PFAM table for each alternate ORF and the +1 gene.

3.    Both the +1 and -1 genes have hit in Pfam (pfam02530 and pfam05244 respectively). This is an example of bad annotation in the sense that a protein with a good Pfam hit was not annotated. Interestingly, this is the case that normal annotation pipelines reject because people do not design the pipeline to find overlapping genes. Conserved overlapping gene pairs have been describe in Virus, Bacteria, Archeae and Eukarya and one the aims of AlterORF Database is to make an insight I the biology of this especial pair of genes. 

 

Search by sequence.

1.      If you do not know the protein ID of you sequence of interest or it is not in our data set you can start a sequence search pasting your protein sequence at here.

2.      From the resume table you can choose a protein ID (eventually yours) and go to the page describe at Search by protein ID in this tutorial.

 

 

 

What information can I extract from alternate ORFs families?

 

The cross genera conservation of some alternate ORFs suggest that they might represent new protein families or domains. Hierarchically clustering was used to build sequence families using the hcluster_sg software, developed as part of the TreeFam project, because it is fast and avoids loading the matrix to memory (our matrix is a square matrix of ~ 3 million elements). Blast e-values were normalized from 0 to 100 (with 100 meaning e-value = 0) with the formulae (–log10 (e-value))/2.

Some proteins will have a link to the corresponding alternate ORF protein family. We make these families because they are useful to find miss-annotation and search for new protein families encoded in alternate ORFs. In this AlterORF release we have not curated the protein families but for futures releases we will, probably collapsing many families, and we will provide a method to evaluate the significance of each family (taking in account organism G+C, aminoacid usage and number of members).