SRI International's Bioinformatics Research Group previously performed an analysis of the contents of the ENZYME database, which determined that no gene or protein sequence is known for more than 1400 enzyme activities, corresponding to 38% of enzyme classification (E.C.) numbers.
This lack of sequence data for such a large fraction of enzyme activities hinders research and biotechnology in a number of areas ranging from genome annotation to metabolic engineering and pathway prediction. Fortunately, the growing availability of large numbers of completely sequenced genomes enables the eventual identification of the genes encoding these enzymes using data available in the literature, combined with computational and experimental analyses.
Consequently, the BRG performed a literature research project to confirm the lack of sequences for these orphan enzymatic activities, to find sequences that do exist in the literature, and to find what types of information typically exist in the literature for these orphan activities in order to assess the difficulty and best strategies for a proposed Enzyme Genomics Initiative (Karp, 2004) that would experimentally elucidate sequences for orphan activities.