BioSPICE Warehouse Loader Handoff Readme

Directory Structure

Java Code:
- biospice/warehouse/util/src/java
- SLO Java code is written in an interface/implementation format. Each business object is made up of an interface file and a class file. An object�s interface contains prototypes of public class methods. An object�s class contains the implementation of the public methods as well as any private methods. The object�s class is named the same as the object interface but with the letter "Impl" added. The combination of these two types, interface and class, allow for easier changes in the future. Instead of rewriting the object�s class file, the class can be extended so methods can be overloaded. A third file is used to control the creation of new objects, the factory class. Each package of classes should have its own factory class to create new objects.
- Example: The Chemical class get a new object from the SchemaFactory class.
  Chemical tempChem = SchemaFactory.getFactory().newChemical( "zinc" );
SQL:
- biospice/warehouse/loader/src/sql
- Warehouse-Schema.sql
Perl:
- biospice/warehouse/util/src/perl
- deleteDatasetWID.pl
  - One argument, the WID of the DataSet to be deleted.
- dataSetStats.pl
  - No arguments.
- getReaction.pl
  - One argument, the WID of the protein envoled in a reaction.
Grammar:
- biospice/warehouse/loader/src/grammars
  1. Enzyme
  2. Swiss-Prot
- Lexer
  - A program that reads in a text file and converts words into tokens based on a grammar file.
- Parser
  - A program the compares the order of tokens to an order specified by grammar rules.

Ant / Compiling

"ant" : Builds all grammars
"ant help" : Displays this help message
"ant enzyme" : Builds a warehouse database loader which will load data files that match the grammar defined in src/grammars/enzyme.g
"ant swissprot": Builds a warehouse database loader which will load data files that match the grammar defined in src/grammars/swissprot.g
"ant build" : This target may be used to build a warehouse database loader for a given grammar. The antlr grammar is defined in the "GrammarFile" property.
- For example: ant -DGrammarFile="c:/myGrammars/testGrammar.g" build
- Note that the file supplied must be an absolute path.
"ant doc" : Generates javadoc for the java source files. The javadoc is written to the docs/api directory.
"ant clean" : Cleans the distribution, removing all class files and jar files. This target also deletes the distribution directory.
"ant cleandoc" : Cleans the javadoc, if any
"ant cleanall" : Executes both the "clean" and "cleandoc" tasks.

Logging

Loader uses a logging system by apache software. The web site is http://jakarta.apache.org/log4j/docs/index.html.
The jars necessary to create the log files are included in the jar folder of cvs. Logging is done by create a Category variable:
- private static Category log = Category.getInstance( class name );
Information that is logged is shown in the consul window and written to a text file.

Running

GUI Parameters
- java DBLoader.java -l logfile -p propertiesFile -t grammarType
Command Line Parameters
- java DSLoader.java -l logfile -p propertiesFile -f dataFile -t grammarType [-d] [-s sqlFile] [-o]

General Outline of Loader Process.

The command line arguments and data base properties files are read in. The program connects to the database. A ParserControl object is create to run the parser. Lexer and parser objects are created and initialized. An SQLOutput object is create to output the parsed information to a specific destination. A DataSet object is created and passed the SQLOutput object as a parameter; the DataSet object is also passed into the parser object for reference. As the parser reads in the file, information is loaded into the DataSet object. Once the end of a data set flat file entry is reached, the DataSet Object is told to save itself, by passing itself into the SQLOutput object. The SQLOutput object distributes the information to the specified destination and the DataSet object resets itself and a new data set flat file entry is loaded. The loading and saving of data set flat file entries continues until the end of the file is reached.

Class Layout:

warehouse

DBLoader and MainWindow

DBLoader contains the main method to run the GUI. It also contains constants for the program. MainWindow runs the interface and interprets the users commands.

DSLoader

DSLoader contain a main method also. This method will not show the GUI, instead more information is past in through the command line arguments.

warehouse.schema

SchemaFactory

Factory class for obtaining any schema object.

ObjectTable / ObjectTableImpl

The ObjectTable files are for storing any piece of information that can be present in all schema objects. Currently, that information includes Source information, Entry information, data set specific identification numbers (DBID), terms, synonyms, comments, citations, and cross-references. Since each Impl class in the schema extends the ObjectTableImpl class, methods for storing and retrieving the listed pieces of information are already taken care of. The power and ease of the ObjectTable files can be seen when creating a new class for the schema. (See "Adding a new Object" file for more information.)

SourceInformation / SourceInformationImpl

The SourceInformation files contain the information that will be loaded into the DataSet table in the database. Only one SourceInformation object is created while loading a data set.

Chemical / ChemicalImpl

An effective base class for copying code needed in other schema classes. ChemicalImpl.java contains examples of get/set routines, and other method used through out the loader.

Description of important methods in an object class.

getSQLInsert( Stirng parentName )

This method will return a string containing an sql command to insert all information specific to the object class. Only the information that goes directly into the objects table is used here. The returned string should be identical to an sql line used at the sqlplus command line.

toLoader()

This method will return a string that can be placed in an Oracle loader control file. The order of the information is dependent on the order specified in the object class�s get(Object�s name)LoaderSetUp method found in warehouse/dboutput/DataLoader.java. These pieces of information are comma delimited and large strings should be included in double quotes.

warehouse.parser

ParserFactory

Generates new BioSpiceLexer and BioSpiceParser objects based on the name of the grammar passed in from the command line.

ParserControl / ParserControlImpl

These class files take care of running the parser and coordinating where the output goes.

Report / ReportImpl

These class file take care of generating an error report after the loader has finished.

warehouse.translator

TranslatorFactory

Generates a new DataSet object for storage of a database flat file entry.

DataSet / DataSetImpl

These class files are used to store a single dataset flat file entry while parsing. The DataSet object is used to reference schema objects. It is also where more complex operations are done. Example: Parsing of a full reaction. (See the setReaction method for more details.) Once the end of the dataset flat file entry is reached, the DataSet object is past on to SQLOutput to save the data.

warehouse.dboutput

OutputFactory

Generates a new SQLOutput object for distributing the data set information.

SQLOutput / SQLOutputImpl

A single output class that handles sending sql commands to the database, sending sql commands to a file or writes input lines for the Oracle loader control files. The destination of the information is determined by boolean flags set in the ParserControl Object. The complex part of this method is the saving of information to make linking tables and look-up tables. Hashtables are the most common means of storage, but vectors are used when there is a chance of collision. The doTables method asks the DataSet object for each list of schema objects, then doTables looks through each list to save object table information. Next, doTables checks with the object for information about linking tables and look-up tables. After all lists have been checked, doTables exports the necessary information to fill linking tables and look-up tables.

DataLoader

Public class containing only static methods. The methods in this class return the header information for the Oracle loader control files. (See the doTables method in SQLOutputImpl.java.)

warehouse.dbaccess

dbInquirer

A public class that contains all contact methods with the database. Static methods allow queries to be performed from any class. Connection information is loaded from the BioSPICE.properties file.

warehouse.error

Contain some exceptions used through out the loader

warehouse.fileloader

Contain classes to allow the user to choose file from an interface window.