BioSPICE Warehouse Loader Handoff Readme


Directory Structure


Ant / Compiling


Logging


Running


General Outline of Loader Process.

The command line arguments and data base properties files are read in. The program connects to the database. A ParserControl object is create to run the parser. Lexer and parser objects are created and initialized. An SQLOutput object is create to output the parsed information to a specific destination. A DataSet object is created and passed the SQLOutput object as a parameter; the DataSet object is also passed into the parser object for reference. As the parser reads in the file, information is loaded into the DataSet object. Once the end of a data set flat file entry is reached, the DataSet Object is told to save itself, by passing itself into the SQLOutput object. The SQLOutput object distributes the information to the specified destination and the DataSet object resets itself and a new data set flat file entry is loaded. The loading and saving of data set flat file entries continues until the end of the file is reached.

Class Layout:

  1. warehouse
    1. DBLoader and MainWindow
      1. DBLoader contains the main method to run the GUI. It also contains constants for the program. MainWindow runs the interface and interprets the users commands.
    2. DSLoader
      1. DSLoader contain a main method also. This method will not show the GUI, instead more information is past in through the command line arguments.
  2. warehouse.schema
    1. SchemaFactory
      1. Factory class for obtaining any schema object.
    2. ObjectTable / ObjectTableImpl
      1. The ObjectTable files are for storing any piece of information that can be present in all schema objects. Currently, that information includes Source information, Entry information, data set specific identification numbers (DBID), terms, synonyms, comments, citations, and cross-references. Since each Impl class in the schema extends the ObjectTableImpl class, methods for storing and retrieving the listed pieces of information are already taken care of. The power and ease of the ObjectTable files can be seen when creating a new class for the schema. (See "Adding a new Object" file for more information.)
    3. SourceInformation / SourceInformationImpl
      1. The SourceInformation files contain the information that will be loaded into the DataSet table in the database. Only one SourceInformation object is created while loading a data set.
    4. Chemical / ChemicalImpl
      1. An effective base class for copying code needed in other schema classes. ChemicalImpl.java contains examples of get/set routines, and other method used through out the loader.
    5. Description of important methods in an object class.
      1. getSQLInsert( Stirng parentName )
        1. This method will return a string containing an sql command to insert all information specific to the object class. Only the information that goes directly into the objects table is used here. The returned string should be identical to an sql line used at the sqlplus command line.
      2. toLoader()
        1. This method will return a string that can be placed in an Oracle loader control file. The order of the information is dependent on the order specified in the object class�s get(Object�s name)LoaderSetUp method found in warehouse/dboutput/DataLoader.java. These pieces of information are comma delimited and large strings should be included in double quotes.
  3. warehouse.parser
    1. ParserFactory
      1. Generates new BioSpiceLexer and BioSpiceParser objects based on the name of the grammar passed in from the command line.
    2. ParserControl / ParserControlImpl
      1. These class files take care of running the parser and coordinating where the output goes.
    3. Report / ReportImpl
      1. These class file take care of generating an error report after the loader has finished.
  4. warehouse.translator
    1. TranslatorFactory
      1. Generates a new DataSet object for storage of a database flat file entry.
    2. DataSet / DataSetImpl
      1. These class files are used to store a single dataset flat file entry while parsing. The DataSet object is used to reference schema objects. It is also where more complex operations are done. Example: Parsing of a full reaction. (See the setReaction method for more details.) Once the end of the dataset flat file entry is reached, the DataSet object is past on to SQLOutput to save the data.
  5. warehouse.dboutput
    1. OutputFactory
      1. Generates a new SQLOutput object for distributing the data set information.
    2. SQLOutput / SQLOutputImpl
      1. A single output class that handles sending sql commands to the database, sending sql commands to a file or writes input lines for the Oracle loader control files. The destination of the information is determined by boolean flags set in the ParserControl Object. The complex part of this method is the saving of information to make linking tables and look-up tables. Hashtables are the most common means of storage, but vectors are used when there is a chance of collision. The doTables method asks the DataSet object for each list of schema objects, then doTables looks through each list to save object table information. Next, doTables checks with the object for information about linking tables and look-up tables. After all lists have been checked, doTables exports the necessary information to fill linking tables and look-up tables.
    3. DataLoader
      1. Public class containing only static methods. The methods in this class return the header information for the Oracle loader control files. (See the doTables method in SQLOutputImpl.java.)
  6. warehouse.dbaccess
    1. dbInquirer
      1. A public class that contains all contact methods with the database. Static methods allow queries to be performed from any class. Connection information is loaded from the BioSPICE.properties file.
  7. warehouse.error
    1. Contain some exceptions used through out the loader
  8. warehouse.fileloader
    1. Contain classes to allow the user to choose file from an interface window.