FilterKit Documentation

written by Ram Nathaniel <ramn@math.tau.ac.il>

Table of contents:

  • What is FilterKit
  • About this document
  • The Filter Architecture
  • Using the filters in a script
  • File Format
  • Classes
  • Skeleton
  • What is FilterKit
    FilterKit is a software architecture combined with a set of utilities and classes built over the GAMB++ library, which provides a framework for geometric algorithms in molecular biology. The main idea behind the design of the architecture is to seperate the algorithm into many phases, each phase resides in a different process, and the processes are not order dependent. Motivation for this approach is the need for different filters and the fact we can never be sure what should the order of these filters be.

    Advantages of working with the FilterKit:

    Disadvantages of working with the FilterKit: The Filters are built so that one can use a filter as a deamon which lies in the background and gives out services to the main program communicating by FIFOs (named pipelines). This allow the algorithm to run a few things in parallel (See Using the filters in a script section for more details).
    About this document
    This document was ment to help the composer of a new application in the GAMBA group in blending in with the filter architecture. It is also intended for the smart user of such applications who wants to take advantage of the flexibility of the architecture without recompiling the modules (See Using the filters in a script section).
    The Filter Architecture
    The Filter Architecture is a software architecture designed to enhance flexibility of the developnent process of new applications in the GAMBA group. This architecture is mainly in the form of a giant pipeline in which the data travels through the stages of the geometric hashing algorithm, and then goes through a number of filters designed to get rid of some of the false positives the algorithm introduced.
    The filters can also be used as semi-demon programs which stays alive until the main program sends a TERNIMATE signal through stdin. For more details see the Using the filters in a script section.
    In the skeleton section one can find a basic empty filter which demonstates the usage of the architecture.
    Using the filters in a script
    The main advantage of the filter architecture is the ability of using the filters in a shell script. In such cases the user can call the different filters in any order he or she may please. All filters can be introduced at any point in the algorithm since the file format of the input is the same as the file format of the output. For example if a user has 4 different filters : A, B, C, D a script which uses them might be:
    Matching_Program | A | B | C | D > output_file
    or:
    Matching_Program | B | A | A | C | D | B > output_file
    A sophisticated user might want to use one of the fileters as a parralel task which does part of the work. For example if we want to preform clustering several times during our algorithm we can invoke the clustering filter to stay in the background and wait, so that every time we want to use it we can send a collection of transformations and get the clustered set.
    The shell script would look something like :
    mknod instream
    mknod outstream
    Clustering_fileter < instream > outstream
    Main_Program < outstream > instream
    File Format
    The files which are in use through the pipeline are made of the following format:
    Notations : here all fields marked with $ are mandatory and all other fields are optional.
    The bold typing marks the keywords that appear in the beginning of each line indicating the type of the line.
    The first part is the header part which will be copied to the output file as is - this part may contain all information necessary in order to make it easier on the user to understand the context in which this file was created. (i.e. date, name, molecules ...)
    HEADER (free text)

    The params part specifies the structure of the TRANS record: in here will appear all fields which are in use and are not mandatory, such as MolName PartNumber etc. and whether there are DETAIL records or not.
    PARAMS (indication which optional fields will be used)

    A basic record containing information about the transformation.
    TRANS ($ Result Number) ($ Result ID) ($ score1) (MoleculeName)        (PartNumber) ($ Rotx, roty, rotz, transx, transy, transz)
    ($ score1) (score2) (size)

    Optional detail records containing info about the actual Matching list  for this transformation. The first field - Result Number indicates the relevant transformation
    DETAIL ($ Result Number) ($ model index) ($ scene index) (dist)

    End this set of inputs - not necessary end the program after finishing to process this input. Another session may follow
    END

    End all. The program may terminate if it processed all before this.
    TERMINATE

    Skeleton
    The skeleton filter is an empty filter which provides a good start for writing a new filter. In this file you can see a typical usage of the FileReader and MyTrans classes. This filter can be used in a pipeline or as a deamon (see the Using the fileters in a script section)


    Compiled Filters