RMSDCluster

Template intended to provide a good quality clustering of transformations, using the distance between two transformations as the RMSD of the picture on a set of points. (the set of points is given by the user in the preProcessing stage). RMSDCluster provides two clustering algorithms - one is faster and less accurate, while the other is more thorough but is limited by the number of transformations it can handle. The decision of which algorithm to use is left to the user, who can determine with the useFastClustering method to use the faster algorithm. The user is also required to supply the hash cube size for the fast clustering. The larger the cube size the more accurate the algorithm is, but also there will be more transformations in each bin and so the algorithm will be slower. Both algorithms are iterative trying to join pairs of close transformatins using a weighted average heuristic. The difference is that the thorough algorithm calculates the entire distance matrix of all the transformations at once and then continuessly contracts the smallest edge that connects vertices that were not used in this iteration, while the faster algorithm uses a hash table and at first only clusters transformations which put the centroid of the molecule in close locations. Calculation of the distance between any two rigid transformation is O(n) in complexity, where n is the number of points, given in the pre-processing stage. template< class TransT>

[ GAMB | Source | Keywords | Summary | Ancestors | All Members | Descendants ]

Quick Index

AUTHORS
GOALS
USAGE

Class Summary

class RMSDCluster
{
public:
Edge() ;
class TransformationRecord ;
TransformationRecord(TransT& trans);
void join(TransformationRecord& record);
RMSDCluster();
void cluster(float maxDist, vector< TransT>& trans, vector< TransT>& output);
template< class ParticleT> void preProcess(const vector< ParticleT>& particles);
float dist2(const TransT& trans1, const TransT& trans2);
void useFastClustering(float binSize_);
protected:
int work(float maxDist, vector< TransformationRecord *>& records);
int fastWork(float maxDist, vector< TransformationRecord *>& records);
void clean(vector< TransformationRecord*>*& records);
}; // RMSDCluster

Back to the top of RMSDCluster


AUTHORS

Ram Nathaniel (ramn@math.tau.ac.il) Dina Duhovny (duhovka@math.tau.ac.il) Oranit Shem-Tov (ornit@math.tau.ac.il)

Copyright: SAMBA group, Tel-Aviv Univ. Israel, 2001

Back to the top of RMSDCluster


GOALS

The RMSDCluster class serves as an easy interface to a clustering paradigm partially similar to the one introduced by FLEXX

Complexity : preProcessing O(N) Overall complexity is O(N*M^2 * log(M)) (but a small constant ...) where: - N is the number of particles, givan in the pre-processing stage - M is the number of transformations to be clustered

There are actually two different clustering algorithms, one is thorough and one is faster but less accurate. Both algorithms are iterative trying to join pairs of close transformatins using a weighted average heuristic. The difference is that the thorough algorithm calculates the entire distance matrix of all the transformations at once while the faster algorithm divides the transformations first by the picture of the molecule centroid under the transformation and only then checks distances at each bin of separately. After the division is useless run the first algorithm.

The basic algorithm: while something was done create a graph from the transformation left in play. sort the edges (according to the distances). contract every edge smaller than maxDist starting from the shortest BUT ONLY USE EACH VERTEX ONCE.

The faster algorithm: calculate the picture of centroid of the set under each transformation and put in a GeomHash (the size of the cubeSize is given by the user) while something was done foreach bucket cluster the transformations in the bucket call the basic algorithm.

Notes: The distance between two transformations is calculated as the RMSD of the pictures by the two transformations on a set of points supplied by the user.

Back to the top of RMSDCluster


USAGE

The RMSDCluster is a template which uses a class parameter known as TransT. TransT should be a class, which implements the following methods: - void join(TransT& trans); - float getScore() const; - void incScore(float s); - const RigidTrans3& rigidTrans();

Usage of RMSDCluster is quite simple:

First you have to define the reference set. This is done by invoking the method void preProcess(vector<Particle>& particles); Of course sending a Molecule<Particle> object will also do.

Then invoke the Cluster method where the maxDist parameter indicates the maximal length of edge that will be constracted (i.e. the maxmimal distance allowed between two transformations in the same cluster)


  class Transformation {
     private:
        RigidTrans3 rt;
        float score;
     public{	 
        Transformation() {...}

        void join(Transformation& t) {...}
	  
        float getScore() const {
	   return score;
	}
    
	void incScore(float s) const {
	   score += s;
	}

	const RigidTrans3& rigidTrans() const {
	   return rigidTrans;
	}
    };

    vector<Transformation> trans;
    Molecule<Particle> mol;
  
    RMSDCluster<Transformation> clusterer();
    cluseterer.preProcess(mol);
    vector<Transformation>& result = cluseterer.Cluster(trans, 5.5);

For use of the faster algorithm just add: clusterer.useFastClustering(cubeSize); before invoking the Cluster method.

Back to the top of RMSDCluster


Edge() ;

The default constructor

    Edge() ;

Function is currently defined inline.


Back to the top of RMSDCluster


class TransformationRecord ;

Private class intended to hold the TransT objects throughout the clustering process.

  class TransformationRecord {
  public:

Back to the top of RMSDCluster


TransformationRecord(TransT& trans);

standard constructor.

    inline TransformationRecord(TransT& trans);

Function is currently defined inline.


Back to the top of RMSDCluster


void join(TransformationRecord& record);

Join the transfromations into this. record is unchanged.

    virtual void join(TransformationRecord& record);

Back to the top of RMSDCluster


RMSDCluster();

default constructor

  RMSDCluster();

Function is currently defined inline.


Back to the top of RMSDCluster


void cluster(float maxDist, vector< TransT>& trans, vector< TransT>& output);

The main function which should be called after the preProcessing is done

  void cluster(float maxDist, vector< TransT>& trans, vector< TransT>& output);

Back to the top of RMSDCluster


template< class ParticleT> void preProcess(const vector< ParticleT>& particles);

preProcessing of a set of points in order to calculate distances on them.

  template< class ParticleT>
  void preProcess(const vector< ParticleT>& particles);

Back to the top of RMSDCluster


float dist2(const TransT& trans1, const TransT& trans2);

Returns the RMSD of the preProcessed set of points under the two transformations.

  virtual float dist2(const TransT& trans1, const TransT& trans2);

Back to the top of RMSDCluster


void useFastClustering(float binSize_);

Sets the bin size on which the clustering will be done at the first stage. if not used the class will preform clustering on all bins at once.

  void useFastClustering(float binSize_);

Back to the top of RMSDCluster


int work(float maxDist, vector< TransformationRecord *>& records);

The main loop builds the graph of transformations a contract the edges.

  virtual int work(float maxDist, vector< TransformationRecord *>& records);

Back to the top of RMSDCluster


int fastWork(float maxDist, vector< TransformationRecord *>& records);

Just like work only handles each bin independently until there is nothing else to be done this way. Bins size is deremined by SetSpeed

  virtual int fastWork(float maxDist, vector< TransformationRecord *>& records);

Back to the top of RMSDCluster


void clean(vector< TransformationRecord*>*& records);

Remove transformations which are not valid. should be used after each invokation of work. virtual vector *clean(vector* records);

  virtual void clean(vector< TransformationRecord*>*& records);

Back to the top of RMSDCluster


All Members

public:
Edge() ;
class TransformationRecord ;
TransformationRecord(TransT& trans);
void join(TransformationRecord& record);
void cluster(float maxDist, vector< TransT>& trans, vector< TransT>& output);
template< class ParticleT> void preProcess(const vector< ParticleT>& particles);
float dist2(const TransT& trans1, const TransT& trans2);
void useFastClustering(float binSize_);
protected:
int work(float maxDist, vector< TransformationRecord *>& records);
int fastWork(float maxDist, vector< TransformationRecord *>& records);
void clean(vector< TransformationRecord*>*& records);

Back to the top of RMSDCluster


Ancestors

Class does not inherit from any other class.

Back to the top of RMSDCluster


Descendants

Class is not inherited by any others.

Back to the top of RMSDCluster


Generated from source by the Cocoon utilities on Sun Nov 15 13:35:26 2009 .

Report problems to jkotula@unimax.com