ClusterList

Class template defines a framework for performing clustering operation. Clustering is performed by adding a list of potentially neighboring data elements and then calling the cluster method.

[ GAMB | Source | Keywords | Summary | Ancestors | All Members | Descendants ]

Quick Index

AUTHORS
CHANGES LOG
GOALS
USAGE

Class Summary

class ClusterList

{
public:
ClusterList();
void addNeighbors(const Data& data1, const Data& data2);
void addNeighbors(const Data& data1, const Data& data2, float dist);
iterator addCluster(const Data& data);
iterator addCluster(const Data& data, const Cluster& cluster);
void cluster(float maxDist);
void fastCluster(float maxDist);
const_iterator getCluster(const Data& data) const;
iterator getCluster(const Data& data);
// Container Methods
const_iterator begin() const;
iterator begin();
const_iterator end() const;
iterator end();
size_type size() const;
protected:
}; // ClusterList

Back to the top of ClusterList


AUTHORS

Meir Fuchs (meirfux@math.tau.ac.il).

Copyright: SAMBA group, Tel-Aviv Univ. Israel, 1997.

Back to the top of ClusterList


CHANGES LOG

Back to the top of ClusterList


GOALS

The ClusterList class template serves as a framework for clustering algorithms. It was designed to allow the user to focus on distance functions and cluster types instead of algorithm implementation and code mechanics. The class is used by first adding a list of potential neighbors and then calling the cluster method. The cluster methods works by sorting the neighbor list by distance and joining potential neighbors into a single cluster if they're clusters are not too far apart.

Back to the top of ClusterList


USAGE

The ClusterList template is defined using a Data class describing the elements being clustered and a Cluster class describing the class of element clusters. Several methods must be defined within the Cluster class if it is to be used as a parameter to the ClusterList class template a. A dist method returning the distance between two Cluster instances.

    float dist(const Cluster&);
b. A join method capable of joining two clusters. The join method receives a single argument of type Cluster and should join the given Cluster object with the parent object (this).

    void join(const Cluster&);
c. The updates() method. Under normal circumstances the updates() method should return the number of times the method join was called on the given class instance. The updates method is used as a signature for the Cluster instance whether it has been updated. If two Cluster instances have not been updated then the distance between them need not be recalculated. The updates function may be used to minimize the calls to the dist method when calculating the distance between two Cluster objects is time consuming. For example if the user has the updates() method return a constant 0, the distances between the Cluster objects will never be calculated and the distance between the Cluster elements will be used. (This is good for nearest neighbor clustering).

    unsigned int updates() const;
d. An explicit constructor that constructs a Cluster object from a Data object. This constructor is needed only if the single argument addCluster method will be used.

    explicit Cluster(const Data&);

For Data elements a function object must be defined. The function object template hash<> is used by the STL hash implementation. Therefore hash<> must be specialized for Data i.e. hash with operator() must be defined. It is recommended that the Data elements be light-weight. For example, if user wishes to cluster Vector3 objects then he should define the Data class parameter to be Vector3* and use pointers.

The following is an example of a class named Transformation used to cluster 3-dimensional rigid transformations using 3 rotational parameter and 3 translational parameters. The transformations are clustered (joined) using a weighted average. The Data elements are defined to be Transformation*.

    class Transformation
    {
    public:
      Transformation(const ReferenceFrame& rt, const float transScore)
        : trans(rt), scr(transScore), joins(0)  {}

      float dist(const Transformation& t)  const  {
        return (trans.rotation().dist(t.trans.rotation()) +
                trans.translation().dist(t.trans.translation())/5);
      }
      void join(const Transformation& t)  {
        float sum = scr + t.scr;
        const Rotation3& rot = trans.rotation();
        const Rotation3& trot = t.trans.rotation();
        trans = ReferenceFrame(Rotation3(rot[0]*scr + trot[0]*t.scr,
                                         rot[1]*scr + trot[1]*t.scr,
                                         rot[2]*scr + trot[2]*t.scr,
                                         rot[3]*scr + trot[3]*t.scr),
                               (trans.translation()*scr + 
                                t.trans.translation()*t.scr) / sum);
        scr = sum;
        ++joins;
      }

      unsigned int updates() const          { return joins; }

    private:
      ReferenceFrame trans;
      float scr;
      int joins;
    };

For the Transformation class a hash function object is defined as follows:

  __STL_TEMPLATE_NULL struct hash<const Transformation*> {
    size_t operator()(const Transformation* const t) const { 
      return ((size_t)t/sizeof(Transformation));
    }
  };

To use the ClusterList class, the user adds a list of possible neighboring elements and calls the cluster or the fastCluster methods. At any point the user may use ClusterList as an STL-like container to inspect what clusters are present.

The clustering algorithms used by the ClusterList keep the neighbor pairs sorted by their distance. When cluster or fastCluster are called, neighbor pairs with shorter distances are accessed first. Each such pair is a associated with a Cluster. If the distance between the two Cluster objects does not exceed a given threshold they are joined. In the fastCluster version of the algorithm neighbors are processed in the order in which they are found. In the cluster version, neighbor pairs may be repositioned in the distance sorted list if the clusters are changed via previous join operations. Both versions are quite efficient although putting an upper bound on the complexity of cluster method seems difficult.

A function that uses Transformation class to cluster 3-D rigid transformations may look something like the following:

    typedef ClusterList<int, Transformation> TransCluster;
    TransCluster clusterList;
    for (int i=0; i<count; ++i)
      clusterList.addCluster(i, transformation[i]);
    for (int i=0; i<count; ++i)
      for (int j=i+1; j<count; ++j)
        if (transformation[i].dist(transformation[j]) < maxDist*2)
          clusterList.addNeighbors(i, j);

    clusterList.cluster(maxDist);

    for (TransCluster::iterator it = clusterList.begin();
         it != clusterList.end();
         ++it) {
      cout << (*it);
This example assumes the existence of an array of Transformation objects. Also a maxDist parameter sets an upper-bound on the inter-cluster distance for clusters that may be joined. It is assumed that operator<< was defined.

Back to the top of ClusterList


ClusterList();

Default and only constructor. Initializes an empty ClusterList object.

  ClusterList();

Back to the top of ClusterList


void addNeighbors(const Data& data1, const Data& data2);

Specify that two Data elements are neighbors. A new cluster is added for each data element that has not been used before. The distance between the two data elements is calculated using the dist() method of the two single element clusters.

  void addNeighbors(const Data& data1, const Data& data2);

Back to the top of ClusterList


void addNeighbors(const Data& data1, const Data& data2, float dist);

Specify that two Data elements are neighbors and the distance between them. A new cluster is added for each data element that has not been used before.

  void addNeighbors(const Data& data1, const Data& data2, float dist);

Back to the top of ClusterList


iterator addCluster(const Data& data);

Adds a single element cluster with Data element data in it. Returns an iterator within the ClusterList container to the new or existing cluster.

  iterator addCluster(const Data& data);

Back to the top of ClusterList


iterator addCluster(const Data& data, const Cluster& cluster);

Adds a single element cluster assuming that it contains a single Data element - data, within it. Returns an iterator within the ClusterList container to the new or existing cluster.

  iterator addCluster(const Data& data, const Cluster& cluster);

Back to the top of ClusterList


void cluster(float maxDist);

The slower but more precise clustering algorithm. This version of the algorithm makes sure that join is called for neighbor pairs only if their corresponding Cluster objects are the closest ones. An exact order of join calls is kept even when clusters are updated.

  void cluster(float maxDist);

Back to the top of ClusterList


void fastCluster(float maxDist);

fastCluster is the faster but less precise version of the clustering algorithm. Neighbor pairs are processed in the order they are found.

  void fastCluster(float maxDist);

Back to the top of ClusterList


const_iterator getCluster(const Data& data) const;

Returns the cluster to which Data element data belongs.

  const_iterator getCluster(const Data& data) const;

Back to the top of ClusterList


iterator getCluster(const Data& data);

Returns the cluster to which Data element data belongs.

  iterator getCluster(const Data& data);

Back to the top of ClusterList


const_iterator begin() const;

Returns iterator to first Cluster in ClusterList. May be called before or after cluster or fastCluster are called.

  const_iterator begin() const;

Back to the top of ClusterList


iterator begin();

Returns iterator to first Cluster in ClusterList. May be called before or after cluster or fastCluster are called.

  iterator begin();

Back to the top of ClusterList


const_iterator end() const;

Returns iterator to end of ClusterList container. May be called before or after cluster or fastCluster are called.

  const_iterator end() const;

Back to the top of ClusterList


iterator end();

Returns iterator to end of ClusterList container. May be called before or after cluster or fastCluster are called.

  iterator end();

Back to the top of ClusterList


size_type size() const;

Returns the number of Cluster instances in the ClusterList container. May be called before or after cluster or fastCluster are called. Note: size is calculated by iterating over all elements in list.

  size_type size() const;

Back to the top of ClusterList


All Members

public:
void addNeighbors(const Data& data1, const Data& data2);
void addNeighbors(const Data& data1, const Data& data2, float dist);
iterator addCluster(const Data& data);
iterator addCluster(const Data& data, const Cluster& cluster);
void cluster(float maxDist);
void fastCluster(float maxDist);
const_iterator getCluster(const Data& data) const;
iterator getCluster(const Data& data);
// Container Methods
const_iterator begin() const;
iterator begin();
const_iterator end() const;
iterator end();
size_type size() const;
protected:

Back to the top of ClusterList


Ancestors

Class does not inherit from any other class.

Back to the top of ClusterList


Descendants

Back to the top of ClusterList


Generated from source by the Cocoon utilities on Sun Nov 15 13:35:25 2009 .

Report problems to jkotula@unimax.com