public class HierarchicalAgglomerativeClustering extends Object implements Clustering
Clustering interface,
 which allows this method to be used in place of other clustering algorithms.
  In addition to clustering, this implementation also exposes the ability
 to view the iterative bottom-up merge through the buildDendogram methods.
 These methods return a series of Merge operations that can be used to
 construct a dendrogram
 and see the partial clustering at any point during the agglomerative merging
 process.  For example, to view the clustering solution after four steps, the
 following code might be used:
   Matrix matrix; 
   List<Merge> merges = buildDendogram(matrix, ...);
   List<Merge> fourMergeSteps = merges.subList(0, 4);
   MultiMap<Integer,Integer> clusterToRows =  new HashMultiMap<Integer,Integer>();
   for (int i = 0; i < matrix.rows(); ++i)
       clusterToElements.put(i, i);
   for (Merge m : fourMergeSteps) {
       clusterToElements.putMulti(m.remainingCluster(), 
           clusterToElements.remove(m.mergedCluster()));
   }
 
 The resulting MultiMap clusterToRows
 contains the mapping from each cluster to the rows that are a part of it.
 Implementation Note: The current version runs in O(n3) worst case time for the number of rows in the matrix. While O(n2 * log(n)) methods exist, these require storing similarity comparisons in a priority queue, which has a substantially higher memory overhead. Therefore, this implementation has opted for a more expensive running time in order to be able to process larger matrices.
 When using the Clustering.cluster(Matrix,Properties) interface,
 this class supports the following properties for controlling the clustering.
 
"edu.ucla.sspace.clustering.HierarchicalAgglomerativeClustering.clusterThreshold"
       "edu.ucla.sspace.clustering.HierarchicalAgglomerativeClustering.clusterLinkage"
       HierarchicalAgglomerativeClustering.ClusterLinkage to use when computing cluster similarity. 
 "edu.ucla.sspace.clustering.HierarchicalAgglomerativeClustering.simFunc"
       COSINE
 Similarity.SimType to use when computing the similarity of two data points. 
 "edu.ucla.sspace.clustering.HierarchicalAgglomerativeClustering.numClusters"
       | Modifier and Type | Class and Description | 
|---|---|
| static class  | HierarchicalAgglomerativeClustering.ClusterLinkageThe method to use when comparing the similarity of two clusters. | 
| Modifier and Type | Field and Description | 
|---|---|
| static String | CLUSTER_LINKAGE_PROPERTYThe property for specifying the cluster linkage to use. | 
| static String | DEFAULT_CLUSTER_LINKAGE_PROPERTYThe default linkage method to use. | 
| static String | MIN_CLUSTER_SIMILARITY_PROPERTYThe property for specifying the cluster similarity threshold. | 
| static String | NUM_CLUSTERS_PROPERTYThe property for specifying the similarity function to use. | 
| static String | PROPERTY_PREFIXA prefix for specifying properties. | 
| static String | SIMILARITY_FUNCTION_PROPERTYThe property for specifying the similarity function to use. | 
| Constructor and Description | 
|---|
| HierarchicalAgglomerativeClustering() | 
| Modifier and Type | Method and Description | 
|---|---|
| List<Merge> | buildDendogram(Matrix m,
              HierarchicalAgglomerativeClustering.ClusterLinkage linkage,
              Similarity.SimType similarityFunction)Builds a dendrogram of the rows of similarity matrix by iteratelyve
 linking each row according to the linkage policy in a bottom up manner. | 
| List<Merge> | buildDendrogram(Matrix similarityMatrix,
               HierarchicalAgglomerativeClustering.ClusterLinkage linkage)Builds a dendrogram of the rows of similarity matrix by iteratively
 linking each row according to the linkage policy in a bottom up manner. | 
| Assignments | cluster(Matrix m,
       int numClusters,
       Properties props)Clusters the set of rows in the given  Matrixinto the specified
 number of clusters. | 
| Assignments | cluster(Matrix matrix,
       Properties props)Clusters the set of rows in the given  Matrixwithout a specified
 number of clusters (optional operation). | 
| static int[] | clusterRows(Matrix m,
           double clusterSimilarityThreshold,
           HierarchicalAgglomerativeClustering.ClusterLinkage linkage,
           Similarity.SimType similarityFunction)Clusters all rows in the matrix using the specified cluster similarity
 measure for comparison and threshold for when to stop clustering. | 
| static int[] | partitionRows(Matrix m,
             int numClusters,
             HierarchicalAgglomerativeClustering.ClusterLinkage linkage,
             Similarity.SimType similarityFunction)Clusters all rows in the matrix using the specified cluster similarity
 measure for comparison and stopping when the number of clusters is equal
 to the specified number. | 
public static final String PROPERTY_PREFIX
public static final String MIN_CLUSTER_SIMILARITY_PROPERTY
public static final String CLUSTER_LINKAGE_PROPERTY
public static final String SIMILARITY_FUNCTION_PROPERTY
public static final String NUM_CLUSTERS_PROPERTY
public static final String DEFAULT_CLUSTER_LINKAGE_PROPERTY
public Assignments cluster(Matrix matrix, Properties props)
Matrix without a specified
 number of clusters (optional operation).  The set of cluster assignments
 are returned for each row in the matrix.cluster in interface Clusteringmatrix - the Matrix whose row data points are to be
        clusteredprops - the properties to use for any parameters each clustering
        algorithm may needAssignment instances that indicate zero or
         more clusters to which each row belongs.public Assignments cluster(Matrix m, int numClusters, Properties props)
Matrix into the specified
 number of clusters.  The set of cluster assignments are returned for each
 row in the matrix. The value of the numClusters parameter will
 override the "edu.ucla.sspace.clustering.HierarchicalAgglomerativeClustering.numClusters" if it was specified.cluster in interface Clusteringm - the Matrix whose row data points are to be
        clusterednumClusters - the number of clusters to generateprops - the properties to use for any parameters each clustering
        algorithm may needAssignment instances that indicate zero or
         more clusters to which each row belongs.public static int[] partitionRows(Matrix m, int numClusters, HierarchicalAgglomerativeClustering.ClusterLinkage linkage, Similarity.SimType similarityFunction)
m - a matrix whose rows are to be clusterednumClusters - the number of clusters into which the matrix should
        dividedlinkage - the method to use for computing the similarity of two
        clusterspublic static int[] clusterRows(Matrix m, double clusterSimilarityThreshold, HierarchicalAgglomerativeClustering.ClusterLinkage linkage, Similarity.SimType similarityFunction)
m - a matrix whose rows are to be clusteredclusterSimilarityThreshold - the threshold to use when deciding
        whether two clusters should be merged.  If the similarity of the
        clusters is below this threshold, they will not be merged and the
        clustering process will be stopped.linkage - the method to use for computing the similarity of two
        clusterspublic List<Merge> buildDendogram(Matrix m, HierarchicalAgglomerativeClustering.ClusterLinkage linkage, Similarity.SimType similarityFunction)
m can be determined.  For example, to
 find the partitioning after 4 merge operations, one might do the
 following:
Matrix matrix; ListThe resultingmerges = buildDendogram(matrix, ...); List fourMergeSteps = merges.subList(0, 4); MultiMap clusterToRows = new HashMultiMap (); for (int i = 0; i < matrix.rows(); ++i) clusterToElements.put(i, i); for (Merge m : fourMergeSteps) { clusterToElements.putMulti(m.remainingCluster(), clusterToElements.remove(m.mergedCluster())); } 
MultiMap clusterToRows
 contains the mapping from each cluster to the rows that are a part of it.m - a matrix whose rows are to be compared and agglomeratively
        merged into clusterslinkage - how two clusters should be compared for similarity when
        deciding which clusters to merge togethersimilarityFunction - how to compare two rows of a matrix for
        similaritypublic List<Merge> buildDendrogram(Matrix similarityMatrix, HierarchicalAgglomerativeClustering.ClusterLinkage linkage)
similarityMatrix - a square matrix whose (i, j) values denote the
        similarity of row i to row j.IllegalArgumentException - if similarityMatrix is not a
         square matrixCopyright © 2012. All Rights Reserved.