public class LinkClustering extends Object implements Clustering, Serializable
cluster
method with a fixed number of
elements will still cluster the rows, but will ignore the requester number of
clusters.
Note that this class is not thread-safe. Each call to clustering
will cache local information about the clustering result to facilitate the
getSolution(int)
and getSolutionDensity(int)
functions.
This class provides one configurable property:
"edu.ucla.sspace.clustering.LinkClustering.keepSimilarityMatrixInMemory"
true
true
, this property specifies the
edge similarity matrix used by HierarchicalAgglomerativeClustering
should be computed once and then
kept in memory, which is the default behavior. If false
, this
causes the similarity of two edges to be recomputed on-the-fly whenever
it is requester. By computing these values on-the-fly, the performance
will be slowed down, depending on the complexity of the edge similarity
function. However, this on-the-fly setting allows for clustering large
graphs whose edge similarity matrix would not regularly fit into memory.
It is advised that users not tune this parameter unless it is known that
the similarity matrix will not fit in memory.
Modifier and Type | Class and Description |
---|---|
protected static class |
LinkClustering.Edge
A utility data structure for representing a directed edge between two
ordinally labeled nodes.
|
Modifier and Type | Field and Description |
---|---|
static String |
KEEP_SIMILARITY_MATRIX_IN_MEMORY_PROPERTY
The property to specify if the edge similarity matrix should be kept in
memory during clustering, or if its values should be computed on the fly.
|
static String |
PROPERTY_PREFIX
A prefix for specifying properties.
|
Constructor and Description |
---|
LinkClustering()
Instantiates a new
LinkClustering instance. |
Modifier and Type | Method and Description |
---|---|
Assignments |
cluster(Matrix matrix,
int numClusters,
Properties props)
Ignores the specified number of clusters and returns the
clustering solution according to the partition density.
|
Assignments |
cluster(Matrix matrix,
Properties props)
Clusters the set of rows in the given
Matrix without a specified
number of clusters (optional operation). |
protected double |
getEdgeSimilarity(SparseMatrix sm,
LinkClustering.Edge e1,
LinkClustering.Edge e2)
Computes the similarity of the two edges as the Jaccard index of the
neighbors of two impost nodes.
|
Assignments |
getSolution(int solutionNum)
Returns the clustering solution after the specified number of merge
steps.
|
double |
getSolutionDensity(int solutionNum)
Returns the partition density of the clustering solution.
|
int |
numberOfSolutions()
Returns the number of clustering solutions found by this instances for
the prior clustering run.
|
public static final String PROPERTY_PREFIX
public static final String KEEP_SIMILARITY_MATRIX_IN_MEMORY_PROPERTY
public LinkClustering()
LinkClustering
instance.public Assignments cluster(Matrix matrix, int numClusters, Properties props)
cluster
in interface Clustering
numClusters
- this parameter is ignored.matrix
- the Matrix
whose row data points are to be
clusteredprops
- the properties to use for any parameters each clustering
algorithm may needAssignment
instances that indicate zero or
more clusters to which each row belongs.IllegalArgumentException
- if matrix
is not square, or is
not an instance of SparseMatrix
public Assignments cluster(Matrix matrix, Properties props)
Matrix
without a specified
number of clusters (optional operation). The set of cluster assignments
are returned for each row in the matrix.cluster
in interface Clustering
matrix
- the Matrix
whose row data points are to be
clusteredprops
- the properties to use for any parameters each clustering
algorithm may needAssignment
instances that indicate zero or
more clusters to which each row belongs.IllegalArgumentException
- if matrix
is not square, or is
not an instance of SparseMatrix
protected double getEdgeSimilarity(SparseMatrix sm, LinkClustering.Edge e1, LinkClustering.Edge e2)
Implementation Note: Subclasses that wish to override this behavior should be aware that this method is likely to be called by multiple threads and therefor should make provisions to be thread safe. In addition, this method may be called more than once per edge pair if the similarity matrix is being computed on-the-fly.
sm
- a matrix containing the connections between edges. A non-zero
value in location (i,j) indicates a node i is connected to
node j by an edge.e1
- an edge to be compared with e2
e2
- an edge to be compared with e1
public double getSolutionDensity(int solutionNum)
public Assignments getSolution(int solutionNum)
solutionNum
- the number of merge steps to take prior to returning
the clustering solution.IllegalArgumentException
- if solutionNum
is less than 0 or
is greater than or equal to numberOfSolutions()
.IllegalStateException
- if this instance has not yet finished a
clustering solution.public int numberOfSolutions()
Copyright © 2012. All Rights Reserved.