public class LinkClustering extends Object implements Serializable
cluster
method with a fixed number of
elements will still cluster the rows, but will ignore the requester number of
clusters.
Note that this class is not thread-safe. Each call to clustering
will cache local information about the clustering result to facilitate the
#getSolution(int)
and #getSolutionDensity(int)
functions.
This class provides one configurable property:
true
true
, this property specifies the
edge similarity matrix used by HierarchicalAgglomerativeClustering
should be computed once and then
kept in memory, which is the default behavior. If false
, this
causes the similarity of two edges to be recomputed on-the-fly whenever
it is requester. By computing these values on-the-fly, the performance
will be slowed down, depending on the complexity of the edge similarity
function. However, this on-the-fly setting allows for clustering large
graphs whose edge similarity matrix would not regularly fit into memory.
It is advised that users not tune this parameter unless it is known that
the similarity matrix will not fit in memory.
Modifier and Type | Field and Description |
---|---|
static String |
MINIMUM_EDGE_SIMILARITY_PROPERTY |
Constructor and Description |
---|
LinkClustering()
Instantiates a new
LinkClustering instance. |
Modifier and Type | Method and Description |
---|---|
<E extends Edge> |
cluster(Graph<E> graph,
int numClusters,
Properties props)
Computes the similarity of the graph's edges and merges them until the
specified number of clusters has been reached.
|
<E extends Edge> |
cluster(Graph<E> graph,
Properties props)
Computes the similarity of the graph's edges and merges them to select
the final partitioning that maximizes the overall cluster density.
|
protected double |
computeDensity(int numNodes,
int numEdges)
Computes the density of the provided partition of edges
|
protected <E extends Edge> |
getConnectionSimilarity(Graph<E> graph,
int keystone,
int impost1,
int impost2)
Computes the similarity of the two edges as the Jaccard index of the
neighbors of two impost nodes.
|
public static final String MINIMUM_EDGE_SIMILARITY_PROPERTY
public LinkClustering()
LinkClustering
instance.public <E extends Edge> MultiMap<Integer,Integer> cluster(Graph<E> graph, int numClusters, Properties props)
numClusters
- the number of clusters to returnpublic <E extends Edge> MultiMap<Integer,Integer> cluster(Graph<E> graph, Properties props)
protected double computeDensity(int numNodes, int numEdges)
protected <E extends Edge> double getConnectionSimilarity(Graph<E> graph, int keystone, int impost1, int impost2)
Implementation Note: Subclasses that wish to override this behavior should be aware that this method is likely to be called by multiple threads and therefor should make provisions to be thread safe. In addition, this method may be called more than once per edge pair if the similarity matrix is being computed on-the-fly.
sm
- a matrix containing the connections between edges. A non-zero
value in location (i,j) indicates a node i is connected to
node j by an edge.e1
- an edge to be compared with e2
e2
- an edge to be compared with e1
Copyright © 2012. All Rights Reserved.