public class ClusteringByCommittee extends Object implements Clustering
This class offers five parameters for configuring how the clustering occurs
"edu.ucla.sspace.clustering.ClusteringByCommittee.averageLinkMergeThreshold"
double
threshold where clusters whose
the average-link similarity falls below the value will not be merged
(i.e. stay two clusters).
"edu.ucla.sspace.clustering.ClusteringByCommittee.maxCommitteeSimilarity"
double
.
"edu.ucla.sspace.clustering.ClusteringByCommittee.residueSimilarityThreshold"
double
.
"edu.ucla.sspace.clustering.ClusteringByCommittee.useHardClustering"
true
"edu.ucla.sspace.clustering.ClusteringByCommittee.softClusteringThreshold"
double
the threshold used during soft clustering where a point
will not be labeled with the committees who are more similar than this
value. If hard clustering is enabled the value of this property has no
effect. See Phrase III of the CBC algorithm for more details.
This class is thread-safe.
Modifier and Type | Field and Description |
---|---|
static String |
AVERGAGE_LINK_MERGE_THRESHOLD_PROPERTY
The property to specify during the Phase II.1 when to stop the
agglomerative clustering of the nearest neighbors.
|
static String |
COMMITTEE_SIMILARITY_THRESHOLD_PROPERTY
The property to specify during Phase II.3 what is the maximum similarity
between two committees above which a new committee will not be included.
|
static String |
DEFAULT_AVERGAGE_LINK_MERGE_THRESHOLD
The default value of the "edu.ucla.sspace.clustering.ClusteringByCommittee.averageLinkMergeThreshold"
property.
|
static String |
DEFAULT_COMMITTEE_SIMILARITY_THRESHOLD
The default value of the "edu.ucla.sspace.clustering.ClusteringByCommittee.maxCommitteeSimilarity" property.
|
static String |
DEFAULT_RESIDUE_SIMILARITY_THRESHOLD
The default value of the "edu.ucla.sspace.clustering.ClusteringByCommittee.residueSimilarityThreshold"
property.
|
static String |
DEFAULT_SOFT_CLUSTERING_SIMILARITY_THRESHOLD
The default value of the "edu.ucla.sspace.clustering.ClusteringByCommittee.softClusteringThreshold" property.
|
static String |
HARD_CLUSTERING_PROPERTY
Specifies whether CBC should use a hard (single class) or soft
(multi-class) cluster labeling.
|
static String |
RESIDUE_SIMILARITY_THRESHOLD_PROPERTY
The property for specifying the similarity threshold in Phase II.5 where
if an element has a similarity less than this threshold to all existing
committees, the element is marked as "residue" and recursively clustered.
|
static String |
SOFT_CLUSTERING_SIMILARITY_THRESHOLD_PROPERTY
The property for specifying a
double the threshold used during
soft clustering where a point will not be labeled with the committees who
are more similar than this value. |
Constructor and Description |
---|
ClusteringByCommittee()
Creates a new
ClusteringByCommittee instance |
Modifier and Type | Method and Description |
---|---|
static List<edu.ucla.sspace.clustering.ClusteringByCommittee.CandidateCommittee> |
buildCommitteesForRow(Collection<Integer> rows,
SparseMatrix sm,
double avgLinkMergeThresh)
Builds a set of candidate committees from the clusters formed by the
average-link clustering of the provided rows.
|
Assignments |
cluster(Matrix m,
int numClusters,
Properties props)
Ignores the provided number of clusters and clusters the rows of
the provided matrix using the CBC algorithm.
|
Assignments |
cluster(Matrix m,
Properties props)
Clusters the rows of
m according to the CBC algorithm, using
props to specify the configurable parameters of the algorithm. |
public static final String AVERGAGE_LINK_MERGE_THRESHOLD_PROPERTY
double
threshold where clusters whose the
average-link similarity falls below the value will not be merged
(i.e. stay two clusters).public static final String DEFAULT_AVERGAGE_LINK_MERGE_THRESHOLD
public static final String COMMITTEE_SIMILARITY_THRESHOLD_PROPERTY
double
.public static final String DEFAULT_COMMITTEE_SIMILARITY_THRESHOLD
public static final String RESIDUE_SIMILARITY_THRESHOLD_PROPERTY
double
.public static final String DEFAULT_RESIDUE_SIMILARITY_THRESHOLD
public static final String SOFT_CLUSTERING_SIMILARITY_THRESHOLD_PROPERTY
double
the threshold used during
soft clustering where a point will not be labeled with the committees who
are more similar than this value. See Phrase III of the CBC algorithm
for more details.public static final String DEFAULT_SOFT_CLUSTERING_SIMILARITY_THRESHOLD
public static final String HARD_CLUSTERING_PROPERTY
public ClusteringByCommittee()
ClusteringByCommittee
instancepublic Assignments cluster(Matrix m, int numClusters, Properties props)
cluster(Matrix,Properties)
without specifying the
number of clusters.cluster
in interface Clustering
m
- the Matrix
whose row data points are to be
clusterednumClusters
- the number of clusters to generateprops
- the properties to use for any parameters each clustering
algorithm may needAssignment
instances that indicate zero or
more clusters to which each row belongs.IllegalArgumentException
- if m
is not an instance of
SparseMatrix
.public Assignments cluster(Matrix m, Properties props)
m
according to the CBC algorithm, using
props
to specify the configurable parameters of the algorithm.cluster
in interface Clustering
m
- the Matrix
whose row data points are to be
clusteredprops
- the properties to use for any parameters each clustering
algorithm may needAssignment
instances that indicate zero or
more clusters to which each row belongs.IllegalArgumentException
- if m
is not an instance of
SparseMatrix
.public static List<edu.ucla.sspace.clustering.ClusteringByCommittee.CandidateCommittee> buildCommitteesForRow(Collection<Integer> rows, SparseMatrix sm, double avgLinkMergeThresh)
avgLinkMergThresh
- the parameter used by HAC to determine when to
stop merging clusters on the basis of their dissimilarityCopyright © 2012. All Rights Reserved.