public class GapStatistic extends Object implements Clustering
Clustering
implementation that iteratively computes the
k-means clustering of a data set and compares it to a random sample of
reference data points. This will recompute k-means with incresing values of
k until the difference between the original data set and the reference data
sets begins to decline. Clustering will stop at the first k value where this
difference is less than the previous difference. This clustering method is
an implementation of the method specified in the following paper:
Modifier and Type | Field and Description |
---|---|
static String |
METHOD_PROPERTY |
static String |
NUM_CLUSTERS_START
The number of clusters to start clustering at.
|
static String |
NUM_REFERENCE_DATA_SETS
The number of reference data sets to use.
|
static String |
PROPERTY_PREFIX
A property prefix used for properties.
|
Constructor and Description |
---|
GapStatistic() |
Modifier and Type | Method and Description |
---|---|
Assignments |
cluster(Matrix m,
int maxClusters,
Properties props)
Clusters the set of rows in the given
Matrix into the specified
number of clusters. |
Assignments |
cluster(Matrix matrix,
Properties props)
Clusters the set of rows in the given
Matrix without a specified
number of clusters (optional operation). |
protected void |
verbose(String msg) |
protected void |
verbose(String format,
Object... args) |
public static final String PROPERTY_PREFIX
public static final String NUM_CLUSTERS_START
public static final String NUM_REFERENCE_DATA_SETS
public static final String METHOD_PROPERTY
public Assignments cluster(Matrix matrix, Properties props)
Matrix
without a specified
number of clusters (optional operation). The set of cluster assignments
are returned for each row in the matrix.cluster
in interface Clustering
matrix
- the Matrix
whose row data points are to be
clusteredprops
- the properties to use for any parameters each clustering
algorithm may needAssignment
instances that indicate zero or
more clusters to which each row belongs.public Assignments cluster(Matrix m, int maxClusters, Properties props)
Matrix
into the specified
number of clusters. The set of cluster assignments are returned for each
row in the matrix.
Iteratively computes the k-means clustering of the dataset m
using the the Gap Statistic .cluster
in interface Clustering
m
- the Matrix
whose row data points are to be
clusteredmaxClusters
- the number of clusters to generateprops
- the properties to use for any parameters each clustering
algorithm may needAssignment
instances that indicate zero or
more clusters to which each row belongs.protected void verbose(String msg)
Copyright © 2012. All Rights Reserved.