public class FastStreamingKMeans extends Object
Modifier and Type | Field and Description |
---|---|
static String |
BETA_PROPERTY
The property for specifying the beta value which determines increases to
the facilities cost for creating a new facility, see page 6 in the paper
for details.
|
static double |
DEFAULT_BETA
The default value of beta that appeared to work best according ot Michael
Shindler
|
static SimilarityFunction |
DEFAULT_SIMILARITY_FUNCTION
The default similarity function is in the inverse of the square of the
Euclidean distances, which preserves all the properties specified in the
Shindler et al (2011) paper.
|
static String |
KAPPA_PROPERTY
The property for specifying kappa, the maximum number of facilities.
|
static String |
SIMILARITY_FUNCTION_PROPERTY
The property for specifying the similarity function with which to compare
data points
|
Constructor and Description |
---|
FastStreamingKMeans() |
Modifier and Type | Method and Description |
---|---|
Assignments |
cluster(Matrix matrix,
int numClusters,
int kappa,
double beta,
SimilarityFunction simFunc)
Clusters the rows of the provided matrix into the specified number of
clusters in a single pass using the parameters to guide how clusters are
formed.
|
Assignments |
cluster(Matrix matrix,
int numClusters,
Properties props)
Clusters the set of rows in the given
Matrix into the specified
number of clusters and using the default values for beta, kappa, and the
SimilarityFunction , unless otherwise specified in the properties. |
Assignments |
cluster(Matrix matrix,
Properties props)
Throws an
UnsupportedOperationException if called. |
public static final String BETA_PROPERTY
public static final String KAPPA_PROPERTY
k * log(num data points)
.public static final String SIMILARITY_FUNCTION_PROPERTY
public static final double DEFAULT_BETA
public static final SimilarityFunction DEFAULT_SIMILARITY_FUNCTION
public Assignments cluster(Matrix matrix, Properties props)
UnsupportedOperationException
if called.public Assignments cluster(Matrix matrix, int numClusters, Properties props)
Matrix
into the specified
number of clusters and using the default values for beta, kappa, and the
SimilarityFunction
, unless otherwise specified in the properties.matrix
- numClusters
- props
- public Assignments cluster(Matrix matrix, int numClusters, int kappa, double beta, SimilarityFunction simFunc)
numClusters
may be returned.matrix
- the matrix whose rows are to be clusterednumClusters
- the number of clusters to be returned. Note that
under some circumstances, the algorithm may return fewer clusters
than this amount.kappa
- the maximum number of facilities (clusters) to keep in
memory at any given point. At most this should be numClusters * Math.log(matrix.rows())
beta
- the initial cost for creating a new facility. The default
value of is recommended for this
parameter, unless specific customization is required.simFunc
- the similarity function used to compare rows of the
matrix. In the original paper, this is the inverse of square of
the Euclidean distance.Copyright © 2012. All Rights Reserved.