public abstract class BaseFunction extends Object implements CriterionFunction
CriterionFunction
implements the basic functionality needed for
a majority of the functions available. It works by first gathering a handful
of meta data for the data set, such as the cluster sizes, initial cluster
assignments, and initial centroids. It then implements update
and requires subclasses to implement functions for determining
the change in the criterion score due to moving a data point.
Sub classes must implement getOldCentroidScore
and getNewCentroidScore
. The first function returns the score for the current
datapoints cluster assignment when that data point is removed from the data
point. The second function returns the score for an anternate cluster when
the current data point is placed in that cluster.
This base class also provides two key methods for assisting with compute the
above changes: modifiedMagnitudeSqrd
and modifiedMagnitude
. For both functions, the first method is
cosidered to be the cluster centroid that is being modified and the second
vector is the data point that is being added to the centroid, without
actually affecting the cluster.Modifier and Type | Field and Description |
---|---|
protected int[] |
assignments
The set of cluster assignments for each cluster.
|
protected DoubleVector[] |
centroids
The centroids representing each cluster.
|
protected int[] |
clusterSizes
The number of data points found in each cluster.
|
protected double[] |
costs
The cost computed for each cluster.
|
protected List<DoubleVector> |
matrix
The
Matrix holding the data points. |
Constructor and Description |
---|
BaseFunction()
Constructs a new
BaseFunction . |
Modifier and Type | Method and Description |
---|---|
int[] |
assignments()
Returns the cluster assignment indices for each data point in the
original matrix passed to {@link #setup(Matrix, int[] int) setup).
|
DoubleVector[] |
centroids()
Returns the final set of centroids computed for the dataset passed to
setup . |
int[] |
clusterSizes()
Returns the number of data points assigned to each cluster.
|
protected abstract double |
getNewCentroidScore(int newCentroidIndex,
DoubleVector dataPoint)
Returns the new score for the cluster centroid indexed by
newCentroidIndex when dataPoint is added to it. |
protected abstract double |
getOldCentroidScore(DoubleVector vector,
int oldCentroidIndex,
int altClusterSize)
Returns the new score for the cluster centroid represented by
altCurrentCentroid with the new altClusterSize . |
protected static double |
modifiedMagnitude(DoubleVector c,
DoubleVector v)
Returns the magnitude of
c as if v was added to the the
vector. |
protected static double |
modifiedMagnitudeSqrd(DoubleVector c,
DoubleVector v)
Returns the magnitude squared of
c as if v was added to
the vector. |
double |
score()
Returns the score computed by this
CriterionFunction . |
void |
setup(Matrix m,
int[] initialAssignments,
int numClusters)
Creates the cluster centroids and any other meta data needed by this
CriterionFunction . |
protected void |
subSetup(Matrix m)
Setup any extra information needed before computing the cost values for
each cluster.
|
protected static DoubleVector |
subtract(DoubleVector c,
DoubleVector v)
Returns a
DoubleVector that is equal to c - v . |
protected static double |
subtractedMagnitude(DoubleVector c,
DoubleVector v)
Returns the magnitude of
c as if v was added to the the
vector. |
protected static double |
subtractedMagnitudeSqrd(DoubleVector c,
DoubleVector v)
Returns the magnitude squared of
c as if v was added to
the vector. |
boolean |
update(int currentVectorIndex)
Updates the clustering assignment for data point indexed by
currentVectorIndex . |
protected void |
updateScores(int newCentroidIndex,
int oldCentroidIndex,
DoubleVector vector) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
isMaximize
protected List<DoubleVector> matrix
Matrix
holding the data points.protected int[] assignments
protected DoubleVector[] centroids
protected int[] clusterSizes
protected double[] costs
public BaseFunction()
BaseFunction
.public void setup(Matrix m, int[] initialAssignments, int numClusters)
CriterionFunction
.setup
in interface CriterionFunction
m
- The Matrix
holding data points. This will be used as
read only.initialAssignments
- The cluster assignments for each data point in
m
. This is used as read only and discarded.protected void subSetup(Matrix m)
public boolean update(int currentVectorIndex)
currentVectorIndex
. This returns true
if the data point is left
in the same cluster and false if it was relocated to another data point.update
in interface CriterionFunction
protected abstract double getOldCentroidScore(DoubleVector vector, int oldCentroidIndex, int altClusterSize)
altCurrentCentroid
with the new altClusterSize
.altCurrentCentroid
- The current updated cluster centroidaltClusterSize
- The current updated cluster sizeprotected abstract double getNewCentroidScore(int newCentroidIndex, DoubleVector dataPoint)
newCentroidIndex
when dataPoint
is added to it. Implementations
of this method should not actually add dataPoint
to the centroid,
but should instead use the helper functions provided to compute the new
score.newCentroidIndex
- The index of the current alternate centroiddataPoint
- The current data point that is being reassignedprotected void updateScores(int newCentroidIndex, int oldCentroidIndex, DoubleVector vector)
protected static DoubleVector subtract(DoubleVector c, DoubleVector v)
DoubleVector
that is equal to c - v
. This
method is used instead of the one in VectorMath
so that a DenseDynamicMagnitudeVector
can be used to represent the difference.
This vector type is optimized for when many calls to magnitude are
interleaved with updates to a few dimensions in the vector.public int[] assignments()
assignments
in interface CriterionFunction
public DoubleVector[] centroids()
setup
.centroids
in interface CriterionFunction
public int[] clusterSizes()
clusterSizes
in interface CriterionFunction
public double score()
CriterionFunction
.score
in interface CriterionFunction
protected static double modifiedMagnitudeSqrd(DoubleVector c, DoubleVector v)
c
as if v
was added to
the vector. We do this because it would be more costly, garbage
collection wise, to create a new vector for each alternate cluster and
then throw away all but one of them.protected static double modifiedMagnitude(DoubleVector c, DoubleVector v)
c
as if v
was added to the the
vector. We do this because it would be more costly, garbage collection
wise, to create a new vector for each alternate cluster and * vector.protected static double subtractedMagnitudeSqrd(DoubleVector c, DoubleVector v)
c
as if v
was added to
the vector. We do this because it would be more costly, garbage
collection wise, to create a new vector for each alternate cluster and
then throw away all but one of them.protected static double subtractedMagnitude(DoubleVector c, DoubleVector v)
c
as if v
was added to the the
vector. We do this because it would be more costly, garbage collection
wise, to create a new vector for each alternate cluster and * vector.Copyright © 2012. All Rights Reserved.