public abstract class BaseSpectralCut extends Object implements EigenCut
Matrix
that represents a set of data points. The spectral cut attempts to find the
a separation that minimizes the conductance between the two resulting
regions. Often, this requires computing a complete affinity matrix from the
data points, which requires O(n^2) time and space complexity. SparseMatrix
s are a special case, Instead of computing the full affinity
matrix, a centroid for the complete data set, and possible divided regions
can be computed and used to evaluate the conductance (based on the
transitivity of the dot product).
There are several variations on computing the conductance of a matrix. This
class does nearly all of the heavy computation, except for computing the
second eigen vector of an affinity matrix. Subclasses need only implement
computeSecondEigenVector(edu.ucla.sspace.matrix.Matrix, int)
, which should return a dense DoubleVector
that represents the eigen values of the affinity matrix.
Implementations are suggested to avoid explicitly computing the full affinity
matrix.
This abstract class provides the rest of the needed functionality for EigenCut
, such as computing the objective functions, selecting an optimal
conductance cut accross the affinity matrix, and computing the split
partitions.Modifier and Type | Class and Description |
---|---|
protected static class |
BaseSpectralCut.Index
A simple comparable data struct holding a row vector's weight and the
vector's original index in a matrix.
|
Modifier and Type | Field and Description |
---|---|
protected Matrix |
dataMatrix
The
Matrix containing the data points. |
protected int[] |
leftReordering
The final ordering of data points in the first created region.
|
protected Matrix |
leftSplit
The data points in the left region.
|
protected DoubleVector |
matrixRowSums
The centroid of the entire data set.
|
protected int |
numRows
The number of rows in the data matrix.
|
protected double |
pSum
The summation of the
rho values. |
protected DoubleVector |
rho
The sum similarity values from each data point to all other data points,
which is equivalent to the simiarltiy between each data point and the
centroid of the entire data set.
|
protected int[] |
rightReordering
The final ordering of data points in the first created region.
|
protected Matrix |
rightSplit
The data points in the right region.
|
Constructor and Description |
---|
BaseSpectralCut() |
Modifier and Type | Method and Description |
---|---|
protected static int |
comparisonCount(int[] clusterSizes)
Returns the number of comparisons made for a cluster.
|
void |
computeCut(Matrix matrix)
Compute the cut with the lowest conductance for the data set.
|
protected static int |
computeCut(Matrix matrix,
DoubleVector rho,
double rhoSum,
DoubleVector matrixRowSums)
Returns the index at which
matrix should be cut such that the
conductance between the two partitions is minimized. |
protected static void |
computeMatrixDotV(Matrix matrix,
DoubleVector newV,
DoubleVector v)
Computes the dot product between a given matrix and a given vector
newV . |
protected static <T extends Matrix> |
computeMatrixRowSum(T matrix)
Compute the row sums of the values in
matrix and returns the
values in a vector of length matrix.columns() . |
protected static DoubleVector |
computeMatrixTransposeV(Matrix matrix,
DoubleVector v)
Returns the dot product between the transpose of a given matrix and a
given vector.
|
DoubleVector |
computeRhoSum(Matrix matrix)
Computes the similarity between each data point and centroid of the data
set.
|
protected abstract DoubleVector |
computeSecondEigenVector(Matrix matrix,
int vectorLength)
Returns a
DoubleVector representing the secord largest eigen
vector for the data set. |
double |
getKMeansObjective()
Returns the K-Means objective score of the entire data set, i.e.
|
double |
getKMeansObjective(double alpha,
double beta,
int leftNumClusters,
int[] leftAssignments,
int rightNumClusters,
int[] rightAssignments)
Returns the K-Means objective computed over the two regions computed over
the data set.
|
Matrix |
getLeftCut()
Returns the data set in the first (left) region.
|
int[] |
getLeftReordering()
Return the ordering of the first region with respect to the original data
set.
|
double |
getMergedObjective(double alpha,
double beta)
Returns the score for the relaxed correlation objective over the entire
data set, undivided.
|
Matrix |
getRightCut()
Returns the data set in the second (right) region.
|
int[] |
getRightReordering()
Return the ordering of the second region with respect to the original
data set.
|
double |
getSplitObjective(double alpha,
double beta,
int leftNumClusters,
int[] leftAssignments,
int rightNumClusters,
int[] rightAssignments)
Returns the score for the relaxed correlation objective when the data
matrix is divided into multiple clusters.
|
static double |
kMeansObjective(int numClusters,
int[] assignments,
Matrix data)
Returns the K-Means objective over an arbitrary clustering assignment for
the data set.
|
protected static DoubleVector |
orthonormalize(DoubleVector v,
DoubleVector other)
|
double |
rhoSum()
Returns the sum of values in
rho . |
protected int numRows
protected DoubleVector rho
protected DoubleVector matrixRowSums
protected double pSum
rho
values.protected int[] leftReordering
protected int[] rightReordering
protected Matrix leftSplit
protected Matrix rightSplit
public double rhoSum()
rho
. This is equivalent to
sum(matrix
* matrix
').public DoubleVector computeRhoSum(Matrix matrix)
matrix
.computeRhoSum
in interface EigenCut
public void computeCut(Matrix matrix)
EigenCut.getLeftCut()
and EigenCut.getRightCut()
.computeCut
in interface EigenCut
protected abstract DoubleVector computeSecondEigenVector(Matrix matrix, int vectorLength)
DoubleVector
representing the secord largest eigen
vector for the data set.public Matrix getLeftCut()
getLeftCut
in interface EigenCut
public Matrix getRightCut()
getRightCut
in interface EigenCut
public int[] getLeftReordering()
getLeftReordering
in interface EigenCut
public int[] getRightReordering()
getRightReordering
in interface EigenCut
public double getKMeansObjective()
getKMeansObjective
in interface EigenCut
public double getKMeansObjective(double alpha, double beta, int leftNumClusters, int[] leftAssignments, int rightNumClusters, int[] rightAssignments)
getKMeansObjective
in interface EigenCut
public static double kMeansObjective(int numClusters, int[] assignments, Matrix data)
public double getSplitObjective(double alpha, double beta, int leftNumClusters, int[] leftAssignments, int rightNumClusters, int[] rightAssignments)
getSplitObjective
in interface EigenCut
alpha
- The weight given to the inter-cluster similarity.beta
- The weight given to the intra-cluster similarity.leftNumClusters
- The number of clusters found in the left splitleftAssignments
- The assignments for data points in the left regionrightNumClusters
- The number of clusters found in the right splitrightAssignments
- The assignments for data points in the right
regionpublic double getMergedObjective(double alpha, double beta)
getMergedObjective
in interface EigenCut
protected static int comparisonCount(int[] clusterSizes)
protected static DoubleVector orthonormalize(DoubleVector v, DoubleVector other)
DoubleVector
that is the orthonormalized version of
v
with respect to other
. This orthonormalization is done
by simply modifying the value of v[0] such that it balances out the
similarity between v
and other
in all other dimensions.protected static int computeCut(Matrix matrix, DoubleVector rho, double rhoSum, DoubleVector matrixRowSums)
matrix
should be cut such that the
conductance between the two partitions is minimized. This is done such
that the sparsity of the data matrix is maintained and all the entire
operation is linear with respect to the number of non zeros in the
matrix.protected static DoubleVector computeMatrixTransposeV(Matrix matrix, DoubleVector v)
SparseMatrix
.
This method also assumes that matrix
is row based and iterates
over each of the values in the row before iterating over another row.protected static void computeMatrixDotV(Matrix matrix, DoubleVector newV, DoubleVector v)
newV
. The result is stored in v
. This method has special
casing for when matrix
is a SparseMatrix
. This method
also assumes that matrix
is row based and iterates over each of
the values in the row before iterating over another row.protected static <T extends Matrix> DoubleVector computeMatrixRowSum(T matrix)
matrix
and returns the
values in a vector of length matrix.columns()
.Copyright © 2012. All Rights Reserved.