BaseSpectralCut (S-Space Package 2.0.1 API)

java.lang.Object
- edu.ucla.sspace.clustering.BaseSpectralCut

All Implemented Interfaces:

EigenCut

Direct Known Subclasses:

CKVWSpectralClustering03.SpectralCut, CKVWSpectralClustering06.SuperSpectralCut
```
public abstract class BaseSpectralCut
extends Object
implements EigenCut
```
An abstract class for computing a spectral cut over a data Matrix that represents a set of data points. The spectral cut attempts to find the a separation that minimizes the conductance between the two resulting regions. Often, this requires computing a complete affinity matrix from the data points, which requires O(n^2) time and space complexity. SparseMatrixs are a special case, Instead of computing the full affinity matrix, a centroid for the complete data set, and possible divided regions can be computed and used to evaluate the conductance (based on the transitivity of the dot product).
There are several variations on computing the conductance of a matrix. This class does nearly all of the heavy computation, except for computing the second eigen vector of an affinity matrix. Subclasses need only implement computeSecondEigenVector(edu.ucla.sspace.matrix.Matrix, int), which should return a dense DoubleVector that represents the eigen values of the affinity matrix. Implementations are suggested to avoid explicitly computing the full affinity matrix.
This abstract class provides the rest of the needed functionality for EigenCut, such as computing the objective functions, selecting an optimal conductance cut accross the affinity matrix, and computing the split partitions.

Author:

Keith Stevens

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`protected static class`	`BaseSpectralCut.Index` A simple comparable data struct holding a row vector's weight and the vector's original index in a matrix.

Field Summary

Fields
Modifier and Type	Field and Description
`protected Matrix`	`dataMatrix` The `Matrix` containing the data points.
`protected int[]`	`leftReordering` The final ordering of data points in the first created region.
`protected Matrix`	`leftSplit` The data points in the left region.
`protected DoubleVector`	`matrixRowSums` The centroid of the entire data set.
`protected int`	`numRows` The number of rows in the data matrix.
`protected double`	`pSum` The summation of the `rho` values.
`protected DoubleVector`	`rho` The sum similarity values from each data point to all other data points, which is equivalent to the simiarltiy between each data point and the centroid of the entire data set.
`protected int[]`	`rightReordering` The final ordering of data points in the first created region.
`protected Matrix`	`rightSplit` The data points in the right region.

Constructor Summary

Constructors
Constructor and Description

BaseSpectralCut()

Constructors
Constructor and Description
`BaseSpectralCut()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected static int`	`comparisonCount(int[] clusterSizes)` Returns the number of comparisons made for a cluster.
`void`	`computeCut(Matrix matrix)` Compute the cut with the lowest conductance for the data set.
`protected static int`	`computeCut(Matrix matrix, DoubleVector rho, double rhoSum, DoubleVector matrixRowSums)` Returns the index at which `matrix` should be cut such that the conductance between the two partitions is minimized.
`protected static void`	`computeMatrixDotV(Matrix matrix, DoubleVector newV, DoubleVector v)` Computes the dot product between a given matrix and a given vector `newV`.
`protected static <T extends Matrix> DoubleVector`	`computeMatrixRowSum(T matrix)` Compute the row sums of the values in `matrix` and returns the values in a vector of length `matrix.columns()`.
`protected static DoubleVector`	`computeMatrixTransposeV(Matrix matrix, DoubleVector v)` Returns the dot product between the transpose of a given matrix and a given vector.
`DoubleVector`	`computeRhoSum(Matrix matrix)` Computes the similarity between each data point and centroid of the data set.
`protected abstract DoubleVector`	`computeSecondEigenVector(Matrix matrix, int vectorLength)` Returns a `DoubleVector` representing the secord largest eigen vector for the data set.
`double`	`getKMeansObjective()` Returns the K-Means objective score of the entire data set, i.e.
`double`	`getKMeansObjective(double alpha, double beta, int leftNumClusters, int[] leftAssignments, int rightNumClusters, int[] rightAssignments)` Returns the K-Means objective computed over the two regions computed over the data set.
`Matrix`	`getLeftCut()` Returns the data set in the first (left) region.
`int[]`	`getLeftReordering()` Return the ordering of the first region with respect to the original data set.
`double`	`getMergedObjective(double alpha, double beta)` Returns the score for the relaxed correlation objective over the entire data set, undivided.
`Matrix`	`getRightCut()` Returns the data set in the second (right) region.
`int[]`	`getRightReordering()` Return the ordering of the second region with respect to the original data set.
`double`	`getSplitObjective(double alpha, double beta, int leftNumClusters, int[] leftAssignments, int rightNumClusters, int[] rightAssignments)` Returns the score for the relaxed correlation objective when the data matrix is divided into multiple clusters.
`static double`	`kMeansObjective(int numClusters, int[] assignments, Matrix data)` Returns the K-Means objective over an arbitrary clustering assignment for the data set.
`protected static DoubleVector`	`orthonormalize(DoubleVector v, DoubleVector other)` Returns a `DoubleVector` that is the orthonormalized version of `v` with respect to `other`.
`double`	`rhoSum()` Returns the sum of values in `rho`.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - dataMatrix
```
protected Matrix dataMatrix
```
    The Matrix containing the data points.
  - numRows
```
protected int numRows
```
    The number of rows in the data matrix. Used as a short hand in computations.
  - rho
```
protected DoubleVector rho
```
    The sum similarity values from each data point to all other data points, which is equivalent to the simiarltiy between each data point and the centroid of the entire data set.
  - matrixRowSums
```
protected DoubleVector matrixRowSums
```
    The centroid of the entire data set.
  - pSum
```
protected double pSum
```
    The summation of the rho values.
  - leftReordering
```
protected int[] leftReordering
```
    The final ordering of data points in the first created region.
  - rightReordering
```
protected int[] rightReordering
```
    The final ordering of data points in the first created region.
  - leftSplit
```
protected Matrix leftSplit
```
    The data points in the left region.
  - rightSplit
```
protected Matrix rightSplit
```
    The data points in the right region.
- Constructor Detail
  - BaseSpectralCut
```
public BaseSpectralCut()
```
- Method Detail
  - rhoSum
```
public double rhoSum()
```
    Returns the sum of values in rho. This is equivalent to sum(matrix * matrix').
    
    Specified by:
    
    rhoSum in interface EigenCut
  - computeRhoSum
```
public DoubleVector computeRhoSum(Matrix matrix)
```
    Computes the similarity between each data point and centroid of the data set. This is essentially the row sums of the affinity matrix for matrix.
    
    Specified by:
    
    computeRhoSum in interface EigenCut
  - computeCut
```
public void computeCut(Matrix matrix)
```
    Compute the cut with the lowest conductance for the data set. This involves the following main steps:
    1. Computing the second eigen vector of the data set.
    2. Sorting both the eigen vector, and each dimensions corresponding data point, based on the eigen values.
    The resulting regions are accessible by in EigenCut.getLeftCut() and EigenCut.getRightCut().
    Specified by:
    
    computeCut in interface EigenCut
  - computeSecondEigenVector
```
protected abstract DoubleVector computeSecondEigenVector(Matrix matrix,
                                    int vectorLength)
```
    Returns a DoubleVector representing the secord largest eigen vector for the data set.
  - getLeftCut
```
public Matrix getLeftCut()
```
    Returns the data set in the first (left) region.
    
    Specified by:
    
    getLeftCut in interface EigenCut
  - getRightCut
```
public Matrix getRightCut()
```
    Returns the data set in the second (right) region.
    
    Specified by:
    
    getRightCut in interface EigenCut
  - getLeftReordering
```
public int[] getLeftReordering()
```
    Return the ordering of the first region with respect to the original data set.
    
    Specified by:
    
    getLeftReordering in interface EigenCut
  - getRightReordering
```
public int[] getRightReordering()
```
    Return the ordering of the second region with respect to the original data set.
    
    Specified by:
    
    getRightReordering in interface EigenCut
  - getKMeansObjective
```
public double getKMeansObjective()
```
    Returns the K-Means objective score of the entire data set, i.e. the sum of the similarity between each dataset and the centroid.
    
    Specified by:
    
    getKMeansObjective in interface EigenCut
  - getKMeansObjective
```
public double getKMeansObjective(double alpha,
                        double beta,
                        int leftNumClusters,
                        int[] leftAssignments,
                        int rightNumClusters,
                        int[] rightAssignments)
```
    Returns the K-Means objective computed over the two regions computed over the data set.
    
    Specified by:
    
    getKMeansObjective in interface EigenCut
  - kMeansObjective
```
public static double kMeansObjective(int numClusters,
                     int[] assignments,
                     Matrix data)
```
    Returns the K-Means objective over an arbitrary clustering assignment for the data set.
  - getSplitObjective
```
public double getSplitObjective(double alpha,
                       double beta,
                       int leftNumClusters,
                       int[] leftAssignments,
                       int rightNumClusters,
                       int[] rightAssignments)
```
    Returns the score for the relaxed correlation objective when the data matrix is divided into multiple clusters. The relaxed correlation objective measures both inter-cluster similarity and intra-cluster dissimilarity. A high score means that values with in a cluster are highly similar and each cluster is highly distinct. This is to be used after clustering values in each sub region.
    
    Specified by:
    
    getSplitObjective in interface EigenCut
    
    Parameters:
    alpha - The weight given to the inter-cluster similarity.
    beta - The weight given to the intra-cluster similarity.
    leftNumClusters - The number of clusters found in the left split
    leftAssignments - The assignments for data points in the left region
    rightNumClusters - The number of clusters found in the right split
    rightAssignments - The assignments for data points in the right region
  - getMergedObjective
```
public double getMergedObjective(double alpha,
                        double beta)
```
    Returns the score for the relaxed correlation objective over the entire data set, undivided.
    
    Specified by:
    
    getMergedObjective in interface EigenCut
  - comparisonCount
```
protected static int comparisonCount(int[] clusterSizes)
```
    Returns the number of comparisons made for a cluster.
  - orthonormalize
```
protected static DoubleVector orthonormalize(DoubleVector v,
                          DoubleVector other)
```
    Returns a DoubleVector that is the orthonormalized version of v with respect to other. This orthonormalization is done by simply modifying the value of v[0] such that it balances out the similarity between v and other in all other dimensions.
  - computeCut
```
protected static int computeCut(Matrix matrix,
             DoubleVector rho,
             double rhoSum,
             DoubleVector matrixRowSums)
```
    Returns the index at which matrix should be cut such that the conductance between the two partitions is minimized. This is done such that the sparsity of the data matrix is maintained and all the entire operation is linear with respect to the number of non zeros in the matrix.
  - computeMatrixTransposeV
```
protected static DoubleVector computeMatrixTransposeV(Matrix matrix,
                                   DoubleVector v)
```
    Returns the dot product between the transpose of a given matrix and a given vector. This method has special casing for a SparseMatrix. This method also assumes that matrix is row based and iterates over each of the values in the row before iterating over another row.
  - computeMatrixDotV
```
protected static void computeMatrixDotV(Matrix matrix,
                     DoubleVector newV,
                     DoubleVector v)
```
    Computes the dot product between a given matrix and a given vector newV. The result is stored in v. This method has special casing for when matrix is a SparseMatrix. This method also assumes that matrix is row based and iterates over each of the values in the row before iterating over another row.
  - computeMatrixRowSum
```
protected static <T extends Matrix> DoubleVector computeMatrixRowSum(T matrix)
```
    Compute the row sums of the values in matrix and returns the values in a vector of length matrix.columns().

Class BaseSpectralCut

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

dataMatrix

numRows

rho

matrixRowSums

pSum

leftReordering

rightReordering

leftSplit

rightSplit

Constructor Detail

BaseSpectralCut

Method Detail

rhoSum

computeRhoSum

computeCut

computeSecondEigenVector

getLeftCut

getRightCut

getLeftReordering

getRightReordering

getKMeansObjective

getKMeansObjective

kMeansObjective

getSplitObjective

getMergedObjective

comparisonCount

orthonormalize

computeCut

computeMatrixTransposeV

computeMatrixDotV

computeMatrixRowSum