GeneralizedOrssSeed (S-Space Package 2.0.1 API)

java.lang.Object
- edu.ucla.sspace.clustering.seeding.GeneralizedOrssSeed

All Implemented Interfaces:

KMeansSeed
```
public class GeneralizedOrssSeed
extends Object
implements KMeansSeed
```
A utility class for selected k data points as seeds from a list of n >> k data points using a general method for comparing the similarity (distance) of data points. The seeds are intented to be used as input to the k-means algorithm for the initial data points (facilities) to which other points are assigned. This implementation is based on the work of
- Rafail Ostrovsky, Yuval Rabani, Leonard Schulman, and Chaitanya Swamy. The Effectiveness of Lloyd-Type Methods for the k-Means Problem. In FOCS, 2006.
Unlike the OrssSeed implementation, this implementation is based on using a SimilarityFunction to compare points. In contrast, the OrssSeed uses the Euclidean distance to compare points as in the Ostrovsky et al. (2006) formulation. The properties defined in the ORSS paper are preserved if the similarity is defined as the inverse of the squared Euclidean distances, which produces the same results as the OrssSeed implemenation. However, this implemenation generalizes the notion of distance to inverse-similarity, which allows data to be compared using alternate methods, such as CosineSimilarity, which is frequently used in comparing text documents. Note that the similarity values returned by any SimilarityFunction used by this class must always be non-negative.
In addition, this class provides an additional overload of the chooseSeeds method that allows the input data points to be weighed. Weighting enables finding seeds where the input are representative of different sample sizes.
This implementation is in part derived from the ORSS seed implementation of Michael Shindler as a part of the Fast Streaming K-Means implementation available here.
Author:

David Jurgens

See Also:
OrssSeed

Constructor Summary

Constructors
Constructor and Description

GeneralizedOrssSeed(SimilarityFunction simFunc)

Constructors
Constructor and Description
`GeneralizedOrssSeed(SimilarityFunction simFunc)`

Method Summary

Methods
Modifier and Type	Method and Description
`DoubleVector[]`	`chooseSeeds(int k, Matrix dataPoints)` Selects `k` rows of `dataPoints` to be seeds of a k-means instance.
`DoubleVector[]`	`chooseSeeds(Matrix dataPoints, int k, int[] weights)` Selects `k` rows of `dataPoints`, weighted by the specified amount, to be seeds of a k-means instance.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - GeneralizedOrssSeed
```
public GeneralizedOrssSeed(SimilarityFunction simFunc)
```
- Method Detail
  - chooseSeeds
```
public DoubleVector[] chooseSeeds(int k,
                         Matrix dataPoints)
```
    Selects k rows of dataPoints to be seeds of a k-means instance. If more seeds are requested than are available, all possible rows are returned.
    
    Specified by:
    
    chooseSeeds in interface KMeansSeed
    
    Parameters:
    dataPoints - a matrix whose rows are to be evaluated and from which k data points will be selected
    k - the number of data points (rows) to select
    
    Returns:
    the set of rows that were selected
  - chooseSeeds
```
public DoubleVector[] chooseSeeds(Matrix dataPoints,
                         int k,
                         int[] weights)
```
    Selects k rows of dataPoints, weighted by the specified amount, to be seeds of a k-means instance. If more seeds are requested than are available, all possible rows are returned.
    
    Parameters:
    dataPoints - a matrix whose rows are to be evaluated and from which k data points will be selected
    k - the number of data points (rows) to select
    weights - as set of scalar int weights that reflect the importance of each data points.
    
    Returns:
    the set of rows that were selected

Class GeneralizedOrssSeed

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

GeneralizedOrssSeed

Method Detail

chooseSeeds

chooseSeeds