StreamingWordsi (S-Space Package 2.0.1 API)

java.lang.Object
- edu.ucla.sspace.wordsi.BaseWordsi
- - edu.ucla.sspace.wordsi.StreamingWordsi

All Implemented Interfaces:

SemanticSpace, Wordsi
```
public class StreamingWordsi
extends BaseWordsi
```
A Wordsi implementation that utilizes streaming, or online, clustering algorithms. This model will immediate assign a context vector to one of the clusters generated for a particular focus word, or create a new cluster if needed. After processing is compelete, the AssignmentReporter will be informed of all the data point assignments made by the clustering algorithm for each word.

Author:

Keith Stevens

Constructor Summary

Constructors
Constructor and Description
`StreamingWordsi(Set<String> acceptedWords, ContextExtractor extractor, Generator<OnlineClustering<SparseDoubleVector>> clusterGenerator, AssignmentReporter reporter, int numClusters)` Creates a new `StreamingWordsi`.

Method Summary

Methods
Modifier and Type	Method and Description
`SparseDoubleVector`	`getVector(String term)` Returns the semantic vector for the provided word.
`Set<String>`	`getWords()` Returns the set of words that are represented in this semantic space.
`void`	`handleContextVector(String focusKey, String secondaryKey, SparseDoubleVector context)` Performs some operation with `contextVector`, which can be indexed by either `primaryKey`, `secondaryKey`, or both.
`void`	`processSpace(Properties props)` Once all the documents have been processed, performs any post-processing steps on the data.

Methods inherited from class edu.ucla.sspace.wordsi.BaseWordsi
acceptWord, getSpaceName, getVectorLength, processDocument

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - StreamingWordsi
```
public StreamingWordsi(Set<String> acceptedWords,
               ContextExtractor extractor,
               Generator<OnlineClustering<SparseDoubleVector>> clusterGenerator,
               AssignmentReporter reporter,
               int numClusters)
```
    Creates a new StreamingWordsi.
    
    Parameters:
    acceptedWords - The set of words that Wordsi should represent. This may be null or empty}.
    extractor - The ContextExtractor used to parse documents
    trackSecondaryKeys - If true, cluster assignments and secondary keys will be tracked. If this is false, the AssignmentReporter will not be used.
    clusterGenerator - A Generator responsible for creating new instances of a OnlineClustering algorithm.
    reporter - The AssignmentReporter responsible for generating a report that details the cluster assignments. This may be null. If trackSecondaryKeys is false, this is not used.
- Method Detail
  - getWords
```
public Set<String> getWords()
```
    Returns the set of words that are represented in this semantic space.
    
    Returns:
    the set of words that are represented in this semantic space.
  - getVector
```
public SparseDoubleVector getVector(String term)
```
    Returns the semantic vector for the provided word.
    
    Parameters:
    term - a word that may be in the semantic space
    
    Returns:
    The Vector for the provided word or null if the word was not in the space.
  - handleContextVector
```
public void handleContextVector(String focusKey,
                       String secondaryKey,
                       SparseDoubleVector context)
```
    Performs some operation with contextVector, which can be indexed by either primaryKey, secondaryKey, or both. This operation will likely assign the contextVector to some cluster immediately or store the contextVector so that it may be clustered with all other other context vecetors generated for primaryKey.
    The secondaryKey does not need to be used, but some experiments may require it, such as the SenseEval/SemEval evaluation or pseudo-word disambiguation. For SenseEval/SemEval evaluations, a SenseEvalContextExtractor should be used, which will provide the context id as the secondaryKey; reporting should be done with a SenseEvalReporter. For pseudo-word disambiguation/discrimination, a PseudoWordContextExtractor should be used, which will create pseudo-words for some set of tokens. This extractor will use the pseudo-word for the primaryKey and the original token as the secondaryKey.
    
    Parameters:
    focusKey - The primary key for contextVector
    context - a SparseDoubleVector that represents a single context for a word
  - processSpace
```
public void processSpace(Properties props)
```
    Once all the documents have been processed, performs any post-processing steps on the data. An algorithm should treat this as a no-op if no post-processing is required. Callers may specify the values for any exposed parameters using the properties argument.
    By general contract, once this method has been called, processDocument will not be called again.
    
    Parameters:
    props - a set of properties and values that may be used to configure any exposed parameters of the algorithm.

Class StreamingWordsi

Constructor Summary

Method Summary

Methods inherited from class edu.ucla.sspace.wordsi.BaseWordsi

Methods inherited from class java.lang.Object

Constructor Detail

StreamingWordsi

Method Detail

getWords

getVector

handleContextVector

processSpace