public class StreamingWordsi extends BaseWordsi
Wordsi implementation that utilizes streaming, or online,
 clustering algorithms.  This model will immediate assign a context vector to
 one of the clusters generated for a particular focus word, or create a new
 cluster if needed.  After processing is compelete, the AssignmentReporter will be informed of all the data point assignments made
 by the clustering algorithm for each word.| Constructor and Description | 
|---|
| StreamingWordsi(Set<String> acceptedWords,
               ContextExtractor extractor,
               Generator<OnlineClustering<SparseDoubleVector>> clusterGenerator,
               AssignmentReporter reporter,
               int numClusters)Creates a new  StreamingWordsi. | 
| Modifier and Type | Method and Description | 
|---|---|
| SparseDoubleVector | getVector(String term)Returns the semantic vector for the provided word. | 
| Set<String> | getWords()Returns the set of words that are represented in this semantic space. | 
| void | handleContextVector(String focusKey,
                   String secondaryKey,
                   SparseDoubleVector context)Performs some operation with  contextVector, which can be indexed
 by eitherprimaryKey,secondaryKey, or both. | 
| void | processSpace(Properties props)Once all the documents have been processed, performs any post-processing
 steps on the data. | 
acceptWord, getSpaceName, getVectorLength, processDocumentpublic StreamingWordsi(Set<String> acceptedWords, ContextExtractor extractor, Generator<OnlineClustering<SparseDoubleVector>> clusterGenerator, AssignmentReporter reporter, int numClusters)
StreamingWordsi.acceptedWords - The set of words that Wordsi should
         represent.  This may be null or empty}.extractor - The ContextExtractor used to parse documentstrackSecondaryKeys - If true, cluster assignments and secondary keys
        will be tracked. If this is false, the AssignmentReporter
        will not be used.clusterGenerator - A Generator responsible for creating new
        instances of a OnlineClustering algorithm.reporter - The AssignmentReporter responsible for generating
        a report that details the cluster assignments. This may be null. If trackSecondaryKeys is false, this is not used.public Set<String> getWords()
public SparseDoubleVector getVector(String term)
term - a word that may be in the semantic spaceVector for the provided word or null if the
          word was not in the space.public void handleContextVector(String focusKey, String secondaryKey, SparseDoubleVector context)
contextVector, which can be indexed
 by either primaryKey, secondaryKey, or both.  This
 operation will likely assign the contextVector to some cluster
 immediately or store the contextVector so that it may be
 clustered with all other other context vecetors generated for primaryKey.
 
 The secondaryKey does not need to be used, but some experiments
 may require it, such as the SenseEval/SemEval evaluation or pseudo-word
 disambiguation.  For SenseEval/SemEval evaluations, a SenseEvalContextExtractor should be used, which will provide the context
 id as the secondaryKey; reporting should be done with a SenseEvalReporter.  For pseudo-word disambiguation/discrimination, a
 PseudoWordContextExtractor should be used, which will create
 pseudo-words for some set of tokens.  This extractor will use the
 pseudo-word for the primaryKey and the original token as the
 secondaryKey.focusKey - The primary key for contextVectorcontext - a SparseDoubleVector that represents a
        single context for a wordpublic void processSpace(Properties props)
properties argument.
 
 By general contract, once this method has been called, processDocument will not be called again.
props - a set of properties and values that may be used to
        configure any exposed parameters of the algorithm.Copyright © 2012. All Rights Reserved.