public class WaitingWordsi extends BaseWordsi
Wordsi implementation that performs batch clustering. Each context
vector is stored and later clustered using a Clustering algorithm.| Constructor and Description |
|---|
WaitingWordsi(Set<String> acceptedWords,
ContextExtractor extractor,
Clustering clustering,
AssignmentReporter reporter)
Creates a new
WaitingWordsi. |
WaitingWordsi(Set<String> acceptedWords,
ContextExtractor extractor,
Clustering clustering,
AssignmentReporter reporter,
int numClusters)
Creates a new
WaitingWordsi. |
| Modifier and Type | Method and Description |
|---|---|
SparseDoubleVector |
getVector(String term)
Returns the semantic vector for the provided word.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
handleContextVector(String focusKey,
String secondaryKey,
SparseDoubleVector context)
Adds the context vector to the end of the list of context vectors
associated with
focusKey. |
void |
processSpace(Properties props)
Once all the documents have been processed, performs any post-processing
steps on the data.
|
acceptWord, getSpaceName, getVectorLength, processDocumentpublic WaitingWordsi(Set<String> acceptedWords, ContextExtractor extractor, Clustering clustering, AssignmentReporter reporter)
WaitingWordsi. The number of clusters is left
unset, which requires that the Clustering algorithm be able to
decide on an appropriate number of clusters.acceptedWords - The set of words that Wordsi should
represent. This may be null or empty}.extractor - The ContextExtractor used to parse documents.trackSecondaryKeys - If true, cluster assignments and secondary keys
will be tracked. If this is false, the AssignmentReporter
will not be used.clustering - The Clustering algorithm to use on each data
set.reporter - The AssignmentReporter responsible for generating
a report that details the cluster assignments. This may be
null. If trackSecondaryKeys is false, this is
not used.public WaitingWordsi(Set<String> acceptedWords, ContextExtractor extractor, Clustering clustering, AssignmentReporter reporter, int numClusters)
WaitingWordsi.acceptedWords - The set of words that Wordsi should
represent. This may be null or empty}.extractor - The ContextExtractor used to parse documents.clustering - The Clustering algorithm to use on each data
set.reporter - The AssignmentReporter responsible for generating
a report that details the cluster assignments. This may be null. If trackSecondaryKeys is false, this is not used.numClusters - Specifies the number of clusters to generate for each
term.public Set<String> getWords()
public SparseDoubleVector getVector(String term)
term - a word that may be in the semantic spaceVector for the provided word or null if the
word was not in the space.public void handleContextVector(String focusKey, String secondaryKey, SparseDoubleVector context)
focusKey.focusKey - The primary key for contextVectorcontext - a SparseDoubleVector that represents a
single context for a wordpublic void processSpace(Properties props)
properties argument.
By general contract, once this method has been called, processDocument will not be called again.
props - a set of properties and values that may be used to
configure any exposed parameters of the algorithm.Copyright © 2012. All Rights Reserved.