public class WaitingWordsi extends BaseWordsi
Wordsi
implementation that performs batch clustering. Each context
vector is stored and later clustered using a Clustering
algorithm.Constructor and Description |
---|
WaitingWordsi(Set<String> acceptedWords,
ContextExtractor extractor,
Clustering clustering,
AssignmentReporter reporter)
Creates a new
WaitingWordsi . |
WaitingWordsi(Set<String> acceptedWords,
ContextExtractor extractor,
Clustering clustering,
AssignmentReporter reporter,
int numClusters)
Creates a new
WaitingWordsi . |
Modifier and Type | Method and Description |
---|---|
SparseDoubleVector |
getVector(String term)
Returns the semantic vector for the provided word.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
handleContextVector(String focusKey,
String secondaryKey,
SparseDoubleVector context)
Adds the context vector to the end of the list of context vectors
associated with
focusKey . |
void |
processSpace(Properties props)
Once all the documents have been processed, performs any post-processing
steps on the data.
|
acceptWord, getSpaceName, getVectorLength, processDocument
public WaitingWordsi(Set<String> acceptedWords, ContextExtractor extractor, Clustering clustering, AssignmentReporter reporter)
WaitingWordsi
. The number of clusters is left
unset, which requires that the Clustering
algorithm be able to
decide on an appropriate number of clusters.acceptedWords
- The set of words that Wordsi
should
represent. This may be null
or empty}.extractor
- The ContextExtractor
used to parse documents.trackSecondaryKeys
- If true, cluster assignments and secondary keys
will be tracked. If this is false, the AssignmentReporter
will not be used.clustering
- The Clustering
algorithm to use on each data
set.reporter
- The AssignmentReporter
responsible for generating
a report that details the cluster assignments. This may be
null
. If trackSecondaryKeys
is false, this is
not used.public WaitingWordsi(Set<String> acceptedWords, ContextExtractor extractor, Clustering clustering, AssignmentReporter reporter, int numClusters)
WaitingWordsi
.acceptedWords
- The set of words that Wordsi
should
represent. This may be null
or empty}.extractor
- The ContextExtractor
used to parse documents.clustering
- The Clustering
algorithm to use on each data
set.reporter
- The AssignmentReporter
responsible for generating
a report that details the cluster assignments. This may be null
. If trackSecondaryKeys
is false, this is not used.numClusters
- Specifies the number of clusters to generate for each
term.public Set<String> getWords()
public SparseDoubleVector getVector(String term)
term
- a word that may be in the semantic spaceVector
for the provided word or null
if the
word was not in the space.public void handleContextVector(String focusKey, String secondaryKey, SparseDoubleVector context)
focusKey
.focusKey
- The primary key for contextVector
context
- a SparseDoubleVector
that represents a
single context for a wordpublic void processSpace(Properties props)
properties
argument.
By general contract, once this method has been called, processDocument
will not be called again.
props
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.Copyright © 2012. All Rights Reserved.