public class EvaluationWordsi extends BaseWordsi
Wordsi
implementation to be used for evaluations. The word senses
are not updated during processing, instead, each generated context vector is
compared to the existing word senses and context vector is labeled with the
id for the most similar sense. The sense labeling is passed on directly to
the AssignmentReporter
for each context vector generated.
Word sense must be provided by a SemanticSpace
. For any polysemous
words, the first sense must be keyed by the raw word and all other sense must
be keyed by the raw word plus "-senseNumber" where senseNumber is an integer
starting at 1, for the second sense, and goes up to N-1, for the last sense.Constructor and Description |
---|
EvaluationWordsi(Set<String> acceptedWords,
ContextExtractor extractor,
SemanticSpace sspace,
AssignmentReporter reporter)
Creates a new
EvaluationWordsi . |
Modifier and Type | Method and Description |
---|---|
Vector |
getVector(String term)
Returns the semantic vector for the provided word.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
handleContextVector(String focusKey,
String secondaryKey,
SparseDoubleVector context)
Performs some operation with
contextVector , which can be indexed
by either primaryKey , secondaryKey , or both. |
void |
processSpace(Properties props)
Once all the documents have been processed, performs any post-processing
steps on the data.
|
acceptWord, getSpaceName, getVectorLength, processDocument
public EvaluationWordsi(Set<String> acceptedWords, ContextExtractor extractor, SemanticSpace sspace, AssignmentReporter reporter)
EvaluationWordsi
.acceptedWords
- The set of accepted words. Only these words will
have context vectors generated.extractor
- The ContextExtractor
responsible for generating
context vectors.sspace
- The SemanticSpace
responsible for provided existing
word senses.reporter
- The AssignmentReporter
reponsible for reporting
sense labelings.public Set<String> getWords()
public Vector getVector(String term)
term
- a word that may be in the semantic spaceVector
for the provided word or null
if the
word was not in the space.public void handleContextVector(String focusKey, String secondaryKey, SparseDoubleVector context)
contextVector
, which can be indexed
by either primaryKey
, secondaryKey
, or both. This
operation will likely assign the contextVector
to some cluster
immediately or store the contextVector
so that it may be
clustered with all other other context vecetors generated for primaryKey
.
The secondaryKey
does not need to be used, but some experiments
may require it, such as the SenseEval/SemEval evaluation or pseudo-word
disambiguation. For SenseEval/SemEval evaluations, a SenseEvalContextExtractor
should be used, which will provide the context
id as the secondaryKey
; reporting should be done with a SenseEvalReporter
. For pseudo-word disambiguation/discrimination, a
PseudoWordContextExtractor
should be used, which will create
pseudo-words for some set of tokens. This extractor will use the
pseudo-word for the primaryKey
and the original token as the
secondaryKey
.focusKey
- The primary key for contextVector
context
- a SparseDoubleVector
that represents a
single context for a wordpublic void processSpace(Properties props)
properties
argument.
By general contract, once this method has been called, processDocument
will not be called again.
props
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.Copyright © 2012. All Rights Reserved.