Wordsi (S-Space Package 2.0.1 API)

All Known Implementing Classes:

BaseWordsi, EvaluationWordsi, LexSubWordsiMain.LexSubWordsi, StreamingWordsi, WaitingWordsi
```
public interface Wordsi
```
An interface for all Wordsi implementations. A complete Wordsi implementation will likely contain four parts: a ContextExtractor, a clustering method, and a ContextAssignmentMap, and a AssignmentReporter. The Extractor will genrate context vectors for a set of words within a given BufferedReader and call handleContextVector for each context vector that is generated. Each context vector can be index by two keys: the primary key, which is generally the focus word for the context vectors and the secondary key, which is either the same as the focus word or an additional value such as a SenseEval/SemEval instance identifier. The ContextAssignmentMap is reponsible for recording which secondary keys and context id's are assigned to each focus term, in many cases, this is not neccesary, but if the exact clustering for each context is required, one should use a ContextAssignmentMap. The clustering method will assign the context vector to some cluster, either immediately or by storing the context vectors and performing a batch clustering. The AssignmentReporter is reponsible for reporting which context vectors were assigned to which clusters. The three major components to Wordsi are separated so that each various context extraction algorithms can be combined with various clustering algorithms and reporting methods.
Implementations are suggested to subclass BaseWordsi, since it provides some methods for accepting and rejecting terms and dispatching the ContextExtractor.

Author:

Keith Stevens

See Also:
ContextExtractor, AssignmentReporter

Method Summary

Methods
Modifier and Type	Method and Description
`boolean`	`acceptWord(String word)` Returns true if this `Wordsi` implementation should generate a semantic vector for `word`.
`void`	`handleContextVector(String primaryKey, String secondaryKey, SparseDoubleVector contextVector)` Performs some operation with `contextVector`, which can be indexed by either `primaryKey`, `secondaryKey`, or both.

- Method Detail
  - acceptWord
```
boolean acceptWord(String word)
```
    Returns true if this Wordsi implementation should generate a semantic vector for word.
  - handleContextVector
```
void handleContextVector(String primaryKey,
                       String secondaryKey,
                       SparseDoubleVector contextVector)
```
    Performs some operation with contextVector, which can be indexed by either primaryKey, secondaryKey, or both. This operation will likely assign the contextVector to some cluster immediately or store the contextVector so that it may be clustered with all other other context vecetors generated for primaryKey.
    The secondaryKey does not need to be used, but some experiments may require it, such as the SenseEval/SemEval evaluation or pseudo-word disambiguation. For SenseEval/SemEval evaluations, a SenseEvalContextExtractor should be used, which will provide the context id as the secondaryKey; reporting should be done with a SenseEvalReporter. For pseudo-word disambiguation/discrimination, a PseudoWordContextExtractor should be used, which will create pseudo-words for some set of tokens. This extractor will use the pseudo-word for the primaryKey and the original token as the secondaryKey.
    
    Parameters:
    primaryKey - The primary key for contextVector
    secondarykey - A secondary key for contextVector
    contextVector - a SparseDoubleVector that represents a single context for a word

Interface Wordsi

Method Summary

Method Detail

acceptWord

handleContextVector