public class SemEvalContextExtractor extends Object implements ContextExtractor
ContextExtractor
for handling SemEval or SenseEval corpora. For
each document, there should be an instance identifier, which uniquely
identifies the context. There should also be some marker, i.e., "|||", that
marks where the focus word is in the document. Only one context vector will
be generated for each document. This class depends on a ContextGenerator
for generating the context vectors.Constructor and Description |
---|
SemEvalContextExtractor(ContextGenerator generator,
int windowSize)
Creates a new
SemEvalContextExtractor . |
SemEvalContextExtractor(ContextGenerator generator,
int windowSize,
String separator)
Creates a new
SemEvalContextExtractor . |
Modifier and Type | Method and Description |
---|---|
int |
getVectorLength()
Returns the maximum number of dimensions used to represent any given
context.
|
void |
processDocument(BufferedReader document,
Wordsi wordsi)
Processes the content of
document and calls Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector) for each context vector that can be extracted
from document . |
public SemEvalContextExtractor(ContextGenerator generator, int windowSize)
SemEvalContextExtractor
.generator
- The ContextGenerator
responsible for creating
context vectorswindowSize
- the number of words before and after a focus word which
compose the context.public SemEvalContextExtractor(ContextGenerator generator, int windowSize, String separator)
SemEvalContextExtractor
.generator
- The ContextGenerator
responsible for creating
context vectorswindowSize
- the number of words before and after a focus word which
compose the context.public int getVectorLength()
getVectorLength
in interface ContextExtractor
public void processDocument(BufferedReader document, Wordsi wordsi)
document
and calls Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector)
for each context vector that can be extracted
from document
.processDocument
in interface ContextExtractor
Copyright © 2012. All Rights Reserved.