public class PseudoWordContextExtractor extends Object implements ContextExtractor
ContextExtractor
. A mapping from real tokens to
pseudo words is used to automatically replace tokens in a corpus while Wordsi
processes contexts. Only pseudo words are represented in the Wordsi
space. When a token is encountered, if it has a pseudo word mapping,
that instance is replaced with the pseudo word mapping. A context vector
will be generated for the context surrounded that word instance, and the
pseudo word replacement will serve as the primary key for the reporter and
the raw token will serve as the secondary key. The pseudo word will then
replace the raw token in the context for all other words, and thus serve as a
feature in place of the real token.Constructor and Description |
---|
PseudoWordContextExtractor(ContextGenerator generator,
int windowSize,
Map<String,String> pseudoWordMap)
Creates a new
PseudoWordContextExtracto . |
Modifier and Type | Method and Description |
---|---|
int |
getVectorLength()
Returns the maximum number of dimensions used to represent any given
context.
|
void |
processDocument(BufferedReader document,
Wordsi wordsi)
Processes the content of
document and calls Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector) for each context vector that can be extracted
from document . |
public PseudoWordContextExtractor(ContextGenerator generator, int windowSize, Map<String,String> pseudoWordMap)
PseudoWordContextExtracto
.generator
- The ContextGenerator
responsible for creating
context vectorswindowSize
- The number of words before and after the focus word
which compose a contextpseudoWordMap
- The mapping from real words to their pseudo word
replacementspublic int getVectorLength()
getVectorLength
in interface ContextExtractor
public void processDocument(BufferedReader document, Wordsi wordsi)
document
and calls Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector)
for each context vector that can be extracted
from document
.processDocument
in interface ContextExtractor
Copyright © 2012. All Rights Reserved.