public class PseudoWordDependencyContextExtractor extends DependencyContextExtractor
DependencyContextExtractor
. Given a mapping from
raw tokens to pseudo words, this extractor will automatically change the text
for any dependency node that has a valid pseudo word mapping. The pseudo
word will serve as the primary key for assignments and the original token
will serve as the secondary key.extractor, generator, readHeader
Constructor and Description |
---|
PseudoWordDependencyContextExtractor(DependencyExtractor extractor,
DependencyContextGenerator generator,
Map<String,String> pseudoWordMap)
Creates a new
PseudoWordDependencyContextExtractor . |
Modifier and Type | Method and Description |
---|---|
protected boolean |
acceptWord(DependencyTreeNode focusNode,
String contextHeader,
Wordsi wordsi)
Returns true if
focusWord is a known pseudo word. |
void |
processDocument(BufferedReader document,
Wordsi wordsi)
Processes the content of
document and calls Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector) for each context vector that can be extracted
from document . |
getPrimaryKey, getSecondaryKey, getVectorLength, handleContextHeader
public PseudoWordDependencyContextExtractor(DependencyExtractor extractor, DependencyContextGenerator generator, Map<String,String> pseudoWordMap)
PseudoWordDependencyContextExtractor
.extractor
- The DependencyExtractor
that parses the document
and returns a valid dependency treebasisMapping
- A mapping from dependency paths to feature indicesweighter
- A weighting function for dependency pathsacceptor
- An accepting function that validates dependency paths
which may serve as featurespseudoWordMap
- A mapping from raw tokens to pseudo wordspublic void processDocument(BufferedReader document, Wordsi wordsi)
document
and calls Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector)
for each context vector that can be extracted
from document
.processDocument
in interface ContextExtractor
processDocument
in class DependencyContextExtractor
protected boolean acceptWord(DependencyTreeNode focusNode, String contextHeader, Wordsi wordsi)
focusWord
is a known pseudo word.acceptWord
in class DependencyContextExtractor
Copyright © 2012. All Rights Reserved.