public class DependencyContextExtractor extends Object implements ContextExtractor
ContextExtractor
reads in documents that have been dependency
parsed. Contexts are defined by a FilteredDependencyIterator
, which
is used to traverse all possible dependency paths rooted at each word of
interest in a document. Each reachable and valid DependencyPath
forms a feature and is weighted by a DependencyPathWeight
.Modifier and Type | Field and Description |
---|---|
protected DependencyExtractor |
extractor
The
DependencyExtractor used to extract parse trees from the
already parsed documents |
protected DependencyContextGenerator |
generator
The
DependencyContextGenerator responsible for processing a
DependencyTreeNode and turning it into a context vector. |
protected boolean |
readHeader
If true, the first line in a dependency document will be treated as the
header of the document, and not part of the parse tree.
|
Constructor and Description |
---|
DependencyContextExtractor(DependencyExtractor extractor,
DependencyContextGenerator generator)
Creates a new
DependencyContextExtractor . |
DependencyContextExtractor(DependencyExtractor extractor,
DependencyContextGenerator generator,
boolean readHeader)
Creates a new
DependencyContextExtractor . |
Modifier and Type | Method and Description |
---|---|
protected boolean |
acceptWord(DependencyTreeNode focusNode,
String contextHeader,
Wordsi wordsi)
Returns true if
Wordsi should generate a context vector for
focusWord . |
protected String |
getPrimaryKey(DependencyTreeNode focusNode)
Returns the token for the primary key, i.e.
|
protected String |
getSecondaryKey(DependencyTreeNode focusNode,
String contextHeader)
Returns the token for the secondary key.
|
int |
getVectorLength()
Returns the maximum number of dimensions used to represent any given
context.
|
protected String |
handleContextHeader(BufferedReader document)
Returns the string for the context header.
|
void |
processDocument(BufferedReader document,
Wordsi wordsi)
Processes the content of
document and calls Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector) for each context vector that can be extracted
from document . |
protected final DependencyExtractor extractor
DependencyExtractor
used to extract parse trees from the
already parsed documentsprotected final DependencyContextGenerator generator
DependencyContextGenerator
responsible for processing a
DependencyTreeNode
and turning it into a context vector.protected final boolean readHeader
public DependencyContextExtractor(DependencyExtractor extractor, DependencyContextGenerator generator)
DependencyContextExtractor
.extractor
- The DependencyExtractor
that parses the document
and returns a valid dependency treegenerator
- The DependencyContextGenerator
used to created
context vectors based on a DependencyTreeNode
.public DependencyContextExtractor(DependencyExtractor extractor, DependencyContextGenerator generator, boolean readHeader)
DependencyContextExtractor
.extractor
- The DependencyExtractor
that parses the document
and returns a valid dependency treegenerator
- The DependencyContextGenerator
used to created
context vectors based on a DependencyTreeNode
.readheader
- If true, the first line in a dependency tree document
will be discarded from the tree and used as a header.public int getVectorLength()
getVectorLength
in interface ContextExtractor
public void processDocument(BufferedReader document, Wordsi wordsi)
document
and calls Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector)
for each context vector that can be extracted
from document
.processDocument
in interface ContextExtractor
protected boolean acceptWord(DependencyTreeNode focusNode, String contextHeader, Wordsi wordsi)
Wordsi
should generate a context vector for
focusWord
.protected String getPrimaryKey(DependencyTreeNode focusNode)
focusNode
.protected String getSecondaryKey(DependencyTreeNode focusNode, String contextHeader)
contextHeader
is
provided, this is the contextHeader
, otherwise it is the word for
the focusNode
.protected String handleContextHeader(BufferedReader document) throws IOException
readHeader
is
true, this returns the first line, otherwise it returns null
.IOException
Copyright © 2012. All Rights Reserved.