DependencyContextExtractor (S-Space Package 2.0.1 API)

java.lang.Object
- edu.ucla.sspace.wordsi.DependencyContextExtractor

All Implemented Interfaces:

ContextExtractor

Direct Known Subclasses:

PseudoWordDependencyContextExtractor, SemEvalDependencyContextExtractor
```
public class DependencyContextExtractor
extends Object
implements ContextExtractor
```
This ContextExtractor reads in documents that have been dependency parsed. Contexts are defined by a FilteredDependencyIterator, which is used to traverse all possible dependency paths rooted at each word of interest in a document. Each reachable and valid DependencyPath forms a feature and is weighted by a DependencyPathWeight.

Author:

Keith Stevens

Field Summary

Fields
Modifier and Type	Field and Description
`protected DependencyExtractor`	`extractor` The `DependencyExtractor` used to extract parse trees from the already parsed documents
`protected DependencyContextGenerator`	`generator` The `DependencyContextGenerator` responsible for processing a `DependencyTreeNode` and turning it into a context vector.
`protected boolean`	`readHeader` If true, the first line in a dependency document will be treated as the header of the document, and not part of the parse tree.

Constructor Summary

Constructors
Constructor and Description
`DependencyContextExtractor(DependencyExtractor extractor, DependencyContextGenerator generator)` Creates a new `DependencyContextExtractor`.
`DependencyContextExtractor(DependencyExtractor extractor, DependencyContextGenerator generator, boolean readHeader)` Creates a new `DependencyContextExtractor`.

Method Summary

Methods
Modifier and Type	Method and Description
`protected boolean`	`acceptWord(DependencyTreeNode focusNode, String contextHeader, Wordsi wordsi)` Returns true if `Wordsi` should generate a context vector for `focusWord`.
`protected String`	`getPrimaryKey(DependencyTreeNode focusNode)` Returns the token for the primary key, i.e.
`protected String`	`getSecondaryKey(DependencyTreeNode focusNode, String contextHeader)` Returns the token for the secondary key.
`int`	`getVectorLength()` Returns the maximum number of dimensions used to represent any given context.
`protected String`	`handleContextHeader(BufferedReader document)` Returns the string for the context header.
`void`	`processDocument(BufferedReader document, Wordsi wordsi)` Processes the content of `document` and calls `Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector)` for each context vector that can be extracted from `document`.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - extractor
```
protected final DependencyExtractor extractor
```
    The DependencyExtractor used to extract parse trees from the already parsed documents
  - generator
```
protected final DependencyContextGenerator generator
```
    The DependencyContextGenerator responsible for processing a DependencyTreeNode and turning it into a context vector.
  - readHeader
```
protected final boolean readHeader
```
    If true, the first line in a dependency document will be treated as the header of the document, and not part of the parse tree.
- Constructor Detail
  - DependencyContextExtractor
```
public DependencyContextExtractor(DependencyExtractor extractor,
                          DependencyContextGenerator generator)
```
    Creates a new DependencyContextExtractor.
    
    Parameters:
    extractor - The DependencyExtractor that parses the document and returns a valid dependency tree
    generator - The DependencyContextGenerator used to created context vectors based on a DependencyTreeNode.
  - DependencyContextExtractor
```
public DependencyContextExtractor(DependencyExtractor extractor,
                          DependencyContextGenerator generator,
                          boolean readHeader)
```
    Creates a new DependencyContextExtractor.
    
    Parameters:
    extractor - The DependencyExtractor that parses the document and returns a valid dependency tree
    generator - The DependencyContextGenerator used to created context vectors based on a DependencyTreeNode.
    readheader - If true, the first line in a dependency tree document will be discarded from the tree and used as a header.
- Method Detail
  - getVectorLength
```
public int getVectorLength()
```
    Returns the maximum number of dimensions used to represent any given context.
    
    Specified by:
    
    getVectorLength in interface ContextExtractor
  - processDocument
```
public void processDocument(BufferedReader document,
                   Wordsi wordsi)
```
    Processes the content of document and calls Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector) for each context vector that can be extracted from document.
    
    Specified by:
    
    processDocument in interface ContextExtractor
  - acceptWord
```
protected boolean acceptWord(DependencyTreeNode focusNode,
                 String contextHeader,
                 Wordsi wordsi)
```
    Returns true if Wordsi should generate a context vector for focusWord.
  - getPrimaryKey
```
protected String getPrimaryKey(DependencyTreeNode focusNode)
```
    Returns the token for the primary key, i.e. the focus word. This is just the text of the focusNode.
  - getSecondaryKey
```
protected String getSecondaryKey(DependencyTreeNode focusNode,
                     String contextHeader)
```
    Returns the token for the secondary key. If a contextHeader is provided, this is the contextHeader, otherwise it is the word for the focusNode.
  - handleContextHeader
```
protected String handleContextHeader(BufferedReader document)
                              throws IOException
```
    Returns the string for the context header. If readHeader is true, this returns the first line, otherwise it returns null.
    
    Throws:
    
    IOException

Class DependencyContextExtractor

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

extractor

generator

readHeader

Constructor Detail

DependencyContextExtractor

DependencyContextExtractor

Method Detail

getVectorLength

processDocument

acceptWord

getPrimaryKey

getSecondaryKey

handleContextHeader