PseudoWordContextExtractor (S-Space Package 2.0.1 API)

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- edu.ucla.sspace.wordsi.psd.PseudoWordContextExtractor

All Implemented Interfaces:

ContextExtractor
```
public class PseudoWordContextExtractor
extends Object
implements ContextExtractor
```
A pseudo word based ContextExtractor. A mapping from real tokens to pseudo words is used to automatically replace tokens in a corpus while Wordsi processes contexts. Only pseudo words are represented in the Wordsi space. When a token is encountered, if it has a pseudo word mapping, that instance is replaced with the pseudo word mapping. A context vector will be generated for the context surrounded that word instance, and the pseudo word replacement will serve as the primary key for the reporter and the raw token will serve as the secondary key. The pseudo word will then replace the raw token in the context for all other words, and thus serve as a feature in place of the real token.

Author:

Keith Stevens

Constructor Summary

Constructors
Constructor and Description
`PseudoWordContextExtractor(ContextGenerator generator, int windowSize, Map<String,String> pseudoWordMap)` Creates a new `PseudoWordContextExtracto`.

Method Summary

Methods
Modifier and Type	Method and Description
`int`	`getVectorLength()` Returns the maximum number of dimensions used to represent any given context.
`void`	`processDocument(BufferedReader document, Wordsi wordsi)` Processes the content of `document` and calls `Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector)` for each context vector that can be extracted from `document`.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - PseudoWordContextExtractor
```
public PseudoWordContextExtractor(ContextGenerator generator,
                          int windowSize,
                          Map<String,String> pseudoWordMap)
```
    Creates a new PseudoWordContextExtracto.
    
    Parameters:
    generator - The ContextGenerator responsible for creating context vectors
    windowSize - The number of words before and after the focus word which compose a context
    pseudoWordMap - The mapping from real words to their pseudo word replacements
- Method Detail
  - getVectorLength
```
public int getVectorLength()
```
    Returns the maximum number of dimensions used to represent any given context.
    
    Specified by:
    
    getVectorLength in interface ContextExtractor
  - processDocument
```
public void processDocument(BufferedReader document,
                   Wordsi wordsi)
```
    Processes the content of document and calls Wordsi.handleContextVector(java.lang.String, java.lang.String, edu.ucla.sspace.vector.SparseDoubleVector) for each context vector that can be extracted from document.
    
    Specified by:
    
    processDocument in interface ContextExtractor

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2012. All Rights Reserved.