public class StructuredVectorSpace extends Object implements SemanticSpace, Serializable
This model requires a dependency parsed corpus. When processing, three types of vectors: word, which represnts the co-occureences word has with all other tokens via a dependency chain; REL|word, which records the set of tokens that govern the REL relationship with word; and word|REL, which records the set of tokens that are governed by word in the REL relationship. The first vector is referred to as a lemma vector and the later two are called selectional preference vectors. In all cases REL is a dependency relationship.
This class implements Filterable
, which allows for fine-grained
control of which semantics are retained. The setSemanticFilter(Set)
method can be used to speficy which words should have their semantics
retained. Note that the words that are filtered out will still be used in
computing the semantics of other words. This behavior is intended for
use with a large corpora where retaining the semantics of all words in memory
is infeasible.
processDocument
. At any given point in
processing, the getVector
method may be used
to access the current semantics of a word. This allows callers to track
incremental changes to the semantics as the corpus is processed.
The processSpace
method does nothing other
than print out the feature indexes in the space to standard out.Modifier and Type | Field and Description |
---|---|
static String |
EMPTY_STRING
A static variable for the empty string.
|
static String |
SSPACE_NAME
The Semantic Space name for
StructuredVectorSpace |
Constructor and Description |
---|
StructuredVectorSpace(DependencyExtractor extractor,
DependencyPathAcceptor acceptor,
VectorCombinor combinor)
Create a new instance of
StructuredVectorSpace . |
StructuredVectorSpace(DependencyExtractor extractor,
DependencyPathAcceptor acceptor,
VectorCombinor combinor,
StringBasisMapping termBasis,
Set<String> semanticFilter)
Create a new instance of
StructuredVectorSpace . |
Modifier and Type | Method and Description |
---|---|
SparseDoubleVector |
contextualize(String focusWord,
String relation,
String secondWord,
boolean isFocusHeadWord) |
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
Vector |
getVector(String term)
Returns the semantic vector for the provided word.
|
int |
getVectorLength()
Returns the length of vectors in this semantic space.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
processDocument(BufferedReader document)
Processes the contents of the provided file as a document.
|
void |
processSpace(Properties properties)
Once all the documents have been processed, performs any post-processing
steps on the data.
|
void |
setSemanticFilter(Set<String> semanticsToRetain)
.
|
public static final String SSPACE_NAME
StructuredVectorSpace
public static final String EMPTY_STRING
public StructuredVectorSpace(DependencyExtractor extractor, DependencyPathAcceptor acceptor, VectorCombinor combinor)
StructuredVectorSpace
.public StructuredVectorSpace(DependencyExtractor extractor, DependencyPathAcceptor acceptor, VectorCombinor combinor, StringBasisMapping termBasis, Set<String> semanticFilter)
StructuredVectorSpace
.public Set<String> getWords()
getWords
in interface SemanticSpace
public Vector getVector(String term)
getVector
in interface SemanticSpace
term
- a word that may be in the semantic spaceVector
for the provided word or null
if the
word was not in the space.public String getSpaceName()
getSpaceName
in interface SemanticSpace
public int getVectorLength()
processSpace
is called.getVectorLength
in interface SemanticSpace
public void processDocument(BufferedReader document) throws IOException
processDocument
in interface SemanticSpace
document
- a reader that allows access to the text of the documentIOException
- if any error occurs while reading the documentpublic void processSpace(Properties properties)
properties
argument.
By general contract, once this method has been called, processDocument
will not be called again.
processSpace
in interface SemanticSpace
properties
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.public SparseDoubleVector contextualize(String focusWord, String relation, String secondWord, boolean isFocusHeadWord)
Copyright © 2012. All Rights Reserved.