public class DependencyRandomIndexing extends Object implements SemanticSpace
RandomIndexing
, which is based on three papers: Dependency Random Indexing (DRI) extends Random Indexing by restricting a word's context to be set of words with which it has a syntactic relationship. Full word co-occurrence models have shown that this restricted interpretation of a context can improve the semantic representations. DRI uses the same approximation technique as Random Indexing to project this full co-occurrence space into a significantly smaller dimensional space. This projection is done through use of index vectors, each of which are sparse and mostly orthogonal to all other index vectors. The summation of a word's index vectors corresponds directly to that word's occurrence in a context.
While Random Indexing uses permutations of these index vectors to encode lexical position, a shallow form of syntactic structure, DRI extends the notion of permutations to allow for the encoding of dependency relationships. Through this modification, the set of relationships between any two co-occurirng words in a sentence can be encoded, as can the distance between the two words. Under this model, each possible dependency relationship could have it's own permutation function, as could each possible distance between co-occurring words.
This class defines the following configurable properties that may be set using either the System properties or using theDependencyRandomIndexing#DependencyRandomIndexing(
DependencyExtractor, DependencyPermutationFunction, Properties)
constructor.
"edu.ucla.sspace.dri.DependencyRandomIndexing.dependencyAcceptor"
UniversalRelationAcceptor
DependencyRelationAcceptor
to use for validating dependency paths. If a
path is rejected it will not influence either the lemma vector or the
selectional preference vectors.
"edu.ucla.sspace.dri.DependencyRandomIndexing.dependencyPathLength"
"edu.ucla.sspace.dri.DependencyRandomIndexing.indexVectorLength"
DEFAULT_VECTOR_LENGTH
Filterable
, which allows for fine-grained
control of which semantics are retained. The setSemanticFilter(Set)
method can be used to speficy which words should have their semantics
retained. Note that the words that are filtered out will still be used in
computing the semantics of other words. This behavior is intended for
use with a large corpora where retaining the semantics of all words in memory
is infeasible.
This class is thread-safe for concurrent calls of processDocument
. At any given point in
processing, the getVector
method may be used
to access the current semantics of a word. This allows callers to track
incremental changes to the semantics as the corpus is processed.
The processSpace
method does nothing for
this class and calls to it will not affect the results of getVectorFor
.RandomIndexing
,
DependencyPermutationFunction
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_DEPENDENCY_PATH_LENGTH
The default legnth a dependency path may have.
|
static int |
DEFAULT_VECTOR_LENGTH
The default vector length.
|
static String |
DEPENDENCY_ACCEPTOR_PROPERTY
The property for setting the
DependencyRelationAcceptor . |
static String |
DEPENDENCY_PATH_LENGTH_PROPERTY
The property for setting the maximal length of any
DependencyPath . |
static String |
PROPERTY_PREFIX
The base prefix for all
DependencyRandomIndexing
properties. |
static String |
SSPACE_NAME
The Semantic Space name for
DependencyRandomIndexing |
static String |
VECTOR_LENGTH_PROPERTY
The property for setting the number of dimensions in the word space.
|
Constructor and Description |
---|
DependencyRandomIndexing(DependencyPermutationFunction<TernaryVector> permFunc)
Creates a new instance of
DependencyRandomIndexing that takes
ownership of a DependencyExtractor and uses the System provided
properties to specify other class objects. |
DependencyRandomIndexing(DependencyPermutationFunction<TernaryVector> permFunc,
Properties properties)
Create a new instance of
DependencyRandomIndexing which
takes ownership |
Modifier and Type | Method and Description |
---|---|
DependencyPermutationFunction<TernaryVector> |
getPermutations() |
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
Vector |
getVector(String term)
Returns the semantic vector for the provided word.
|
int |
getVectorLength()
Returns the length of vectors in this semantic space.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
Map<String,TernaryVector> |
getWordToVectorMap() |
void |
processDocument(BufferedReader document)
Processes the contents of the provided file as a document.
|
void |
processSpace(Properties properties)
Does nothing.
|
void |
setSemanticFilter(Set<String> semanticsToRetain)
.
|
void |
setWordToIndexVector(Map<String,TernaryVector> m)
Assigns the word to
IntegerVector mapping to be used by this
instance. |
void |
setWordToVectorMap(Map<String,TernaryVector> vectorMap) |
public static final String PROPERTY_PREFIX
DependencyRandomIndexing
properties.public static final String VECTOR_LENGTH_PROPERTY
public static final String DEPENDENCY_ACCEPTOR_PROPERTY
DependencyRelationAcceptor
.public static final String DEPENDENCY_PATH_LENGTH_PROPERTY
DependencyPath
.public static final int DEFAULT_VECTOR_LENGTH
public static final int DEFAULT_DEPENDENCY_PATH_LENGTH
public static final String SSPACE_NAME
DependencyRandomIndexing
public DependencyRandomIndexing(DependencyPermutationFunction<TernaryVector> permFunc)
DependencyRandomIndexing
that takes
ownership of a DependencyExtractor
and uses the System provided
properties to specify other class objects.public DependencyRandomIndexing(DependencyPermutationFunction<TernaryVector> permFunc, Properties properties)
DependencyRandomIndexing
which
takes ownershippublic Set<String> getWords()
getWords
in interface SemanticSpace
public Vector getVector(String term)
getVector
in interface SemanticSpace
term
- a word that may be in the semantic spaceVector
for the provided word or null
if the
word was not in the space.public DependencyPermutationFunction<TernaryVector> getPermutations()
public Map<String,TernaryVector> getWordToVectorMap()
public void setWordToVectorMap(Map<String,TernaryVector> vectorMap)
public String getSpaceName()
getSpaceName
in interface SemanticSpace
public int getVectorLength()
processSpace
is called.getVectorLength
in interface SemanticSpace
public void processDocument(BufferedReader document) throws IOException
processDocument
in interface SemanticSpace
document
- a reader that allows access to the text of the documentIOException
- if any error occurs while reading the documentpublic void processSpace(Properties properties)
processSpace
in interface SemanticSpace
properties
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.public void setWordToIndexVector(Map<String,TernaryVector> m)
IntegerVector
mapping to be used by this
instance. This instance takes ownership of the passed in map.m
- a mapping from token to the IntegerVector
that should be
used represent it when calculating other word's semanticsCopyright © 2012. All Rights Reserved.