public class ReflectiveRandomIndexing extends Object implements SemanticSpace, Filterable
This class defines the following configurable properties that may be set
using either the System properties or using the ReflectiveRandomIndexing(Properties)
constructor.
"edu.ucla.sspace.ri.ReflectiveRandomIndexing.vectorLength"
"edu.ucla.sspace.ri.ReflectiveRandomIndexing.sparseSemantics"
true
This class implements Filterable
, which allows for fine-grained
control of which semantics are retained. The setSemanticFilter(Set)
method can be used to speficy which words should have their semantics
retained. Note that the words that are filtered out will still be used in
computing the semantics of other words. This behavior is intended for
use with a large corpora where retaining the semantics of all words in memory
is infeasible.
This class is thread-safe for concurrent calls of processDocument
. The getVector
method will only return valid reflective
vectors after the call to processSpace
.
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_VECTOR_LENGTH
The default number of dimensions to be used by the index and semantic
vectors.
|
static String |
USE_SPARSE_SEMANTICS_PROPERTY
Specifies whether to use a sparse encoding for each word's semantics,
which saves space but requires more computation.
|
static String |
VECTOR_LENGTH_PROPERTY
The property to specify the number of dimensions to be used by the index
and semantic vectors.
|
Constructor and Description |
---|
ReflectiveRandomIndexing()
Creates a new
ReflectiveRandomIndexing instance using the current
System properties for configuration. |
ReflectiveRandomIndexing(Properties properties)
Creates a new
ReflectiveRandomIndexing instance using the
provided properites for configuration. |
Modifier and Type | Method and Description |
---|---|
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
IntegerVector |
getVector(String word)
Returns the semantic vector for the provided word.
|
int |
getVectorLength()
Returns the length of vectors in this semantic space.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
processDocument(BufferedReader document)
Updates the semantic vectors based on the words in the document.
|
void |
processSpace(Properties properties)
Computes the reflective semantic vectors for word meanings
|
void |
setSemanticFilter(Set<String> semanticsToRetain)
Specifies the set of words that should have their semantics retained,
where all other words do not.
|
public static final String VECTOR_LENGTH_PROPERTY
public static final String USE_SPARSE_SEMANTICS_PROPERTY
public static final int DEFAULT_VECTOR_LENGTH
public ReflectiveRandomIndexing()
ReflectiveRandomIndexing
instance using the current
System
properties for configuration.public ReflectiveRandomIndexing(Properties properties)
ReflectiveRandomIndexing
instance using the
provided properites for configuration.public IntegerVector getVector(String word)
getVector
in interface SemanticSpace
word
- a word that may be in the semantic spaceVector
for the provided word or null
if the
word was not in the space.public String getSpaceName()
getSpaceName
in interface SemanticSpace
public int getVectorLength()
processSpace
is called.getVectorLength
in interface SemanticSpace
public Set<String> getWords()
getWords
in interface SemanticSpace
public void processDocument(BufferedReader document) throws IOException
processDocument
in interface SemanticSpace
document
- a reader that allows access to the text of the documentIOException
- if any error occurs while reading the documentpublic void processSpace(Properties properties)
processSpace
in interface SemanticSpace
properties
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.public void setSemanticFilter(Set<String> semanticsToRetain)
setSemanticFilter
in interface Filterable
semanticsToRetain
- the set of words for which semantics should be
computed.Copyright © 2012. All Rights Reserved.