public class Coals extends Object implements SemanticSpace
Rohde, D. L. T., Gonnerman, L. M., Plaut, D. C. (2005). An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence. Cognitive Science (submitted). Available here
COALS first computes a term by term co-occurrance using a ramped 4-word window. Once all documents have been processed, the co-occurrence matrix will be re ordered such that only theN
most frequent terms have their
semantic vectors retained and only the M
most frequent terms are used
as co-occurrence features. These values can be set by the "edu.ucla.sspace.coals.Coals.maxWords" and "edu.ucla.sspace.coals.Coals.maxDimensions" properties,
resepctively. After re ordering the semantic vectors and features, CorrelationTransform
is used to rerank all co-occurrence scores. As part of
this transform, all negative correlations are dropped and replaced with a 0.
Finally, and optionally, the SVD
is used to reduce the semantic
space. To set the number of retained dimensions via SVD
, set the
"edu.ucla.sspace.coals.Coals.dimension" property.Modifier and Type | Field and Description |
---|---|
static String |
COALS_SSPACE_NAME
The name of this
SemanticSpace |
static String |
DO_NOT_NORMALIZE_PROPERTY
Specifies if Coals should not normalize the co-occurance matrix.
|
static String |
MAX_DIMENSIONS_PROPERTY
Specifies the number of dimensions in the raw co-occurrance matrix to
maintain.
|
static String |
MAX_WORDS_PROPERTY
Specifies the number of words to build semantics for.
|
static String |
PROPERTY_PREFIX
The property prefix for other settings.
|
static String |
REDUCE_DIMENSION_PROPERTY
Specifies the number of dimensions the co-occurance matrix should be
reduced to.
|
static String |
REDUCE_MATRIX_PROPERTY
Specifies whether or not the co-occurance matrix should be reduced.
|
Constructor and Description |
---|
Coals(Transform transform,
MatrixFactorization reducer) |
Coals(Transform transform,
MatrixFactorization reducer,
int reducedDimensions,
int maxWords,
int maxDimensions)
Creats a
Coals instance. |
Modifier and Type | Method and Description |
---|---|
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
Vector |
getVector(String term)
Returns the semantic vector for the provided word.
|
int |
getVectorLength()
Returns the length of vectors in this semantic space.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
processDocument(BufferedReader document)
Processes the contents of the provided file as a document.
|
void |
processSpace(Properties props)
Once all the documents have been processed, performs any post-processing
steps on the data.
|
public static final String PROPERTY_PREFIX
public static final String REDUCE_MATRIX_PROPERTY
public static final String REDUCE_DIMENSION_PROPERTY
public static final String MAX_DIMENSIONS_PROPERTY
public static final String MAX_WORDS_PROPERTY
public static final String DO_NOT_NORMALIZE_PROPERTY
public static final String COALS_SSPACE_NAME
SemanticSpace
public Coals(Transform transform, MatrixFactorization reducer)
public Coals(Transform transform, MatrixFactorization reducer, int reducedDimensions, int maxWords, int maxDimensions)
Coals
instance.public Set<String> getWords()
getWords
in interface SemanticSpace
public Vector getVector(String term)
getVector
in interface SemanticSpace
term
- a word that may be in the semantic spaceVector
for the provided word or null
if the
word was not in the space.public String getSpaceName()
SemanticSpace
getSpaceName
in interface SemanticSpace
public int getVectorLength()
SemanticSpace
processSpace
is called.getVectorLength
in interface SemanticSpace
public void processDocument(BufferedReader document) throws IOException
processDocument
in interface SemanticSpace
document
- a reader that allows access to the text of the documentIOException
- if any error occurs while reading the documentpublic void processSpace(Properties props)
properties
argument.
By general contract, once this method has been called, processDocument
will not be called again.
processSpace
in interface SemanticSpace
props
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.Copyright © 2012. All Rights Reserved.