public class Coals extends Object implements SemanticSpace
Rohde, D. L. T., Gonnerman, L. M., Plaut, D. C. (2005). An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence. Cognitive Science (submitted). Available here
COALS first computes a term by term co-occurrance using a ramped 4-word window. Once all documents have been processed, the co-occurrence matrix will be re ordered such that only theN most frequent terms have their
semantic vectors retained and only the M most frequent terms are used
as co-occurrence features. These values can be set by the "edu.ucla.sspace.coals.Coals.maxWords" and "edu.ucla.sspace.coals.Coals.maxDimensions" properties,
resepctively. After re ordering the semantic vectors and features, CorrelationTransform is used to rerank all co-occurrence scores. As part of
this transform, all negative correlations are dropped and replaced with a 0.
Finally, and optionally, the SVD is used to reduce the semantic
space. To set the number of retained dimensions via SVD, set the
"edu.ucla.sspace.coals.Coals.dimension" property.| Modifier and Type | Field and Description |
|---|---|
static String |
COALS_SSPACE_NAME
The name of this
SemanticSpace |
static String |
DO_NOT_NORMALIZE_PROPERTY
Specifies if Coals should not normalize the co-occurance matrix.
|
static String |
MAX_DIMENSIONS_PROPERTY
Specifies the number of dimensions in the raw co-occurrance matrix to
maintain.
|
static String |
MAX_WORDS_PROPERTY
Specifies the number of words to build semantics for.
|
static String |
PROPERTY_PREFIX
The property prefix for other settings.
|
static String |
REDUCE_DIMENSION_PROPERTY
Specifies the number of dimensions the co-occurance matrix should be
reduced to.
|
static String |
REDUCE_MATRIX_PROPERTY
Specifies whether or not the co-occurance matrix should be reduced.
|
| Constructor and Description |
|---|
Coals(Transform transform,
MatrixFactorization reducer) |
Coals(Transform transform,
MatrixFactorization reducer,
int reducedDimensions,
int maxWords,
int maxDimensions)
Creats a
Coals instance. |
| Modifier and Type | Method and Description |
|---|---|
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
Vector |
getVector(String term)
Returns the semantic vector for the provided word.
|
int |
getVectorLength()
Returns the length of vectors in this semantic space.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
processDocument(BufferedReader document)
Processes the contents of the provided file as a document.
|
void |
processSpace(Properties props)
Once all the documents have been processed, performs any post-processing
steps on the data.
|
public static final String PROPERTY_PREFIX
public static final String REDUCE_MATRIX_PROPERTY
public static final String REDUCE_DIMENSION_PROPERTY
public static final String MAX_DIMENSIONS_PROPERTY
public static final String MAX_WORDS_PROPERTY
public static final String DO_NOT_NORMALIZE_PROPERTY
public static final String COALS_SSPACE_NAME
SemanticSpacepublic Coals(Transform transform, MatrixFactorization reducer)
public Coals(Transform transform, MatrixFactorization reducer, int reducedDimensions, int maxWords, int maxDimensions)
Coals instance.public Set<String> getWords()
getWords in interface SemanticSpacepublic Vector getVector(String term)
getVector in interface SemanticSpaceterm - a word that may be in the semantic spaceVector for the provided word or null if the
word was not in the space.public String getSpaceName()
SemanticSpacegetSpaceName in interface SemanticSpacepublic int getVectorLength()
SemanticSpaceprocessSpace is called.getVectorLength in interface SemanticSpacepublic void processDocument(BufferedReader document) throws IOException
processDocument in interface SemanticSpacedocument - a reader that allows access to the text of the documentIOException - if any error occurs while reading the documentpublic void processSpace(Properties props)
properties argument.
By general contract, once this method has been called, processDocument will not be called again.
processSpace in interface SemanticSpaceprops - a set of properties and values that may be used to
configure any exposed parameters of the algorithm.Copyright © 2012. All Rights Reserved.