public class LocalityPreservingCooccurrenceSpace extends Object implements SemanticSpace
SemanticSpace
,
LocalityPreservingSemanticAnalysis
,
AffinityMatrixCreator
,
LocalityPreservingProjection
Modifier and Type | Field and Description |
---|---|
static String |
DEFAULT_WEIGHTING
The default
WeightingFunction to use. |
static int |
DEFAULT_WINDOW_SIZE
The default number of words before and after the focus word to include
|
static String |
ENTROPY_THRESHOLD_PROPERTY
The property to specify the minimum entropy theshold a word should have
to be included in the vector space after processing.
|
static String |
LPCS_DIMENSIONS_PROPERTY
The property to set the number of dimension to which the space should be
reduced using the SVD
|
static String |
WEIGHTING_FUNCTION_PROPERTY
The property to set the
WeightingFunction to be used with
weighting the co-occurrence of neighboring words based on their distance. |
static String |
WINDOW_SIZE_PROPERTY
The property to specify the number of words to view before and after each
word in focus.
|
Constructor and Description |
---|
LocalityPreservingCooccurrenceSpace(AffinityMatrixCreator creator)
Constructs a new instance using the system properties for configuration.
|
LocalityPreservingCooccurrenceSpace(AffinityMatrixCreator creator,
Properties properties)
Constructs a new instance using the provided properties for
configuration.
|
Modifier and Type | Method and Description |
---|---|
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
Vector |
getVector(String word)
Returns the semantic vector for the provided word.
|
int |
getVectorLength()
Returns the length of vectors in this semantic space.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
processDocument(BufferedReader document)
Processes the contents of the provided file as a document.
|
void |
processSpace(Properties properties)
Once all the documents have been processed, performs any post-processing
steps on the data.
|
public static final String ENTROPY_THRESHOLD_PROPERTY
public static final String WINDOW_SIZE_PROPERTY
public static final String WEIGHTING_FUNCTION_PROPERTY
WeightingFunction
to be used with
weighting the co-occurrence of neighboring words based on their distance.public static final String LPCS_DIMENSIONS_PROPERTY
public static final int DEFAULT_WINDOW_SIZE
public static final String DEFAULT_WEIGHTING
WeightingFunction
to use.public LocalityPreservingCooccurrenceSpace(AffinityMatrixCreator creator)
public LocalityPreservingCooccurrenceSpace(AffinityMatrixCreator creator, Properties properties)
public void processDocument(BufferedReader document) throws IOException
processDocument
in interface SemanticSpace
document
- a reader that allows access to the text of the documentIOException
- if any error occurs while reading the documentpublic Set<String> getWords()
getWords
in interface SemanticSpace
public Vector getVector(String word)
getVector
in interface SemanticSpace
word
- a word that may be in the semantic spaceVector
for the provided word or null
if the
word was not in the space.public int getVectorLength()
processSpace
is called.getVectorLength
in interface SemanticSpace
public void processSpace(Properties properties)
properties
argument.
By general contract, once this method has been called, processDocument
will not be called again.
processSpace
in interface SemanticSpace
properties
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.public String getSpaceName()
getSpaceName
in interface SemanticSpace
Copyright © 2012. All Rights Reserved.