public interface SemanticSpace
processDocument
will be called
one or more times with the text of the corpus.
processSpace
will be called after all
the documents have been used. Once this method has been called, no
further calls to processDocument
should be made
getVector
may be called after the space has
been processed. Implementations may optionally support this method
being called prior to processSpace
but this is not required.
getWords()
may be called at any time to determine which
words are currently represented in the space. Implementations should specify
in their class documentations what parameters are available as properties for
the processSpace
method, and what the default value of those
parameters are.Modifier and Type | Method and Description |
---|---|
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
Vector |
getVector(String word)
Returns the semantic vector for the provided word.
|
int |
getVectorLength()
Returns the length of vectors in this semantic space.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
processDocument(BufferedReader document)
Processes the contents of the provided file as a document.
|
void |
processSpace(Properties properties)
Once all the documents have been processed, performs any post-processing
steps on the data.
|
void processDocument(BufferedReader document) throws IOException
document
- a reader that allows access to the text of the documentIOException
- if any error occurs while reading the documentSet<String> getWords()
Vector getVector(String word)
word
- a word that may be in the semantic spaceVector
for the provided word or null
if the
word was not in the space.void processSpace(Properties properties)
properties
argument.
By general contract, once this method has been called, processDocument
will not be called again.
properties
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.String getSpaceName()
int getVectorLength()
processSpace
is called.Copyright © 2012. All Rights Reserved.