public class CachingOnDiskSemanticSpace extends Object implements SemanticSpace
SemanticSpace
where most vector data is kept on disk, but
frequently accessed data is kept in memory. This class is designed for large
semantic spaces whose data will not fit in memory and whose usage pattern
will frequently access a vector multiple times.
The performance of this class is dependent on the format of the backing
vector data; .sspace
files in binary
or
sparse binary
format will likely be faster
for accessing the data due to it being in its native format.
The getWords
method will return words in the order they are stored on
disk. Accessing the words in this order will have to a significant
performance improve over random access. Furtherore, random access to text
and sparse text
formatted matrices will have particularly poor performance for large semantic
spaces, as the internal cursor to the data will have to restart from the
beginning of the file.
This class is thread-safe.
SemanticSpaceIO
,
OnDiskSemanticSpace
Constructor and Description |
---|
CachingOnDiskSemanticSpace(File file)
Creates a new instance of
CachingOnDiskSemanticSpace from the data in
the specified file. |
CachingOnDiskSemanticSpace(String filename)
Creates a new instance of
CachingOnDiskSemanticSpace from the
data in the file with the specified name. |
Modifier and Type | Method and Description |
---|---|
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
Vector |
getVector(String word)
Returns the semantic vector for the provided word.
|
int |
getVectorLength()
Returns the length of vectors in this semantic space.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
processDocument(BufferedReader document)
Not supported; throws an
UnsupportedOperationException if called. |
void |
processSpace(Properties props)
Not supported; throws an
UnsupportedOperationException if called. |
public CachingOnDiskSemanticSpace(String filename) throws IOException
CachingOnDiskSemanticSpace
from the
data in the file with the specified name.filename
- the name of a file containing a semantic spaceIOException
- if any I/O exception occurs when reading the semantic
space data from the filepublic CachingOnDiskSemanticSpace(File file) throws IOException
CachingOnDiskSemanticSpace
from the data in
the specified file.file
- a file containing a semantic spaceIOException
- if any I/O exception occurs when reading the semantic
space data from the filpublic String getSpaceName()
getSpaceName
in interface SemanticSpace
public Set<String> getWords()
getWords
in interface SemanticSpace
public Vector getVector(String word)
getVector
in interface SemanticSpace
word
- a word that may be in the semantic spaceVector
for the provided word or null
if the
word was not in the space.IOError
- if any IOException
occurs when reading the data
from the underlying semantic space file.public int getVectorLength()
processSpace
is called.getVectorLength
in interface SemanticSpace
public void processDocument(BufferedReader document)
UnsupportedOperationException
if called.processDocument
in interface SemanticSpace
document
- a reader that allows access to the text of the documentan
- UnsupportedOperationException
if called.public void processSpace(Properties props)
UnsupportedOperationException
if called.processSpace
in interface SemanticSpace
props
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.an
- UnsupportedOperationException
if called.Copyright © 2012. All Rights Reserved.