public class OnDiskSemanticSpace extends Object implements SemanticSpace
SemanticSpace
where all vector data is kept on disk. This class is
designed for large semantic spaces whose data, even in sparse format, will
not fit into memory.
The performance of this class is dependent on the format of the backing
vector data; .sspace
files in binary
or
sparse binary
format will likely be faster
for accessing the data due to it being in its native format.
The getWords
method will return words in the order they are stored on
disk. Accessing the words in this order will have to a significant
performance improve over random access. Furtherore, random access to text
and sparse text
formatted matrices will have particularly poor performance for large semantic
spaces, as the internal cursor to the data will have to restart from the
beginning of the file.
This class is thread-safe.
SemanticSpaceIO
,
StaticSemanticSpace
Constructor and Description |
---|
OnDiskSemanticSpace(File file)
Creates the
OnDiskSemanticSpace from the provided file. |
OnDiskSemanticSpace(File file,
SemanticSpaceIO.SSpaceFormat format)
Deprecated.
|
OnDiskSemanticSpace(String filename)
Creates the
OnDiskSemanticSpace from the file. |
Modifier and Type | Method and Description |
---|---|
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
Vector |
getVector(String word)
Returns the semantic vector for the provided word.
|
int |
getVectorLength()
Returns the length of vectors in this semantic space.
|
Set<String> |
getWords()
Returns the set of words that are represented in this semantic space.
|
void |
processDocument(BufferedReader document)
Not supported; throws an
UnsupportedOperationException if called. |
void |
processSpace(Properties props)
Not supported; throws an
UnsupportedOperationException if called. |
public OnDiskSemanticSpace(String filename) throws IOException
OnDiskSemanticSpace
from the file.filename
- the name of a semantic space fileIOException
- if any I/O exception occurs when reading the semantic
space data from the fileError
- if the 4-byte header for the file contains an unrecognized
semantic space formatpublic OnDiskSemanticSpace(File file) throws IOException
OnDiskSemanticSpace
from the provided file.file
- a file containing a store semantic spaceIOException
- if any I/O exception occurs when reading the semantic
space data from the fileError
- if the 4-byte header for the file contains an unrecognized
semantic space format@Deprecated public OnDiskSemanticSpace(File file, SemanticSpaceIO.SSpaceFormat format) throws IOException
OnDiskSemanticSpace
from the provided file in the
specified format. This constructor should only be used for loading
semantic space files that do not have the 4-byte header indicating their
format.file
- a file containing a semantic spaceformat
- the format of the semanti space.IOException
- if any I/O exception occurs when reading the semantic
space data from the filepublic Set<String> getWords()
getWords
in interface SemanticSpace
public Vector getVector(String word)
getVector
in interface SemanticSpace
word
- a word that may be in the semantic spaceVector
for the provided word or null
if the
word was not in the space.IOError
- if any IOException
occurs when reading the data
from the underlying semantic space file.public String getSpaceName()
getSpaceName
in interface SemanticSpace
public int getVectorLength()
processSpace
is called.getVectorLength
in interface SemanticSpace
public void processDocument(BufferedReader document)
UnsupportedOperationException
if called.processDocument
in interface SemanticSpace
document
- a reader that allows access to the text of the documentpublic void processSpace(Properties props)
UnsupportedOperationException
if called.processSpace
in interface SemanticSpace
props
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.Copyright © 2012. All Rights Reserved.