public class ExplicitSemanticAnalysis extends GenericTermDocumentVectorSpace
Modifier and Type | Field and Description |
---|---|
static String |
ESA_SSPACE_NAME |
documentCounter, LOG, wordSpace
Constructor and Description |
---|
ExplicitSemanticAnalysis()
Constructs a new
ExplicitSemanticAnalysis instance. |
ExplicitSemanticAnalysis(BasisMapping<String,String> termToIndex,
MatrixBuilder termDocumentMatrixBuilder)
Constructs a new
ExplicitSemanticAnalysis using the provided
objects for processing. |
Modifier and Type | Method and Description |
---|---|
SparseArray<String> |
getDocumentDescriptors(Vector documentVector)
Returns a
SparseArray containing document labels for any non zero
value in the given Vector . |
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
protected void |
handleDocumentHeader(int docIndex,
String header)
Stores
header at index docIndex . |
void |
processSpace(Properties properties)
Once all the documents have been processed, performs any post-processing
steps on the data.
|
getVector, getVectorLength, getWords, processDocument, processSpace
public static final String ESA_SSPACE_NAME
public ExplicitSemanticAnalysis() throws IOException
ExplicitSemanticAnalysis
instance.IOException
public ExplicitSemanticAnalysis(BasisMapping<String,String> termToIndex, MatrixBuilder termDocumentMatrixBuilder) throws IOException
ExplicitSemanticAnalysis
using the provided
objects for processing.termToIndex
- The BasisMapping
used to map strings to
indices.termDocumentMatrixBuilder
- The MatrixBuilder
used to write
document vectors to disk which later get processed in processSpace
.IOException
- if this instance encounters any errors when creatng
the backing array files required for processingprotected void handleDocumentHeader(int docIndex, String header)
header
at index docIndex
.handleDocumentHeader
in class GenericTermDocumentVectorSpace
docIndex
- The document id assigned to the current documentpublic SparseArray<String> getDocumentDescriptors(Vector documentVector)
SparseArray
containing document labels for any non zero
value in the given Vector
. The given Vector
s are
expected to have the same dimensionality as this ExplicitSemanticAnalysis
word space. Under ESA, these returned document
labels can be considered the wikipedia articles that best describe the
vector created by combining each of the term vectors in a fragment of
text.public void processSpace(Properties properties)
properties
argument.
By general contract, once this method has been called, processDocument
will not be called again.
properties
- a set of properties and values that may be used to
configure any exposed parameters of the algorithm.public String getSpaceName()
Copyright © 2012. All Rights Reserved.