public class ExplicitSemanticAnalysis extends GenericTermDocumentVectorSpace
| Modifier and Type | Field and Description |
|---|---|
static String |
ESA_SSPACE_NAME |
documentCounter, LOG, wordSpace| Constructor and Description |
|---|
ExplicitSemanticAnalysis()
Constructs a new
ExplicitSemanticAnalysis instance. |
ExplicitSemanticAnalysis(BasisMapping<String,String> termToIndex,
MatrixBuilder termDocumentMatrixBuilder)
Constructs a new
ExplicitSemanticAnalysis using the provided
objects for processing. |
| Modifier and Type | Method and Description |
|---|---|
SparseArray<String> |
getDocumentDescriptors(Vector documentVector)
Returns a
SparseArray containing document labels for any non zero
value in the given Vector. |
String |
getSpaceName()
Returns a unique string describing the name and configuration of this
algorithm.
|
protected void |
handleDocumentHeader(int docIndex,
String header)
Stores
header at index docIndex. |
void |
processSpace(Properties properties)
Once all the documents have been processed, performs any post-processing
steps on the data.
|
getVector, getVectorLength, getWords, processDocument, processSpacepublic static final String ESA_SSPACE_NAME
public ExplicitSemanticAnalysis()
throws IOException
ExplicitSemanticAnalysis instance.IOExceptionpublic ExplicitSemanticAnalysis(BasisMapping<String,String> termToIndex, MatrixBuilder termDocumentMatrixBuilder) throws IOException
ExplicitSemanticAnalysis using the provided
objects for processing.termToIndex - The BasisMapping used to map strings to
indices.termDocumentMatrixBuilder - The MatrixBuilder used to write
document vectors to disk which later get processed in processSpace.IOException - if this instance encounters any errors when creatng
the backing array files required for processingprotected void handleDocumentHeader(int docIndex,
String header)
header at index docIndex.handleDocumentHeader in class GenericTermDocumentVectorSpacedocIndex - The document id assigned to the current documentpublic SparseArray<String> getDocumentDescriptors(Vector documentVector)
SparseArray containing document labels for any non zero
value in the given Vector. The given Vectors are
expected to have the same dimensionality as this ExplicitSemanticAnalysis word space. Under ESA, these returned document
labels can be considered the wikipedia articles that best describe the
vector created by combining each of the term vectors in a fragment of
text.public void processSpace(Properties properties)
properties argument.
By general contract, once this method has been called, processDocument will not be called again.
properties - a set of properties and values that may be used to
configure any exposed parameters of the algorithm.public String getSpaceName()
Copyright © 2012. All Rights Reserved.