public class DocumentVectorBuilder extends Object
DocumentVectorBuilder generates Vector representations of a
document, based on semantic Vectors provided for a SemanticSpace. This can be consider as a projecting the document into the
semantic space.
Documents will be tokenized using the current tokenizing
method, and the vector in the SemanticSpace corresponding to each
word found in the document will be combined together.
Options for combining term Vectors include summation, average, and
term frequency weighting.| Modifier and Type | Field and Description |
|---|---|
static String |
USE_TERM_FREQUENCIES_PROPERTY
The property to specify if term frequencies should be used when combining
term vectors.
|
| Constructor and Description |
|---|
DocumentVectorBuilder(SemanticSpace baseSpace)
Creates a
DocumentVectorBuilder from a SemanticSpace and
extracts options from the system wide Properties. |
DocumentVectorBuilder(SemanticSpace baseSpace,
Properties props)
Creates a
DocumentVectorBuilder from a SemanticSpace and
extracts options from the given Properties. |
| Modifier and Type | Method and Description |
|---|---|
void |
add(DoubleVector dest,
Vector src,
int factor) |
DoubleVector |
buildVector(BufferedReader document,
DoubleVector documentVector)
Represent a document as the summation of term Vectors.
|
public static final String USE_TERM_FREQUENCIES_PROPERTY
public DocumentVectorBuilder(SemanticSpace baseSpace)
DocumentVectorBuilder from a SemanticSpace and
extracts options from the system wide Properties.public DocumentVectorBuilder(SemanticSpace baseSpace, Properties props)
DocumentVectorBuilder from a SemanticSpace and
extracts options from the given Properties.public DoubleVector buildVector(BufferedReader document, DoubleVector documentVector)
document - A BufferedReader for a document to project into a
SemanticSpace.documentVector - A Vector which has been pre-allocated to
store the document's representation. This is
pre-allocated so that users of DocumentVectorBuilder can decide what type of
Vector should be used to represent a
document.documentVector after it has been modified to represent
the terms in document.public void add(DoubleVector dest, Vector src, int factor)
Copyright © 2012. All Rights Reserved.