public class DocumentVectorBuilder extends Object
DocumentVectorBuilder
generates Vector
representations of a
document, based on semantic Vector
s provided for a SemanticSpace
. This can be consider as a projecting the document into the
semantic space.
Documents will be tokenized using the current tokenizing
method, and the vector in the SemanticSpace
corresponding to each
word found in the document will be combined together.
Options for combining term Vector
s include summation, average, and
term frequency weighting.Modifier and Type | Field and Description |
---|---|
static String |
USE_TERM_FREQUENCIES_PROPERTY
The property to specify if term frequencies should be used when combining
term vectors.
|
Constructor and Description |
---|
DocumentVectorBuilder(SemanticSpace baseSpace)
Creates a
DocumentVectorBuilder from a SemanticSpace and
extracts options from the system wide Properties . |
DocumentVectorBuilder(SemanticSpace baseSpace,
Properties props)
Creates a
DocumentVectorBuilder from a SemanticSpace and
extracts options from the given Properties . |
Modifier and Type | Method and Description |
---|---|
void |
add(DoubleVector dest,
Vector src,
int factor) |
DoubleVector |
buildVector(BufferedReader document,
DoubleVector documentVector)
Represent a document as the summation of term Vectors.
|
public static final String USE_TERM_FREQUENCIES_PROPERTY
public DocumentVectorBuilder(SemanticSpace baseSpace)
DocumentVectorBuilder
from a SemanticSpace
and
extracts options from the system wide Properties
.public DocumentVectorBuilder(SemanticSpace baseSpace, Properties props)
DocumentVectorBuilder
from a SemanticSpace
and
extracts options from the given Properties
.public DoubleVector buildVector(BufferedReader document, DoubleVector documentVector)
document
- A BufferedReader
for a document to project into a
SemanticSpace
.documentVector
- A Vector
which has been pre-allocated to
store the document's representation. This is
pre-allocated so that users of DocumentVectorBuilder
can decide what type of
Vector
should be used to represent a
document.documentVector
after it has been modified to represent
the terms in document
.public void add(DoubleVector dest, Vector src, int factor)
Copyright © 2012. All Rights Reserved.