public class RandomIndexingMain extends GenericMain
RandomIndexing from the command line.
This class provides several options:
-d, --docFile=FILE[,FILE...] a file where each line is
a document
-f, --fileList=FILE[,FILE...] a list of document files
-l, --vectorLength=INT length of semantic vectors
-p, --usePermutations=BOOL whether to permute index
vectors based on word order
-s, --windowSize=INT how many words to consider in
each direction
-L, --loadVectors=FILE specifies a file containing
word-to-index vector mappings that should be used by the RandomIndxing instance. This allows multiple invocations of this
program to reuse the same semantic space.
-S, --saveVectors=FILE specifies a file in which the
word-to-index vector mappings will be saved after the RandomIndxing instance has finished processing all the documents.
When used in conjunction with the --loadVectors option, this
allows later invocations of this program to reuse the this
invocation's semantic space.
-F, --tokenFilter=FILE[include|exclude][,FILE...]
specifies a list of one or more files to use for filtering the documents. An option
flag may be added to each file to specify how the words in the filter
filter should be used: include if only the words in the filter
file should be retained in the document; exclude if only the
words not in the filter file should be retained in the
document.
-n, --permutationFunction=CLASSNAME the PermutationFunction class
to use for permuting TernaryVectors, if permutation is
enabled.
-o, --outputFormat=text|binary} Specifies the
output formatting to use when generating the semantic space (.sspace) file. See SemanticSpaceUtils for format details.
-t, --threads=INT the number of threads to use
-v, --verbose prints verbose output
-w, --overwrite=BOOL specifies whether to overwrite
the existing output
An invocation will produce one file as output random-indexing.sspace.
If overwrite was set to true, this file will be replaced for
each new semantic space. Otherwise, a new output file of the format random-indexing<number>.sspace will be created, where <number> is a unique identifier for that program's invocation. The output
file will be placed in the directory specified on the command line.
This class is desgined to run multi-threaded and performs well with one thread per core, which is the default setting.
RandomIndexingargOptions, EXT, isMultiThreaded, verbose| Modifier and Type | Method and Description |
|---|---|
protected void |
addExtraOptions(ArgOptions options)
Adds all of the options to the
ArgOptions. |
protected SemanticSpace |
getSpace()
Returns an instance of
RandomIndexing. |
protected SemanticSpaceIO.SSpaceFormat |
getSpaceFormat()
Returns the format as
the default format of a
RandomIndexing space. |
static void |
main(String[] args) |
protected void |
postProcessing()
If
--saveVectors was specified, write the accumulated
word-to-index vector mapping to file. |
protected Properties |
setupProperties()
Returns the
Properties object that will be used when calling
SemanticSpace.processSpace(Properties). |
addCorpusReaderIterators, addDocIterators, addFileIterators, getAlgorithmSpecifics, getDocumentIterator, handleExtraOptions, loadValidTermSet, parseDocumentsMultiThreaded, parseDocumentsSingleThreaded, processDocumentsAndSpace, run, saveSSpace, setupOptions, usage, verbose, verboseprotected void addExtraOptions(ArgOptions options)
ArgOptions.addExtraOptions in class GenericMainoptions - the ArgOptions object which more main specific options can
be added to.GenericMain.handleExtraOptions()public static void main(String[] args)
protected Properties setupProperties()
Properties object that will be used when calling
SemanticSpace.processSpace(Properties). Subclasses should
override this method if they need to specify additional properties for
the space. This method will be called once before GenericMain.getSpace().setupProperties in class GenericMainProperties used for processing the semantic space.protected SemanticSpace getSpace()
RandomIndexing. If loadVectors is
specified in the command line options, this method will also initialize
the word-to-TernaryVector mapping.getSpace in class GenericMainprotected SemanticSpaceIO.SSpaceFormat getSpaceFormat()
RandomIndexing space.getSpaceFormat in class GenericMainprotected void postProcessing()
--saveVectors was specified, write the accumulated
word-to-index vector mapping to file.postProcessing in class GenericMainCopyright © 2012. All Rights Reserved.