public class RandomIndexingMain extends GenericMain
RandomIndexing
from the command line.
This class provides several options:
-d
, --docFile=FILE[,FILE...]
a file where each line is
a document
-f
, --fileList=FILE[,FILE...]
a list of document files
-l
, --vectorLength=INT
length of semantic vectors
-p
, --usePermutations=BOOL
whether to permute index
vectors based on word order
-s
, --windowSize=INT
how many words to consider in
each direction
-L
, --loadVectors=FILE
specifies a file containing
word-to-index vector mappings that should be used by the RandomIndxing
instance. This allows multiple invocations of this
program to reuse the same semantic space.
-S
, --saveVectors=FILE
specifies a file in which the
word-to-index vector mappings will be saved after the RandomIndxing
instance has finished processing all the documents.
When used in conjunction with the --loadVectors
option, this
allows later invocations of this program to reuse the this
invocation's semantic space.
-F
, --tokenFilter=FILE[include|exclude][,FILE...]
specifies a list of one or more files to use for filtering
the documents. An option
flag may be added to each file to specify how the words in the filter
filter should be used: include
if only the words in the filter
file should be retained in the document; exclude
if only the
words not in the filter file should be retained in the
document.
-n
, --permutationFunction=CLASSNAME
the PermutationFunction
class
to use for permuting TernaryVector
s, if permutation is
enabled.
-o
, --outputFormat=
text|binary} Specifies the
output formatting to use when generating the semantic space (.sspace
) file. See SemanticSpaceUtils
for format details.
-t
, --threads=INT
the number of threads to use
-v
, --verbose
prints verbose output
-w
, --overwrite=BOOL
specifies whether to overwrite
the existing output
An invocation will produce one file as output random-indexing.sspace
.
If overwrite
was set to true
, this file will be replaced for
each new semantic space. Otherwise, a new output file of the format random-indexing<number>.sspace
will be created, where <number>
is a unique identifier for that program's invocation. The output
file will be placed in the directory specified on the command line.
This class is desgined to run multi-threaded and performs well with one thread per core, which is the default setting.
RandomIndexing
argOptions, EXT, isMultiThreaded, verbose
Modifier and Type | Method and Description |
---|---|
protected void |
addExtraOptions(ArgOptions options)
Adds all of the options to the
ArgOptions . |
protected SemanticSpace |
getSpace()
Returns an instance of
RandomIndexing . |
protected SemanticSpaceIO.SSpaceFormat |
getSpaceFormat()
Returns the format as
the default format of a
RandomIndexing space. |
static void |
main(String[] args) |
protected void |
postProcessing()
If
--saveVectors was specified, write the accumulated
word-to-index vector mapping to file. |
protected Properties |
setupProperties()
Returns the
Properties object that will be used when calling
SemanticSpace.processSpace(Properties) . |
addCorpusReaderIterators, addDocIterators, addFileIterators, getAlgorithmSpecifics, getDocumentIterator, handleExtraOptions, loadValidTermSet, parseDocumentsMultiThreaded, parseDocumentsSingleThreaded, processDocumentsAndSpace, run, saveSSpace, setupOptions, usage, verbose, verbose
protected void addExtraOptions(ArgOptions options)
ArgOptions
.addExtraOptions
in class GenericMain
options
- the ArgOptions object which more main specific options can
be added to.GenericMain.handleExtraOptions()
public static void main(String[] args)
protected Properties setupProperties()
Properties
object that will be used when calling
SemanticSpace.processSpace(Properties)
. Subclasses should
override this method if they need to specify additional properties for
the space. This method will be called once before GenericMain.getSpace()
.setupProperties
in class GenericMain
Properties
used for processing the semantic space.protected SemanticSpace getSpace()
RandomIndexing
. If loadVectors
is
specified in the command line options, this method will also initialize
the word-to-TernaryVector
mapping.getSpace
in class GenericMain
protected SemanticSpaceIO.SSpaceFormat getSpaceFormat()
RandomIndexing
space.getSpaceFormat
in class GenericMain
protected void postProcessing()
--saveVectors
was specified, write the accumulated
word-to-index vector mapping to file.postProcessing
in class GenericMain
Copyright © 2012. All Rights Reserved.