public class IsaMain extends GenericMain
IncrementalSemanticAnalysis (ISA)
from the command line. This class takes in several command line arguments.
-d, --docFile=FILE[,FILE...] a file where each line is
a document. This is the preferred input format for large corpora
-f, --fileList=FILE[,FILE...] a list of document files
where each file is specified on its own line.
-l, --vectorLength=INT length of semantic vectors
-p, --usePermutations=BOOL whether to permute index
vectors based on word order
-s, --windowSize=INT how many words to consider in each
direction
-L, --loadVectors=FILE specifies a file containing
word-to-index vector mappings that should be used by the RandomIndxing instance. This allows multiple invocations of this
program to reuse the same semantic space.
-S, --saveVectors=FILE specifies a file in which the
word-to-index vector mappings will be saved after the RandomIndxing instance has finished processing all the documents.
When used in conjunction with the --loadVectors option, this
allows later invocations of this program to reuse the this
invocation's semantic space.
-F, --tokenFilter=FILE[include|exclude][,FILE...]
specifies a list of one or more files to use for filtering the documents. An option
flag may be added to each file to specify how the words in the filter
filter should be used: include if only the words in the filter
file should be retained in the document; exclude if only the
words not in the filter file should be retained in the
document.
-n, --permutationFunction=CLASSNAME the PermutationFunction class
to use for permuting TernarVectors, if permutation is enabled.
-o, --outputFormat=text, binary, sparse_text,
sparse_binary Specifies the output formatting to use when
generating the semantic space (.sspace) file. See SemanticSpaceUtils for
format details.
-w, --overwrite=BOOL specifies whether to overwrite
the existing output files. The default is true. If set to
false, a unique integer is inserted into the file name.
-v, --verbose specifies whether to print runtime
information to standard out
An invocation will produce one file as output hal-semantic-space.sspace. If overwrite was set to true,
this file will be replaced for each new semantic space. Otherwise, a new
output file of the format isa-semantic-space<number>.sspace will be
created, where <number> is a unique identifier for that program's
invocation. The output file will be placed in the directory specified on the
command line.
IncrementalSemanticAnalysisargOptions, EXT, isMultiThreaded, verbose| Modifier and Type | Method and Description |
|---|---|
protected void |
addExtraOptions(ArgOptions options)
Adds all of the options to the
ArgOptions. |
protected String |
getAlgorithmSpecifics()
Prints the instructions on how to execute this program to standard out.
|
protected SemanticSpace |
getSpace()
Returns the
SemanticSpace that will be used for processing. |
protected SemanticSpaceIO.SSpaceFormat |
getSpaceFormat()
Returns the
format in which the
finished SemanticSpace should be saved. |
static void |
main(String[] args) |
protected void |
postProcessing()
If
--saveVectors was specified, write the accumulated
word-to-index vector mapping to file. |
protected Properties |
setupProperties()
Returns the
Properties object that will be used when calling
SemanticSpace.processSpace(Properties). |
addCorpusReaderIterators, addDocIterators, addFileIterators, getDocumentIterator, handleExtraOptions, loadValidTermSet, parseDocumentsMultiThreaded, parseDocumentsSingleThreaded, processDocumentsAndSpace, run, saveSSpace, setupOptions, usage, verbose, verboseprotected void addExtraOptions(ArgOptions options)
ArgOptions.addExtraOptions in class GenericMainoptions - the ArgOptions object which more main specific options can
be added to.GenericMain.handleExtraOptions()public static void main(String[] args)
protected SemanticSpace getSpace()
GenericMainSemanticSpace that will be used for processing. This
method is guaranteed to be called after the command line arguments have
been parsed, so the contents of GenericMain.argOptions are valid.getSpace in class GenericMainprotected Properties setupProperties()
GenericMainProperties object that will be used when calling
SemanticSpace.processSpace(Properties). Subclasses should
override this method if they need to specify additional properties for
the space. This method will be called once before GenericMain.getSpace().setupProperties in class GenericMainProperties used for processing the semantic space.protected void postProcessing()
--saveVectors was specified, write the accumulated
word-to-index vector mapping to file.postProcessing in class GenericMainprotected SemanticSpaceIO.SSpaceFormat getSpaceFormat()
format in which the
finished SemanticSpace should be saved. Subclasses should
override this function if they want to specify a specific format that is
most suited for their space, when one is not manually specified by the
user.getSpaceFormat in class GenericMainprotected String getAlgorithmSpecifics()
getAlgorithmSpecifics in class GenericMainCopyright © 2012. All Rights Reserved.