public class StructuredVectorSpaceMain extends DependencyGenericMain
StructuredVectorSpace
(StructuredVectorSpace) from the command line. This class takes in several
command line arguments.
-d
, --docFile=FILE[,FILE...]
a file where each line is
a document. This is the preferred input format for large corpora
-f
, --fileList=FILE[,FILE...]
a list of document files
where each file is specified on its own line.
-a
, --pathAcceptor=CLASSNAME
Specifies the DependencyPathAcceptor
to use while accepting or rejecting DependencyPath
s.
-W
, --pathWeighter=CLASSNAME
Specifies the DependencyPathWeighter
to use while scoring DependencyPath
s.
-G
, --configFile=config.xml
Specifies a configuration file for specifying the ordering of the
malth styled dependency parse trees.
-o
, --outputFormat=
text|binary} Specifies the
output formatting to use when generating the semantic space (.sspace
) file. See SemanticSpaceUtils
for format details.
-t
, --threads=INT
how many threads to use when
processing the documents. The default is one per core.
-w
, --overwrite=BOOL
specifies whether to overwrite
the existing output files. The default is true
. If set to
false
, a unique integer is inserted into the file name.
-v
, --verbose
specifies whether to print runtime
information to standard out
An invocation will produce one file as output structued-vector-space.sspace
. If overwrite
was set to true
, this file will be replaced for each new semantic space. Otherwise, a
new output file of the format structued-vector-space<number>.sspace
will be created, where <number>
is a unique identifier for that
program's invocation. The output file will be placed in the directory
specified on the command line.
This class is desgined to run multi-threaded and performs well with one thread per core, which is the default setting.
StructuredVectorSpace
argOptions, EXT, isMultiThreaded, verbose
Modifier and Type | Method and Description |
---|---|
void |
addExtraOptions(ArgOptions options)
Adds options to the provided
ArgOptions instance, which will be
used to parse the command line. |
protected SemanticSpace |
getSpace()
Returns the
SemanticSpace that will be used for processing. |
protected SemanticSpaceIO.SSpaceFormat |
getSpaceFormat()
Returns the
format in which the
finished SemanticSpace should be saved. |
static void |
main(String[] args) |
protected Properties |
setupProperties()
Returns the
Properties object that will be used when calling
SemanticSpace.processSpace(Properties) . |
addDocIterators, addFileIterators, setupDependencyExtractor, usage
addCorpusReaderIterators, getAlgorithmSpecifics, getDocumentIterator, handleExtraOptions, loadValidTermSet, parseDocumentsMultiThreaded, parseDocumentsSingleThreaded, postProcessing, processDocumentsAndSpace, run, saveSSpace, setupOptions, verbose, verbose
public void addExtraOptions(ArgOptions options)
ArgOptions
instance, which will be
used to parse the command line. This method allows subclasses the
ability to add extra command line options.addExtraOptions
in class DependencyGenericMain
options
- the ArgOptions object which more main specific options can
be added to.GenericMain.handleExtraOptions()
protected SemanticSpace getSpace()
SemanticSpace
that will be used for processing. This
method is guaranteed to be called after the command line arguments have
been parsed, so the contents of GenericMain.argOptions
are valid.getSpace
in class GenericMain
protected Properties setupProperties()
Properties
object that will be used when calling
SemanticSpace.processSpace(Properties)
. Subclasses should
override this method if they need to specify additional properties for
the space. This method will be called once before GenericMain.getSpace()
.setupProperties
in class GenericMain
Properties
used for processing the semantic space.protected SemanticSpaceIO.SSpaceFormat getSpaceFormat()
format
in which the
finished SemanticSpace
should be saved. Subclasses should
override this function if they want to specify a specific format that is
most suited for their space, when one is not manually specified by the
user.getSpaceFormat
in class GenericMain
Copyright © 2012. All Rights Reserved.