GenericWordsiMain (S-Space Package 2.0.1 API)

java.lang.Object
- edu.ucla.sspace.mains.GenericMain
- - edu.ucla.sspace.mains.GenericWordsiMain

Direct Known Subclasses:

DVWordsiMain, PreComputedWordsiMain, RIWordsiMain, TopicWordsiMain, WCWordsiMain
```
public abstract class GenericWordsiMain
extends GenericMain
```
A base implementation for Wordsi executables. This class provides base arguments that nearly all Wordsi executables will require, along with basic processing for those arguments.
This class provides access to three different word sense modes : online clustering, offline clustering, and an evaluation mode. For the two clustering modes, word senses are generated by clustering individual context vectors. The first mode uses StreamingWordsi and the latter mode uses WaitingWordsi. The third mode assumes that the word sense have already been learned and are fixed. Individual contexts are labeled with the most similar word sense.
This class provides access to two evaluation modes: Pseudo Word Discrimination and the SenseEval/SemEval evaluation. When training a Wordsi model for a pseudo word task, the -e option should be set with the "pseudoWord} argument. The -P option should be set so that Wordsi knows which words form pseudo words. Wordsi will generate a report that specifies how many times each core word in a pseudo word was assigned to a word sense for the pseudo word. When running in evaluation mode, the -e option must be set.
Since Wordsi instances will need to reuse features during training and testing, the --Save and --Load options are provided. --Save will store any data structures that are required for generating context vectors. --Load will load these same data structures from disk and re-use them. In general, --Save should be used during training and --Load should be used during testing. Different Wordsi executables will serialize different data structures, but these will generally be a mapping from strings to some other data type.
GenericMain provides the core options used by this base executible. This class provides the following addition options:
- Required (one of):
- Evaluation Type
- Optional
- Serialization
Author:

Keith Stevens

Field Summary
- Fields inherited from class edu.ucla.sspace.mains.GenericMain
  argOptions, EXT, isMultiThreaded, verbose

Constructor Summary

Constructors
Constructor and Description

GenericWordsiMain()

Constructors
Constructor and Description
`GenericWordsiMain()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected void`	`addExtraOptions(ArgOptions options)` Adds options to the provided `ArgOptions` instance, which will be used to parse the command line.
`protected ContextExtractor`	`contextExtractorFromGenerator(ContextGenerator generator)` Returns a `ContextExtractor` that uses the given `ContextGenerator` which will process the corpus in the format specified by the command line.
`protected Set<String>`	`getAcceptedWords()` Returns a set of strings that the `Wordsi` implementations should represent, or `null`, which signifies that all words should be represented.
`protected Iterator<Document>`	`getDocumentIterator()` Returns the iterator for all of the documents specified on the command line or throws an `Error` if no documents are specified.
`protected abstract ContextExtractor`	`getExtractor()` Returns a `ContextExtractor`, which will be responsible for creating context vectors for documents.
`protected Map<String,String>`	`getPseudoWordMap()` Returns a mapping from real tokens to their pseudo word tokens, or `null` if the `-P` option is not specified.
`protected SemanticSpace`	`getSpace()` Returns the `SemanticSpace` that will be used for processing.
`protected <T> T`	`loadObject(ObjectInputStream inStream)` Returns an object of type `T` from the provided `ObjectInputStream`.
`protected ObjectInputStream`	`openLoadFile()` Returns an `ObjectInputStream` for the file referred to by the `--Load` option or `null` if the option was not used.
`protected ObjectOutputStream`	`openSaveFile()` Returns an `ObjectOutputStream` for the file referred to by the `--Save` option or `null` if the option was not used.
`protected void`	`saveObject(ObjectOutputStream outStream, Object obj)` Writes the `obj` to the given `ObjectOutputStream`.
`protected int`	`windowSize()` Returns the window size used in a sliding context window.

Methods inherited from class edu.ucla.sspace.mains.GenericMain
addCorpusReaderIterators, addDocIterators, addFileIterators, getAlgorithmSpecifics, getSpaceFormat, handleExtraOptions, loadValidTermSet, parseDocumentsMultiThreaded, parseDocumentsSingleThreaded, postProcessing, processDocumentsAndSpace, run, saveSSpace, setupOptions, setupProperties, usage, verbose, verbose

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - GenericWordsiMain
```
public GenericWordsiMain()
```
- Method Detail
  - addExtraOptions
```
protected void addExtraOptions(ArgOptions options)
```
    Adds options to the provided ArgOptions instance, which will be used to parse the command line. This method allows subclasses the ability to add extra command line options.
    
    Overrides:
    
    addExtraOptions in class GenericMain
    
    Parameters:
    options - the ArgOptions object which more main specific options can be added to.
    See Also:
    GenericMain.handleExtraOptions()
  - getExtractor
```
protected abstract ContextExtractor getExtractor()
```
    Returns a ContextExtractor, which will be responsible for creating context vectors for documents.
  - getAcceptedWords
```
protected Set<String> getAcceptedWords()
```
    Returns a set of strings that the Wordsi implementations should represent, or null, which signifies that all words should be represented.
  - getPseudoWordMap
```
protected Map<String,String> getPseudoWordMap()
```
    Returns a mapping from real tokens to their pseudo word tokens, or null if the -P option is not specified.
  - contextExtractorFromGenerator
```
protected ContextExtractor contextExtractorFromGenerator(ContextGenerator generator)
```
    Returns a ContextExtractor that uses the given ContextGenerator which will process the corpus in the format specified by the command line. This is just a helper function for sub-classes implementing getExtractor().
  - windowSize
```
protected int windowSize()
```
    Returns the window size used in a sliding context window.
  - getDocumentIterator
```
protected Iterator<Document> getDocumentIterator()
                                          throws IOException
```
    Description copied from class: GenericMain
    
    Returns the iterator for all of the documents specified on the command line or throws an Error if no documents are specified. If subclasses should override either GenericMain.addFileIterators(java.util.Collection<java.util.Iterator<edu.ucla.sspace.text.Document>>, java.lang.String[]) or GenericMain.addDocIterators(java.util.Collection<java.util.Iterator<edu.ucla.sspace.text.Document>>, java.lang.String[]) if they use different file format. Alternatively, oen can implement a CorpusReader and use the -R option.
    
    Overrides:
    
    getDocumentIterator in class GenericMain
    
    Throws:
    
    IOException
  - getSpace
```
protected SemanticSpace getSpace()
```
    Returns the SemanticSpace that will be used for processing. This method is guaranteed to be called after the command line arguments have been parsed, so the contents of GenericMain.argOptions are valid.
    
    Specified by:
    
    getSpace in class GenericMain
  - openSaveFile
```
protected ObjectOutputStream openSaveFile()
```
    Returns an ObjectOutputStream for the file referred to by the --Save option or null if the option was not used.
  - openLoadFile
```
protected ObjectInputStream openLoadFile()
```
    Returns an ObjectInputStream for the file referred to by the --Load option or null if the option was not used.
  - saveObject
```
protected void saveObject(ObjectOutputStream outStream,
              Object obj)
```
    Writes the obj to the given ObjectOutputStream.
  - loadObject
```
protected <T> T loadObject(ObjectInputStream inStream)
```
    Returns an object of type T from the provided ObjectInputStream. This method does the casting, so assignments should be done directly to a pointer and not through a ternary operator, otherwise the cast will need to be done a second time.

Class GenericWordsiMain

Field Summary

Fields inherited from class edu.ucla.sspace.mains.GenericMain

Constructor Summary

Method Summary

Methods inherited from class edu.ucla.sspace.mains.GenericMain

Methods inherited from class java.lang.Object

Constructor Detail

GenericWordsiMain

Method Detail

addExtraOptions

getExtractor

getAcceptedWords

getPseudoWordMap

contextExtractorFromGenerator

windowSize

getDocumentIterator

getSpace

openSaveFile

openLoadFile

saveObject

loadObject