edu.ucla.sspace.text (S-Space Package 2.0.1 API)

Interface Summary
Interface	Description
AnnotatedDocument	An abstraction for a document that allows document processors to access text in a uniform manner.
CorpusReader<D extends Document>	A basic interface for setting up a `CorpusReader`, which reads un cleaned text from corpus files and transforms them into an appropriately cleaned `Document` instance.
Document	An abstraction for a document that allows document processors to access text in a uniform manner.
LabeledDocument	An abstraction for a document that has an accompanying label or name.
LabeledParsedDocument	A union interface for a document that has been (or will be) dependency parsed to generate an accompanying parse tree of its contents and that has an accompanying label about its source or contents.
ParsedDocument	An abstraction for a document that has been (or will be) dependency parsed to generate an accompanying parse tree of its contents.
Stemmer	An interface for classes that stem tokens.
TemporalDocument	An abstraction for a document that allows document processors to access time-annotated text in a uniform manner.

Class Summary
Class	Description
BloglinesCorpusReader	A `DirectoryCorpusReader` for the bloglines corpus.
BufferedFileListDocumentIterator	An iterator implementation that returns `Document` instances given a file that contains list of files, buffering their contents as necessary.
BufferedIterator	An iterator over all the tokens in a stream, which supports arbitrary look-ahead into the stream by buffering the tokens.
ChildesCorpusReader	A corpus reader for the Childes corpus.
CompoundWordIterator	An iterator over all the tokens in a stream, which supports tokenizing predetermined n-grams as single tokens.
DependencyFileDocumentIterator	An iterator implementation that returns `Document` containg a single dependency parsed sentence given a file in the CoNLL Format
DirectoryCorpusReader<D extends Document>	An abstract base class for corpus reading iterators that need to traverse through a large nested directory structure to find files containing text.
DocumentPreprocessor	A class for preprocessing all types of documents.
EnglishStemmer	A wrapper for the english Snowball Stemmer.
FileDocument	A `Document` implementation backed by a `File` whose contents are used for the document text.
FileListDocumentIterator	An iterator implementation that returns `Document` instances given a file that contains list of files.
FileListTemporalDocumentIterator	An iterator implementation that returns `TemporalDocument` instances given a file that contains list of files and their creation time stamps, each on a separate line.
FilteredIterator	An iterator over all the tokens in a stream that uses a `TokenFilter` to remove invalid tokens.
GermanStemmer	A wrapper for the german Snowball Stemmer.
ItalianStemmer	A wrapper for the italian Snowball Stemmer.
IteratorFactory	A factory class for generating `Iterator<String>` tokenizers for streams of tokens such as `BufferedReader` instances.
LabeledParsedStringDocument	An abstraction for a document that has been (or will be) dependency parsed to generate an accompanying parse tree of its contents.
LabeledStringDocument	A `LabeledDocument` implementation backed by a `String` whose contents are used for the document text.
LimitedOneLinePerDocumentIterator	An iterator decorator that returns `Document` instances given a file that contains list of files.
OneLinePerDocumentIterator	An iterator implementation that returns `Document` instances given a file that contains list of files.
OneLinePerTemporalDocumentIterator	An iterator implementation that returns `TemporalDocument` instances given a file where each line of text is treated as a separate document.
OrderPreservingFilteredIterator	An iterator over all the tokens in a stream that uses a `TokenFilter` to remove invalid tokens and replaces them with the `IteratorFactory.EMPTY_TOKEN` string to signify their position.
PatPho	An implementation of the PatPho phonological representation system.
PorterStemmer	This is an implementation of the Porter stemmer in Java.
PukWaCDocumentIterator	An iterator implementation that returns `Document` containg a single dependency parsed sentence given a file in the CoNLL Format which is contained in the XML format provided in the WaCkypedia corpus.
SenseEvalDependencyCorpusReader	A corpus reader for the SenseEvalDependency corpus.
SnowballPorterStemmer	A wrapper for the porter Snowball Stemmer.
StemmingIterator	An iterator that stems all of the tokens that it returns.
StringDocument	A `Document` implementation backed by a `String` whose contents are used for the document text.
StringUtils	A collection of static methods for processing text.
TemporalBloglinesCorpusReader	A subclass of `BloglinesCorpusReader` that always includes timestamps.
TemporalFileDocument	A `TemporalDocument` implementation backed by a `File` whose contents are used for the document text.
TemporalStringDocument	A `TemporalDocument` implementation backed by a `String` whose contents are used for the document text.
TemporalUsenetCorpusReader	A subclass of `UsenetCorpusReader` that always includes timestamps.
TermAssociationFinder
TokenFilter	A utility for asserting what tokens are valid and invalid within a stream of tokens.
UkWacDependencyFileIterator	An iterator implementation that returns `Document` containg a single dependency parsed sentence given a file in the CoNLL Format
UkWaCDocumentIterator	An iterator implementation that returns `Document` instances labled with the source URL from which its text was obtained, as specified in the ukWaC.
UsenetCorpusReader	A `DirectoryCorpusReader` for the Usenet corpus provided by the Westbury Lab.
WaCkypediaDocumentIterator	An iterator implementation that returns `Document` containg a single dependency parsed sentence given a file in the CoNLL Format which is contained in the XML format provided in the WaCkypedia corpus.
WordIterator	An iterator over all of the tokens present in a `BufferedReader` that are separated by any amount of white space.
WordReplacementIterator	An iterator over all tokens in a stream that replaces tokens if they have a known replacement value.

Package edu.ucla.sspace.text