| Interface | Description |
|---|---|
| AnnotatedDocument |
An abstraction for a document that allows document processors to access text
in a uniform manner.
|
| CorpusReader<D extends Document> |
A basic interface for setting up a
CorpusReader, which reads un
cleaned text from corpus files and transforms them into an appropriately
cleaned Document instance. |
| Document |
An abstraction for a document that allows document processors to access text
in a uniform manner.
|
| LabeledDocument |
An abstraction for a document that has an accompanying label or name.
|
| LabeledParsedDocument |
A union interface for a document that has been (or will be) dependency parsed
to generate an accompanying parse tree of its contents and that has an
accompanying label about its source or contents.
|
| ParsedDocument |
An abstraction for a document that has been (or will be) dependency parsed to
generate an accompanying parse tree of its contents.
|
| Stemmer |
An interface for classes that stem tokens.
|
| TemporalDocument |
An abstraction for a document that allows document processors to access
time-annotated text in a uniform manner.
|
| Class | Description |
|---|---|
| BloglinesCorpusReader |
A
DirectoryCorpusReader for the bloglines corpus. |
| BufferedFileListDocumentIterator |
An iterator implementation that returns
Document instances given a
file that contains list of files, buffering their contents as necessary. |
| BufferedIterator |
An iterator over all the tokens in a stream, which supports arbitrary
look-ahead into the stream by buffering the tokens.
|
| ChildesCorpusReader |
A corpus reader for the Childes corpus.
|
| CompoundWordIterator |
An iterator over all the tokens in a stream, which supports tokenizing
predetermined n-grams as single tokens.
|
| DependencyFileDocumentIterator |
An iterator implementation that returns
Document containg a single
dependency parsed sentence given a file in the CoNLL Format |
| DirectoryCorpusReader<D extends Document> |
An abstract base class for corpus reading iterators that need to traverse
through a large nested directory structure to find files containing text.
|
| DocumentPreprocessor |
A class for preprocessing all types of documents.
|
| EnglishStemmer |
A wrapper for the english Snowball
Stemmer.
|
| FileDocument |
A
Document implementation backed by a File whose contents are
used for the document text. |
| FileListDocumentIterator |
An iterator implementation that returns
Document instances given a
file that contains list of files. |
| FileListTemporalDocumentIterator |
An iterator implementation that returns
TemporalDocument instances
given a file that contains list of files and their creation time stamps, each
on a separate line. |
| FilteredIterator |
An iterator over all the tokens in a stream that uses a
TokenFilter to
remove invalid tokens. |
| GermanStemmer |
A wrapper for the german Snowball
Stemmer.
|
| ItalianStemmer |
A wrapper for the italian Snowball
Stemmer.
|
| IteratorFactory |
A factory class for generating
Iterator<String> tokenizers for
streams of tokens such as BufferedReader instances. |
| LabeledParsedStringDocument |
An abstraction for a document that has been (or will be) dependency parsed to
generate an accompanying parse tree of its contents.
|
| LabeledStringDocument |
A
LabeledDocument implementation backed by a String whose
contents are used for the document text. |
| LimitedOneLinePerDocumentIterator |
An iterator decorator that returns
Document instances given a file
that contains list of files. |
| OneLinePerDocumentIterator |
An iterator implementation that returns
Document instances given a
file that contains list of files. |
| OneLinePerTemporalDocumentIterator |
An iterator implementation that returns
TemporalDocument instances
given a file where each line of text is treated as a separate document. |
| OrderPreservingFilteredIterator |
An iterator over all the tokens in a stream that uses a
TokenFilter
to remove invalid tokens and replaces them with the IteratorFactory.EMPTY_TOKEN string to signify their position. |
| PatPho |
An implementation of the PatPho phonological representation system.
|
| PorterStemmer |
This is an implementation of the Porter stemmer in Java.
|
| PukWaCDocumentIterator |
An iterator implementation that returns
Document containg a single
dependency parsed sentence given a file in the CoNLL Format which
is contained in the XML format provided in the WaCkypedia corpus. |
| SenseEvalDependencyCorpusReader |
A corpus reader for the SenseEvalDependency corpus.
|
| SnowballPorterStemmer |
A wrapper for the porter Snowball
Stemmer.
|
| StemmingIterator |
An iterator that stems
all of the tokens that it returns.
|
| StringDocument |
A
Document implementation backed by a String whose contents
are used for the document text. |
| StringUtils |
A collection of static methods for processing text.
|
| TemporalBloglinesCorpusReader |
A subclass of
BloglinesCorpusReader that always includes timestamps. |
| TemporalFileDocument |
A
TemporalDocument implementation backed by a File whose
contents are used for the document text. |
| TemporalStringDocument |
A
TemporalDocument implementation backed by a String whose
contents are used for the document text. |
| TemporalUsenetCorpusReader |
A subclass of
UsenetCorpusReader that always includes timestamps. |
| TermAssociationFinder | |
| TokenFilter |
A utility for asserting what tokens are valid and invalid within a stream of
tokens.
|
| UkWacDependencyFileIterator |
An iterator implementation that returns
Document containg a single
dependency parsed sentence given a file in the CoNLL Format |
| UkWaCDocumentIterator |
An iterator implementation that returns
Document instances labled
with the source URL from which its text was obtained, as specified in the
ukWaC. |
| UsenetCorpusReader | |
| WaCkypediaDocumentIterator |
An iterator implementation that returns
Document containg a single
dependency parsed sentence given a file in the CoNLL Format which
is contained in the XML format provided in the WaCkypedia corpus. |
| WordIterator |
An iterator over all of the tokens present in a
BufferedReader that
are separated by any amount of white space. |
| WordReplacementIterator |
An iterator over all tokens in a stream that replaces tokens if they have a
known replacement value.
|
Copyright © 2012. All Rights Reserved.