Interface | Description |
---|---|
AnnotatedDocument |
An abstraction for a document that allows document processors to access text
in a uniform manner.
|
CorpusReader<D extends Document> |
A basic interface for setting up a
CorpusReader , which reads un
cleaned text from corpus files and transforms them into an appropriately
cleaned Document instance. |
Document |
An abstraction for a document that allows document processors to access text
in a uniform manner.
|
LabeledDocument |
An abstraction for a document that has an accompanying label or name.
|
LabeledParsedDocument |
A union interface for a document that has been (or will be) dependency parsed
to generate an accompanying parse tree of its contents and that has an
accompanying label about its source or contents.
|
ParsedDocument |
An abstraction for a document that has been (or will be) dependency parsed to
generate an accompanying parse tree of its contents.
|
Stemmer |
An interface for classes that stem tokens.
|
TemporalDocument |
An abstraction for a document that allows document processors to access
time-annotated text in a uniform manner.
|
Class | Description |
---|---|
BloglinesCorpusReader |
A
DirectoryCorpusReader for the bloglines corpus. |
BufferedFileListDocumentIterator |
An iterator implementation that returns
Document instances given a
file that contains list of files, buffering their contents as necessary. |
BufferedIterator |
An iterator over all the tokens in a stream, which supports arbitrary
look-ahead into the stream by buffering the tokens.
|
ChildesCorpusReader |
A corpus reader for the Childes corpus.
|
CompoundWordIterator |
An iterator over all the tokens in a stream, which supports tokenizing
predetermined n-grams as single tokens.
|
DependencyFileDocumentIterator |
An iterator implementation that returns
Document containg a single
dependency parsed sentence given a file in the CoNLL Format |
DirectoryCorpusReader<D extends Document> |
An abstract base class for corpus reading iterators that need to traverse
through a large nested directory structure to find files containing text.
|
DocumentPreprocessor |
A class for preprocessing all types of documents.
|
EnglishStemmer |
A wrapper for the english Snowball
Stemmer.
|
FileDocument |
A
Document implementation backed by a File whose contents are
used for the document text. |
FileListDocumentIterator |
An iterator implementation that returns
Document instances given a
file that contains list of files. |
FileListTemporalDocumentIterator |
An iterator implementation that returns
TemporalDocument instances
given a file that contains list of files and their creation time stamps, each
on a separate line. |
FilteredIterator |
An iterator over all the tokens in a stream that uses a
TokenFilter to
remove invalid tokens. |
GermanStemmer |
A wrapper for the german Snowball
Stemmer.
|
ItalianStemmer |
A wrapper for the italian Snowball
Stemmer.
|
IteratorFactory |
A factory class for generating
Iterator<String> tokenizers for
streams of tokens such as BufferedReader instances. |
LabeledParsedStringDocument |
An abstraction for a document that has been (or will be) dependency parsed to
generate an accompanying parse tree of its contents.
|
LabeledStringDocument |
A
LabeledDocument implementation backed by a String whose
contents are used for the document text. |
LimitedOneLinePerDocumentIterator |
An iterator decorator that returns
Document instances given a file
that contains list of files. |
OneLinePerDocumentIterator |
An iterator implementation that returns
Document instances given a
file that contains list of files. |
OneLinePerTemporalDocumentIterator |
An iterator implementation that returns
TemporalDocument instances
given a file where each line of text is treated as a separate document. |
OrderPreservingFilteredIterator |
An iterator over all the tokens in a stream that uses a
TokenFilter
to remove invalid tokens and replaces them with the IteratorFactory.EMPTY_TOKEN string to signify their position. |
PatPho |
An implementation of the PatPho phonological representation system.
|
PorterStemmer |
This is an implementation of the Porter stemmer in Java.
|
PukWaCDocumentIterator |
An iterator implementation that returns
Document containg a single
dependency parsed sentence given a file in the CoNLL Format which
is contained in the XML format provided in the WaCkypedia corpus. |
SenseEvalDependencyCorpusReader |
A corpus reader for the SenseEvalDependency corpus.
|
SnowballPorterStemmer |
A wrapper for the porter Snowball
Stemmer.
|
StemmingIterator |
An iterator that stems
all of the tokens that it returns.
|
StringDocument |
A
Document implementation backed by a String whose contents
are used for the document text. |
StringUtils |
A collection of static methods for processing text.
|
TemporalBloglinesCorpusReader |
A subclass of
BloglinesCorpusReader that always includes timestamps. |
TemporalFileDocument |
A
TemporalDocument implementation backed by a File whose
contents are used for the document text. |
TemporalStringDocument |
A
TemporalDocument implementation backed by a String whose
contents are used for the document text. |
TemporalUsenetCorpusReader |
A subclass of
UsenetCorpusReader that always includes timestamps. |
TermAssociationFinder | |
TokenFilter |
A utility for asserting what tokens are valid and invalid within a stream of
tokens.
|
UkWacDependencyFileIterator |
An iterator implementation that returns
Document containg a single
dependency parsed sentence given a file in the CoNLL Format |
UkWaCDocumentIterator |
An iterator implementation that returns
Document instances labled
with the source URL from which its text was obtained, as specified in the
ukWaC. |
UsenetCorpusReader | |
WaCkypediaDocumentIterator |
An iterator implementation that returns
Document containg a single
dependency parsed sentence given a file in the CoNLL Format which
is contained in the XML format provided in the WaCkypedia corpus. |
WordIterator |
An iterator over all of the tokens present in a
BufferedReader that
are separated by any amount of white space. |
WordReplacementIterator |
An iterator over all tokens in a stream that replaces tokens if they have a
known replacement value.
|
Copyright © 2012. All Rights Reserved.