Package | Description |
---|---|
edu.ucla.sspace.dependency | |
edu.ucla.sspace.mains | |
edu.ucla.sspace.text | |
edu.ucla.sspace.text.corpora |
Class and Description |
---|
Stemmer
An interface for classes that stem tokens.
|
TokenFilter
A utility for asserting what tokens are valid and invalid within a stream of
tokens.
|
Class and Description |
---|
Document
An abstraction for a document that allows document processors to access text
in a uniform manner.
|
TemporalDocument
An abstraction for a document that allows document processors to access
time-annotated text in a uniform manner.
|
Class and Description |
---|
BloglinesCorpusReader
A
DirectoryCorpusReader for the bloglines corpus. |
CorpusReader
A basic interface for setting up a
CorpusReader , which reads un
cleaned text from corpus files and transforms them into an appropriately
cleaned Document instance. |
DirectoryCorpusReader
An abstract base class for corpus reading iterators that need to traverse
through a large nested directory structure to find files containing text.
|
DirectoryCorpusReader.BaseFileIterator |
Document
An abstraction for a document that allows document processors to access text
in a uniform manner.
|
DocumentPreprocessor
A class for preprocessing all types of documents.
|
LabeledDocument
An abstraction for a document that has an accompanying label or name.
|
LabeledParsedDocument
A union interface for a document that has been (or will be) dependency parsed
to generate an accompanying parse tree of its contents and that has an
accompanying label about its source or contents.
|
LabeledStringDocument
A
LabeledDocument implementation backed by a String whose
contents are used for the document text. |
ParsedDocument
An abstraction for a document that has been (or will be) dependency parsed to
generate an accompanying parse tree of its contents.
|
Stemmer
An interface for classes that stem tokens.
|
StringDocument
A
Document implementation backed by a String whose contents
are used for the document text. |
TemporalDocument
An abstraction for a document that allows document processors to access
time-annotated text in a uniform manner.
|
TokenFilter
A utility for asserting what tokens are valid and invalid within a stream of
tokens.
|
UsenetCorpusReader |
Class and Description |
---|
CorpusReader
A basic interface for setting up a
CorpusReader , which reads un
cleaned text from corpus files and transforms them into an appropriately
cleaned Document instance. |
DirectoryCorpusReader
An abstract base class for corpus reading iterators that need to traverse
through a large nested directory structure to find files containing text.
|
DirectoryCorpusReader.BaseFileIterator |
Document
An abstraction for a document that allows document processors to access text
in a uniform manner.
|
DocumentPreprocessor
A class for preprocessing all types of documents.
|
TemporalDocument
An abstraction for a document that allows document processors to access
time-annotated text in a uniform manner.
|
Copyright © 2012. All Rights Reserved.