public class UsenetCorpusReader extends DirectoryCorpusReader<Document>
DirectoryCorpusReader for the Usenet
corpus provided by the Westbury Lab.
The corpus filenames are expected to remain unchanged from how they were
specified, i.e. have the numeric timestamp naming convention.| Modifier and Type | Class and Description |
|---|---|
class |
UsenetCorpusReader.InnerIterator |
DirectoryCorpusReader.BaseFileIterator| Constructor and Description |
|---|
UsenetCorpusReader()
Creates a reader for USENET document, starting with the specified file.
|
UsenetCorpusReader(boolean includeTimestamps)
Creates a reader for USENET document, starting with the specified file
and, if
includeTimestamps is true, prepending the
creation date of each document as the first token in the document. |
| Modifier and Type | Method and Description |
|---|---|
protected Iterator<Document> |
corpusIterator(Iterator<File> fileIter)
|
initialize, read, readpublic UsenetCorpusReader()
public UsenetCorpusReader(boolean includeTimestamps)
includeTimestamps is true, prepending the
creation date of each document as the first token in the document.protected Iterator<Document> corpusIterator(Iterator<File> fileIter)
DirectoryCorpusReaderIterator over documents contained in the Files
traversed by fileIter. Sub-classes are encouraged to sub-class
BaseFileIterator for the return value of this method.corpusIterator in class DirectoryCorpusReader<Document>Copyright © 2012. All Rights Reserved.