public class UsenetCorpusReader extends DirectoryCorpusReader<Document>
DirectoryCorpusReader
for the Usenet
corpus provided by the Westbury Lab.
The corpus filenames are expected to remain unchanged from how they were
specified, i.e. have the numeric timestamp naming convention.Modifier and Type | Class and Description |
---|---|
class |
UsenetCorpusReader.UseNetIterator |
DirectoryCorpusReader.BaseFileIterator
Constructor and Description |
---|
UsenetCorpusReader() |
UsenetCorpusReader(DocumentPreprocessor preprocessor) |
Modifier and Type | Method and Description |
---|---|
protected Iterator<Document> |
corpusIterator(Iterator<File> files)
|
initialize, read, read
public UsenetCorpusReader()
public UsenetCorpusReader(DocumentPreprocessor preprocessor)
protected Iterator<Document> corpusIterator(Iterator<File> files)
DirectoryCorpusReader
Iterator
over documents contained in the File
s
traversed by fileIter
. Sub-classes are encouraged to sub-class
BaseFileIterator for the return value of this method.corpusIterator
in class DirectoryCorpusReader<Document>
Copyright © 2012. All Rights Reserved.