public class UsenetCorpusReader extends DirectoryCorpusReader<Document>
DirectoryCorpusReader
for the Usenet
corpus provided by the Westbury Lab.
The corpus filenames are expected to remain unchanged from how they were
specified, i.e. have the numeric timestamp naming convention.Modifier and Type | Class and Description |
---|---|
class |
UsenetCorpusReader.InnerIterator |
DirectoryCorpusReader.BaseFileIterator
Constructor and Description |
---|
UsenetCorpusReader()
Creates a reader for USENET document, starting with the specified file.
|
UsenetCorpusReader(boolean includeTimestamps)
Creates a reader for USENET document, starting with the specified file
and, if
includeTimestamps is true , prepending the
creation date of each document as the first token in the document. |
Modifier and Type | Method and Description |
---|---|
protected Iterator<Document> |
corpusIterator(Iterator<File> fileIter)
|
initialize, read, read
public UsenetCorpusReader()
public UsenetCorpusReader(boolean includeTimestamps)
includeTimestamps
is true
, prepending the
creation date of each document as the first token in the document.protected Iterator<Document> corpusIterator(Iterator<File> fileIter)
DirectoryCorpusReader
Iterator
over documents contained in the File
s
traversed by fileIter
. Sub-classes are encouraged to sub-class
BaseFileIterator for the return value of this method.corpusIterator
in class DirectoryCorpusReader<Document>
Copyright © 2012. All Rights Reserved.