public abstract class DirectoryCorpusReader<D extends Document> extends Object implements CorpusReader<D>
initialize(String) uses the argument as a directory name,
and not the name of a text file to be processed. This CorpusReader
instead uses that directory name as the root directory of the nested
directory structure. This class also does not implement initialize(Reader).| Modifier and Type | Class and Description |
|---|---|
class |
DirectoryCorpusReader.BaseFileIterator |
| Constructor and Description |
|---|
DirectoryCorpusReader()
Constructs a new
DirectoryCoprusReader that uses no DocumentPreprocessor. |
DirectoryCorpusReader(DocumentPreprocessor processor)
Constructs a new
DirectoryCoprusReader that uses processor to pre-process any raw text extracted from a corpus file. |
| Modifier and Type | Method and Description |
|---|---|
protected abstract Iterator<D> |
corpusIterator(Iterator<File> fileIter)
|
void |
initialize(Reader baseReader)
Unsupported.
|
Iterator<D> |
read(File dir)
Initializes the
DirectoryCorpusReader to start processing all
files accessbile under the directory specified by dirName. |
Iterator<D> |
read(Reader reader)
Unsupported.
|
public DirectoryCorpusReader()
DirectoryCoprusReader that uses no DocumentPreprocessor.public DirectoryCorpusReader(DocumentPreprocessor processor)
DirectoryCoprusReader that uses processor to pre-process any raw text extracted from a corpus file.public Iterator<D> read(File dir)
DirectoryCorpusReader to start processing all
files accessbile under the directory specified by dirName.read in interface CorpusReader<D extends Document>dirName - A directory path containing a large directory structure
that contains numerous text files that can be processed by a
subclass of DirectoryCoprusReader.public Iterator<D> read(Reader reader)
read in interface CorpusReader<D extends Document>reader - A Reader that will extract text from a data
source, such as a URL, a File, a data stream, or any other source
accesible via the Reader interface. Each CorpusReader should specify the expected text format, be it an
XML schema or some other unique format.public void initialize(Reader baseReader)
UnsupportedOperationException - when calledCopyright © 2012. All Rights Reserved.