public abstract class DirectoryCorpusReader<D extends Document> extends Object implements CorpusReader<D>
initialize(String)
uses the argument as a directory name,
and not the name of a text file to be processed. This CorpusReader
instead uses that directory name as the root directory of the nested
directory structure. This class also does not implement initialize(Reader)
.Modifier and Type | Class and Description |
---|---|
class |
DirectoryCorpusReader.BaseFileIterator |
Constructor and Description |
---|
DirectoryCorpusReader()
Constructs a new
DirectoryCoprusReader that uses no DocumentPreprocessor . |
DirectoryCorpusReader(DocumentPreprocessor processor)
Constructs a new
DirectoryCoprusReader that uses processor to pre-process any raw text extracted from a corpus file. |
Modifier and Type | Method and Description |
---|---|
protected abstract Iterator<D> |
corpusIterator(Iterator<File> fileIter)
|
void |
initialize(Reader baseReader)
Unsupported.
|
Iterator<D> |
read(File dir)
Initializes the
DirectoryCorpusReader to start processing all
files accessbile under the directory specified by dirName . |
Iterator<D> |
read(Reader reader)
Unsupported.
|
public DirectoryCorpusReader()
DirectoryCoprusReader
that uses no DocumentPreprocessor
.public DirectoryCorpusReader(DocumentPreprocessor processor)
DirectoryCoprusReader
that uses processor
to pre-process any raw text extracted from a corpus file.public Iterator<D> read(File dir)
DirectoryCorpusReader
to start processing all
files accessbile under the directory specified by dirName
.read
in interface CorpusReader<D extends Document>
dirName
- A directory path containing a large directory structure
that contains numerous text files that can be processed by a
subclass of DirectoryCoprusReader
.public Iterator<D> read(Reader reader)
read
in interface CorpusReader<D extends Document>
reader
- A Reader
that will extract text from a data
source, such as a URL, a File, a data stream, or any other source
accesible via the Reader
interface. Each CorpusReader
should specify the expected text format, be it an
XML schema or some other unique format.public void initialize(Reader baseReader)
UnsupportedOperationException
- when calledCopyright © 2012. All Rights Reserved.