gov.llnl.ontology.text.corpora
Class WackypediaDocumentReader
java.lang.Object
gov.llnl.ontology.text.corpora.UkWacDocumentReader
gov.llnl.ontology.text.corpora.WackypediaDocumentReader
- All Implemented Interfaces:
- DocumentReader
public class WackypediaDocumentReader
- extends UkWacDocumentReader
A DocumentReader
for the parsed wackypedia corpus. The wacky corpus
should have the default xml formatting with the CoNLL sentence format. This
DocumentReader
will discard all of non-token information. The url in
the id
attribute of text
is the key and text, and it's hash
value is the id for each document.
This is not thread safe.
- Author:
- Keith Stevens
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CORPUS_NAME
public static final String CORPUS_NAME
- See Also:
- Constant Field Values
WackypediaDocumentReader
public WackypediaDocumentReader()
corpusName
public String corpusName()
- Returns
CORPUS_NAME
- Overrides:
corpusName
in class UkWacDocumentReader
Copyright © 2010-2011. All Rights Reserved.