gov.llnl.ontology.text.corpora
Class WackypediaDocumentReader

java.lang.Object
  extended by gov.llnl.ontology.text.corpora.UkWacDocumentReader
      extended by gov.llnl.ontology.text.corpora.WackypediaDocumentReader
All Implemented Interfaces:
DocumentReader

public class WackypediaDocumentReader
extends UkWacDocumentReader

A DocumentReader for the parsed wackypedia corpus. The wacky corpus should have the default xml formatting with the CoNLL sentence format. This DocumentReader will discard all of non-token information. The url in the id attribute of text is the key and text, and it's hash value is the id for each document.

This is not thread safe.

Author:
Keith Stevens

Field Summary
static String CORPUS_NAME
           
 
Constructor Summary
WackypediaDocumentReader()
           
 
Method Summary
 String corpusName()
          Returns CORPUS_NAME
 
Methods inherited from class gov.llnl.ontology.text.corpora.UkWacDocumentReader
readDocument, readDocument
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CORPUS_NAME

public static final String CORPUS_NAME
See Also:
Constant Field Values
Constructor Detail

WackypediaDocumentReader

public WackypediaDocumentReader()
Method Detail

corpusName

public String corpusName()
Returns CORPUS_NAME

Overrides:
corpusName in class UkWacDocumentReader


Copyright © 2010-2011. All Rights Reserved.