gov.llnl.ontology.text.corpora
Class SenseEval2007DocumentReader
java.lang.Object
gov.llnl.ontology.text.corpora.SenseEval2007DocumentReader
- All Implemented Interfaces:
- DocumentReader
public class SenseEval2007DocumentReader
- extends Object
- implements DocumentReader
A DocumentReader for the SenseEval 2007 corpus. This automatically
removes the head tags from the document. It uses the
instance name as the key, the title is just the keyterm. The id is the
token index of the word that matches the title when both are stemmed. It
does not generate any labels for a document.
This is not thread safe.
- Author:
- Keith Stevens
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CORPUS_NAME
public static final String CORPUS_NAME
- See Also:
- Constant Field Values
SenseEval2007DocumentReader
public SenseEval2007DocumentReader()
corpusName
public String corpusName()
- Returns
CORPUS_NAME
readDocument
public Document readDocument(String doc)
- Returns a
Document represented by the given string.
- Specified by:
readDocument in interface DocumentReader
readDocument
public Document readDocument(String doc,
String corpusName)
- Returns a
Document represented by the given string and uses
corpusName as the corpus name for the returned Document.
- Specified by:
readDocument in interface DocumentReader
Copyright © 2010-2011. All Rights Reserved.