gov.llnl.ontology.text.corpora
Class SenseEval2007DocumentReader
java.lang.Object
gov.llnl.ontology.text.corpora.SenseEval2007DocumentReader
- All Implemented Interfaces:
- DocumentReader
public class SenseEval2007DocumentReader
- extends Object
- implements DocumentReader
A DocumentReader
for the SenseEval 2007 corpus. This automatically
removes the head
tags from the document. It uses the
instance name as the key, the title is just the keyterm. The id is the
token index of the word that matches the title when both are stemmed. It
does not generate any labels for a document.
This is not thread safe.
- Author:
- Keith Stevens
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CORPUS_NAME
public static final String CORPUS_NAME
- See Also:
- Constant Field Values
SenseEval2007DocumentReader
public SenseEval2007DocumentReader()
corpusName
public String corpusName()
- Returns
CORPUS_NAME
readDocument
public Document readDocument(String doc)
- Returns a
Document
represented by the given string.
- Specified by:
readDocument
in interface DocumentReader
readDocument
public Document readDocument(String doc,
String corpusName)
- Returns a
Document
represented by the given string and uses
corpusName
as the corpus name for the returned Document
.
- Specified by:
readDocument
in interface DocumentReader
Copyright © 2010-2011. All Rights Reserved.