gov.llnl.ontology.text.corpora
Class SenseEval2007DocumentReader

java.lang.Object
  extended by gov.llnl.ontology.text.corpora.SenseEval2007DocumentReader
All Implemented Interfaces:
DocumentReader

public class SenseEval2007DocumentReader
extends Object
implements DocumentReader

A DocumentReader for the SenseEval 2007 corpus. This automatically removes the head tags from the document. It uses the instance name as the key, the title is just the keyterm. The id is the token index of the word that matches the title when both are stemmed. It does not generate any labels for a document.

This is not thread safe.

Author:
Keith Stevens

Field Summary
static String CORPUS_NAME
           
 
Constructor Summary
SenseEval2007DocumentReader()
           
 
Method Summary
 String corpusName()
          Returns CORPUS_NAME
 Document readDocument(String doc)
          Returns a Document represented by the given string.
 Document readDocument(String doc, String corpusName)
          Returns a Document represented by the given string and uses corpusName as the corpus name for the returned Document.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CORPUS_NAME

public static final String CORPUS_NAME
See Also:
Constant Field Values
Constructor Detail

SenseEval2007DocumentReader

public SenseEval2007DocumentReader()
Method Detail

corpusName

public String corpusName()
Returns CORPUS_NAME


readDocument

public Document readDocument(String doc)
Returns a Document represented by the given string.

Specified by:
readDocument in interface DocumentReader

readDocument

public Document readDocument(String doc,
                             String corpusName)
Returns a Document represented by the given string and uses corpusName as the corpus name for the returned Document.

Specified by:
readDocument in interface DocumentReader


Copyright © 2010-2011. All Rights Reserved.