SemEvalLexSubReader (S-Space Package 2.0.1 API)

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.xml.sax.helpers.DefaultHandler
- - edu.ucla.sspace.text.corpora.SemEvalLexSubReader

All Implemented Interfaces:

CorpusReader<Document>, ContentHandler, DTDHandler, EntityResolver, ErrorHandler
```
public class SemEvalLexSubReader
extends org.xml.sax.helpers.DefaultHandler
implements CorpusReader<Document>
```
Reads the xml corpus files for the SemEval 2010 Lexical Substition task, available here. Each file contains all of the contexts for a single word. The xml files should be unchanged from their original format.
This CorpusReader returns documents in the following format:
word_instance_id text ... ||| *focus_word* text ...
Note that this is implemented as a DefaultHandler for a SAXParser.

Author:

Keith Stevens

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

class SemEvalLexSubReader.SemEvalHandler

Constructor Summary

Constructors
Constructor and Description

SemEvalLexSubReader()

Method Summary

Methods
Modifier and Type	Method and Description
`Iterator<Document>`	`read(File file)` Returns a `Iterator` that traverses the documents containted in the given `file`.
`Iterator<Document>`	`read(Reader reader)` Retrusn a `Iterator` that traverses the documents contained in `baseReader`.

Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - SemEvalLexSubReader
```
public SemEvalLexSubReader()
```
- Method Detail
  - read
```
public Iterator<Document> read(Reader reader)
```
    Retrusn a Iterator that traverses the documents contained in baseReader.
    
    Specified by:
    
    read in interface CorpusReader<Document>
    
    Parameters:
    reader - A Reader that will extract text from a data source, such as a URL, a File, a data stream, or any other source accesible via the Reader interface. Each CorpusReader should specify the expected text format, be it an XML schema or some other unique format.
  - read
```
public Iterator<Document> read(File file)
```
    Returns a Iterator that traverses the documents containted in the given file.
    
    Specified by:
    
    read in interface CorpusReader<Document>
    
    Parameters:
    file - A text file holding documents in a format that is readable by a particular CorpusReader. This text file may have it's own unique text structure or an xml format. Each CorpusReader should specify the expected text format.

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2012. All Rights Reserved.