gov.llnl.ontology.text.corpora
Class PubMedDocumentReader
java.lang.Object
org.xml.sax.helpers.DefaultHandler
gov.llnl.ontology.text.corpora.PubMedDocumentReader
- All Implemented Interfaces:
- DocumentReader, ContentHandler, DTDHandler, EntityResolver, ErrorHandler
public class PubMedDocumentReader
- extends DefaultHandler
- implements DocumentReader
A DocumentReader for the PubMed corpus. PubMed is formatted as a
series of documents in a single xml file. this DocumentReader works
as a DefaultHandler for the SAXParser and will read one full
document per call to readDocument(java.lang.String, java.lang.String). Text in NameOfSubstance
tags are the document labels, text in ArticleTitle is the title, text
in PMID serves as the id and key value, and text in Abstract
is the raw document text.
This is not thread safe.
- Author:
- Keith Stevens
| Methods inherited from class org.xml.sax.helpers.DefaultHandler |
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PubMedDocumentReader
public PubMedDocumentReader()
- Creates a new
PubMedDocumentReader
readDocument
public Document readDocument(String originalText,
String corpusName)
- Returns a
Document represented by the given string and uses
corpusName as the corpus name for the returned Document.
- Specified by:
readDocument in interface DocumentReader
readDocument
public Document readDocument(String originalText)
- Returns a
Document represented by the given string.
- Specified by:
readDocument in interface DocumentReader
startElement
public void startElement(String uri,
String localName,
String name,
Attributes atts)
throws SAXException
- Specified by:
startElement in interface ContentHandler- Overrides:
startElement in class DefaultHandler
- Throws:
SAXException
characters
public void characters(char[] ch,
int start,
int length)
throws SAXException
- Specified by:
characters in interface ContentHandler- Overrides:
characters in class DefaultHandler
- Throws:
SAXException
endElement
public void endElement(String uri,
String localName,
String name)
throws SAXException
- Specified by:
endElement in interface ContentHandler- Overrides:
endElement in class DefaultHandler
- Throws:
SAXException
Copyright © 2010-2011. All Rights Reserved.