gov.llnl.ontology.text.corpora
Class PubMedDocumentReader
java.lang.Object
org.xml.sax.helpers.DefaultHandler
gov.llnl.ontology.text.corpora.PubMedDocumentReader
- All Implemented Interfaces:
- DocumentReader, ContentHandler, DTDHandler, EntityResolver, ErrorHandler
public class PubMedDocumentReader
- extends DefaultHandler
- implements DocumentReader
A DocumentReader
for the PubMed corpus. PubMed is formatted as a
series of documents in a single xml file. this DocumentReader
works
as a DefaultHandler
for the SAXParser
and will read one full
document per call to readDocument(java.lang.String, java.lang.String)
. Text in NameOfSubstance
tags are the document labels, text in ArticleTitle
is the title, text
in PMID
serves as the id and key value, and text in Abstract
is the raw document text.
This is not thread safe.
- Author:
- Keith Stevens
Methods inherited from class org.xml.sax.helpers.DefaultHandler |
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PubMedDocumentReader
public PubMedDocumentReader()
- Creates a new
PubMedDocumentReader
readDocument
public Document readDocument(String originalText,
String corpusName)
- Returns a
Document
represented by the given string and uses
corpusName
as the corpus name for the returned Document
.
- Specified by:
readDocument
in interface DocumentReader
readDocument
public Document readDocument(String originalText)
- Returns a
Document
represented by the given string.
- Specified by:
readDocument
in interface DocumentReader
startElement
public void startElement(String uri,
String localName,
String name,
Attributes atts)
throws SAXException
- Specified by:
startElement
in interface ContentHandler
- Overrides:
startElement
in class DefaultHandler
- Throws:
SAXException
characters
public void characters(char[] ch,
int start,
int length)
throws SAXException
- Specified by:
characters
in interface ContentHandler
- Overrides:
characters
in class DefaultHandler
- Throws:
SAXException
endElement
public void endElement(String uri,
String localName,
String name)
throws SAXException
- Specified by:
endElement
in interface ContentHandler
- Overrides:
endElement
in class DefaultHandler
- Throws:
SAXException
Copyright © 2010-2011. All Rights Reserved.