gov.llnl.ontology.text.corpora
Class PubMedDocumentReader
java.lang.Object
   org.xml.sax.helpers.DefaultHandler
org.xml.sax.helpers.DefaultHandler
       gov.llnl.ontology.text.corpora.PubMedDocumentReader
gov.llnl.ontology.text.corpora.PubMedDocumentReader
- All Implemented Interfaces: 
- DocumentReader, ContentHandler, DTDHandler, EntityResolver, ErrorHandler
- public class PubMedDocumentReader 
- extends DefaultHandler- implements DocumentReader
A DocumentReader for the PubMed corpus.  PubMed is formatted as a
 series of documents in a single xml file.  this DocumentReader works
 as a DefaultHandler for the SAXParser and will read one full
 document per call to readDocument(java.lang.String, java.lang.String).  Text in NameOfSubstance
 tags are the document labels, text in ArticleTitle is the title, text
 in PMID serves as the id and key value, and text in Abstract
 is the raw document text.
 
 
 This is not thread safe.
- Author:
- Keith Stevens
 
 
| Methods inherited from class org.xml.sax.helpers.DefaultHandler | 
| endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning | 
 
| Methods inherited from class java.lang.Object | 
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
PubMedDocumentReader
public PubMedDocumentReader()
- Creates a new PubMedDocumentReader
 
readDocument
public Document readDocument(String originalText,
                             String corpusName)
- Returns a Documentrepresented by the given string and usescorpusNameas the corpus name for the returnedDocument.
 
- 
- Specified by:
- readDocumentin interface- DocumentReader
 
- 
 
readDocument
public Document readDocument(String originalText)
- Returns a Documentrepresented by the given string.
 
- 
- Specified by:
- readDocumentin interface- DocumentReader
 
- 
 
startElement
public void startElement(String uri,
                         String localName,
                         String name,
                         Attributes atts)
                  throws SAXException
- 
- Specified by:
- startElementin interface- ContentHandler
- Overrides:
- startElementin class- DefaultHandler
 
- 
- Throws:
- SAXException
 
characters
public void characters(char[] ch,
                       int start,
                       int length)
                throws SAXException
- 
- Specified by:
- charactersin interface- ContentHandler
- Overrides:
- charactersin class- DefaultHandler
 
- 
- Throws:
- SAXException
 
endElement
public void endElement(String uri,
                       String localName,
                       String name)
                throws SAXException
- 
- Specified by:
- endElementin interface- ContentHandler
- Overrides:
- endElementin class- DefaultHandler
 
- 
- Throws:
- SAXException
 
Copyright © 2010-2011. All Rights Reserved.