|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgov.llnl.ontology.text.corpora.NYTDocumentReader
public class NYTDocumentReader
NYTDocumentReader
Created: Jun 17, 2008
Author: Evan Sandhaus (sandhes@nytimes.com)
Class for parsing New York Times articles from NITF files.
Field Summary | |
---|---|
static String |
DATE_PUBLICATION_ATTRIBUTE
NITF Constant |
Constructor Summary | |
---|---|
NYTDocumentReader()
|
Method Summary | |
---|---|
static NYTCorpusDocument |
parseNYTCorpusDocumentFromDOMDocument(Document document)
|
static NYTCorpusDocument |
parseNYTCorpusDocumentFromDOMDocument(File file,
Document document)
|
static NYTCorpusDocument |
parseNYTCorpusDocumentFromFile(File file,
boolean validating)
Parse an New York Times Document from a file. |
static NYTCorpusDocument |
parseNYTCorpusDocumentFromString(String str,
boolean validating)
Parse an New York Times Document from a string. |
NYTCorpusDocument |
readDocument(String doc)
Returns a Document represented by the given string. |
NYTCorpusDocument |
readDocument(String doc,
String corpusName)
Returns a Document represented by the given string and uses
corpusName as the corpus name for the returned Document . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String DATE_PUBLICATION_ATTRIBUTE
Constructor Detail |
---|
public NYTDocumentReader()
Method Detail |
---|
public NYTCorpusDocument readDocument(String doc)
Document
represented by the given string.
readDocument
in interface DocumentReader
public NYTCorpusDocument readDocument(String doc, String corpusName)
Document
represented by the given string and uses
corpusName
as the corpus name for the returned Document
.
readDocument
in interface DocumentReader
public static NYTCorpusDocument parseNYTCorpusDocumentFromFile(File file, boolean validating)
file
- The file from which to parse the document.disableValidation
- True if the file is to be validated against the
nitf DTD and false if it is not. It is recommended that validation
be disabled, as all documents in the corpus have previously been
validated against the NITF DTD.
public static NYTCorpusDocument parseNYTCorpusDocumentFromString(String str, boolean validating)
str
- The file from which to parse the document.disableValidation
- True if the file is to be validated against the
nitf DTD and false if it is not. It is recommended that validation
be disabled, as all documents in the corpus have previously been
validated against the NITF DTD.
public static NYTCorpusDocument parseNYTCorpusDocumentFromDOMDocument(File file, Document document)
public static NYTCorpusDocument parseNYTCorpusDocumentFromDOMDocument(Document document)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |