|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgov.llnl.ontology.text.corpora.NYTCorpusDocument
public class NYTCorpusDocument
NYTimesLDCDocument
Created: Jun 17, 2008
Author: Evan Sandhaus (sandhes@nytimes.com)
This class represents a New York Times Corpus Document. See field comments for individual field description.
Field Summary | |
---|---|
protected URL |
alternateURL
This field specifies the location on nytimes.com of the article. |
protected String |
articleAbstract
This field is a summary of the article written by the New York Times Indexing Service. |
protected String |
authorBiography
This field specifies the biography of the author of the article. |
protected String |
banner
The banner field is used to indicate if there has been additional information appended to the articles since its publication. |
protected List<String> |
biographicalCategories
When present, the biographical category field generally indicates that a document focuses on a particular individual. |
protected String |
body
The body field is the text content of the article. |
protected String |
byline
This field specifies the byline of the article as it appeared in the print edition of the New York Times. |
protected String |
columnName
If the article is part of a regular column, this field specifies the name of that column. |
protected Integer |
columnNumber
This field specifies the column in which the article starts in the print paper. |
static String |
CORPUS_NAME
The corpus name for any Document returned by this class. |
protected Date |
correctionDate
This field specifies the date on which a correction was made to the article. |
protected String |
correctionText
For articles corrected following publication, this field specifies the correction. |
protected String |
credit
This field indicates the entity that produced the editorial content of this document. |
protected String |
dateline
The �dateline� field is the dateline of the article. |
protected String |
dayOfWeek
This field specifies the day of week on which the article was published. |
protected List<String> |
descriptors
The �descriptors� field specifies a list of descriptive terms drawn from a normalized controlled vocabulary corresponding to subjects mentioned in the article. |
protected String |
featurePage
The |
protected List<String> |
generalOnlineDescriptors
The �general online descriptors� field specifies a list of descriptors that are at a higher level of generality than the other tags associated with the article. |
protected int |
guid
The GUID field specifies a an integer that is guaranteed to be unique for every document in the corpus. |
protected String |
headline
This field specifies the headline of the article as it appeared in the print edition of the New York Times. |
protected String |
kicker
The kicker is an additional piece of information printed as an accompaniment to a news headline. |
protected String |
leadParagraph
The �lead Paragraph� field is the lead paragraph of the article. |
protected List<String> |
locations
The �locations� field specifies a list of geographic descriptors drawn from a normalized controlled vocabulary that correspond to places mentioned in the article. |
protected List<String> |
names
The �names� field specifies a list of names mentioned in the article. |
protected String |
newsDesk
This field specifies the desk in the New York Times newsroom that produced the article. |
protected String |
normalizedByline
The Normalized Byline field is the byline normalized to the form (last name, first name). |
protected List<String> |
onlineDescriptors
This field specifies a list of descriptors from a normalized controlled vocabulary that correspond to topics mentioned in the article. |
protected String |
onlineHeadline
This field specifies the headline displayed with the article on nytimes.com. |
protected String |
onlineLeadParagraph
This field specifies the lead paragraph as defined by the producers at nytimes.com. |
protected List<String> |
onlineLocations
This field specifies a list of place names that correspond to geographic locations mentioned in the article. |
protected List<String> |
onlineOrganizations
This field specifies a list of organizations that correspond to organizations mentioned in the article. |
protected List<String> |
onlinePeople
This field specifies a list of people that correspond to individuals mentioned in the article. |
protected String |
onlineSection
This field specifies the section(s) on nytimes.com in which the article is placed. |
protected List<String> |
onlineTitles
This field specifies a list of authored works mentioned in the article. |
protected List<String> |
organizations
This field specifies a list of organization names drawn from a normalized controlled vocabulary that correspond to organizations mentioned in the article. |
protected String |
originalText
The original xml text for this Document . |
protected Integer |
page
This field specifies the page of the section in the paper in which the article appears. |
protected List<String> |
people
This field specifies a list of people from a normalized controlled vocabulary that correspond to individuals mentioned in the article. |
protected Date |
publicationDate
This field specifies the date of the article�s publication. |
protected Integer |
publicationDayOfMonth
This field specifies the day of the month on which the article was published, always in the range 1-31. |
protected Integer |
publicationMonth
This field specifies the month on which the article was published in the range 1-12 where 1 is January 2 is February etc. |
protected Integer |
publicationYear
This field specifies the year in which the article was published. |
protected String |
section
This field specifies the section of the paper in which the article appears. |
protected String |
seriesName
If the article is part of a regular series, this field specifies the name of that column. |
protected String |
slug
The slug is a short string that uniquely identifies an article from all other articles published on the same day. |
protected File |
sourceFile
The file from which this object was read. |
protected List<String> |
taxonomicClassifiers
This field specifies a list of taxonomic classifiers that place this article into a hierarchy of articles. |
protected List<String> |
titles
This field specifies a list of authored works that correspond to works mentioned in the article. |
protected List<String> |
typesOfMaterial
This field specifies a normalized list of terms describing the general editorial category of the article. |
protected URL |
url
This field specifies the location on nytimes.com of the article. |
protected Integer |
wordCount
This field specifies the number of words in the body of the article, including the lead paragraph. |
Constructor Summary | |
---|---|
NYTCorpusDocument()
|
Method Summary | |
---|---|
Set<String> |
categories()
Returns the set of categories that this document has, if any. |
URL |
getAlternateURL()
Accessor for the alternateURL property. |
String |
getArticleAbstract()
Accessor for the articleAbstract property. |
String |
getAuthorBiography()
Accessor for the authorBiography property. |
String |
getBanner()
Accessor for the banner property. |
List<String> |
getBiographicalCategories()
Accessor for the biographicalCategories property. |
String |
getBody()
Accessor for the body property. |
String |
getByline()
Accessor for the byline property. |
String |
getColumnName()
Accessor for the columnName property. |
Integer |
getColumnNumber()
Accessor for the columnNumber property. |
Date |
getCorrectionDate()
Accessor for the correctionDate property. |
String |
getCorrectionText()
Accessor for the correctionText property. |
String |
getCredit()
Accessor for the credit property. |
String |
getDateline()
Accessor for the dateline property. |
String |
getDayOfWeek()
Accessor for the dayOfWeek property. |
List<String> |
getDescriptors()
Accessor for the descriptors property. |
String |
getFeaturePage()
Accessor for the featurePage property. |
List<String> |
getGeneralOnlineDescriptors()
Accessor for the generalOnlineDescriptors property. |
int |
getGuid()
Accessor for the guid property. |
String |
getHeadline()
Accessor for the headline property. |
String |
getKicker()
Accessor for the kicker property. |
String |
getLeadParagraph()
Accessor for the leadParagraph property. |
List<String> |
getLocations()
Accessor for the locations property. |
List<String> |
getNames()
Accessor for the names property. |
String |
getNewsDesk()
Accessor for the newsDesk property. |
String |
getNormalizedByline()
Accessor for the normalizedByline property. |
List<String> |
getOnlineDescriptors()
Accessor for the onlineDescriptors property. |
String |
getOnlineHeadline()
Accessor for the onlineHeadline property. |
String |
getOnlineLeadParagraph()
Accessor for the onlineLeadParagraph property. |
List<String> |
getOnlineLocations()
Accessor for the onlineLocations property. |
List<String> |
getOnlineOrganizations()
Accessor for the onlineOrganizations property. |
List<String> |
getOnlinePeople()
Accessor for the onlinePeople property. |
String |
getOnlineSection()
Accessor for the onlineSection property. |
List<String> |
getOnlineTitles()
Accessor for the onlineTitles property. |
List<String> |
getOrganizations()
Accessor for the organizations property. |
Integer |
getPage()
Accessor for the page property. |
List<String> |
getPeople()
Accessor for the people property. |
Date |
getPublicationDate()
Accessor for the publicationDate property. |
Integer |
getPublicationDayOfMonth()
Accessor for the publicationDayOfMonth property. |
Integer |
getPublicationMonth()
Accessor for the publicationMonth property. |
Integer |
getPublicationYear()
Accessor for the publicationYear property. |
String |
getSection()
Accessor for the section property. |
String |
getSeriesName()
Accessor for the seriesName property. |
String |
getSlug()
Accessor for the slug property. |
File |
getSourceFile()
Accessor for the sourceFile property. |
List<String> |
getTaxonomicClassifiers()
Accessor for the taxonomicClassifiers property. |
List<String> |
getTitles()
Accessor for the titles property. |
List<String> |
getTypesOfMaterial()
Accessor for the typesOfMaterial property. |
URL |
getUrl()
Accessor for the url property. |
Integer |
getWordCount()
Accessor for the wordCount property. |
long |
id()
Returns a unique identifier for this document. |
String |
key()
Returns a string name of this document. |
String |
originalText()
Returns the original, uncleaned text. |
String |
rawText()
Returns the raw text of the corpus. |
void |
setAlternateURL(URL alternateURL)
Setter for the alternateURL property. |
void |
setArticleAbstract(String articleAbstract)
Setter for the articleAbstract property. |
void |
setAuthorBiography(String authorBiography)
Setter for the authorBiography property. |
void |
setBanner(String banner)
Setter for the banner property. |
void |
setBiographicalCategories(List<String> biographicalCategories)
Setter for the biographicalCategories property. |
void |
setBody(String body)
Setter for the body property. |
void |
setByline(String byline)
Setter for the byline property. |
void |
setColumnName(String columnName)
Setter for the columnName property. |
void |
setColumnNumber(Integer columnNumber)
Setter for the columnNumber property. |
void |
setCorrectionDate(Date correctionDate)
Setter for the correctionDate property. |
void |
setCorrectionText(String correctionText)
Setter for the correctionText property. |
void |
setCredit(String credit)
Setter for the credit property. |
void |
setDateline(String dateline)
Setter for the dateline property. |
void |
setDayOfWeek(String dayOfWeek)
Setter for the dayOfWeek property. |
void |
setDescriptors(List<String> descriptors)
Setter for the descriptors property. |
void |
setFeaturePage(String featurePage)
Setter for the featurePage property. |
void |
setGeneralOnlineDescriptors(List<String> generalOnlineDescriptors)
Setter for the generalOnlineDescriptors property. |
void |
setGuid(int guid)
Setter for the guid property. |
void |
setHeadline(String headline)
Setter for the headline property. |
void |
setKicker(String kicker)
Setter for the kicker property. |
void |
setLeadParagraph(String leadParagraph)
Setter for the leadParagraph property. |
void |
setLocations(List<String> locations)
Setter for the locations property. |
void |
setNames(List<String> names)
Setter for the names property. |
void |
setNewsDesk(String newsDesk)
Setter for the newsDesk property. |
void |
setNormalizedByline(String normalizedByline)
Setter for the normalizedByline property. |
void |
setOnlineDescriptors(List<String> onlineDescriptors)
Setter for the onlineDescriptors property. |
void |
setOnlineHeadline(String onlineHeadline)
Setter for the onlineHeadline property. |
void |
setOnlineLeadParagraph(String onlineLeadParagraph)
Setter for the onlineLeadParagraph property. |
void |
setOnlineLocations(List<String> onlineLocations)
Setter for the onlineLocations property. |
void |
setOnlineOrganizations(List<String> onlineOrganizations)
Setter for the onlineOrganizations property. |
void |
setOnlinePeople(List<String> onlinePeople)
Setter for the onlinePeople property. |
void |
setOnlineSection(String onlineSection)
Setter for the onlineSection property. |
void |
setOnlineTitles(List<String> onlineTitles)
Setter for the onlineTitles property. |
void |
setOrganizations(List<String> organizations)
Setter for the organizations property. |
void |
setOriginalText(String text)
Setter for the original text. |
void |
setPage(Integer page)
Setter for the page property. |
void |
setPeople(List<String> people)
Setter for the people property. |
void |
setPublicationDate(Date publicationDate)
Setter for the publicationDate property. |
void |
setPublicationDayOfMonth(Integer publicationDayOfMonth)
Setter for the publicationDayOfMonth property. |
void |
setPublicationMonth(Integer publicationMonth)
Setter for the publicationMonth property. |
void |
setPublicationYear(Integer publicationYear)
Setter for the publicationYear property. |
void |
setSection(String section)
Setter for the section property. |
void |
setSeriesName(String seriesName)
Setter for the seriesName property. |
void |
setSlug(String slug)
Setter for the slug property. |
void |
setSourceFile(File sourceFile)
Setter for the sourceFile property. |
void |
setTaxonomicClassifiers(List<String> taxonomicClassifiers)
Setter for the taxonomicClassifiers property. |
void |
setTitles(List<String> titles)
Setter for the titles property. |
void |
setTypesOfMaterial(List<String> typesOfMaterial)
Setter for the typesOfMaterial property. |
void |
setUrl(URL url)
Setter for the url property. |
void |
setWordCount(Integer wordCount)
Setter for the wordCount property. |
String |
sourceCorpus()
Returns the name of the source corpus. |
String |
title()
Returns the title of this document, if any exists. |
String |
toString()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String CORPUS_NAME
Document
returned by this class.
protected URL alternateURL
protected String articleAbstract
protected String authorBiography
protected String banner
protected List<String> biographicalCategories
protected String body
protected String byline
Sample byline:
protected String columnName
Sample Column Names:
protected Integer columnNumber
protected Date correctionDate
protected String correctionText
protected String credit
protected String dateline
Sample datelines:
protected String dayOfWeek
protected List<String> descriptors
Examples Include:
protected String featurePage
protected List<String> generalOnlineDescriptors
Examples Include:
protected int guid
protected String headline
protected String kicker
protected String leadParagraph
protected List<String> locations
Examples Include:
protected List<String> names
Examples Include:
protected String newsDesk
protected String normalizedByline
protected List<String> onlineDescriptors
Examples Include:
protected String onlineHeadline
protected String onlineLeadParagraph
protected List<String> onlineLocations
Examples Include:
protected List<String> onlineOrganizations
Examples Include:
protected List<String> onlinePeople
Examples Include:
protected String onlineSection
protected List<String> onlineTitles
Examples Include:
protected List<String> organizations
Examples Include:
protected Integer page
protected List<String> people
Examples Include:
protected Date publicationDate
protected Integer publicationDayOfMonth
protected Integer publicationMonth
protected Integer publicationYear
protected String section
protected String seriesName
protected String slug
protected File sourceFile
protected List<String> taxonomicClassifiers
Examples Include:
protected List<String> titles
Examples Include:
protected List<String> typesOfMaterial
Examples Include:
protected URL url
protected Integer wordCount
protected String originalText
Document
.
Constructor Detail |
---|
public NYTCorpusDocument()
Method Detail |
---|
public String sourceCorpus()
sourceCorpus
in interface Document
public String rawText()
rawText
in interface Document
public String originalText()
originalText
in interface Document
public String key()
key
in interface Document
public long id()
id
in interface Document
public String title()
title
in interface Document
public Set<String> categories()
categories
in interface Document
public URL getAlternateURL()
public String getArticleAbstract()
public String getAuthorBiography()
public String getBanner()
public List<String> getBiographicalCategories()
public String getBody()
public String getByline()
public String getColumnName()
public Integer getColumnNumber()
public Date getCorrectionDate()
public String getCorrectionText()
public String getCredit()
public String getDateline()
public String getDayOfWeek()
public List<String> getDescriptors()
public String getFeaturePage()
public List<String> getGeneralOnlineDescriptors()
public int getGuid()
public String getHeadline()
public String getKicker()
public String getLeadParagraph()
public List<String> getLocations()
public List<String> getNames()
public String getNewsDesk()
public String getNormalizedByline()
public List<String> getOnlineDescriptors()
public String getOnlineHeadline()
public String getOnlineLeadParagraph()
public List<String> getOnlineLocations()
public List<String> getOnlineOrganizations()
public List<String> getOnlinePeople()
public String getOnlineSection()
public List<String> getOnlineTitles()
public List<String> getOrganizations()
public Integer getPage()
public List<String> getPeople()
public Date getPublicationDate()
public Integer getPublicationDayOfMonth()
public Integer getPublicationMonth()
public Integer getPublicationYear()
public String getSection()
public String getSeriesName()
public String getSlug()
public File getSourceFile()
public List<String> getTaxonomicClassifiers()
public List<String> getTitles()
public List<String> getTypesOfMaterial()
public URL getUrl()
public Integer getWordCount()
public void setAlternateURL(URL alternateURL)
alternateURL
- the alternativeURL to setpublic void setArticleAbstract(String articleAbstract)
articleAbstract
- the articleAbstract to setpublic void setAuthorBiography(String authorBiography)
authorBiography
- the authorBiography to setpublic void setBanner(String banner)
banner
- the banner to setpublic void setBiographicalCategories(List<String> biographicalCategories)
biographicalCategories
- the biographicalCategories to setpublic void setOriginalText(String text)
text
- The original text to set.public void setBody(String body)
body
- the body to setpublic void setByline(String byline)
byline
- the byline to setpublic void setColumnName(String columnName)
columnName
- the columnName to setpublic void setColumnNumber(Integer columnNumber)
columnNumber
- the columnNumber to setpublic void setCorrectionDate(Date correctionDate)
correctionDate
- the correctionDate to setpublic void setCorrectionText(String correctionText)
correctionText
- the correctionText to setpublic void setCredit(String credit)
credit
- the credit to setpublic void setDateline(String dateline)
dateline
- the dateline to setpublic void setDayOfWeek(String dayOfWeek)
dayOfWeek
- the dayOfWeek to setpublic void setDescriptors(List<String> descriptors)
descriptors
- the descriptors to setpublic void setFeaturePage(String featurePage)
featurePage
- the featurePage to setpublic void setGeneralOnlineDescriptors(List<String> generalOnlineDescriptors)
generalOnlineDescriptors
- the generalOnlineDescriptors to setpublic void setGuid(int guid)
guid
- the guid to setpublic void setHeadline(String headline)
headline
- the headline to setpublic void setKicker(String kicker)
kicker
- the kicker to setpublic void setLeadParagraph(String leadParagraph)
leadParagraph
- the leadParagraph to setpublic void setLocations(List<String> locations)
locations
- the locations to setpublic void setNames(List<String> names)
names
- the names to setpublic void setNewsDesk(String newsDesk)
newsDesk
- the newsDesk to setpublic void setNormalizedByline(String normalizedByline)
normalizedByline
- the normalizedByline to setpublic void setOnlineDescriptors(List<String> onlineDescriptors)
onlineDescriptors
- the onlineDescriptors to setpublic void setOnlineHeadline(String onlineHeadline)
onlineHeadline
- the onlineHeadline to setpublic void setOnlineLeadParagraph(String onlineLeadParagraph)
onlineLeadParagraph
- the onlineLeadParagraph to setpublic void setOnlineLocations(List<String> onlineLocations)
onlineLocations
- the onlineLocations to setpublic void setOnlineOrganizations(List<String> onlineOrganizations)
onlineOrganizations
- the onlineOrganizations to setpublic void setOnlinePeople(List<String> onlinePeople)
onlinePeople
- the onlinePeople to setpublic void setOnlineSection(String onlineSection)
onlineSection
- the onlineSection to setpublic void setOnlineTitles(List<String> onlineTitles)
onlineTitles
- the onlineTitles to setpublic void setOrganizations(List<String> organizations)
organizations
- the organizations to setpublic void setPage(Integer page)
page
- the page to setpublic void setPeople(List<String> people)
people
- the people to setpublic void setPublicationDate(Date publicationDate)
publicationDate
- the publicationDate to setpublic void setPublicationDayOfMonth(Integer publicationDayOfMonth)
publicationDayOfMonth
- the publicationDayOfMonth to setpublic void setPublicationMonth(Integer publicationMonth)
publicationMonth
- the publicationMonth to setpublic void setPublicationYear(Integer publicationYear)
publicationYear
- the publicationYear to setpublic void setSection(String section)
section
- the section to setpublic void setSeriesName(String seriesName)
seriesName
- the seriesName to setpublic void setSlug(String slug)
slug
- the slug to setpublic void setSourceFile(File sourceFile)
sourceFile
- the sourceFile to setpublic void setTaxonomicClassifiers(List<String> taxonomicClassifiers)
taxonomicClassifiers
- the taxonomicClassifiers to setpublic void setTitles(List<String> titles)
titles
- the titles to setpublic void setTypesOfMaterial(List<String> typesOfMaterial)
typesOfMaterial
- the typesOfMaterial to setpublic void setUrl(URL url)
url
- the url to setpublic void setWordCount(Integer wordCount)
wordCount
- the wordCount to setpublic String toString()
toString
in class Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |