|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgov.llnl.ontology.mapreduce.table.TrinidadTable
public class TrinidadTable
Field Summary | |
---|---|
static String |
ALL_CORPORA
A marker to request all corpora types when scanning. |
static String |
ANNOTATION_CF
The column family for the document annotations. |
static String |
ANNOTATION_SENTENCE
The column qualifier for the sentence level document annotations. |
static String |
ANNOTATION_TOKEN
The column qualifier for the token level document annotations. |
static String |
CATEGORY_COLUMN
The column name for categories that a document may fall under, if any. |
static String |
DOC_ID
The column name for the document id. |
static String |
DOC_KEY
The column name for the document key. |
static String |
LABEL_CF
The column family for word list labels associated wtih each document. |
static String |
META_CF
The column family for word list labels associated wtih each document. |
static String |
SENSE_SENTENCE_PREFIX
The column qualifier prefix for sentence level word sense annotations. |
static String |
SENSE_TOKEN_PREFIX
The column qualifier prefix for token level word sense annotations. |
static String |
SOURCE_CF
The column family for source related columns. |
static String |
SOURCE_ID
The column qualifier for the corpus id. |
static String |
SOURCE_IDCOL
The full column qualifier for the corpus id. |
static String |
SOURCE_NAME
The column qualifier for the corpus source name. |
static String |
SOURCE_NAMECOL
The full column qualifier for the corpus source name. |
static String |
TABLE_NAME
The official table name. |
static String |
TEXT_CF
The column family for the text colunns. |
static String |
TEXT_ORIGINAL
The column qualifier for the original document text. |
static String |
TEXT_ORIGINAL_COL
The full column qualifier for the original document text. |
static String |
TEXT_RAW
The column qualifier for the cleaned document text. |
static String |
TEXT_RAW_COL
The full column qualifier for the cleaned document text. |
static String |
TEXT_TITLE
The column qualifier for the document title. |
static String |
TEXT_TITLE_COL
The full column qualifier for the document title. |
static String |
TEXT_TYPE
The column qualifier for the text type. |
static String |
TEXT_TYPE_COL
The full column qualifier for the text type. |
static String |
XML_MIME_TYPE
Stores the text type of any document. |
Constructor Summary | |
---|---|
TrinidadTable()
Creates a new TrinidadTable that uses the default . |
Method Summary | |
---|---|
void |
close()
Closes the connection to the document reader. |
void |
createTable()
Creates a new instance of the HTable represented by this GenericTable |
void |
createTable(org.apache.hadoop.hbase.client.HConnection connector)
Creates a new instance of the HTable represented by this GenericTable |
Document |
document(org.apache.hadoop.hbase.client.Result row)
Returns the Document associated with this row. |
Set<String> |
getCategories(org.apache.hadoop.hbase.client.Result row)
Returns the set of categories associated with the document in
row . |
String |
getLabel(org.apache.hadoop.hbase.client.Result row,
String labelName)
Returns the label associated with column labelName inside of
row . |
Iterator<org.apache.hadoop.hbase.client.Result> |
iterator(org.apache.hadoop.hbase.client.Scan scan)
Returns an iterator over all of the rows accessible from this GenericTable . |
void |
markRowAsProcessed(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
org.apache.hadoop.hbase.client.Result row)
Marks the row index by key as having been processed. |
void |
put(Document document)
Stores the text of Document in this CorpusTable . |
void |
put(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> sentences)
Stores the List of Sentences in this table. |
void |
putCategories(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
Set<String> categories)
Store the categories associated with the document indexed by
key . |
void |
putLabel(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
String labelName,
String labelValue)
Stores the labelValue in the column specified by labelName in the row index by key . |
void |
putSenses(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> sentences,
String senseLabel)
Stores the List of Sentences containing only word senses
in this table. |
List<Sentence> |
sentences(org.apache.hadoop.hbase.client.Result row)
Returns the List of Sentence s stored in row . |
void |
setupScan(org.apache.hadoop.hbase.client.Scan scan)
Initializes a Scan such that it will request whatever columns and
column families are neccesary for processing as determined by the table
type. |
void |
setupScan(org.apache.hadoop.hbase.client.Scan scan,
String corpusName)
Initializes a Scan such that it will request columns and
column families are neccesary for extracting the raw document text,
dependency trees, and document source information from the specified
corpusName . |
boolean |
shouldProcessRow(org.apache.hadoop.hbase.client.Result row)
Returns true if the given row should be processed. |
String |
sourceCorpus(org.apache.hadoop.hbase.client.Result row)
Returns the source corpus that this row contains. |
org.apache.hadoop.hbase.client.HTable |
table()
Returns the HTable instance attached to this GenericTable . |
String |
tableName()
Returns the name of the HBase Table that this GenericTable
represents. |
String |
text(org.apache.hadoop.hbase.client.Result row)
Returns the cleaned text stored by the given row . |
String |
textSource(org.apache.hadoop.hbase.client.Result row)
Returns the raw document text stored in row . |
String |
title(org.apache.hadoop.hbase.client.Result row)
Retuns the title of the document stored in row . |
List<Sentence> |
wordSenses(org.apache.hadoop.hbase.client.Result row,
String senseLabel)
Returns the List of Sentence stored in row that
correspond to the word senses created with labelName . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String XML_MIME_TYPE
public static final String ALL_CORPORA
public static final String TABLE_NAME
public static final String SOURCE_CF
public static final String SOURCE_NAME
public static final String SOURCE_NAMECOL
public static final String SOURCE_ID
public static final String SOURCE_IDCOL
public static final String TEXT_CF
public static final String TEXT_ORIGINAL
public static final String TEXT_ORIGINAL_COL
public static final String TEXT_TYPE
public static final String TEXT_TYPE_COL
public static final String TEXT_RAW
public static final String TEXT_RAW_COL
public static final String TEXT_TITLE
public static final String TEXT_TITLE_COL
public static final String ANNOTATION_CF
public static final String ANNOTATION_SENTENCE
public static final String ANNOTATION_TOKEN
public static final String SENSE_SENTENCE_PREFIX
public static final String SENSE_TOKEN_PREFIX
public static final String LABEL_CF
public static final String META_CF
public static final String CATEGORY_COLUMN
public static final String DOC_KEY
public static final String DOC_ID
Constructor Detail |
---|
public TrinidadTable()
TrinidadTable
that uses the default .
Method Detail |
---|
public void createTable()
HTable
represented by this GenericTable
createTable
in interface GenericTable
public void createTable(org.apache.hadoop.hbase.client.HConnection connector)
HTable
represented by this GenericTable
createTable
in interface GenericTable
public void setupScan(org.apache.hadoop.hbase.client.Scan scan)
Scan
such that it will request whatever columns and
column families are neccesary for processing as determined by the table
type. This method will only be called once per job.
setupScan
in interface GenericTable
public void setupScan(org.apache.hadoop.hbase.client.Scan scan, String corpusName)
Scan
such that it will request columns and
column families are neccesary for extracting the raw document text,
dependency trees, and document source information from the specified
corpusName
.
setupScan
in interface GenericTable
public Iterator<org.apache.hadoop.hbase.client.Result> iterator(org.apache.hadoop.hbase.client.Scan scan)
GenericTable
.
iterator
in interface GenericTable
public String tableName()
GenericTable
represents.
tableName
in interface GenericTable
public org.apache.hadoop.hbase.client.HTable table()
HTable
instance attached to this GenericTable
.
table
in interface GenericTable
public String text(org.apache.hadoop.hbase.client.Result row)
row
.
text
in interface CorpusTable
public String textSource(org.apache.hadoop.hbase.client.Result row)
row
.
textSource
in interface CorpusTable
public String sourceCorpus(org.apache.hadoop.hbase.client.Result row)
sourceCorpus
in interface CorpusTable
public String title(org.apache.hadoop.hbase.client.Result row)
row
.
title
in interface CorpusTable
public List<Sentence> sentences(org.apache.hadoop.hbase.client.Result row)
List
of Sentence
s stored in row
.
This call will include all annotations requested in the setup call to
GenericTable.setupScan(org.apache.hadoop.hbase.client.Scan)
.
sentences
in interface CorpusTable
public List<Sentence> wordSenses(org.apache.hadoop.hbase.client.Result row, String senseLabel)
List
of Sentence
stored in row
that
correspond to the word senses created with labelName
.
wordSenses
in interface CorpusTable
public Document document(org.apache.hadoop.hbase.client.Result row)
Document
associated with this row.
document
in interface CorpusTable
public void put(Document document)
Document
in this CorpusTable
.
put
in interface CorpusTable
public void put(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, List<Sentence> sentences)
List
of Sentences
in this table.
Implementations are welcome to stores this List
as a complete
object or as a seperate set of smaller Annotation
s.
put
in interface CorpusTable
public void putSenses(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, List<Sentence> sentences, String senseLabel)
List
of Sentences
containing only word senses
in this table.
putSenses
in interface CorpusTable
public void putCategories(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, Set<String> categories)
categories
associated with the document indexed by
key
.
putCategories
in interface CorpusTable
public Set<String> getCategories(org.apache.hadoop.hbase.client.Result row)
categories
associated with the document in
row
.
getCategories
in interface CorpusTable
public void putLabel(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, String labelName, String labelValue)
labelValue
in the column specified by labelName
in the row index by key
.
putLabel
in interface CorpusTable
public String getLabel(org.apache.hadoop.hbase.client.Result row, String labelName)
labelName
inside of
row
.
getLabel
in interface CorpusTable
public boolean shouldProcessRow(org.apache.hadoop.hbase.client.Result row)
row
should be processed.
shouldProcessRow
in interface CorpusTable
public void markRowAsProcessed(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, org.apache.hadoop.hbase.client.Result row)
key
as having been processed.
markRowAsProcessed
in interface CorpusTable
public void close()
close
in interface GenericTable
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |