|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
public interface CorpusTable
An interface for interacting with a document based HBase table. The HBase
table should have at least three key values for each row: the raw document
text, the corpus name from which the text came, and a dependency parse tree.
This interface allows all extraction code a fixed method for accessing these
data values. Each data piece must be extractable from a Result
instance. Each Result must also refer to only one document, from a
single source.
DocumentReaders are often instantiated through reflection. Implementations
for all methods, except for setupScan should also be stateless and
threadsafe. The accessor methods will be called from multiple threads in no
particular order.
| Method Summary | |
|---|---|
Document |
document(org.apache.hadoop.hbase.client.Result row)
Returns the Document associated with this row. |
Set<String> |
getCategories(org.apache.hadoop.hbase.client.Result row)
Returns the set of categories associated with the document in
row. |
String |
getLabel(org.apache.hadoop.hbase.client.Result row,
String labelName)
Returns the label associated with column labelName inside of
row. |
void |
markRowAsProcessed(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
org.apache.hadoop.hbase.client.Result row)
Marks the row index by key as having been processed. |
void |
put(Document document)
Stores the text of Document in this CorpusTable. |
void |
put(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> sentences)
Stores the List of Sentences in this table. |
void |
putCategories(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
Set<String> categories)
Store the categories associated with the document indexed by
key. |
void |
putLabel(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
String labelName,
String labelValue)
Stores the labelValue in the column specified by labelName in the row index by key. |
void |
putSenses(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> senses,
String senseLabel)
Stores the List of Sentences containing only word senses
in this table. |
List<Sentence> |
sentences(org.apache.hadoop.hbase.client.Result row)
Returns the List of Sentences stored in row. |
boolean |
shouldProcessRow(org.apache.hadoop.hbase.client.Result row)
Returns true if the given row should be processed. |
String |
sourceCorpus(org.apache.hadoop.hbase.client.Result row)
Returns the source corpus that this row contains. |
String |
text(org.apache.hadoop.hbase.client.Result row)
Returns the cleaned text stored by the given row. |
String |
textSource(org.apache.hadoop.hbase.client.Result row)
Returns the raw document text stored in row. |
String |
title(org.apache.hadoop.hbase.client.Result row)
Retuns the title of the document stored in row. |
List<Sentence> |
wordSenses(org.apache.hadoop.hbase.client.Result row,
String labelName)
Returns the List of Sentence stored in row that
correspond to the word senses created with labelName. |
| Methods inherited from interface gov.llnl.ontology.mapreduce.table.GenericTable |
|---|
close, createTable, createTable, iterator, setupScan, setupScan, table, tableName |
| Method Detail |
|---|
String text(org.apache.hadoop.hbase.client.Result row)
row.
String textSource(org.apache.hadoop.hbase.client.Result row)
row.
String title(org.apache.hadoop.hbase.client.Result row)
row.
String sourceCorpus(org.apache.hadoop.hbase.client.Result row)
List<Sentence> sentences(org.apache.hadoop.hbase.client.Result row)
List of Sentences stored in row.
This call will include all annotations requested in the setup call to
GenericTable.setupScan(org.apache.hadoop.hbase.client.Scan).
List<Sentence> wordSenses(org.apache.hadoop.hbase.client.Result row,
String labelName)
List of Sentence stored in row that
correspond to the word senses created with labelName.
Document document(org.apache.hadoop.hbase.client.Result row)
Document associated with this row.
void put(Document document)
Document in this CorpusTable.
void put(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> sentences)
List of Sentences in this table.
Implementations are welcome to stores this List as a complete
object or as a seperate set of smaller Annotations.
void putSenses(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> senses,
String senseLabel)
List of Sentences containing only word senses
in this table.
void putLabel(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
String labelName,
String labelValue)
labelValue in the column specified by labelName in the row index by key.
String getLabel(org.apache.hadoop.hbase.client.Result row,
String labelName)
labelName inside of
row.
void putCategories(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
Set<String> categories)
categories associated with the document indexed by
key.
Set<String> getCategories(org.apache.hadoop.hbase.client.Result row)
categories associated with the document in
row.
boolean shouldProcessRow(org.apache.hadoop.hbase.client.Result row)
row should be processed.
void markRowAsProcessed(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
org.apache.hadoop.hbase.client.Result row)
key as having been processed.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||