|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface CorpusTable
An interface for interacting with a document based HBase table. The HBase
table should have at least three key values for each row: the raw document
text, the corpus name from which the text came, and a dependency parse tree.
This interface allows all extraction code a fixed method for accessing these
data values. Each data piece must be extractable from a Result
instance. Each Result
must also refer to only one document, from a
single source.
DocumentReader
s are often instantiated through reflection. Implementations
for all methods, except for setupScan
should also be stateless and
threadsafe. The accessor methods will be called from multiple threads in no
particular order.
Method Summary | |
---|---|
Document |
document(org.apache.hadoop.hbase.client.Result row)
Returns the Document associated with this row. |
Set<String> |
getCategories(org.apache.hadoop.hbase.client.Result row)
Returns the set of categories associated with the document in
row . |
String |
getLabel(org.apache.hadoop.hbase.client.Result row,
String labelName)
Returns the label associated with column labelName inside of
row . |
void |
markRowAsProcessed(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
org.apache.hadoop.hbase.client.Result row)
Marks the row index by key as having been processed. |
void |
put(Document document)
Stores the text of Document in this CorpusTable . |
void |
put(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> sentences)
Stores the List of Sentences in this table. |
void |
putCategories(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
Set<String> categories)
Store the categories associated with the document indexed by
key . |
void |
putLabel(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
String labelName,
String labelValue)
Stores the labelValue in the column specified by labelName in the row index by key . |
void |
putSenses(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> senses,
String senseLabel)
Stores the List of Sentences containing only word senses
in this table. |
List<Sentence> |
sentences(org.apache.hadoop.hbase.client.Result row)
Returns the List of Sentence s stored in row . |
boolean |
shouldProcessRow(org.apache.hadoop.hbase.client.Result row)
Returns true if the given row should be processed. |
String |
sourceCorpus(org.apache.hadoop.hbase.client.Result row)
Returns the source corpus that this row contains. |
String |
text(org.apache.hadoop.hbase.client.Result row)
Returns the cleaned text stored by the given row . |
String |
textSource(org.apache.hadoop.hbase.client.Result row)
Returns the raw document text stored in row . |
String |
title(org.apache.hadoop.hbase.client.Result row)
Retuns the title of the document stored in row . |
List<Sentence> |
wordSenses(org.apache.hadoop.hbase.client.Result row,
String labelName)
Returns the List of Sentence stored in row that
correspond to the word senses created with labelName . |
Methods inherited from interface gov.llnl.ontology.mapreduce.table.GenericTable |
---|
close, createTable, createTable, iterator, setupScan, setupScan, table, tableName |
Method Detail |
---|
String text(org.apache.hadoop.hbase.client.Result row)
row
.
String textSource(org.apache.hadoop.hbase.client.Result row)
row
.
String title(org.apache.hadoop.hbase.client.Result row)
row
.
String sourceCorpus(org.apache.hadoop.hbase.client.Result row)
List<Sentence> sentences(org.apache.hadoop.hbase.client.Result row)
List
of Sentence
s stored in row
.
This call will include all annotations requested in the setup call to
GenericTable.setupScan(org.apache.hadoop.hbase.client.Scan)
.
List<Sentence> wordSenses(org.apache.hadoop.hbase.client.Result row, String labelName)
List
of Sentence
stored in row
that
correspond to the word senses created with labelName
.
Document document(org.apache.hadoop.hbase.client.Result row)
Document
associated with this row.
void put(Document document)
Document
in this CorpusTable
.
void put(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, List<Sentence> sentences)
List
of Sentences
in this table.
Implementations are welcome to stores this List
as a complete
object or as a seperate set of smaller Annotation
s.
void putSenses(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, List<Sentence> senses, String senseLabel)
List
of Sentences
containing only word senses
in this table.
void putLabel(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, String labelName, String labelValue)
labelValue
in the column specified by labelName
in the row index by key
.
String getLabel(org.apache.hadoop.hbase.client.Result row, String labelName)
labelName
inside of
row
.
void putCategories(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, Set<String> categories)
categories
associated with the document indexed by
key
.
Set<String> getCategories(org.apache.hadoop.hbase.client.Result row)
categories
associated with the document in
row
.
boolean shouldProcessRow(org.apache.hadoop.hbase.client.Result row)
row
should be processed.
void markRowAsProcessed(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, org.apache.hadoop.hbase.client.Result row)
key
as having been processed.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |