|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectgov.llnl.ontology.mapreduce.table.TrinidadTable
public class TrinidadTable
| Field Summary | |
|---|---|
static String |
ALL_CORPORA
A marker to request all corpora types when scanning. |
static String |
ANNOTATION_CF
The column family for the document annotations. |
static String |
ANNOTATION_SENTENCE
The column qualifier for the sentence level document annotations. |
static String |
ANNOTATION_TOKEN
The column qualifier for the token level document annotations. |
static String |
CATEGORY_COLUMN
The column name for categories that a document may fall under, if any. |
static String |
DOC_ID
The column name for the document id. |
static String |
DOC_KEY
The column name for the document key. |
static String |
LABEL_CF
The column family for word list labels associated wtih each document. |
static String |
META_CF
The column family for word list labels associated wtih each document. |
static String |
SENSE_SENTENCE_PREFIX
The column qualifier prefix for sentence level word sense annotations. |
static String |
SENSE_TOKEN_PREFIX
The column qualifier prefix for token level word sense annotations. |
static String |
SOURCE_CF
The column family for source related columns. |
static String |
SOURCE_ID
The column qualifier for the corpus id. |
static String |
SOURCE_IDCOL
The full column qualifier for the corpus id. |
static String |
SOURCE_NAME
The column qualifier for the corpus source name. |
static String |
SOURCE_NAMECOL
The full column qualifier for the corpus source name. |
static String |
TABLE_NAME
The official table name. |
static String |
TEXT_CF
The column family for the text colunns. |
static String |
TEXT_ORIGINAL
The column qualifier for the original document text. |
static String |
TEXT_ORIGINAL_COL
The full column qualifier for the original document text. |
static String |
TEXT_RAW
The column qualifier for the cleaned document text. |
static String |
TEXT_RAW_COL
The full column qualifier for the cleaned document text. |
static String |
TEXT_TITLE
The column qualifier for the document title. |
static String |
TEXT_TITLE_COL
The full column qualifier for the document title. |
static String |
TEXT_TYPE
The column qualifier for the text type. |
static String |
TEXT_TYPE_COL
The full column qualifier for the text type. |
static String |
XML_MIME_TYPE
Stores the text type of any document. |
| Constructor Summary | |
|---|---|
TrinidadTable()
Creates a new TrinidadTable that uses the default . |
|
| Method Summary | |
|---|---|
void |
close()
Closes the connection to the document reader. |
void |
createTable()
Creates a new instance of the HTable represented by this GenericTable |
void |
createTable(org.apache.hadoop.hbase.client.HConnection connector)
Creates a new instance of the HTable represented by this GenericTable |
Document |
document(org.apache.hadoop.hbase.client.Result row)
Returns the Document associated with this row. |
Set<String> |
getCategories(org.apache.hadoop.hbase.client.Result row)
Returns the set of categories associated with the document in
row. |
String |
getLabel(org.apache.hadoop.hbase.client.Result row,
String labelName)
Returns the label associated with column labelName inside of
row. |
Iterator<org.apache.hadoop.hbase.client.Result> |
iterator(org.apache.hadoop.hbase.client.Scan scan)
Returns an iterator over all of the rows accessible from this GenericTable. |
void |
markRowAsProcessed(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
org.apache.hadoop.hbase.client.Result row)
Marks the row index by key as having been processed. |
void |
put(Document document)
Stores the text of Document in this CorpusTable. |
void |
put(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> sentences)
Stores the List of Sentences in this table. |
void |
putCategories(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
Set<String> categories)
Store the categories associated with the document indexed by
key. |
void |
putLabel(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
String labelName,
String labelValue)
Stores the labelValue in the column specified by labelName in the row index by key. |
void |
putSenses(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> sentences,
String senseLabel)
Stores the List of Sentences containing only word senses
in this table. |
List<Sentence> |
sentences(org.apache.hadoop.hbase.client.Result row)
Returns the List of Sentences stored in row. |
void |
setupScan(org.apache.hadoop.hbase.client.Scan scan)
Initializes a Scan such that it will request whatever columns and
column families are neccesary for processing as determined by the table
type. |
void |
setupScan(org.apache.hadoop.hbase.client.Scan scan,
String corpusName)
Initializes a Scan such that it will request columns and
column families are neccesary for extracting the raw document text,
dependency trees, and document source information from the specified
corpusName. |
boolean |
shouldProcessRow(org.apache.hadoop.hbase.client.Result row)
Returns true if the given row should be processed. |
String |
sourceCorpus(org.apache.hadoop.hbase.client.Result row)
Returns the source corpus that this row contains. |
org.apache.hadoop.hbase.client.HTable |
table()
Returns the HTable instance attached to this GenericTable. |
String |
tableName()
Returns the name of the HBase Table that this GenericTable
represents. |
String |
text(org.apache.hadoop.hbase.client.Result row)
Returns the cleaned text stored by the given row. |
String |
textSource(org.apache.hadoop.hbase.client.Result row)
Returns the raw document text stored in row. |
String |
title(org.apache.hadoop.hbase.client.Result row)
Retuns the title of the document stored in row. |
List<Sentence> |
wordSenses(org.apache.hadoop.hbase.client.Result row,
String senseLabel)
Returns the List of Sentence stored in row that
correspond to the word senses created with labelName. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String XML_MIME_TYPE
public static final String ALL_CORPORA
public static final String TABLE_NAME
public static final String SOURCE_CF
public static final String SOURCE_NAME
public static final String SOURCE_NAMECOL
public static final String SOURCE_ID
public static final String SOURCE_IDCOL
public static final String TEXT_CF
public static final String TEXT_ORIGINAL
public static final String TEXT_ORIGINAL_COL
public static final String TEXT_TYPE
public static final String TEXT_TYPE_COL
public static final String TEXT_RAW
public static final String TEXT_RAW_COL
public static final String TEXT_TITLE
public static final String TEXT_TITLE_COL
public static final String ANNOTATION_CF
public static final String ANNOTATION_SENTENCE
public static final String ANNOTATION_TOKEN
public static final String SENSE_SENTENCE_PREFIX
public static final String SENSE_TOKEN_PREFIX
public static final String LABEL_CF
public static final String META_CF
public static final String CATEGORY_COLUMN
public static final String DOC_KEY
public static final String DOC_ID
| Constructor Detail |
|---|
public TrinidadTable()
TrinidadTable that uses the default .
| Method Detail |
|---|
public void createTable()
HTable represented by this GenericTable
createTable in interface GenericTablepublic void createTable(org.apache.hadoop.hbase.client.HConnection connector)
HTable represented by this GenericTable
createTable in interface GenericTablepublic void setupScan(org.apache.hadoop.hbase.client.Scan scan)
Scan such that it will request whatever columns and
column families are neccesary for processing as determined by the table
type. This method will only be called once per job.
setupScan in interface GenericTable
public void setupScan(org.apache.hadoop.hbase.client.Scan scan,
String corpusName)
Scan such that it will request columns and
column families are neccesary for extracting the raw document text,
dependency trees, and document source information from the specified
corpusName.
setupScan in interface GenericTablepublic Iterator<org.apache.hadoop.hbase.client.Result> iterator(org.apache.hadoop.hbase.client.Scan scan)
GenericTable.
iterator in interface GenericTablepublic String tableName()
GenericTable
represents.
tableName in interface GenericTablepublic org.apache.hadoop.hbase.client.HTable table()
HTable instance attached to this GenericTable.
table in interface GenericTablepublic String text(org.apache.hadoop.hbase.client.Result row)
row.
text in interface CorpusTablepublic String textSource(org.apache.hadoop.hbase.client.Result row)
row.
textSource in interface CorpusTablepublic String sourceCorpus(org.apache.hadoop.hbase.client.Result row)
sourceCorpus in interface CorpusTablepublic String title(org.apache.hadoop.hbase.client.Result row)
row.
title in interface CorpusTablepublic List<Sentence> sentences(org.apache.hadoop.hbase.client.Result row)
List of Sentences stored in row.
This call will include all annotations requested in the setup call to
GenericTable.setupScan(org.apache.hadoop.hbase.client.Scan).
sentences in interface CorpusTable
public List<Sentence> wordSenses(org.apache.hadoop.hbase.client.Result row,
String senseLabel)
List of Sentence stored in row that
correspond to the word senses created with labelName.
wordSenses in interface CorpusTablepublic Document document(org.apache.hadoop.hbase.client.Result row)
Document associated with this row.
document in interface CorpusTablepublic void put(Document document)
Document in this CorpusTable.
put in interface CorpusTable
public void put(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> sentences)
List of Sentences in this table.
Implementations are welcome to stores this List as a complete
object or as a seperate set of smaller Annotations.
put in interface CorpusTable
public void putSenses(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
List<Sentence> sentences,
String senseLabel)
List of Sentences containing only word senses
in this table.
putSenses in interface CorpusTable
public void putCategories(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
Set<String> categories)
categories associated with the document indexed by
key.
putCategories in interface CorpusTablepublic Set<String> getCategories(org.apache.hadoop.hbase.client.Result row)
categories associated with the document in
row.
getCategories in interface CorpusTable
public void putLabel(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
String labelName,
String labelValue)
labelValue in the column specified by labelName in the row index by key.
putLabel in interface CorpusTable
public String getLabel(org.apache.hadoop.hbase.client.Result row,
String labelName)
labelName inside of
row.
getLabel in interface CorpusTablepublic boolean shouldProcessRow(org.apache.hadoop.hbase.client.Result row)
row should be processed.
shouldProcessRow in interface CorpusTable
public void markRowAsProcessed(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
org.apache.hadoop.hbase.client.Result row)
key as having been processed.
markRowAsProcessed in interface CorpusTablepublic void close()
close in interface GenericTable
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||