gov.llnl.ontology.mapreduce.table
Class WordNetEvidenceTable

java.lang.Object
  extended by gov.llnl.ontology.mapreduce.table.WordNetEvidenceTable
All Implemented Interfaces:
EvidenceTable, GenericTable

public class WordNetEvidenceTable
extends Object
implements EvidenceTable

This class documents the schema of the WordNet Evidence table. Only word pairs where both terms exist in word net should be entered into the table.

Author:
Keith Stevens

Field Summary
static String ALL_CORPORA
          A marker to request all corpora types when scanning.
static String CLASS_CF
          The column family name for the class family.
static String CLUSTER_SIMILARITY
          The column family name for the cluster based similarity column family.
static String COSINE_SIMILARITY
          The column family name for the cosine based similarity column family.
static String COUSIN_EVIDENCE
          The column name for the coordinate evidence class.
static String DEPENDENCY_FEATURE_CF
          The column family name for the dependency features.
static String DEPENDENCY_PATH_ANNOTATION_NAME
          The annotation name for dependency path counts.
static String EUCLIDEAN_SIMILARITY
          The column family name for the euclidean based similarity column family.
static String HYPERNYM_EVIDENCE
          The column name for the hyernym evidence class.
static String KL_SIMILARITY
          The column family name for the kl-divergence based similarity column family.
static String LIN_SIMILARITY
          The column family name for the Lin based similarity column family.
static String LSH_CLUSTER_SIMILARITY
          The column name for clusters of similiarity lists generated via Locality Sensitive Hashing.
static String NOUN_PAIR_CF
          The column family name for the noun pair for each row.
static String NOUN_PAIR_COLUMN
          The column name for the noun pair.
static String SIMILARITY_CF
          The column family name for any similarity measurements between two noun pairs.
static String TABLE_NAME
          table name for this schema
 
Constructor Summary
WordNetEvidenceTable()
           
 
Method Summary
 String classColumnFamily()
          Returns the string name of the class column family.
 byte[] classColumnFamilyBytes()
          Returns the name of the class column family as a byte array.
 void close()
          Closes the connection to the document reader.
 String cousinColumn()
          Returns the column name for cousin class labels.
 byte[] cousinColumnBytes()
          Returns the column name for cousin class labels as a byte array.
 void createTable()
          Creates a new instance of the HTable represented by this GenericTable
 void createTable(org.apache.hadoop.hbase.client.HConnection connector)
          Creates a new instance of the HTable represented by this GenericTable
 String dependencyColumnFamily()
          Returns the string name of the dependency path column family.
 byte[] dependencyColumnFamilyBytes()
          Returns the name of the dependency path column family as a byte array.
 Counter<String> getDependencyPaths(org.apache.hadoop.hbase.client.Result row)
          Returns a new map that contains all of the dependency path counts, regardless of their source.
 Counter<String> getDependencyPaths(org.apache.hadoop.hbase.client.Result row, String source)
          Returns a map that contains all of the dependency paths associated with a single noun pair.
 SynsetRelations.HypernymStatus getHypernymStatus(org.apache.hadoop.hbase.client.Result row)
          Retrieves the SynsetRelations.HypernymStatus for the given Result.
 String hypernymColumn()
          Returns the column name for hypernym class labels.
 byte[] hypernymColumnBytes()
          Returns the column name for hypernym class labels as a byte array.
 Iterator<org.apache.hadoop.hbase.client.Result> iterator(org.apache.hadoop.hbase.client.Scan scan)
          Returns an iterator over all of the rows accessible from this GenericTable.
 StringPair nounPair(org.apache.hadoop.hbase.client.Result row)
          Returns a StringPair for the noun pair held in the given Result.
 void putDependencyPaths(String word1, String word2, String source, Counter<String> pathCounts)
          Stores the dependency path counts gathred from the source corpus using the provided Put object.
 void putHypernymStatus(org.apache.hadoop.hbase.io.ImmutableBytesWritable key, SynsetRelations.HypernymStatus status)
          Stores the SynsetRelations.HypernymStatus using the given key.
 void setupScan(org.apache.hadoop.hbase.client.Scan scan)
          Initializes a Scan such that it will request whatever columns and column families are neccesary for processing as determined by the table type.
 void setupScan(org.apache.hadoop.hbase.client.Scan scan, String corpusName)
          Initializes a Scan such that it will request columns and column families are neccesary for extracting the raw document text, dependency trees, and document source information from the specified corpusName.
 org.apache.hadoop.hbase.client.HTable table()
          Returns the HTable instance attached to this GenericTable.
 String tableName()
          Returns the name of the HBase Table that this GenericTable represents.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ALL_CORPORA

public static final String ALL_CORPORA
A marker to request all corpora types when scanning.

See Also:
Constant Field Values

TABLE_NAME

public static final String TABLE_NAME
table name for this schema

See Also:
Constant Field Values

NOUN_PAIR_CF

public static final String NOUN_PAIR_CF
The column family name for the noun pair for each row.

See Also:
Constant Field Values

NOUN_PAIR_COLUMN

public static final String NOUN_PAIR_COLUMN
The column name for the noun pair.

See Also:
Constant Field Values

DEPENDENCY_FEATURE_CF

public static final String DEPENDENCY_FEATURE_CF
The column family name for the dependency features. Column names will be based on the source of the corpus for each dependency feature.

See Also:
Constant Field Values

CLASS_CF

public static final String CLASS_CF
The column family name for the class family.

See Also:
Constant Field Values

HYPERNYM_EVIDENCE

public static final String HYPERNYM_EVIDENCE
The column name for the hyernym evidence class. Positive values are marked as "KNOWN_HYPERNYM" and negative values are marked as "KNOWN_NON_HYPERNYM". Pairs that serve as potential additions to wordnet are marked as NOVEL_{HYPERNYM|HYPONYM}. Use WordNetEvidence#HypernymStatus to read covert values in this column to the appropriate enum.

See Also:
Constant Field Values

COUSIN_EVIDENCE

public static final String COUSIN_EVIDENCE
The column name for the coordinate evidence class. Values are stored as a pair of integers separated by a hyphen, such as "m-n". Cousins share some ancestor in the wordnet hierarchy, where m specifies the distance between the first term and the common ancestor and n specifies the distance between the second term and the common ancestor. A distance of Integer.MAX_VALUE signifies that the common ancesstor is beyond a particular depth, most likely 7.

See Also:
Constant Field Values

SIMILARITY_CF

public static final String SIMILARITY_CF
The column family name for any similarity measurements between two noun pairs.

See Also:
Constant Field Values

CLUSTER_SIMILARITY

public static final String CLUSTER_SIMILARITY
The column family name for the cluster based similarity column family. All values are stored as doubles.

See Also:
Constant Field Values

LSH_CLUSTER_SIMILARITY

public static final String LSH_CLUSTER_SIMILARITY
The column name for clusters of similiarity lists generated via Locality Sensitive Hashing.

See Also:
Constant Field Values

COSINE_SIMILARITY

public static final String COSINE_SIMILARITY
The column family name for the cosine based similarity column family. All values are stored as doubles.

See Also:
Constant Field Values

EUCLIDEAN_SIMILARITY

public static final String EUCLIDEAN_SIMILARITY
The column family name for the euclidean based similarity column family. All values are stored as doubles.

See Also:
Constant Field Values

KL_SIMILARITY

public static final String KL_SIMILARITY
The column family name for the kl-divergence based similarity column family. All values are stored as doubles. Note that this metric is not symmetric.

See Also:
Constant Field Values

LIN_SIMILARITY

public static final String LIN_SIMILARITY
The column family name for the Lin based similarity column family. All values are stored as doubles.

See Also:
Constant Field Values

DEPENDENCY_PATH_ANNOTATION_NAME

public static final String DEPENDENCY_PATH_ANNOTATION_NAME
The annotation name for dependency path counts.

See Also:
Constant Field Values
Constructor Detail

WordNetEvidenceTable

public WordNetEvidenceTable()
Method Detail

tableName

public String tableName()
Returns the name of the HBase Table that this GenericTable represents.

Specified by:
tableName in interface GenericTable

classColumnFamily

public String classColumnFamily()
Returns the string name of the class column family.

Specified by:
classColumnFamily in interface EvidenceTable

classColumnFamilyBytes

public byte[] classColumnFamilyBytes()
Returns the name of the class column family as a byte array.

Specified by:
classColumnFamilyBytes in interface EvidenceTable

dependencyColumnFamily

public String dependencyColumnFamily()
Returns the string name of the dependency path column family.

Specified by:
dependencyColumnFamily in interface EvidenceTable

dependencyColumnFamilyBytes

public byte[] dependencyColumnFamilyBytes()
Returns the name of the dependency path column family as a byte array.

Specified by:
dependencyColumnFamilyBytes in interface EvidenceTable

hypernymColumn

public String hypernymColumn()
Returns the column name for hypernym class labels.

Specified by:
hypernymColumn in interface EvidenceTable

hypernymColumnBytes

public byte[] hypernymColumnBytes()
Returns the column name for hypernym class labels as a byte array.

Specified by:
hypernymColumnBytes in interface EvidenceTable

cousinColumn

public String cousinColumn()
Returns the column name for cousin class labels.

Specified by:
cousinColumn in interface EvidenceTable

cousinColumnBytes

public byte[] cousinColumnBytes()
Returns the column name for cousin class labels as a byte array.

Specified by:
cousinColumnBytes in interface EvidenceTable

createTable

public void createTable()
Creates a new instance of the HTable represented by this GenericTable

Specified by:
createTable in interface GenericTable

createTable

public void createTable(org.apache.hadoop.hbase.client.HConnection connector)
Creates a new instance of the HTable represented by this GenericTable

Specified by:
createTable in interface GenericTable

setupScan

public void setupScan(org.apache.hadoop.hbase.client.Scan scan)
Initializes a Scan such that it will request whatever columns and column families are neccesary for processing as determined by the table type. This method will only be called once per job.

Specified by:
setupScan in interface GenericTable

setupScan

public void setupScan(org.apache.hadoop.hbase.client.Scan scan,
                      String corpusName)
Initializes a Scan such that it will request columns and column families are neccesary for extracting the raw document text, dependency trees, and document source information from the specified corpusName.

Specified by:
setupScan in interface GenericTable

iterator

public Iterator<org.apache.hadoop.hbase.client.Result> iterator(org.apache.hadoop.hbase.client.Scan scan)
Returns an iterator over all of the rows accessible from this GenericTable.

Specified by:
iterator in interface GenericTable

table

public org.apache.hadoop.hbase.client.HTable table()
Returns the HTable instance attached to this GenericTable.

Specified by:
table in interface GenericTable

nounPair

public StringPair nounPair(org.apache.hadoop.hbase.client.Result row)
Returns a StringPair for the noun pair held in the given Result.

Specified by:
nounPair in interface EvidenceTable

getDependencyPaths

public Counter<String> getDependencyPaths(org.apache.hadoop.hbase.client.Result row)
Returns a new map that contains all of the dependency path counts, regardless of their source.

Specified by:
getDependencyPaths in interface EvidenceTable

getDependencyPaths

public Counter<String> getDependencyPaths(org.apache.hadoop.hbase.client.Result row,
                                          String source)
Returns a map that contains all of the dependency paths associated with a single noun pair.

Specified by:
getDependencyPaths in interface EvidenceTable

putDependencyPaths

public void putDependencyPaths(String word1,
                               String word2,
                               String source,
                               Counter<String> pathCounts)
Stores the dependency path counts gathred from the source corpus using the provided Put object.

Specified by:
putDependencyPaths in interface EvidenceTable

getHypernymStatus

public SynsetRelations.HypernymStatus getHypernymStatus(org.apache.hadoop.hbase.client.Result row)
Retrieves the SynsetRelations.HypernymStatus for the given Result. The status will be the same across all corpora.

Specified by:
getHypernymStatus in interface EvidenceTable

putHypernymStatus

public void putHypernymStatus(org.apache.hadoop.hbase.io.ImmutableBytesWritable key,
                              SynsetRelations.HypernymStatus status)
Stores the SynsetRelations.HypernymStatus using the given key. The status will be the same across all corpora.

Specified by:
putHypernymStatus in interface EvidenceTable

close

public void close()
Description copied from interface: GenericTable
Closes the connection to the document reader.

Specified by:
close in interface GenericTable


Copyright © 2010-2011. All Rights Reserved.