gov.llnl.ontology.mapreduce
Class CorpusTableMR

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by gov.llnl.ontology.mapreduce.CorpusTableMR
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
Direct Known Subclasses:
DependencyOccurrenceCountMR, DisambiguateMR, ExtractKeysMR, ExtractNounPairsMR, IngestCorpusMR, OneLinePerDocExtractorMR, ParsedDocExtractorMR, ParseMR, POSCountMR, SemEvalPrinter, TagDocumentMR, TagNetworkMR, TagOccurrenceMR, TagWordStatsMR, TermDocOccurrenceCountMR, TermDocumentCountMR, TokenCountMR, WordOccurrenceCountMR, WordsiMR

public abstract class CorpusTableMR
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool

Author:
Keith Stevens

Nested Class Summary
static class CorpusTableMR.CorpusTableMapper<K,V>
          A simple base class for any CorpusTableMR job.
 
Field Summary
static String CONF_PREFIX
          The configuration key prefix.
static String TABLE
          The configuration key for setting the CorpusTable.
 
Constructor Summary
CorpusTableMR()
           
 
Method Summary
protected  void addOptions(MRArgOptions options)
          Add more command line arguments.
protected  String jobName()
          Returns a descriptive job name for this map reduce task.
protected abstract  Class mapperClass()
          Returns the Class object for the Mapper task.
protected  Class mapperKeyClass()
          Returns the Class object for the Mapper Key of this task.
protected  Class mapperValueClass()
          Returns the Class object for the Mapper Value of this task.
 int run(String[] args)
          
protected  void setupConfiguration(MRArgOptions options, org.apache.hadoop.conf.Configuration conf)
          Copies command line arguments to a Configuration so that Map/Reduce jobs can utilize the values set.
protected  void setupReducer(String tableName, org.apache.hadoop.mapreduce.Job job, MRArgOptions options)
          Sets up the Reducer for this job.
protected  void validateOptions(MRArgOptions options)
          Returns true if the MRArgOptions contains a valid value for each requried option.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

CONF_PREFIX

public static String CONF_PREFIX
The configuration key prefix.


TABLE

public static String TABLE
The configuration key for setting the CorpusTable.

Constructor Detail

CorpusTableMR

public CorpusTableMR()
Method Detail

addOptions

protected void addOptions(MRArgOptions options)
Add more command line arguments. By default, this adds no options.


jobName

protected String jobName()
Returns a descriptive job name for this map reduce task.


validateOptions

protected void validateOptions(MRArgOptions options)
Returns true if the MRArgOptions contains a valid value for each requried option. By default, this does no validation.


setupConfiguration

protected void setupConfiguration(MRArgOptions options,
                                  org.apache.hadoop.conf.Configuration conf)
Copies command line arguments to a Configuration so that Map/Reduce jobs can utilize the values set. By default, this does no configuration.


setupReducer

protected void setupReducer(String tableName,
                            org.apache.hadoop.mapreduce.Job job,
                            MRArgOptions options)
                     throws IOException
Sets up the Reducer for this job. By default, it is a IdentityTableReducer.

Throws:
IOException

mapperClass

protected abstract Class mapperClass()
Returns the Class object for the Mapper task.


mapperKeyClass

protected Class mapperKeyClass()
Returns the Class object for the Mapper Key of this task. By default this returns ImmutableBytesWritable.


mapperValueClass

protected Class mapperValueClass()
Returns the Class object for the Mapper Value of this task. By default, this returns Put.


run

public int run(String[] args)
        throws IOException,
               InterruptedException,
               ClassNotFoundException

Specified by:
run in interface org.apache.hadoop.util.Tool
Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2010-2011. All Rights Reserved.