gov.llnl.ontology.mapreduce.ingest
Class ParseMR

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by gov.llnl.ontology.mapreduce.CorpusTableMR
          extended by gov.llnl.ontology.mapreduce.ingest.ParseMR
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class ParseMR
extends CorpusTableMR

This Map Reduce job iterates over rows in a CorpusTable and produces a dependency parse tree for each sentence in each document. These parse trees are then storred as an annotation in the CorpusTable.

This class requires that the following types of objects be specified by the command line:

Author:
Keith Stevens

Nested Class Summary
static class ParseMR.ParseMapper
          This TableMapper does all of the work.
 
Nested classes/interfaces inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR
CorpusTableMR.CorpusTableMapper<K,V>
 
Field Summary
static String PARSER
          The configuration key for setting the Tokenizer.
 
Fields inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR
CONF_PREFIX, TABLE
 
Constructor Summary
ParseMR()
           
 
Method Summary
protected  void addOptions(MRArgOptions options)
          Add more command line arguments.
protected  String jobName()
          Returns a descriptive job name for this map reduce task.
static void main(String[] args)
          Runs the ParseMR.
protected  Class mapperClass()
          Returns the Class object for the Mapper task.
protected  void setupConfiguration(MRArgOptions options, org.apache.hadoop.conf.Configuration conf)
          Copies command line arguments to a Configuration so that Map/Reduce jobs can utilize the values set.
protected  void validateOptions(MRArgOptions options)
          Returns true if the MRArgOptions contains a valid value for each requried option.
 
Methods inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR
mapperKeyClass, mapperValueClass, run, setupReducer
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

PARSER

public static String PARSER
The configuration key for setting the Tokenizer.

Constructor Detail

ParseMR

public ParseMR()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Runs the ParseMR.

Throws:
Exception

addOptions

protected void addOptions(MRArgOptions options)
Add more command line arguments. By default, this adds no options.

Overrides:
addOptions in class CorpusTableMR

validateOptions

protected void validateOptions(MRArgOptions options)
Returns true if the MRArgOptions contains a valid value for each requried option. By default, this does no validation.

Overrides:
validateOptions in class CorpusTableMR

jobName

protected String jobName()
Returns a descriptive job name for this map reduce task.

Overrides:
jobName in class CorpusTableMR

setupConfiguration

protected void setupConfiguration(MRArgOptions options,
                                  org.apache.hadoop.conf.Configuration conf)
Copies command line arguments to a Configuration so that Map/Reduce jobs can utilize the values set. By default, this does no configuration.

Overrides:
setupConfiguration in class CorpusTableMR

mapperClass

protected Class mapperClass()
Returns the Class object for the Mapper task.

Specified by:
mapperClass in class CorpusTableMR


Copyright © 2010-2011. All Rights Reserved.