gov.llnl.ontology.mapreduce.ingest
Class IngestCorpusMR

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by gov.llnl.ontology.mapreduce.CorpusTableMR
          extended by gov.llnl.ontology.mapreduce.ingest.IngestCorpusMR
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class IngestCorpusMR
extends CorpusTableMR

This Map Reduce job iterates over rows in a CorpusTable and applies sentence spans, token spans, and part of speech tags to every element in the raw text document.

This class requires that the following types of objects be specified by the command line:

Author:
Keith Stevens

Nested Class Summary
static class IngestCorpusMR.IngestCorpusMapper
          This TableMapper iterates over rows in a CorpusTable and applies sentence spans, token spans, and part of speech tags to every element in the raw text document.
 
Nested classes/interfaces inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR
CorpusTableMR.CorpusTableMapper<K,V>
 
Field Summary
static String DEFAULT_SPLITTER
           
static String DEFAULT_TAGGER
           
static String DEFAULT_TOKENIZER
           
static String SENTENCE_DETECTOR
          The configuration key for setting the SentenceDetector.
static String TAGGER
          The configuration key for setting the Tokenizer.
static String TOKENIZER
          The configuration key for setting the Tokenizer.
 
Fields inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR
CONF_PREFIX, TABLE
 
Constructor Summary
IngestCorpusMR()
           
 
Method Summary
protected  void addOptions(MRArgOptions options)
          Add more command line arguments.
protected  String jobName()
          Returns a descriptive job name for this map reduce task.
static void main(String[] args)
          Runs the IngestCorpusMR.
protected  Class mapperClass()
          Returns the Class object for the Mapper task.
protected  void setupConfiguration(MRArgOptions options, org.apache.hadoop.conf.Configuration conf)
          Copies command line arguments to a Configuration so that Map/Reduce jobs can utilize the values set.
protected  void validateOptions(MRArgOptions options)
          Returns true if the MRArgOptions contains a valid value for each requried option.
 
Methods inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR
mapperKeyClass, mapperValueClass, run, setupReducer
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

TAGGER

public static String TAGGER
The configuration key for setting the Tokenizer.


SENTENCE_DETECTOR

public static String SENTENCE_DETECTOR
The configuration key for setting the SentenceDetector.


TOKENIZER

public static String TOKENIZER
The configuration key for setting the Tokenizer.


DEFAULT_SPLITTER

public static final String DEFAULT_SPLITTER
See Also:
Constant Field Values

DEFAULT_TOKENIZER

public static final String DEFAULT_TOKENIZER
See Also:
Constant Field Values

DEFAULT_TAGGER

public static final String DEFAULT_TAGGER
See Also:
Constant Field Values
Constructor Detail

IngestCorpusMR

public IngestCorpusMR()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Runs the IngestCorpusMR.

Throws:
Exception

addOptions

protected void addOptions(MRArgOptions options)
Add more command line arguments. By default, this adds no options.

Overrides:
addOptions in class CorpusTableMR

validateOptions

protected void validateOptions(MRArgOptions options)
Returns true if the MRArgOptions contains a valid value for each requried option. By default, this does no validation.

Overrides:
validateOptions in class CorpusTableMR

jobName

protected String jobName()
Returns a descriptive job name for this map reduce task.

Overrides:
jobName in class CorpusTableMR

setupConfiguration

protected void setupConfiguration(MRArgOptions options,
                                  org.apache.hadoop.conf.Configuration conf)
Copies command line arguments to a Configuration so that Map/Reduce jobs can utilize the values set. By default, this does no configuration.

Overrides:
setupConfiguration in class CorpusTableMR

mapperClass

protected Class mapperClass()
Returns the Class object for the Mapper task.

Specified by:
mapperClass in class CorpusTableMR


Copyright © 2010-2011. All Rights Reserved.