|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.hadoop.conf.Configured
gov.llnl.ontology.mapreduce.CorpusTableMR
gov.llnl.ontology.mapreduce.ingest.IngestCorpusMR
public class IngestCorpusMR
This Map Reduce job iterates over rows in a CorpusTable
and
applies sentence spans, token spans, and part of speech tags to every
element in the raw text document.
CorpusTable
: Controls access to the document table.SentenceDetector
: Splits documents up in to a series of
sentences.Tokenizer
: Tokenizes words in a sentence.POSTagger
: Applies Part of Speech tags to words in a
sentence.
Nested Class Summary | |
---|---|
static class |
IngestCorpusMR.IngestCorpusMapper
This TableMapper iterates over rows in a CorpusTable and
applies sentence spans, token spans, and part of speech tags to every
element in the raw text document. |
Nested classes/interfaces inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR |
---|
CorpusTableMR.CorpusTableMapper<K,V> |
Field Summary | |
---|---|
static String |
DEFAULT_SPLITTER
|
static String |
DEFAULT_TAGGER
|
static String |
DEFAULT_TOKENIZER
|
static String |
SENTENCE_DETECTOR
The configuration key for setting the SentenceDetector . |
static String |
TAGGER
The configuration key for setting the Tokenizer . |
static String |
TOKENIZER
The configuration key for setting the Tokenizer . |
Fields inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR |
---|
CONF_PREFIX, TABLE |
Constructor Summary | |
---|---|
IngestCorpusMR()
|
Method Summary | |
---|---|
protected void |
addOptions(MRArgOptions options)
Add more command line arguments. |
protected String |
jobName()
Returns a descriptive job name for this map reduce task. |
static void |
main(String[] args)
Runs the IngestCorpusMR . |
protected Class |
mapperClass()
Returns the Class object for the Mapper task. |
protected void |
setupConfiguration(MRArgOptions options,
org.apache.hadoop.conf.Configuration conf)
Copies command line arguments to a Configuration so that
Map/Reduce jobs can utilize the values set. |
protected void |
validateOptions(MRArgOptions options)
Returns true if the MRArgOptions contains a valid value for each
requried option. |
Methods inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR |
---|
mapperKeyClass, mapperValueClass, run, setupReducer |
Methods inherited from class org.apache.hadoop.conf.Configured |
---|
getConf, setConf |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
---|
getConf, setConf |
Field Detail |
---|
public static String TAGGER
Tokenizer
.
public static String SENTENCE_DETECTOR
SentenceDetector
.
public static String TOKENIZER
Tokenizer
.
public static final String DEFAULT_SPLITTER
public static final String DEFAULT_TOKENIZER
public static final String DEFAULT_TAGGER
Constructor Detail |
---|
public IngestCorpusMR()
Method Detail |
---|
public static void main(String[] args) throws Exception
IngestCorpusMR
.
Exception
protected void addOptions(MRArgOptions options)
addOptions
in class CorpusTableMR
protected void validateOptions(MRArgOptions options)
MRArgOptions
contains a valid value for each
requried option. By default, this does no validation.
validateOptions
in class CorpusTableMR
protected String jobName()
jobName
in class CorpusTableMR
protected void setupConfiguration(MRArgOptions options, org.apache.hadoop.conf.Configuration conf)
Configuration
so that
Map/Reduce jobs can utilize the values set. By default, this does no
configuration.
setupConfiguration
in class CorpusTableMR
protected Class mapperClass()
Class
object for the Mapper task.
mapperClass
in class CorpusTableMR
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |