|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.hadoop.conf.Configured
gov.llnl.ontology.mapreduce.CorpusTableMR
gov.llnl.ontology.mapreduce.ingest.IngestCorpusMR
public class IngestCorpusMR
This Map Reduce job iterates over rows in a CorpusTable and
applies sentence spans, token spans, and part of speech tags to every
element in the raw text document.
CorpusTable: Controls access to the document table.SentenceDetector: Splits documents up in to a series of
sentences.Tokenizer: Tokenizes words in a sentence.POSTagger: Applies Part of Speech tags to words in a
sentence.
| Nested Class Summary | |
|---|---|
static class |
IngestCorpusMR.IngestCorpusMapper
This TableMapper iterates over rows in a CorpusTable and
applies sentence spans, token spans, and part of speech tags to every
element in the raw text document. |
| Nested classes/interfaces inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR |
|---|
CorpusTableMR.CorpusTableMapper<K,V> |
| Field Summary | |
|---|---|
static String |
DEFAULT_SPLITTER
|
static String |
DEFAULT_TAGGER
|
static String |
DEFAULT_TOKENIZER
|
static String |
SENTENCE_DETECTOR
The configuration key for setting the SentenceDetector. |
static String |
TAGGER
The configuration key for setting the Tokenizer. |
static String |
TOKENIZER
The configuration key for setting the Tokenizer. |
| Fields inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR |
|---|
CONF_PREFIX, TABLE |
| Constructor Summary | |
|---|---|
IngestCorpusMR()
|
|
| Method Summary | |
|---|---|
protected void |
addOptions(MRArgOptions options)
Add more command line arguments. |
protected String |
jobName()
Returns a descriptive job name for this map reduce task. |
static void |
main(String[] args)
Runs the IngestCorpusMR. |
protected Class |
mapperClass()
Returns the Class object for the Mapper task. |
protected void |
setupConfiguration(MRArgOptions options,
org.apache.hadoop.conf.Configuration conf)
Copies command line arguments to a Configuration so that
Map/Reduce jobs can utilize the values set. |
protected void |
validateOptions(MRArgOptions options)
Returns true if the MRArgOptions contains a valid value for each
requried option. |
| Methods inherited from class gov.llnl.ontology.mapreduce.CorpusTableMR |
|---|
mapperKeyClass, mapperValueClass, run, setupReducer |
| Methods inherited from class org.apache.hadoop.conf.Configured |
|---|
getConf, setConf |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface org.apache.hadoop.conf.Configurable |
|---|
getConf, setConf |
| Field Detail |
|---|
public static String TAGGER
Tokenizer.
public static String SENTENCE_DETECTOR
SentenceDetector.
public static String TOKENIZER
Tokenizer.
public static final String DEFAULT_SPLITTER
public static final String DEFAULT_TOKENIZER
public static final String DEFAULT_TAGGER
| Constructor Detail |
|---|
public IngestCorpusMR()
| Method Detail |
|---|
public static void main(String[] args)
throws Exception
IngestCorpusMR.
Exceptionprotected void addOptions(MRArgOptions options)
addOptions in class CorpusTableMRprotected void validateOptions(MRArgOptions options)
MRArgOptions contains a valid value for each
requried option. By default, this does no validation.
validateOptions in class CorpusTableMRprotected String jobName()
jobName in class CorpusTableMR
protected void setupConfiguration(MRArgOptions options,
org.apache.hadoop.conf.Configuration conf)
Configuration so that
Map/Reduce jobs can utilize the values set. By default, this does no
configuration.
setupConfiguration in class CorpusTableMRprotected Class mapperClass()
Class object for the Mapper task.
mapperClass in class CorpusTableMR
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||