|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
gov.llnl.ontology.text.hbase.XMLInputFormat
public class XMLInputFormat
A FileInputFormat for handling gzipped tarball files with each
internal file containing data for a single document.
| Constructor Summary | |
|---|---|
XMLInputFormat()
|
|
| Method Summary | |
|---|---|
org.apache.hadoop.mapreduce.RecordReader |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Returns a XMLRecordReader. |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext job)
Generate the list of files and make them into FileSplits. |
static void |
setXMLTags(org.apache.hadoop.mapreduce.Job job,
String delimiter)
|
| Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
|---|
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public XMLInputFormat()
| Method Detail |
|---|
public static void setXMLTags(org.apache.hadoop.mapreduce.Job job,
String delimiter)
public org.apache.hadoop.mapreduce.RecordReader createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException,
InterruptedException
XMLRecordReader. The record reader will return
each tarred file.
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>IOException
InterruptedException
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job)
throws IOException
getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>IOException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||