gov.llnl.ontology.text.hbase
Class XMLRecordReader

java.lang.Object
  extended by org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
      extended by gov.llnl.ontology.text.hbase.XMLRecordReader
All Implemented Interfaces:
Closeable

public class XMLRecordReader
extends org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

A RecordReader for processing gzipped tarballs of document files. It is assumed that each tarballed file is a single document, or will be processed further by other stages.

Author:
Keith Stevens

Field Summary
static String CONF_PREFIX
           
static String DELIMITER_TAG
           
 
Constructor Summary
XMLRecordReader()
          Creates a new XMLRecordReader without gzipped files.
XMLRecordReader(boolean useGzip)
          Creates a new XMLRecordReader with useGzip set to true if the files are in a gzip format.
 
Method Summary
 void close()
          
 org.apache.hadoop.hbase.io.ImmutableBytesWritable getCurrentKey()
          
 org.apache.hadoop.io.Text getCurrentValue()
          
 float getProgress()
          
 void initialize(org.apache.hadoop.mapreduce.InputSplit isplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Extract the Path for the file to be processed by this XMLRecordReader.
 boolean nextKeyValue()
          Advances the reader one step to point to the next tarball file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CONF_PREFIX

public static final String CONF_PREFIX
See Also:
Constant Field Values

DELIMITER_TAG

public static final String DELIMITER_TAG
See Also:
Constant Field Values
Constructor Detail

XMLRecordReader

public XMLRecordReader()
Creates a new XMLRecordReader without gzipped files.


XMLRecordReader

public XMLRecordReader(boolean useGzip)
Creates a new XMLRecordReader with useGzip set to true if the files are in a gzip format.

Method Detail

initialize

public void initialize(org.apache.hadoop.mapreduce.InputSplit isplit,
                       org.apache.hadoop.mapreduce.TaskAttemptContext context)
                throws IOException,
                       InterruptedException
Extract the Path for the file to be processed by this XMLRecordReader.

Specified by:
initialize in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
Throws:
IOException
InterruptedException

nextKeyValue

public boolean nextKeyValue()
                     throws IOException
Advances the reader one step to point to the next tarball file. It returns null when there are no more files in the tarball.

Specified by:
nextKeyValue in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
Throws:
IOException

getCurrentKey

public org.apache.hadoop.hbase.io.ImmutableBytesWritable getCurrentKey()

Specified by:
getCurrentKey in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

getCurrentValue

public org.apache.hadoop.io.Text getCurrentValue()

Specified by:
getCurrentValue in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

getProgress

public float getProgress()
                  throws IOException,
                         InterruptedException

Specified by:
getProgress in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
Throws:
IOException
InterruptedException

close

public void close()
           throws IOException

Specified by:
close in interface Closeable
Specified by:
close in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
Throws:
IOException


Copyright © 2010-2011. All Rights Reserved.