gov.llnl.ontology.text.hbase
Class GzipTarInputFormat.GzipTarRecordReader

java.lang.Object
  extended by org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
      extended by gov.llnl.ontology.text.hbase.GzipTarInputFormat.GzipTarRecordReader
All Implemented Interfaces:
Closeable
Enclosing class:
GzipTarInputFormat

public class GzipTarInputFormat.GzipTarRecordReader
extends org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

A RecordReader for processing gzipped tarballs of document files. It is assumed that each tarballed file is a single document, or will be processed further by other stages.


Constructor Summary
GzipTarInputFormat.GzipTarRecordReader()
           
 
Method Summary
 void close()
          
 org.apache.hadoop.hbase.io.ImmutableBytesWritable getCurrentKey()
          
 org.apache.hadoop.io.Text getCurrentValue()
          
 float getProgress()
          
 void initialize(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Extract the Path for the file to be processed by this GzipTarInputFormat.GzipTarRecordReader.
 boolean nextKeyValue()
          Advances the reader one step to point to the next tarball file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GzipTarInputFormat.GzipTarRecordReader

public GzipTarInputFormat.GzipTarRecordReader()
Method Detail

initialize

public void initialize(org.apache.hadoop.mapreduce.InputSplit split,
                       org.apache.hadoop.mapreduce.TaskAttemptContext context)
                throws IOException,
                       InterruptedException
Extract the Path for the file to be processed by this GzipTarInputFormat.GzipTarRecordReader.

Specified by:
initialize in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
Throws:
IOException
InterruptedException

nextKeyValue

public boolean nextKeyValue()
                     throws IOException
Advances the reader one step to point to the next tarball file. It returns null when there are no more files in the tarball.

Specified by:
nextKeyValue in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
Throws:
IOException

getCurrentKey

public org.apache.hadoop.hbase.io.ImmutableBytesWritable getCurrentKey()

Specified by:
getCurrentKey in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

getCurrentValue

public org.apache.hadoop.io.Text getCurrentValue()

Specified by:
getCurrentValue in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

getProgress

public float getProgress()
                  throws IOException,
                         InterruptedException

Specified by:
getProgress in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
Throws:
IOException
InterruptedException

close

public void close()
           throws IOException

Specified by:
close in interface Closeable
Specified by:
close in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
Throws:
IOException


Copyright © 2010-2011. All Rights Reserved.