XMLRecordReader (C-Cat 1.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES All Classes

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

gov.llnl.ontology.text.hbase
Class XMLRecordReader

java.lang.Object
  org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
      gov.llnl.ontology.text.hbase.XMLRecordReader

All Implemented Interfaces:: Closeable

public class XMLRecordReader
extends org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>
extends org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

A RecordReader for processing gzipped tarballs of document files. It is assumed that each tarballed file is a single document, or will be processed further by other stages.

Author:: Keith Stevens

Field Summary
`static String`	`CONF_PREFIX`
`static String`	`DELIMITER_TAG`

Constructor Summary
`XMLRecordReader()` Creates a new `XMLRecordReader` without gzipped files.
`XMLRecordReader(boolean useGzip)` Creates a new `XMLRecordReader` with `useGzip` set to true if the files are in a gzip format.

Method Summary
`void`	`close()`
`org.apache.hadoop.hbase.io.ImmutableBytesWritable`	`getCurrentKey()`
`org.apache.hadoop.io.Text`	`getCurrentValue()`
`float`	`getProgress()`
`void`	`initialize(org.apache.hadoop.mapreduce.InputSplit isplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)` Extract the `Path` for the file to be processed by this `XMLRecordReader`.
`boolean`	`nextKeyValue()` Advances the reader one step to point to the next tarball file.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

CONF_PREFIX

public static final String CONF_PREFIX

See Also:: Constant Field Values

DELIMITER_TAG

public static final String DELIMITER_TAG

See Also:: Constant Field Values

Constructor Detail

XMLRecordReader

public XMLRecordReader()

Creates a new XMLRecordReader without gzipped files.

XMLRecordReader

public XMLRecordReader(boolean useGzip)

Creates a new XMLRecordReader with useGzip set to true if the files are in a gzip format.

Method Detail

initialize

public void initialize(org.apache.hadoop.mapreduce.InputSplit isplit,
                       org.apache.hadoop.mapreduce.TaskAttemptContext context)
                throws IOException,
                       InterruptedException

Extract the Path for the file to be processed by this XMLRecordReader.

Specified by:: initialize in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

Throws:: IOException; InterruptedException

nextKeyValue

public boolean nextKeyValue()
                     throws IOException

Advances the reader one step to point to the next tarball file. It returns null when there are no more files in the tarball.

Specified by:: nextKeyValue in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

Throws:: IOException

getCurrentKey

public org.apache.hadoop.hbase.io.ImmutableBytesWritable getCurrentKey()

Specified by:: getCurrentKey in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

getCurrentValue

public org.apache.hadoop.io.Text getCurrentValue()

Specified by:: getCurrentValue in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

getProgress

public float getProgress()
                  throws IOException,
                         InterruptedException

Specified by:: getProgress in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

Throws:: IOException; InterruptedException

close

public void close()
           throws IOException

Specified by:: close in interface Closeable
Specified by:: close in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.hbase.io.ImmutableBytesWritable,org.apache.hadoop.io.Text>

Throws:: IOException

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES All Classes

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

gov.llnl.ontology.text.hbase Class XMLRecordReader

CONF_PREFIX

DELIMITER_TAG

XMLRecordReader

XMLRecordReader

initialize

nextKeyValue

getCurrentKey

getCurrentValue

getProgress

close

gov.llnl.ontology.text.hbase
Class XMLRecordReader