Read from Hadoop Sequence File

The Read from Hadoop Sequence File stage reads data from a sequence file as input to a dataflow. A sequence file is a flat file consisting of binary key/value pairs. For more information, go to wiki.apache.org/hadoop/SequenceFile.

Note: The Read from Hadoop Sequence File stage only supports delimited, uncompressed sequence files located on Hadoop Distributed File System (HDFS).

File Properties Tab

Fields Description
Server Indicates the file you select in the File name field is located on the Hadoop system. You need to create a connection to the Hadoop file server in the Management Console before using it in the stage. If you select a file on the Hadoop system, the server name will be the name you specify in the Management Console while creating a file server.
File name Specifies the path to the file. Click the ellipses button (...) to browse to the file you want.

Field separator

Specifies the character used to separate fields in a delimited file.

For example, this record uses a pipe (|) as a field separator:

7200 13TH ST|MIAMI|FL|33144

These characters available to define as field separators are:

  • Space
  • Tab
  • Comma
  • Period
  • Semicolon
  • Pipe

If the file uses a different character as a field separator, click the ellipses button to select another character as a delimiter.

Text qualifier

The character used to surround text values in a delimited file.

For example, this record uses double quotes (") as a text qualifier.

"7200 13TH ST"|"MIAMI"|"FL"|"33144"

The characters available to define as text qualifiers are:

  • Single quote (')
  • Double quote (")

If the file uses a different text qualifier, click the ellipses button to select another character as a text qualifier.

Fields Tab

The Fields tab defines the names, positions, and types of fields in the file. For more information, see Defining Fields In an Input Sequence File.

Sort Fields Tab

The Sort Fields tab defines fields by which to sort the input records before they are sent into the dataflow. Sorting is optional. For more information, see Sorting Input Records.

Filter Tab

The Field tab defines fields by which to filter the input records before they are sent into the dataflow. For more information, see Filtering Input Records.