Write to Hadoop Sequence File

The Write to Hadoop Sequence File stage writes data to a sequence file as output from a dataflow. A sequence file is a flat file consisting of binary key/value pairs. For more information, go to wiki.apache.org/hadoop/SequenceFile.

Note: The Write to Hadoop Sequence File stage only supports delimited, uncompressed sequence files located on Hadoop Distributed File System (HDFS).

Related tasks:

Connecting to Hadoop: To be able to use Write to Hadoop Sequence File stage, you need to create a connection to the Hadoop file server. Once you do that, the name by which you save the connection is displayed as the server name.

File Properties Tab


Fields	Description
Server	Indicates the file you select in the File name field is located on the Hadoop system. Note: You need to create a connection to the Hadoop file server before using it here. For details on creating connection, see Connecting to Hadoop. If you select a file on the Hadoop system, the server name will be the name you specify while creating a file server.
File name	Specifies the path to the file. Click the ellipses button (...) to browse to the file you want.
Field separator	Specifies the character used to separate fields in a delimited file. For example, this record uses a pipe (\|) as a field separator: `7200 13TH ST\|MIAMI\|FL\|33144` These characters available to define as field separators are: Space Tab Comma Period Semicolon Pipe If the file uses a different character as a field separator, click the ellipses button to select another character as a delimiter.
Text qualifier	The character used to surround text values in a delimited file. For example, this record uses double quotes (") as a text qualifier. `"7200 13TH ST"\|"MIAMI"\|"FL"\|"33144"` The characters available to define as text qualifiers are: Single quote (') Double quote (") If the file uses a different text qualifier, click the ellipses button to select another character as a text qualifier.

Fields Tab

The Fields tab defines the names, positions, and types of fields in the file. For more information, see Defining Fields In an Output Sequence File