Read from Hadoop Sequence File
The Read from Hadoop Sequence File stage reads data from a sequence file as input to a dataflow. A sequence file is a flat file consisting of binary key/value pairs. For more information, refer to https://knpcode.com/hadoop/hadoop-io/how-to-read-and-write-sequencefile-in-hadoop/.
- Connectivity to HDFS from Spectrum on Windows
- Support and connectivity to Hadoop 3.x from Spectrum with high availability
- Kerberos-enabled HDFS connectivity through Windows
Also see Configuring HDFS Connection for HA Cluster and Best Practices for connecting to HDFS 3.x and Hive 2.1.1.
Connecting to Hadoop: To be able to read a file located on the Hadoop system or to write a file to it, you need to create a connection to the Hadoop file server. Once you do that, the name by which you save the connection is displayed as the server name.
File Properties Tab
Fields | Description |
---|---|
Server | Indicates the file you select in the File name field is located
on the Hadoop system. Note: You need to create a connection to the Hadoop file server before
using it here. For details on creating connection, see Connecting to Hadoop.
If you select a file on the Hadoop system, the server name will be the name you
specify while creating a file server. |
File name | Specifies the path to the file. Click the ellipses button (...) to go to the file you want. |
Field separator | Specifies the character used to separate fields in a delimited
file. For example, this record uses a pipe (|) as a field
separator:
These characters available to define as field separators are:
If the file uses a different character as a field separator, click the ellipses button to select another character as a delimiter. |
Text qualifier |
The character used to surround text values in a delimited file. For example, this record uses double quotes (") as a text qualifier.
The characters available to define as text qualifiers are:
If the file uses a different text qualifier, click the ellipses button to select another character as a text qualifier. |
Fields Tab
The Fields tab defines the names, positions, and types of fields in the file. For more information, see Defining Fields In an Input Sequence File.
Sort Fields Tab
The Sort Fields tab defines fields by which to sort the input records before they are sent into the dataflow. Sorting is optional. For more information, see Sorting Input Records.
Filter Tab
The Field tab defines fields by which to filter the input records before they are sent into the dataflow. For more information, see Filtering Input Records.