In the Write to Hadoop Sequence File stage, the Fields tab
defines the names, positions, and types of fields in the file. After you define an
input file on the File Properties tab you can define the
fields.
-
To select the desired fields from the input data, or an existing file, click
Quick Add.
-
Select the specific fields from the input data.
-
Click OK.
-
To add new fields, click Add.
- Enter the Name of the field.
- Select the Type of the field.
The stage supports the following data types:
- double
- A numeric data type that contains both negative and positive double
precision numbers between 2-1074 and
(2-2-52)×21023. In E notation, the range of values is
-1.79769313486232E+308 to 1.79769313486232E+308.
- float
- A numeric data type that contains both negative and positive single
precision numbers between 2-149 and
(2-223)×2127. In E notation, the range of values
-3.402823E+38 to 3.402823E+38.
- integer
- A numeric data type that contains both negative and positive whole numbers
between -231 (-2,147,483,648) and 231-1
(2,147,483,647).
- long
- A numeric data type that contains both negative and positive whole numbers
between -263 (-9,223,372,036,854,775,808) and 263-1
(9,223,372,036,854,775,807).
- string
- A sequence of characters.
-
If you're overwriting an existing file, click Regenerate
to pick the schema from the existing file, and modify it.
This generates the schema based on the first 50 records in the input
data to this stage.
-
If you want to have any excess space characters removed from the beginning and
end of a field's character string, select the Trim Spaces
check box.
-
Specify an option to generate the key:
- Auto Generate
-
The Hadoop cluster auto generates the key. For auto generation,
all the fields in the grid are considered value fields. The data
type of the key is long.
- User Defined
-
By default, the first field in the grid is selected as the key
field. An icon is displayed to indicate that the field is the
key field. You can select any other field as the key field.
After defining the fields in your output file, you can edit its contents and layout.
Option Name
|
Description
|
Add
|
Adds a field to the output. You can append a field to the end of the existing layout, or
you can insert a field into an existing position and the
position of the remaining fields will be adjusted
accordingly.
|
Modify
|
Modifies the field's name and type.
|
Remove
|
Removes the selected field from the output.
|
Move Up/Move Down
|
Reorders the selected field.
|