In the Write to Hadoop Sequence File stage, the Fields tab defines
the names, positions, and types of fields in the file. After you define an input
file on the File Properties tab you can define the
fields.
-
To select the desired fields from the input data, or an existing file, click
Quick Add.
-
Select the specific fields from the input data.
-
Click OK.
-
To add new fields, click Add.
-
Enter the Name of the field.
-
Select the Type of the field. The stage supports
the following data types:
- boolean
- A logical type with two values: true and false.
- date
- A data type that contains a month, day, and year. For example, 2012-01-30 or January 30,
2012. You can specify a default date format in Management Console.
- datetime
- A data type that contains a month, day, year, and hours,
minutes, and seconds. For example, 2012/01/30 6:15:00
PM.
Note: In Parquet files, datetime
and
time
datatypes are mapped as
String
. In RC files,
datetime
datatype is mapped as
timestamp
.
- double
- A numeric data type that contains both negative and positive double
precision numbers between 2-1074 and
(2-2-52)×21023. In E notation, the range of values is
-1.79769313486232E+308 to 1.79769313486232E+308.
- float
- A numeric data type that contains both negative and positive single
precision numbers between 2-149 and
(2-223)×2127. In E notation, the range of values
-3.402823E+38 to 3.402823E+38.
- integer
- A numeric data type that contains both negative and positive whole numbers
between -231 (-2,147,483,648) and 231-1
(2,147,483,647).
- bigdecimal
- A numeric data type that supports 38 decimal points of
precision. Use this data type for data that will be used in
mathematical calculations requiring a high degree of
precision, especially those involving financial data. The
bigdecimal data type supports more precise calculations than
the double data type.
Note: For RC, Avro, and Parquet Hive
files, the bigdecimal
datatype is
converted to a decimal
datatype with
precision 38 and scale 10.;
- long
- A numeric data type that contains both negative and positive
whole numbers between -263
(-9,223,372,036,854,775,808) and 263-1
(9,223,372,036,854,775,807).
Note: In RC files, the
long
datatype is mapped as
bigint
datatype.
- string
- A sequence of characters.
-
In the Position field, enter the position of
this field within the record.
For example, in this input file, AddressLine1 is in position 1, City
is in position 2, StateProvince is in position 3, and PostalCode is
in position 4.
"AddressLine1"|"City"|"StateProvince"|"PostalCode"
"7200 13TH ST"|"MIAMI"|"FL"|"33144"
"One Global View"|"Troy"|"NY"|12180
-
If you're overwriting an existing file, click Regenerate
to pick the schema from the existing file, and then modify it.
This generates the schema based on the first 50 records in the input
data to this stage.
-
If you want to have any excess space characters removed from the beginning and
end of a field's character string, select the Trim Spaces
check box.
-
Specify one of the following options to generate the key:
- Auto Generate
-
The Hadoop cluster auto generates the key. For auto generation,
all the fields in the grid are considered value fields. The data
type of the key is long.
- User Defined
-
By default, the first field in the grid is selected as the key
field. An icon is displayed to indicate that the field is the
key field. You can select any other field as the key field.
After defining the fields in your output file, you can edit its contents and layout.
Option Name
|
Description
|
Add
|
Adds a field to the output. You can append a field to the end of the existing layout, or
you can insert a field into an existing position and the
position of the remaining fields will be adjusted
accordingly.
|
Modify
|
Modifies the field's name and type.
|
Remove
|
Removes the selected field from the output.
|
Move Up/Move Down
|
Reorders the selected field.
|