In the Fields tab of the Write to Hive
File stage, the schema names and datatypes of the fields in the
input data to the stage are listed.
-
To select the desired fields from the input data, or an existing file, click
Quick Add.
-
Select the specific fields from the input data.
-
Click OK.
-
To add new fields, click Add.
-
Enter the Name of the field.
-
Select the Type of the field. The stage supports
these data types:
- boolean
- A logical type with two values: true and false.
- date
- A data type that contains a month, day, and year. For example, 2012-01-30 or January 30,
2012. You can specify a default date format in Spectrum Management Console.
- datetime
- A data type that contains a month, day, year, and hours,
minutes, and seconds. For example, 2012/01/30 6:15:00
PM.
Note: In Parquet files, datetime
and
time
datatypes are mapped as
String
.
- double
- A numeric data type that contains both negative and positive double
precision numbers between 2-1074 and
(2-2-52)×21023. In E notation, the range of values is
-1.79769313486232E+308 to 1.79769313486232E+308.
- float
- A numeric data type that contains both negative and positive single
precision numbers between 2-149 and
(2-223)×2127. In E notation, the range of values
-3.402823E+38 to 3.402823E+38.
- integer
- A numeric data type that contains both negative and positive whole numbers
between -231 (-2,147,483,648) and 231-1
(2,147,483,647).
- bigdecimal
- A numeric data type that supports 38 decimal points of
precision. Use this data type for data that will be used in
mathematical calculations requiring a high degree of
precision, especially those involving financial data. The
bigdecimal data type supports more precise calculations than
the double data type.
Note: For Avro and Parquet Hive files,
the bigdecimal
datatype is converted to
a decimal
datatype with precision 38
and scale 10.
- long
- A numeric data type that contains both negative and positive
whole numbers between -263
(-9,223,372,036,854,775,808) and 263-1
(9,223,372,036,854,775,807).
- string
- A sequence of characters.
-
In the Position field, enter the position of
this field within the record.
For example, in this input file, AddressLine1 is in position 1, City
is in position 2, StateProvince is in position 3, and PostalCode is
in position 4.
"AddressLine1"|"City"|"StateProvince"|"PostalCode"
"7200 13TH ST"|"MIAMI"|"FL"|"33144"
"One Global View"|"Troy"|"NY"|12180
-
If you're overwriting an existing file, click Regenerate
to pick the schema from the existing file, then modify it.
This generates the schema based on the metadata of the existing file, in
case of ORC and Parquet output files.
The Name column
lists the names of the various columns of the input data. The
Type column lists the datatypes of each
respective field of the input data.
Note: In case of Parquet file type, another column
Nullable indicates whether the field is
nullable or not. You can check this checkbox for a particular field to
make the field nullable, or uncheck it otherwise.
-
You can modify the names, datatypes and sequence of the selected columns in the
output using these buttons:
Option Name
|
Description
|
Add
|
Adds a field to the output.
|
Modify
|
Modifies the selected field's name and datatype.
|
Remove
|
Removes the selected field from the output.
|
Move Up/Move Down
|
Reorders the position of the selected field in the
output.
|
-
Click OK.