Defining Fields for Reading from Hive File

In the Fields tab of the Read from Hive File stage, the schema names, datatypes, positions, and the given names of the fields in the file are listed.

  1. Click Regenerate.
    For ORC, Avro, and Parquet files, this generates the schema based on the metadata of the existing file.

    The grid displays the columns Name, Type, Stage Field, and Include.

    The Name column displays the field name, as derived from the header record of the file.

    The Type column lists the datatypes of each respective field of the file.

    The stage supports these data types:

    boolean
    A logical type with two values: true and false.
    date
    A data type that contains a month, day, and year. For example, 2012-01-30 or January 30, 2012. You can specify a default date format in Spectrum Management Console.
    datetime
    A data type that contains a month, day, year, and hours, minutes, and seconds.

    For example, 2012/01/30 6:15:00 PM.

    Note: The datetime datatype in Spectrum maps to the timestamp datatype of Hive files.
    double
    A numeric data type that contains both negative and positive double precision numbers between 2-1074 and (2-2-52)×21023. In E notation, the range of values is -1.79769313486232E+308 to 1.79769313486232E+308.
    bigdecimal
    A numeric data type that supports 38 decimal points of precision. Use this data type for data that will be used in mathematical calculations requiring a high degree of precision, especially those involving financial data. The bigdecimal data type supports more precise calculations than the double data type.
    Note: For Avro and Parquet Hive files, fields of the decimal datatype in the input file are converted to bigdecimal datatype.
    long
    A numeric data type that contains both negative and positive whole numbers between -263 (-9,223,372,036,854,775,808) and 263-1 (9,223,372,036,854,775,807).
    Note: The long datatype in Spectrum maps to the bigint datatype of Hive files.
    integer
    A numeric data type that contains both negative and positive whole numbers between -231 (-2,147,483,648) and 231-1 (2,147,483,647).
    float
    A numeric data type that contains both negative and positive single precision numbers between 2-149 and (2-223)×2127. In E notation, the range of values -3.402823E+38 to 3.402823E+38.
    string
    A sequence of characters.
    The Position column displays the starting position of the respective field within a record.
  2. In the Stage Field column, edit the existing field name to the desired name for each field.
    By default, this column displays the field names read from the file.
  3. In the Include column, select the checkboxes against the fields you wish to include in the output of the stage.
    By default, all the fields are selected in this column.
  4. Click OK.