Defining Fields In an Input Sequence File

In the Read from Hadoop Sequence File stage, the Fields tab defines the names, positions, and types of fields in the file. After you define an input file on the File Properties tab you can define the fields.

The Fields tab defines the names, position, and, for some file types, lengths, of the fields in the file. After you define an input file on the File Properties tab you can define the fields.

  1. To add additional fields in the output, click Add.
  2. In the Type field, you can leave the data type as string if you do not intend to perform any mathematical operations with the data. However, if you intend to perform these kinds of operations, select an appropriate data type. This will convert the string data from the file to a data type that will enable the proper manipulation of the data in the dataflow.
    The stage supports the following data types:
    double
    A numeric data type that contains both negative and positive double precision numbers between 2-1074 and (2-2-52)×21023. In E notation, the range of values is -1.79769313486232E+308 to 1.79769313486232E+308.
    float
    A numeric data type that contains both negative and positive single precision numbers between 2-149 and (2-223)×2127. In E notation, the range of values -3.402823E+38 to 3.402823E+38.
    integer
    A numeric data type that contains both negative and positive whole numbers between -231 (-2,147,483,648) and 231-1 (2,147,483,647).
    long
    A numeric data type that contains both negative and positive whole numbers between -263 (-9,223,372,036,854,775,808) and 263-1 (9,223,372,036,854,775,807).
    string
    A sequence of characters.
  3. In the Name field, choose the field you want to add or type the name of the field.