Defining Fields In an Input Sequence File

In the Read from Hadoop Sequence File stage, the Fields tab defines the names, positions, and types of fields in the file. After you define an input file on the File Properties tab you can define the fields.

If the input file does not contain a header record, or if you want to manually define the fields, follow these steps on the Fields tab:

  1. To define the fields already present in the input file, click Regenerate. Then, click Detect Type. This will automatically set the data type for each field based on the first 50 records in the file.
  2. To add additional fields in the output, click Add.
  3. In the Name field, choose the field you want to add or type the name of the field.
  4. In the Type field, you can leave the data type as string if you do not intend to perform any mathematical operations with the data. However, if you intend to perform these kinds of operations, select an appropriate data type. This will convert the string data from the file to a data type that will enable the proper manipulation of the data in the dataflow.
    The stage supports the following data types:
    double
    A numeric data type that contains both negative and positive double precision numbers between 2-1074 and (2-2-52)×21023. In E notation, the range of values is -1.79769313486232E+308 to 1.79769313486232E+308.
    float
    A numeric data type that contains both negative and positive single precision numbers between 2-149 and (2-223)×2127. In E notation, the range of values -3.402823E+38 to 3.402823E+38.
    integer
    A numeric data type that contains both negative and positive whole numbers between -231 (-2,147,483,648) and 231-1 (2,147,483,647).
    long
    A numeric data type that contains both negative and positive whole numbers between -263 (-9,223,372,036,854,775,808) and 263-1 (9,223,372,036,854,775,807).
    string
    A sequence of characters.
  5. In the Position field, enter the position of this field within the record.

    For example, in this input file, AddressLine1 is in position 1, City is in position 2, StateProvince is in position 3, and PostalCode is in position 4.

    "AddressLine1"|"City"|"StateProvince"|"PostalCode"
    "7200 13TH ST"|"MIAMI"|"FL"|"33144"
    "One Global View"|"Troy"|"NY"|12180
  6. If you want to have any excess space characters removed from the beginning and end of a field's value string, select the Trim check box.