Input Parameters

Parameter Description
Table Lookup Configuration To standardize terms against a previously validated form of that term, and to apply the standardized version to all records.

The rules can be of the type Standardize, Categorize or Identify.

Reference Data Path To specify the Reference Data path details.
Job Configurations The Hadoop configurations for the job.

For a MapReduce job, the instance must be of type MRJobConfig. For a Spark job, the instance must be of type SparkJobConfig.

Input File For text files:
File Path
The path of the input text file on the Hadoop platform.
Record Separator
The record separator used in the input file.
Field Separator
The separator used between any two consecutive fields of a record, in the input file.
Text Qualifier
The character used to surround text values in a delimited file.
Header Row Fields
An array of the header fields of the input file.
Skip First Row
Flag to indicate if the first row must be skipped while reading the input file records.

This must be true in case the first row is a header row.

Attention: Invoke the appropriate constructor of FilePath.
For ORC format files:
ORC File Path
The path of the input ORC format file on the Hadoop platform.
Common parameters:
Field Mappings
A map of key value pairs, with the existing column names as the keys and the desired output column names as the values.
Output File For text files:
File Path
The path of the output text file on the Hadoop platform.
Field Separator
The separator used between any two consecutive fields of a record, in the output file.
Attention: Invoke the appropriate constructor of FilePath.
For ORC format files:
ORC File Path
The path of the output ORC format file on the Hadoop platform.
Common Parameters:
Flag to indicate if output file must overwrite any existing file of same name.
Create Output Header
Flag to indicate if header file is to be created on the Hadoop server or not.
Job Name The name of the job.
Compress Output Flag to indicate if the output must be compressed.

Set this to true to compress the output.