Configuration Files

These tables describe the parameters and the values you need to specify before you run the Joiner job.

Note: The description here assumes there are three input files for the joiner job. However, you can have any number of input files for the job.

Table 1. inputFileConfig
Parameter	Description
pb.bdq.input.type	Input file type. The values can be: `file`, `TEXT`, or `ORC`.
These rows describe the details of the first input file.
pb.bdq.inputfile.path.0	The path where you have placed the input file on HDFS. For example, /home/hduser/input/input0.txt
textinputformat.record.delimiter.0	File record delimiter used in the text type input file. For example, `LINUX`, `MACINTOSH`, or `WINDOWS`
pb.bdq.inputformat.field.delimiter.0	Field or column delimiter used in the input file, such as comma (`,`) or tab.
pb.bdq.inputformat.text.qualifier.0	Text qualifiers, if any, in the columns or fields of the input file.
pb.bdq.inputformat.file.header.0	Column headers as comma-separated values. For example, business name, id, and domain.
pb.bdq.inputformat.skip.firstrow.0	If the first row is to be skipped from processing. The values can be `True` or `False`, where `True` indicates skip.
These rows describe the details of the second input file.
pb.bdq.inputfile.path.1	The path where you have placed the input file on HDFS. For example, /home/hduser/input/input1.txt
textinputformat.record.delimiter.1	File record delimiter used in the text type input file. For example, `LINUX`, `MACINTOSH`, or `WINDOWS`
pb.bdq.inputformat.field.delimiter.1	Field or column delimiter used in the input file, such as comma (`,`) or tab.
pb.bdq.inputformat.text.qualifier.1	Text qualifiers, if any, in the columns or fields of the input file.
pb.bdq.inputformat.file.header.1	Column headers as comma-separated values. For example, business name, id, and domain.
pb.bdq.inputformat.skip.firstrow.1	If the first row is to be skipped from processing. The values can be `True` or `False`, where `True` indicates skip.
These rows describe details of the third input file.
pb.bdq.inputfile.path.2	The path where you have placed the input file on HDFS. For example, /home/hduser/input/input2.txt
textinputformat.record.delimiter.2	File record delimiter used in the text type input file. For example, `LINUX`, `MACINTOSH`, or `WINDOWS`
pb.bdq.inputformat.field.delimiter.2	Field or column delimiter used in the input file, such as comma (`,`) or tab.
pb.bdq.inputformat.text.qualifier.2	Text qualifiers, if any, in the columns or fields of the input file.
pb.bdq.inputformat.file.header.2	Column headers as comma-separated values. For example, business name, id, and domain.
pb.bdq.inputformat.skip.firstrow.2	If the first row is to be skipped from processing. The values can be `True` or `False`, where `True` indicates skip.

Table 2. joinerConfig
Parameter	Description
pb.bdq.job.type	This is a constant value that defines the job. The value for this job is: `Joiner`.
pb.bdq.job.name	Name of the job. Default is `JoinerSample`.
com.pb.bdq.dim.join.left.port	Json string for defining input File Index of left port.
com.pb.bdq.dim.join.type	Specify the type of join operation to be performed. Options are: `LeftOuter` `Full` `Inner`
com.pb.bdq.dim.join.col.0	Specify the columns to be joined, in comma separated format (,).
com.pb.bdq.dim.join.col.1	Specify the columns to be joined, in comma separated format (,).
com.pb.bdq.dim.join.col.2	Specify the columns to be joined, in comma separated format (,).

Table 3. mapReduceConfig
Specifies the MapReduce configuration parameters
Use this file to customize MapReduce parameters, such as mapreduce.map.memory.mb, mapreduce.reduce.memory.mb and mapreduce.map.speculative, as needed for your job.

Table 4. OutputFileConfig
Parameter	Description
pb.bdq.output.type	Specify if the output is in: `file`, `TEXT`, or `ORC` format.
pb.bdq.outputfile.path	The path where you want the output file to be generated on HDFS. For example, /user/hduser/sampledata/ joiner/output
pb.bdq.outputformat.field.delimiter	Field or column delimiter in the output file, such as comma (`,`) or tab.
pb.bdq.output.overwrite	For a `true` value, the output folder is overwritten every time job is run.
pb.bdq.outputformat.headerfile.create	Specify `true`, if the output file needs to have a header.