Configuration Files

These tables describe the parameters and the values you need to specify before you run the Advanced Transformer job.

Table 1. inputFileConfig
Parameter Description
pb.bdq.input.type Input file type. The values can be: file, TEXT, or ORC.
pb.bdq.inputfile.path The path where you have placed the input file on HDFS. For example, /user/hduser/sampledata/advancedtransformer/input/ AdvancedTransformer_Input.txt
textinputformat.record.delimiter File record delimiter used in the text type input file. For example, LINUX, MACINTOSH, or WINDOWS
pb.bdq.inputformat.field.delimiter Field or column delimiter used in the input file, such as comma (,) or tab.
pb.bdq.inputformat.text.qualifier Text qualifiers, if any, in the columns or fields of the input file.
pb.bdq.inputformat.file.header Comma-separated value of the headers used in the input file.
pb.bdq.inputformat.skip.firstrow If the first row is to be skipped from processing. The values can be True or False, where True indicates skip.
pb.bdq.inputfile.field.mapping Maps the values of the headers used in the input file with their updated values.
Note: This is an optional parameter.
Table 2. advancedTransformerConfig
Parameter Description
pb.bdq.job.type This is a constant value that defines the job. The value for this job is: AdvTransformer.
pb.bdq.job.name Name of the job. Default is AdvanceTransformerSample.
pb.bdq.dnm.advtransformer.configuration Json string for defining advance transformer configuration. It specifies details, such as the source input field to be evaluated for scan and split, the output field where you want to put the extracted data, any special characters that you want to tokenize, and the type of extraction to be performed.
pb.bdq.reference.data The path where you have placed the reference data. For example, {"referenceDataPathLocation": "LocaltoDataNodes","dataDir":" /home/data/referenceData"}
Table 3. advancedTransformerConfigHDFSRefData(DataDownloader)
Parameter Description
pb.bdq.job.type This is a constant value that defines the job. The value for this job is: AdvTransformer.
pb.bdq.job.name Name of the job. Default is AdvanceTransformerSample.
pb.bdq.dnm.advtransformer.configuration Json string for defining advance transformer configuration. It specifies details, such as the source input field to be evaluated for scan and split, the output field where you want to put the extracted data, any special characters that you want to tokenize, and the type of extraction to be performed.
pb.bdq.reference.data Path of reference data on HDFS and the data downloader path. For example, {"referenceDataPathLocation":"HDFS", "dataDir":"/home/data/dm/referenceData", "dataDownloader":{"dataDownloader":"HDFS", "localFSRepository":"/local/download"}}
Table 4. advancedTransformerConfigDistributedCache
Parameter Description
pb.bdq.job.type This is a constant value that defines the job. The value for this job is: AdvTransformer.
pb.bdq.job.name Name of the job. Default is AdvanceTransformerSample.
pb.bdq.dnm.advtransformer.configuration Json string for defining advance transformer configuration. It specifies details, such as the source input field to be evaluated for scan and split, the output field where you want to put the extracted data, any special characters that you want to tokenize, and the type of extraction to be performed.
pb.bdq.reference.data Path of the reference data on HDFS and the type of data downloader. For example, {"referenceDataPathLocation":"HDFS", "dataDir":"/home/data/dm/referenceData", "dataDownloader":{"dataDownloader":"DC"}}
Table 5. mapReduceConfig
Specifies the MapReduce configuration parameters
Use this file to customize MapReduce parameters, such as mapreduce.map.memory.mb, mapreduce.reduce.memory.mb and mapreduce.map.speculative, as needed for your job.
Table 6. OutputFileConfig
Parameter Description
pb.bdq.output.type Specify if the output is in: file, TEXT, or ORC format.
pb.bdq.outputfile.path The path where you want the output file to be generated on HDFS. For example, /user/hduser/sampledata/ advancedtransformer/output
pb.bdq.outputformat.field.delimiter Field or column delimiter in the output file, such as comma (,) or tab.
pb.bdq.output.overwrite For a true value, the output folder is overwritten every time job is run.
pb.bdq.outputformat.headerfile.create Specify true, if the output file needs to have a header.