Configuration Files

These tables describe the parameters and the values you need to specify before you run the Validate Address Global job.

Table 1. inputFileConfig
Parameter	Description
pb.bdq.input.type	Input file type. The values can be: `TEXT`, `ORC` or `PARQUET`.
pb.bdq.inputfile.path	The path where you have placed the input file on HDFS. For example, /user/hduser/sampledata/addressing/ input/global/Global_Address.txt
textinputformat.record.delimiter	File record delimiter used in the text type input file. For example, `LINUX`, `MACINTOSH`, or `WINDOWS`
pb.bdq.inputformat.field.delimiter	Field or column delimiter used in the input file, such as comma (`,`) or tab.
pb.bdq.inputformat.text.qualifier	Text qualifiers, if any, in the columns or fields of the input file.
pb.bdq.inputformat.file.header	Comma-separated value of the headers used in the input file.
pb.bdq.inputformat.skip.firstrow	If the first row is to be skipped from processing. The values can be `True` or `False`, where `True` indicates skip.

Table 2. globalAddressingConfig
Parameter	Description
pb.bdq.job.type	This is a constant value that defines the job. The value for this job is: `GlobalAddressingValidate`.
pb.bdq.job.name	Name of the job. Default is `GlobalAddressingValidateSample`.
pb.bdq.reference.data	The path where you have placed the reference data. For example, {"dataDir":"/home/hduser/ReferenceData/ AddressQuality/Global", "referenceDataPathLocation":"LocaltoDataNodes"}
pb.bdq.uam.global.engine.configurations.preload	Preload type in the global engine configuration. The values can be: `NONE`, `FULL`, or `PARTIAL`.
pb.bdq.uam.global.engine.configurations.database.type	Database Type in the global engine configuration. The values can be`BATCH_INTERACTIVE`, `FASTCOMPLETION`, or `CERTIFIED`.
pb.bdq.uam.global.engine.configurations.supported.countries	Supported countries for global address validation job, such as United States Of America, Great Britain, and Canada. Note: You can specify multiple countries as comma-separated values.
pb.bdq.uam.global.input.configuration	Json string to define the input configurations, such as, Match Mode, Default Country, Maximum Results, Result Casing, State Province Type, and Optimization Level.
pb.bdq.uam.global.general.configuration	Json string to define general configuration, such as Cache Size, Maximum Thread Count, and Maximum Limit of Memory Usage.
pb.bdq.uam.global.unlockCode	Code to unlock data in the database.

Table 3. globalAddressingConfigHDFSRefData(DataDownloader)
Parameter	Description
pb.bdq.job.type	This is a constant value that defines the job. The value for this job is: `GlobalAddressingValidate`.
pb.bdq.job.name	Name of the job. Default is `GlobalAddressingValidateSample`.
pb.bdq.reference.data	Path of reference data on HDFS and the data downloader path. For example, {"referenceDataPathLocation":"HDFS", "dataDir":"/user/root/ReferenceData/Global/Global.zip", "dataDownloader":{"dataDownloader":"HDFS", "localFSRepository":"/opt/PitneyBowes/ ReferenceData/GlobalAddress"}}
pb.bdq.uam.input.groupby.region	Specifies if the input address data should be grouped by region, such as APAC, EMEA, and America. A `true` value indicates grouping. Note: This parameter is applicable only if you have placed your reference data on HDFS.
pb.bdq.uam.global.engine.configurations.preload	Preload type in the global engine configuration. The values can be: `NONE`, `FULL`, or `PARTIAL`.
pb.bdq.uam.global.engine.configurations.database.type	Database Type in the global engine configuration. The values can be`BATCH_INTERACTIVE`, `FASTCOMPLETION`, or `CERTIFIED`.
pb.bdq.uam.global.engine.configurations.supported.countries	Supported countries for global address validation job, such as United States Of America, Great Britain, and Canada. Note: You can specify multiple countries as comma-separated values.
pb.bdq.uam.global.input.configuration	Json string to define the input configurations, such as, Match Mode, Default Country, Maximum Results, Result Casing, State Province Type, and Optimization Level.
pb.bdq.uam.global.general.configuration	Json string to define general configuration, such as Cache Size, Maximum Thread Count, and Maximum Limit of Memory Usage.
pb.bdq.uam.global.unlockCode	Code to unlock data in the database.

Table 4. globalAddressingConfigDistributedCache
Parameter	Description
pb.bdq.job.type	This is a constant value that defines the job. The value for this job is: `GlobalAddressingValidate`.
pb.bdq.job.name	Name of the job. Default is `GlobalAddressingValidateSample`.
pb.bdq.reference.data	Path of the reference data on HDFS and the type of data downloader. For example, {"dataDir":"/home/hduser/ReferenceData/ AddressQuality/Global", "referenceDataPathLocation":"HDFS", "dataDownloader":{"dataDownloader":"DC"}}
pb.bdq.uam.global.engine.configurations.preload	Preload type in the global engine configuration. The values can be: `NONE`, `FULL`, or `PARTIAL`.
pb.bdq.uam.global.engine.configurations.database.type	Database Type in the global engine configuration. The values can be`BATCH_INTERACTIVE`, `FASTCOMPLETION`, or `CERTIFIED`.
pb.bdq.uam.global.engine.configurations.supported.countries	Supported countries for global address validation job, such as United States Of America, Great Britain, and Canada. Note: You can specify multiple countries as comma-separated values.
pb.bdq.uam.global.input.configuration	Json string to define the input configurations, such as, Match Mode, Default Country, Maximum Results, Result Casing, State Province Type, and Optimization Level.
pb.bdq.uam.global.general.configuration	Json string to define general configuration, such as Cache Size, Maximum Thread Count, and Maximum Limit of Memory Usage.
pb.bdq.uam.global.unlockCode	Code to unlock data in the database.

Table 5. mapReduceConfig
Specifies the MapReduce configuration parameters
Use this file to customize MapReduce parameters, such as mapreduce.map.memory.mb, mapreduce.reduce.memory.mb and mapreduce.map.speculative, as needed for your job.

Table 6. outputFileConfig
Parameter	Description
pb.bdq.output.type	Specify if the output is in: `TEXT`, `ORC`, or `PARQUET` format.
pb.bdq.outputfile.path	The path where you want the output file to be generated on HDFS. For example, `/user/hduser/sampledata/addressing/output/global`.
pb.bdq.outputformat.field.delimiter	Field or column delimiter in the output file, such as comma (`,`) or tab.
pb.bdq.output.overwrite	For a `true` value, the output folder is overwritten every time job is run.
pb.bdq.outputformat.headerfile.create	Specify `true`, if the output file needs to have a header.
pb.bdq.job.print.counters.console	If the counters are printed on console or in a file. `True` indicates counters are printed on the console
pb.bdq.job.counter.file.path	Path and the name of the file to which the counters are to be printed. You need to specify this if value in the pb.bdq.job.print.counters.console is `false`.
Properties of Parquet file
parquet.compression	The compression algorithm used to compress pages. It is one of these: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, or `LZO`. Default is `UNCOMPRESSED`.
parquet.block.size	The size of a row group being buffered in memory. Larger values improve the I/O when reading but consume more memory when writing. Default size is 134217728 bytes (= 128 * 1024 * 1024)
parquet.page.size	Page constitutes block and is the smallest unit that must be read fully to access a single record. Default size is 1048576 bytes (= 1 * 1024 * 1024) Note: A very small page size results in deterioration of compression.
parquet.dictionary.page.size	Default size is 1048576 bytes (= 1 * 1024 * 1024)
parquet.enable.dictionary	The boolean value (`True` or `False`) to enable or disable dictionary encoding. Default is `True`
parquet.validation	Default boolean value is `False`.
parquet.writer.version	Specifies the version of writer. It should be `PARQUET_1_0` or `PARQUET_2_0`. Default is `PARQUET_1_0`.
parquet.writer.max-padding	Default to no padding, 0% of the row group size
parquet.page.size.check.estimate	Default boolean value is `True`
parquet.page.size.row.check.min	Default is 100
parquet.page.size.row.check.max	Default is 10000