Configuration Files

These tables describe the parameters and the values you need to specify before you run the Validate Address job.

Table 1. inputFileConfig
Parameter Description
pb.bdq.input.type Input file type. The values can be: TEXT, ORC or PARQUET.
pb.bdq.inputfile.path The path where you have placed the input file on HDFS. For example, /home/hadoop/uamus.txt
textinputformat.record.delimiter File record delimiter used in the text type input file. For example, LINUX, MACINTOSH, or WINDOWS
pb.bdq.inputformat.field.delimiter Field or column delimiter used in the input file, such as comma (,) or tab.
pb.bdq.inputformat.text.qualifier Text qualifiers, if any, in the columns or fields of the input file.
pb.bdq.inputformat.file.header Comma-separated value of the headers used in the input file.
pb.bdq.inputformat.skip.firstrow If the first row is to be skipped from processing. The values can be True or False, where True indicates skip.
Table 2. uamusConfig
Parameter Description
pb.bdq.job.type This is a constant value that defines the job. The value for this job is: UniversalAddressingValidate.
pb.bdq.job.name Name of the job. Default is UAMUniversalAddressingSample.
pb.bdq.reference.data The path where you have placed the reference data. For example, {"dataDir":"/home/hduser/ReferenceData/ AddressQuality/UAM-US", "referenceDataPathLocation":"LocaltoDataNodes"}
pb.bdq.uam.universaladdress.input.configuration Json string to define input configurations, such as, Process Type, Elements of Output Address, and Number of the Report Lists.
pb.bdq.uam.universaladdress.general.configuration Json string to define general configurations, such as, the File Type, Memory Model, and Suitelink Memory Model.
pb.bdq.uam.universaladdress.cobol.runtime The cobol runtime directory path. For example, /home/hduser/PBSpectrum_BigDataSDK/SDK/ runtime
pb.bdq.uam.universaladdress.modules.dir Path where the modules directory resides. For example, /home/hduser/PBSpectrum_BigDataSDK/ SDK/modules
pb.bdq.uam.universaladdress.dpv.db.path The path where Delivery Point Validation (DPV) database resides. For example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data
Note: This parameter is optional.
pb.bdq.uam.universaladdress.ews.db.path The path of the Early Warning System (EWS) database. For example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data
Note: This parameter is optional.
pb.bdq.uam.universaladdress.lacs.db.path The path where Locatable Address Conversion System (LACS) database resides. For example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data
Note: This parameter is optional.
pb.bdq.uam.universaladdress.rdi.db.path The path where Residential Delivery Indicator (RDI) database resides. For example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data
Note: This parameter is optional.
pb.bdq.uam.universaladdress.suitelink.db.path The suitelink database path. For example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data
Note: This parameter is optional.
pb.bdq.job.report.create Specify true, if you want to generate a report on successful completion.
Table 3. uamusConfigHDFSRefData(DataDownloader)
Parameter Description
pb.bdq.job.type This is a constant value that defines the job. The value for this job is: UniversalAddressingValidate.
pb.bdq.job.name Name of the job. Default is UAMUniversalAddressingSample.
pb.bdq.reference.data Path of reference data on HDFS and the data downloader path. For example, {"referenceDataPathLocation":"HDFS", "dataDir":"/user/root/ReferenceData/UAM-US", "dataDownloader":{"dataDownloader":"HDFS", "localFSRepository":"/opt/PitneyBowes/ ReferenceData/UAM-US"}}
pb.bdq.uam.universaladdress.input.configuration Json string to define input configurations, such as, Process Type, Elements of Output Address, and Number of the Report Lists.
pb.bdq.uam.universaladdress.general.configuration Json string to define general configurations, such as, the File Type, Memory Model, and Suitelink Memory Model.
pb.bdq.uam.universaladdress.cobol.runtime The cobol runtime directory path. For example, /home/hduser/PBSpectrum_BigDataSDK/SDK/ runtime
pb.bdq.uam.universaladdress.modules.dir Path where the modules directory resides. For example, /home/hduser/PBSpectrum_BigDataSDK/ SDK/modules
pb.bdq.uam.universaladdress.dpv.db.path The path where Delivery Point Validation (DPV) database resides. For example, hdfs:///user/hduser/ReferenceData/ AddressQuality/UAM/Data.zip
Note: This parameter is optional.
pb.bdq.uam.universaladdress.ews.db.path The path of the Early Warning System (EWS) database. For example, hdfs:///user/hduser/ReferenceData/ AddressQuality/UAM/Data.zip
Note: This parameter is optional.
pb.bdq.uam.universaladdress.lacs.db.path The path where Locatable Address Conversion System (LACS) database resides. For example, hdfs:///user/hduser/ReferenceData/ AddressQuality/UAM/Data.zip
Note: This parameter is optional.
pb.bdq.uam.universaladdress.rdi.db.path The path where Residential Delivery Indicator (RDI) database resides. For example, hdfs:///user/hduser/ReferenceData/ AddressQuality/UAM/RDI.zip
Note: This parameter is optional.
pb.bdq.job.report.create Specify true, if you want to generate a report on successful completion.
Table 4. uamusConfigDistributedCache
Parameter Description
pb.bdq.job.type This is a constant value that defines the job. The value for this job is: UniversalAddressingValidate.
pb.bdq.job.name Name of the job. Default is UAMUniversalAddressingSample.
pb.bdq.reference.data Path of the reference data on HDFS and the type of data downloader. For example, {"dataDir":"/user/hduser/ReferenceData/ AddressQuality/UAM", "referenceDataPathLocation":"HDFS", "dataDownloader":{"dataDownloader":"DC"}}
pb.bdq.uam.universaladdress.input.configuration Json string to define input configurations, such as, Process Type, Elements of Output Address, and Number of the Report Lists.
pb.bdq.uam.universaladdress.general.configuration Json string to define general configurations, such as, the File Type, Memory Model, and Suitelink Memory Model.
pb.bdq.uam.universaladdress.acushare.license The path where you have placed the acushare license file. For example, /home/hduser/runcbl.alc
pb.bdq.uam.universaladdress.acushare.service A true value indicates that acushare service is running.
pb.bdq.uam.universaladdress.unix.version Specifies the Unix version of your cluster node. For example, REDHAT7.
pb.bdq.uam.universaladdress.cobol.runtime The cobol runtime directory path. For example, /home/hduser/PBSpectrum_BigDataSDK/SDK/ runtime
pb.bdq.uam.universaladdress.modules.dir Path where the modules directory resides. For example, /home/hduser/PBSpectrum_BigDataSDK/ SDK/modules
pb.bdq.uam.universaladdress.dpv.db.path The path where Delivery Point Validation (DPV) database resides. For example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data
Note: This parameter is optional.
pb.bdq.uam.universaladdress.ews.db.path The path of the Early Warning System (EWS) database. For example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data
Note: This parameter is optional.
pb.bdq.uam.universaladdress.lacs.db.path The path where Locatable Address Conversion System (LACS) database resides. For example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data
Note: This parameter is optional.
pb.bdq.uam.universaladdress.rdi.db.path The path where Residential Delivery Indicator (RDI) database resides. For example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data
Note: This parameter is optional.
pb.bdq.uam.universaladdress.suitelink.db.path The suitelink database path. For example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data
Note: This parameter is optional.
pb.bdq.job.report.create Specify true, if you want to generate a report on successful completion.
Table 5. mapReduceConfig
Specifies the MapReduce configuration parameters
Use this file to customize MapReduce parameters, such as mapreduce.map.memory.mb, mapreduce.reduce.memory.mb and mapreduce.map.speculative, as needed for your job.
Table 6. outputFileConfig
Parameter Description
pb.bdq.output.type Specify if the output is in: TEXT, ORC, or PARQUET format.
pb.bdq.outputfile.path The path where you want the output file to be generated on HDFS. For example, /home/hadoop/output.
pb.bdq.outputformat.field.delimiter Field or column delimiter in the output file, such as comma (,) or tab.
pb.bdq.output.overwrite For a true value, the output folder is overwritten every time job is run.
pb.bdq.outputformat.headerfile.create Specify true, if the output file needs to have a header.
pb.bdq.job.print.counters.console If the counters are printed on console or in a file. True indicates counters are printed on the console
pb.bdq.job.counter.file.path Path and the name of the file to which the counters are to be printed. You need to specify this if value in the pb.bdq.job.print.counters.console is false.
Properties of Parquet file  
parquet.compression The compression algorithm used to compress pages. It is one of these: UNCOMPRESSED, SNAPPY, GZIP, or LZO.

Default is UNCOMPRESSED.

parquet.block.size The size of a row group being buffered in memory.

Larger values improve the I/O when reading but consume more memory when writing.

Default size is 134217728 bytes (= 128 * 1024 * 1024)

parquet.page.size Page constitutes block and is the smallest unit that must be read fully to access a single record.
Default size is 1048576 bytes (= 1 * 1024 * 1024)
Note: A very small page size results in deterioration of compression.
parquet.dictionary.page.size Default size is 1048576 bytes (= 1 * 1024 * 1024)
parquet.enable.dictionary The boolean value (True or False) to enable or disable dictionary encoding. Default is True
parquet.validation Default boolean value is False.
parquet.writer.version Specifies the version of writer. It should be PARQUET_1_0 or PARQUET_2_0. Default is PARQUET_1_0.
parquet.writer.max-padding Default to no padding, 0% of the row group size
parquet.page.size.check.estimate Default boolean value is True
parquet.page.size.row.check.min Default is 100
parquet.page.size.row.check.max Default is 10000