Configuration Files
These tables describe the parameters and the values you need to specify before you run the Validate Address job.
Parameter | Description |
---|---|
pb.bdq.input.type | Input file type. The values can be: TEXT, ORC or PARQUET. |
pb.bdq.inputfile.path | The path where you have placed the input file on HDFS. For example, /home/hadoop/uamus.txt |
textinputformat.record.delimiter | File record delimiter used in the text type input file. For example, LINUX, MACINTOSH, or WINDOWS |
pb.bdq.inputformat.field.delimiter | Field or column delimiter used in the input file, such as comma (,) or tab. |
pb.bdq.inputformat.text.qualifier | Text qualifiers, if any, in the columns or fields of the input file. |
pb.bdq.inputformat.file.header | Comma-separated value of the headers used in the input file. |
pb.bdq.inputformat.skip.firstrow | If the first row is to be skipped from processing. The values can be True or False, where True indicates skip. |
Parameter | Description |
---|---|
pb.bdq.job.type | This is a constant value that defines the job. The value for this job is: UniversalAddressingValidate. |
pb.bdq.job.name | Name of the job. Default is UAMUniversalAddressingSample. |
pb.bdq.reference.data | The path where you have placed the reference data. For example, {"dataDir":"/home/hduser/ReferenceData/ AddressQuality/UAM-US", "referenceDataPathLocation":"LocaltoDataNodes"} |
pb.bdq.uam.universaladdress.input.configuration | Json string to define input configurations, such as, Process Type, Elements of Output Address, and Number of the Report Lists. |
pb.bdq.uam.universaladdress.general.configuration | Json string to define general configurations, such as, the File Type, Memory Model, and Suitelink Memory Model. |
pb.bdq.uam.universaladdress.cobol.runtime | The cobol runtime directory path. For example, /home/hduser/PBSpectrum_BigDataSDK/SDK/ runtime |
pb.bdq.uam.universaladdress.modules.dir | Path where the modules directory resides. For example, /home/hduser/PBSpectrum_BigDataSDK/ SDK/modules |
pb.bdq.uam.universaladdress.dpv.db.path | The path where Delivery Point Validation (DPV) database resides. For example,
/home/hduser/ReferenceData/ AddressQuality/UAM/Data Note: This parameter
is optional.
|
pb.bdq.uam.universaladdress.ews.db.path | The path of the Early Warning System (EWS) database. For example,
/home/hduser/ReferenceData/ AddressQuality/UAM/Data Note: This parameter
is optional.
|
pb.bdq.uam.universaladdress.lacs.db.path | The path where Locatable Address Conversion System (LACS) database resides. For
example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data Note: This
parameter is optional.
|
pb.bdq.uam.universaladdress.rdi.db.path | The path where Residential Delivery Indicator (RDI) database resides. For example,
/home/hduser/ReferenceData/ AddressQuality/UAM/Data Note: This parameter
is optional.
|
pb.bdq.uam.universaladdress.suitelink.db.path | The suitelink database path. For example,
/home/hduser/ReferenceData/ AddressQuality/UAM/Data Note: This parameter
is optional.
|
pb.bdq.job.report.create | Specify true, if you want to generate a report on successful completion. |
Parameter | Description |
---|---|
pb.bdq.job.type | This is a constant value that defines the job. The value for this job is: UniversalAddressingValidate. |
pb.bdq.job.name | Name of the job. Default is UAMUniversalAddressingSample. |
pb.bdq.reference.data | Path of reference data on HDFS and the data downloader path. For example, {"referenceDataPathLocation":"HDFS", "dataDir":"/user/root/ReferenceData/UAM-US", "dataDownloader":{"dataDownloader":"HDFS", "localFSRepository":"/opt/PitneyBowes/ ReferenceData/UAM-US"}} |
pb.bdq.uam.universaladdress.input.configuration | Json string to define input configurations, such as, Process Type, Elements of Output Address, and Number of the Report Lists. |
pb.bdq.uam.universaladdress.general.configuration | Json string to define general configurations, such as, the File Type, Memory Model, and Suitelink Memory Model. |
pb.bdq.uam.universaladdress.cobol.runtime | The cobol runtime directory path. For example, /home/hduser/PBSpectrum_BigDataSDK/SDK/ runtime |
pb.bdq.uam.universaladdress.modules.dir | Path where the modules directory resides. For example, /home/hduser/PBSpectrum_BigDataSDK/ SDK/modules |
pb.bdq.uam.universaladdress.dpv.db.path | The path where Delivery Point Validation (DPV) database resides. For example,
hdfs:///user/hduser/ReferenceData/ AddressQuality/UAM/Data.zip Note: This
parameter is optional.
|
pb.bdq.uam.universaladdress.ews.db.path | The path of the Early Warning System (EWS) database. For example,
hdfs:///user/hduser/ReferenceData/ AddressQuality/UAM/Data.zip Note: This
parameter is optional.
|
pb.bdq.uam.universaladdress.lacs.db.path | The path where Locatable Address Conversion System (LACS) database resides. For
example,
hdfs:///user/hduser/ReferenceData/ AddressQuality/UAM/Data.zip Note: This
parameter is optional.
|
pb.bdq.uam.universaladdress.rdi.db.path | The path where Residential Delivery Indicator (RDI) database resides. For example,
hdfs:///user/hduser/ReferenceData/ AddressQuality/UAM/RDI.zip Note: This
parameter is optional.
|
pb.bdq.uam.universaladdress.suitelink.db.path | The suitelink database path. For example,
hdfs:///user/hduser/ReferenceData/ AddressQuality/UAM/Data.zip Note: This
parameter is optional.
|
pb.bdq.job.report.create | Specify true, if you want to generate a report on successful completion. |
Parameter | Description |
---|---|
pb.bdq.job.type | This is a constant value that defines the job. The value for this job is: UniversalAddressingValidate. |
pb.bdq.job.name | Name of the job. Default is UAMUniversalAddressingSample. |
pb.bdq.reference.data | Path of the reference data on HDFS and the type of data downloader. For example, {"dataDir":"/user/hduser/ReferenceData/ AddressQuality/UAM", "referenceDataPathLocation":"HDFS", "dataDownloader":{"dataDownloader":"DC"}} |
pb.bdq.uam.universaladdress.input.configuration | Json string to define input configurations, such as, Process Type, Elements of Output Address, and Number of the Report Lists. |
pb.bdq.uam.universaladdress.general.configuration | Json string to define general configurations, such as, the File Type, Memory Model, and Suitelink Memory Model. |
pb.bdq.uam.universaladdress.acushare.license | The path where you have placed the acushare license file. For example, /home/hduser/runcbl.alc |
pb.bdq.uam.universaladdress.acushare.service | A true value indicates that acushare service is running. |
pb.bdq.uam.universaladdress.unix.version | Specifies the Unix version of your cluster node. For example, REDHAT7. |
pb.bdq.uam.universaladdress.cobol.runtime | The cobol runtime directory path. For example, /home/hduser/PBSpectrum_BigDataSDK/SDK/ runtime |
pb.bdq.uam.universaladdress.modules.dir | Path where the modules directory resides. For example, /home/hduser/PBSpectrum_BigDataSDK/ SDK/modules |
pb.bdq.uam.universaladdress.dpv.db.path | The path where Delivery Point Validation (DPV) database resides. For example,
/home/hduser/ReferenceData/ AddressQuality/UAM/Data Note: This parameter
is optional.
|
pb.bdq.uam.universaladdress.ews.db.path | The path of the Early Warning System (EWS) database. For example,
/home/hduser/ReferenceData/ AddressQuality/UAM/Data Note: This parameter
is optional.
|
pb.bdq.uam.universaladdress.lacs.db.path | The path where Locatable Address Conversion System (LACS) database resides. For
example, /home/hduser/ReferenceData/ AddressQuality/UAM/Data Note: This
parameter is optional.
|
pb.bdq.uam.universaladdress.rdi.db.path | The path where Residential Delivery Indicator (RDI) database resides. For example,
/home/hduser/ReferenceData/ AddressQuality/UAM/Data Note: This parameter
is optional.
|
pb.bdq.uam.universaladdress.suitelink.db.path | The suitelink database path. For example,
/home/hduser/ReferenceData/ AddressQuality/UAM/Data Note: This parameter
is optional.
|
pb.bdq.job.report.create | Specify true, if you want to generate a report on successful completion. |
Specifies the MapReduce configuration parameters |
---|
Use this file to customize MapReduce parameters, such as mapreduce.map.memory.mb, mapreduce.reduce.memory.mb and mapreduce.map.speculative, as needed for your job. |
Parameter | Description |
---|---|
pb.bdq.output.type | Specify if the output is in: TEXT, ORC, or PARQUET format. |
pb.bdq.outputfile.path | The path where you want the output file to be generated on HDFS. For example, /home/hadoop/output. |
pb.bdq.outputformat.field.delimiter | Field or column delimiter in the output file, such as comma (,) or tab. |
pb.bdq.output.overwrite | For a true value, the output folder is overwritten every time job is run. |
pb.bdq.outputformat.headerfile.create | Specify true, if the output file needs to have a header. |
pb.bdq.job.print.counters.console | If the counters are printed on console or in a file. True indicates counters are printed on the console |
pb.bdq.job.counter.file.path | Path and the name of the file to which the counters are to be printed. You need to specify this if value in the pb.bdq.job.print.counters.console is false. |
Properties of Parquet file | Â |
parquet.compression | The compression algorithm used to compress pages. It is one of these:
UNCOMPRESSED, SNAPPY,
GZIP, or LZO. Default is UNCOMPRESSED. |
parquet.block.size | The size of a row group being buffered in memory. Larger values improve the I/O when reading but consume more memory when writing. Default size is 134217728 bytes (= 128 * 1024 * 1024) |
parquet.page.size | Page constitutes block and is the smallest unit that must be read fully to access a
single record. Default size is 1048576 bytes (= 1 * 1024 * 1024)
Note: A very small page
size results in deterioration of compression.
|
parquet.dictionary.page.size | Default size is 1048576 bytes (= 1 * 1024 * 1024) |
parquet.enable.dictionary | The boolean value (True or False) to enable or disable dictionary encoding. Default is True |
parquet.validation | Default boolean value is False. |
parquet.writer.version | Specifies the version of writer. It should be PARQUET_1_0 or PARQUET_2_0. Default is PARQUET_1_0. |
parquet.writer.max-padding | Default to no padding, 0% of the row group size |
parquet.page.size.check.estimate | Default boolean value is True |
parquet.page.size.row.check.min | Default is 100 |
parquet.page.size.row.check.max | Default is 10000 |