Data and Address Quality for Big Data SDK
Reference data strategy
- Download it to the current working directory: The reference data gets downloaded to your working directory as temporary files. Every time your job is completed, these files are deleted from the working directory, making fresh download of reference data mandatory for each job.
- Download data to a local path: The reference data is downloaded to a local data path you specify, and it remains available for all the jobs till the data gets refreshed on HDFS.
Reference data for UAM, GAM on HDFS
- Universal Addressing Module (except Validate Address Loqate)
- Global Addressing Module
Silent extraction of reference data
silentInstalldb_unc.sh
. The script accepts arguments one time and
extracts databases on your machine outside of an interactive process.sh installdb_unc.sh
, if needed.Open Parser job
You can now use Open Parser job of Data Normalization Module to define a parsing grammar and apply it to parse your input data strings.
For more information about this job, see the section on Open Parser in the Data Normalization Module jobs.
Data Integration jobs
- Joiner job: Use this Data Integration Module job to perform SQL-style JOIN operation to combine records from multiple files.
- Custom Groovy script job: Use this Data Integration Module job to transform input
fields based on the defined groovy script. Note: You can create and run this job with Hive UDFs as well.
Global Address Validation job
For standardizing and validating international addresses outside the United States, you can now use the Global Address Validation job of Global Addressing Module. You can create a job using the Java API with either MapReduce or Spark, or with Hive UDFs.
Support to S3 native filesystem
The Amazon S3 native filesystem (s3n) client is now available in the Hadoop
MapReduce and Spark jobs. You can store and access your
input and output files on s3n. You need to provide the path in a specified
format as parameter to the filepath
sub-class and use it your job.
Acushare service setup
You can now perform a silent Acushare service setup. For this, you need to copy
the script from the Spectrum™ Data & Address Quality for Big Data SDK
installation path to any location on the node and provide the service installation path in the
installer.properties
file.
For more information, see the section Running Acushare service in the Data and Address Quality For Big Data SDK Guide.