Using a Hive UDF of Global Addressing Module

To run each Hive UDF job, you can either run these steps individually on your Hive client within a single session, or create an HQL file compiling all the required steps sequentially and run it in one go.

  1. In your Hive client, log in to the required Hive database.
  2. Register the JAR file of Spectrum™ Data & Address Quality for Big Data SDK GAM Module.

    ADD JAR /home/hduser/gam/ gam.globaladdressvalidation.hive.${project.version}.jar

  3. Create an alias for the Hive UDF of the Address Quality job you want to run.
    Note: String in quotes represents the class names needed for this job to run.
    For example:
    CREATE TEMPORARY FUNCTION addressvalidation as 'com.pb.bdq.gam.process.hive.addressvalidation.AddressValidationUDF';
  4. Enable or disable the hive fetch task conversion.
    For Example:
    set hive.fetch.task.conversion=none;
  5. Use hivevar:engineconf to set the engine configurations, such as, Database Path, Country Code, and the Process Type.
    For Example:
    set hivevar:engineconf='{"productDatabaseInfoList":
                            [{"dbPath":"/home/hduser/ReferenceData/AddressQuality/GAM/GGB062017",
                            "countryCode":"GBR","processType":"VALIDATE"}]}';
  6. Specify the settings for the input data using the hivevar:inputoption parameter.
    For example:
    set hivevar:inputoption='{"casing":"Mixed","matchMode":"Relaxed",
                            "defaultCountry":"GBR","maximumResults":2,"returnInputAddress":false,
                            "returnParsedAddress":false,"returnPrecisionCode":false,"mustMatchAddressNumber":false,
                            "mustMatchStreet":false,"mustMatchCity":false,"mustMatchLocality":false,
                            "mustMatchState":false,"mustMatchStateProvince":false,"mustMatchPostCode":false,
                            "keepMultiMatch":true,"preferPostalOverCity":false,"cityFallback":true,
                            "postalFallback":true,"validationLevel":"ADDRESS"}';
  7. Specify the header fields of the input table in comma-separated format, and assign to a variable or configuration property.
    set hivevar:header='inputkeyvalue,AddressLine1,AddressLine2,City,postalcode,
    StateProvince,firmname,Country';
  8. To run the job and display the job output on the console, write the query as indicated in this example:
    Note: This query returns a map of key value pairs containing output fields for each row.
    SELECT tmp2.record["HouseNumber"],tmp2.record["Confidence"],tmp2.record["AddressLine1"],
    tmp2.record["StreetName"],tmp2.record["PostalCode"] FROM (SELECT  addressvalidation(${hivevar:engineconf},${hivevar:inputoption},
    ${hivevar:header},inputkeyvalue,addressline1,AddressLine2,city,postalcode,stateprovince,firmname,
    country)as mygp from address_validation) as addressgroup LATERAL VIEW explode(addressgroup.mygp)tmp2 as record;
    To run the job and dump the job output in a designated file, write the query as indicated in this example:
    INSERT OVERWRITE LOCAL DIRECTORY '/home/hduser/GlobalAddressValidation/'row format delimited 
    FIELDS TERMINATED BY ',' lines terminated by '\n' STORED AS TEXTFILE SELECT * FROM (SELECT addressvalidation(${hivevar:engineconf},
    ${hivevar:inputoption},${hivevar:header},inputkeyvalue,addressline1,AddressLine2,city,postalcode,
    stateprovince,firmname,country)as mygp from address_validation) as addressgroup LATERAL VIEW explode(addressgroup.mygp) tmp2 as record;
    Note: Use the alias defined earlier for the UDF.