Geocode UDF

Description

The Geocode function accepts an address as an argument and returns geocoding attributes such as longitude, latitude, or a point geometry.
Note: In order to use this UDF, you must set the required options either as Hive variables or provided in the optional config map parameter.

Function Registration

create function Geocode as 'com.pb.bigdata.geocoding.hive.Geocode';

Syntax

Geocode(String mainAddressLine, String country, [Map<String, String> config])
Geocode(String mainAddressLine, String addressLastLine, String country, [Map<String, String> config])
Geocode(String addressNumber, String streetName, String unitValue, String areaName3, String areaName1, String postCode1, String country, [Map<String, String> config])
Geocode(Map<String, String> input, [Map<String, String> config])

Parameters

Address Fields

Parameter Type Description
mainAddressLine String the street address or a single line address
addressLastLine String the location level information like city and postal code
addressNumber String the house number information
streetName String the street name
unitType String the unit type (such as Apt. or Suite)
unitValue String the unit value
areaName1 String the largest geographic division in an area, such as a state, province, or region
areaName2 String the secondary geographic division in an area, such as a county or district
areaName3 String the city or town name
areaName4 String the smallest geographic division in an area, such as a city subdivision or locality
postCode1 String the primary postal code
postCode2 String the secondary postal code
placeName String the name of a business, building, or location
country String the ISO-3166 two- or three-character abbreviation for the country, or the country name

Input and Configuration Maps

Parameter Type Description
input Map key-value pairs of address fields and input values. The keys must be constant values from the list of address fields in the table above. The values can be either constants or column references.
config Map can be used for setting both Hive properties and Preferences

Hive Properties

Options that set the Hive Variables.
Note: Any values specified here will override values set as Hive properties.
All values must be constant. If a key is present but has a null value, the property is cleared.

Preferences

settings that control the matching and geocoding criteria, return values and fields
Note: Any values specified here will override values set in the geocode preferences file.

All values must be constant. If a key is present but has a null value, the preference is cleared.

For more information, see Geocoding Preferences.
Note: The function may time out when using a large number of reference datasets that are distributed using HDFS or S3. If you are using Hive with the MapReduce engine, you can adjust the value of the mapreduce.task.timeout property.

Return Values

This function returns struct values described in Hive Output Fields.

Examples

In order to use this UDF, you must set the required options either as Hive variables or provided in the optional config map parameter.

All the examples below are using the following output fields: X, Y, formattedStreetAddress, formattedLocationAddress

Single Geocode returning struct

All these queries return the same response, a struct containing all the requested output fields.

SELECT geocode("600 Maryland Ave SW, Washington, DC 20002", "USA") AS result;
SELECT geocode(map("mainAddressLine", "600 Maryland Ave SW", "addressLastLine", "Washington, DC 20002", "country","USA")) AS result;

Output:

Result
{"x":"-77.020707","y":"38.886115","formattedstreetaddress":"600 MARYLAND AVE SW","formattedlocationaddress":"WASHINGTON, DC 20024-2520","error":null}
Single Geocode returning a single row of columns

Use a built-in, table-generating function called inline to turn the struct into columns.

SELECT inline(array(geocode("600 Maryland Ave SW, Washington, DC 20002", "USA")));

Output:

X Y Street Address Location Address Error
-77.020707 38.886115 600 MARYLAND AVE SW WASHINGTON, DC 20024-2520 null
Single Geocode returning particular fields
SELECT result.x AS x, result.y AS y FROM (SELECT inline(array(geocode("600 Maryland Ave SW, Washington, DC 20002","USA")))) result;

Output:

X Y
-77.020707 38.886115
Table Geocode with struct
SELECT customer_id, customer_name, customer_phone, geocode("", t.address, "", t.city, t.state, t.postcode, t.country) AS geocoded_result FROM customers t;

Output:

ID Name Phone Result
0000000001 John Michael Dorian +(1)-(555)-6647575 {"x":"-79.794777","y":"34.191205","formattedstreetaddress":"1205 W PALMETTO ST","formattedlocationaddress":"FLORENCE, SC 29501-4131","pb_key":"P0000KPW08AT","error":null}
0000000002 Elliot Reid +(1)-(555)-5406548 {"x":"-122.393078","y":"37.753697","formattedstreetaddress":"1107 PENNSYLVANIA AVE","formattedlocationaddress":"SAN FRANCISCO, CA 94107-3413","pb_key":"P00002T1L8H7","error":null}
0000000003 Jan Itor +(1)-(555)-3925210 {"x":"-76.30457","y":"40.045757","formattedstreetaddress":"478 N DUKE ST","formattedlocationaddress":"LANCASTER, PA 17602-4967","pb_key":"P0000JCCUDFI","error":null}
Table Geocode
SELECT customer_id, customer_name, customer_phone, result.* FROM customers LATERAL 
VIEW OUTER inline(array(geocode("", address, "", city, state, postcode, country))) result;

Output:

ID Name Phone X Y Street Address Location Address Error
0000000001 John Michael Dorian (555)-6647575 -79.794777 34.191205 1205 W PALMETTO ST FLORENCE, SC 29501-4131 null
0000000002 Elliot Reid (555)-5406548 -122.393078 37.753697 1107 PENNSYLVANIA AVE SAN FRANCISCO, CA 94107-3413 null
0000000003 Jan Itor (555)-3925210 -76.30457 40.045757 478 N DUKE ST LANCASTER, PA 17602-4967 null
Table Geocode into new table
CREATE TABLE customers_geocoded AS SELECT customer_id, customer_name, customer_phone, result.x, result.y, result.formattedStreetAddress, result.formattedLocationAddress FROM customers LATERAL VIEW OUTER inline(array(geocode("", address, "", city, state, postcode, country))) result;            

Table Geocode using optional config map syntax

SELECT geocode(map('mainAddressLine', t.address,'areaName3', t.city, 'areaName1', 
t.state, 'postCode1', t.zipcode,'country','USA'), 
map('pb.geocoding.resources.location', 'hdfs:///precisely/geocoding/software/resources/', 
'pb.geocoding.output.fields', 'formattedStreetAddress,formattedLocationAddress,precisionCode,PB_KEY',
'matchMode', 'RELAXED',                                                    
'returnAllCandidateInfo', 'true')) FROM myTable t;