Geocode UDF
Description
Geocode function accepts an address as an argument and returns
geocoding attributes such as longitude, latitude, or a point geometry. Function Registration
create function Geocode as 'com.pb.bigdata.geocoding.hive.Geocode';
Syntax
Geocode(String mainAddressLine, String country, [Map<String, String> config])
Geocode(String mainAddressLine, String addressLastLine, String country, [Map<String, String> config])
Geocode(String addressNumber, String streetName, String unitValue, String areaName3, String areaName1, String postCode1, String country, [Map<String, String> config])
Geocode(Map<String, String> input, [Map<String, String> config])
Parameters
Address Fields
| Parameter | Type | Description |
|---|---|---|
| mainAddressLine | String | the street address or a single line address |
| addressLastLine | String | the location level information like city and postal code |
| addressNumber | String | the house number information |
| streetName | String | the street name |
| unitType | String | the unit type (such as Apt. or Suite) |
| unitValue | String | the unit value |
| areaName1 | String | the largest geographic division in an area, such as a state, province, or region |
| areaName2 | String | the secondary geographic division in an area, such as a county or district |
| areaName3 | String | the city or town name |
| areaName4 | String | the smallest geographic division in an area, such as a city subdivision or locality |
| postCode1 | String | the primary postal code |
| postCode2 | String | the secondary postal code |
| placeName | String | the name of a business, building, or location |
| country | String | the ISO-3166 two- or three-character abbreviation for the country, or the country name |
Input and Configuration Maps
| Parameter | Type | Description |
|---|---|---|
| input | Map | key-value pairs of address fields and input values. The keys must be constant values from the list of address fields in the table above. The values can be either constants or column references. |
| config | Map | can be used for setting both Hive properties and Preferences
Hive Properties Options that set the Hive Variables. Note: Any values specified here will override values set
as Hive properties. All values must be constant. If a
key is present but has a null value, the property is
cleared.Preferences settings that control the matching and geocoding criteria, return values and fieldsNote: Any
values specified here will override values set in the geocode
preferences file. All values must be constant. If a key is present but has a null value, the preference is cleared. For more information, see Geocoding Preferences. |
mapreduce.task.timeout property.Return Values
This function returns struct values described in Hive Output Fields.
Examples
In order to use this UDF, you must set the required options either as Hive variables or provided in the optional config map parameter.
All the examples below are using the following output fields: X, Y, formattedStreetAddress, formattedLocationAddress
Single Geocode returning structAll these queries return the same response, a struct containing all the requested output fields.
SELECT geocode("600 Maryland Ave SW, Washington, DC 20002", "USA") AS result;
SELECT geocode(map("mainAddressLine", "600 Maryland Ave SW", "addressLastLine", "Washington, DC 20002", "country","USA")) AS result;
Output:
| Result |
|---|
{"x":"-77.020707","y":"38.886115","formattedstreetaddress":"600
MARYLAND AVE SW","formattedlocationaddress":"WASHINGTON, DC
20024-2520","error":null}
|
Use a built-in, table-generating function called inline to turn the struct into columns.
SELECT inline(array(geocode("600 Maryland Ave SW, Washington, DC 20002", "USA")));
Output:
| X | Y | Street Address | Location Address | Error |
|---|---|---|---|---|
| -77.020707 | 38.886115 | 600 MARYLAND AVE SW | WASHINGTON, DC 20024-2520 | null |
SELECT result.x AS x, result.y AS y FROM (SELECT inline(array(geocode("600 Maryland Ave SW, Washington, DC 20002","USA")))) result;
Output:
| X | Y |
|---|---|
| -77.020707 | 38.886115 |
SELECT customer_id, customer_name, customer_phone, geocode("", t.address, "", t.city, t.state, t.postcode, t.country) AS geocoded_result FROM customers t;
Output:
| ID | Name | Phone | Result |
|---|---|---|---|
| 0000000001 | John Michael Dorian | +(1)-(555)-6647575 | {"x":"-79.794777","y":"34.191205","formattedstreetaddress":"1205 W PALMETTO ST","formattedlocationaddress":"FLORENCE, SC 29501-4131","pb_key":"P0000KPW08AT","error":null} |
| 0000000002 | Elliot Reid | +(1)-(555)-5406548 | {"x":"-122.393078","y":"37.753697","formattedstreetaddress":"1107 PENNSYLVANIA AVE","formattedlocationaddress":"SAN FRANCISCO, CA 94107-3413","pb_key":"P00002T1L8H7","error":null} |
| 0000000003 | Jan Itor | +(1)-(555)-3925210 | {"x":"-76.30457","y":"40.045757","formattedstreetaddress":"478 N DUKE ST","formattedlocationaddress":"LANCASTER, PA 17602-4967","pb_key":"P0000JCCUDFI","error":null} |
SELECT customer_id, customer_name, customer_phone, result.* FROM customers LATERAL
VIEW OUTER inline(array(geocode("", address, "", city, state, postcode, country))) result;
Output:
| ID | Name | Phone | X | Y | Street Address | Location Address | Error |
|---|---|---|---|---|---|---|---|
| 0000000001 | John Michael Dorian | (555)-6647575 | -79.794777 | 34.191205 | 1205 W PALMETTO ST | FLORENCE, SC 29501-4131 | null |
| 0000000002 | Elliot Reid | (555)-5406548 | -122.393078 | 37.753697 | 1107 PENNSYLVANIA AVE | SAN FRANCISCO, CA 94107-3413 | null |
| 0000000003 | Jan Itor | (555)-3925210 | -76.30457 | 40.045757 | 478 N DUKE ST | LANCASTER, PA 17602-4967 | null |
CREATE TABLE customers_geocoded AS SELECT customer_id, customer_name, customer_phone, result.x, result.y, result.formattedStreetAddress, result.formattedLocationAddress FROM customers LATERAL VIEW OUTER inline(array(geocode("", address, "", city, state, postcode, country))) result;
Table Geocode using optional config map syntax
SELECT geocode(map('mainAddressLine', t.address,'areaName3', t.city, 'areaName1',
t.state, 'postCode1', t.zipcode,'country','USA'),
map('pb.geocoding.resources.location', 'hdfs:///precisely/geocoding/software/resources/',
'pb.geocoding.output.fields', 'formattedStreetAddress,formattedLocationAddress,precisionCode,PB_KEY',
'matchMode', 'RELAXED',
'returnAllCandidateInfo', 'true')) FROM myTable t;