Global Geocoding Module

Performance and scalability improvements

With the proper configuration, the Global Geocoding Module (GGM) now performs faster than the Enterprise Geocoding Module.

Enhance GGM performance by setting database resource values poolsize and POOL_MAX_ACTIVE equal to or double the number of CPUs.

poolsize (maxActive - for CLI users) =  CPU  or 2 * CPU
POOL_MAX_ACTIVE = poolsize

GGM Performance Configuration

Before proceeding, see these topics about tuning Spectrum performance:

GGM dataflow performance depends on multiple factors, including the following:

  • Machine configuration (number of CPU, available RAM): Higher CPU and memory helps in executing multiple threads of geocoders inside GGM, leading to higher performance.
  • Databases configured: Only configure those databases that need to be geocoded. Otherwise, the larger the configured databases are, the more memory GGM needs. This can eventually limit the number of threads that can be spawned on a given machine and limit performance.
  • Input addresses: Refer to Optimizing Geocoding Stages.
  • Dataflow runtime settings: Refer to Database Pool Size and Runtime Instances.
  • GGM database configuration : GGM has certain configuration parameters which control internal threads and associated pools. To determine optimal value for your environment, review and adjust these parameters as explained in the Spectrum Dataflow Designer Guide. Key parameters are highlighted below.

GGM Database configuration parameters

An administrator can configure a GGM database by importing SPD files via the Administration Utility commands. Once it has been imported, users can go to the Management Console and configure a GGM database using imported SPDs.

While configuring databases in the Add Database screen, an administrator can set the following properties to enhance GGM performance:

  • Pool size Database poolsize defines number of concurrent requests GGM can handle. Set it equal or double of current CPU to achieve better performance. (CLI users please refer to maxActive property.)
  • Select the Override advanced settings option.
  • MAXIMIZE_BATCH_SIZE The default is 100; this controls the maximum batch size for REST API.
  • POOL_MAX_ACTIVE The default is 16, this controls the Internal API pool size.

Create and use multiple geocoding databases

GGM now allows users to create multiple database resources. Each database resource can be configured with a different vintage, which allows for flexibility within dataflows. Users can test a new release by comparing it with an older release before promoting it.

Backwards compatibility: when older flows are imported, they will default to the latest database created on 2019.1.

Notes:

  • Before configuring and utilizing multiple database resources, confirm if there are any applications using the REST API from GGM. REST API calls will need to be modified to include the database name so it does not conflict with any new database resources.
  • If you want to use this new capability and you have existing data flows, review and update any data flow database resources. Otherwise, no changes to data flow database resources are necessary.

MLD Extended Attributes including APN and Elevation

(USA) This new feature provides access to extended attributes associated with an addressable location that has a pbKey. When matching addresses with Master Location Data (MLD), Spectrum now returns additional property information associated with the address, such as Assessor's Parcel Number (APN), Elevation, Address Type, and Lot Size. APN can be used to identify the parcel so the parcel ID can be linked to additional information for the insurance industry, such as property and insurance risk attributes. For more detail, see the full list below.

Requirements

The following are required to return MLD Extended Attributes in the Global Geocoding Module (GGM):

  • Master Location Data dataset.
  • Streets dataset.
  • MLD Extended Attributes dataset.
  • Recommendation: the vintages of the MLD and MLD Extended Attributes datasets be within 4 months of each other.
  • Within GGM, on the Return Values tab, select the Return all available information setting. Return values in Enterprise Designer, using Custom Fields.

MLD Extended Attributes Output Fields (optional)

Field

Description
AddressType

Address Type regarding number of units:

S – Single unit

M – Multiple units

P – Post Office box

X – Unknown

Apn Assessor’s parcel number.
IncorpPlaceInd

Incorporated Place Indicator.

I – Incorporated place

N – Not an incorporated place

X – Unknown

LotSize Lot size of the parcel expressed in square feet; 0 if none.
LotSizeMeters Lot size of the parcel expressed in square meters; 0 if none.
MECLatitude Latitude of Minimum Enclosing Circle expressed with an implied 6 digits of decimal precision; 0 if none.
MECLongitude Longitude of Minimum Enclosing Circle expressed with an implied 6 digits of decimal precision; 0 if none.
MECRadius Radius of Minimum Enclosing Circle (in square feet) expressed as a whole number. For example: 1234 means 1,234 feet.
MECRadiusMeters

Radius of Minimum Enclosing Circle (in meters) expressed with 1 digit of decimal precision.

Elevation Elevation above sea level (in feet) expressed with 1 digit of decimal precision. For example: 12.5 feet.
ResidentialBusiness

Usage Indicator:

R – Residential use

B – Business use

M – Mixed use – residential and business

X – Unknown use

TigerFaceID TIGER Face Identifier. This field can be used to match to all Census geocodes using external data; 0 if none.
TigerPlace TIGER Place code; 0 if none.
UrbanAreaID TIGER Urban Area Identifier. Defines the urban area if any; 0 if none.
UrbanAreaPop Census population of the urban area; 0 if none.
Urbanicity Urbanicity Indicator. An indicator that defines, according to the Census, the Urbanicity of the Address using TIGER UACE codes for categorization.

Added support for RDI

The Residential Delivery Indicator (RDI™) is a United States Postal Service (USPS®) data product that identifies whether a delivery type is classified as residential or business. If you are shipping to residences, you may lower costs by shipping with the Postal Service™ and avoid residential delivery surcharges typically charged by other shipping companies.

Note: To use RDI, Delivery Point Validation (DPV) must also be enabled and a US Streets dataset loaded.

Output Field Description
RDIRetCode

USPS Residential Delivery Indicator (RDI) return codes:

• Y = Residence

• N = Business

• Blank = Not processed through RDI.

See also: Improved dataset management for USPS® products (USA)

Reverse Geocoding: supports fallback to World geocoder

Reverse geocoding can now fallback to the World Geocoder for those countries that are included in the GGM database. The advantage of this feature is for points that do not return a point or street-level candidate, it can now return a geographic or postal-level centroid match.

Note: To enable this feature, configure the GEOCODING GEOCODING WORLD PLACES INT WORLD GLOBAL ALL GLB dataset in the database and select the Fallback to world geocoder option on the Reverse Geocoding tab within the Global Reverse Geocode stage.

New Input field: Addressline2

On the Global Geocode Stage: added the (optional) input field AddressLine2 which is supported for USA, AUS, CAN, and GBR. It is also a new field on the REST Service.

New Output field: MatchScore

This new custom field displays a score (0-100) of how well the input

compares to the candidate values for certain fields. It is used to better indicate what parts of

input address were changed to make the match. Fields checked include street name, house number,

directional, street type, unit number, place name, postal code, and area names 1, 3, and 4. A

lower match score indicates many input fields were changed.

New Output field: Confidence (USA Only)

This new field indicates the confidence in the output provided, from 0 to 100. The higher the score, the higher the probability that the match is correct. If the match is exact, the confidence score is 100. For all other matches, the confidence score is calculated based on which portions of the input address had to be changed to obtain a match.

If you have enabled the option to return centroids, the confidencevalue indicates the type of centroid returned:

  • 60: street centroid
  • 50: postal code centroid
  • 35: city centroid
  • 30: county centroid
  • 25: state centroid

Example

An address was initially entered without the Areaname. As shown below, when the Areaname is specified, the Confidence level increased. Adding additional address information, such as city, house number, street name, trailing or leading directional, street suffix, and postal code, will increase Confidence.

confidence comparison

New Output field: CPC record type (CAN Only)

Added a field for Canadian postal code record type CPC_RECORD_TYPE which contains a 2-character code.
Note: This field is returned only when the candidate contains a full postcode of FSA (Forward Sortation Area) + LDU (Local Delivery Unit).
Code Description
<blank> no address match was found
* Unknown
A1 High rise
B1 LVR (Large Volume Receiver) street
C1 Government Street Address
D2 LVR (Large Volume Receiver) Served by Lock Box
E2 Government Served by Lock Box
F2 LVR (Large Volume Receiver) Served by General Delivery
11 Street
21 Street served by route
32 PO Box
42 Route service
52 General Delivery

Return Values: Parsed Address available for all countries

Parsed Address Output Fields

Previously available for both EGM USA and EGM Non USA, the Parsed Address output fields display the components of a matched address which has been parsed and standardized by the geocoder. In this release, additional output fields were added to support international addresses for all countries. (No changes to US fields.)

To enable, go to the Return Values tab and select the Parsed Address option.

Field Name Parsed Input Description
AddressNumberInput House or building number
AreaName1Input State, province or region
AreaName2Input County or district
AreaName3Input City, town or suburb
AreaName4Input Locality
CountryInput Country
FormattedInputStreetInput The formatted main address line
GenericField2Input Reserved for custom use by country level geocoders
GenericField3Input Reserved for custom use by country level geocoders
GenericField4Input

Reserved for custom use by country level geocoders

IntersectingStreetInput

If your data contains references by intersection, for example, main address “Central Avenue” intersecting street “Pine Lane”, you can enter the intersecting street on this line.

LeadingDirectionalInput Directional information for delivery (i.e., N, S, E, W, NE, NW, SE, SW) before the address
PlaceNameInput Point of interest or a business name
PostCode1Input Series of numbers, letters, and spaces used to sort the mail
PostCode2Input

If an address contains a primary postcode and a postal coded add-on, the add-on postcode is entered here

StreetNameInput Name of the street
StreetPrefixInput Word before the street such as N., S., E., or W.
StreetSuffixInput Word that follows the StreetName such as "Street" or "Avenue"
TrailingDirectionalInput Directional information for delivery (i.e., N, S, E, W, NE, NW, SE, SW) after the address.
UnitTypeInput Indicates the type of unit such as apartment or suite (APT, STE, etc.).
UnitValueInput The number of the unit
UnparsedInput The full address as entered

New CLI commands

Defining memory size for GGM

ggmglobalgeocodedb memory set The ggmglobalgeocodedb memory set command defines the memory size for the Global Geocoding Module databases. The fields for defining minimum and maximum memory values can be empty. If a value is empty, that value will not be specified on the command line when starting the component, as if no value were explicitly defined. If no value is specified, or if a value is 0, the property will not be passed to the Command Line Interface.
ggmglobalgeocodedb memory set --name database_name --mn minimum_memory_size --mx maximum_memory_size

Important changes regarding data bundles and databases

In previous releases, archiving data bundles (SPD) was optional and turned off by default. In this release, the bundles(SPDs) will be archived on the machine where they are installed.

Default paths:

  • Archive folder: <SpectrumInstallFolder>\server\archive
  • Extracted data: <SpectrumInstallFolder>\server\ref-data
This release of Spectrum also requires users to use CLI to perform all operations related to installing and deleting data bundles. Here are a few example commands:

Changing default data extraction and archive folders:

productdata extract register --p platform --d c:\\<desiredpath>\<SpectrumDataFolder>
productdata archive register --p platform --d c:\\<desiredpath>\<SpectrumArchiveFolder> 

Installing and deleting data bundles:

productdata install --f c:\<pathtToSPD>\KNT082019.spd –w
productdata install --f c:\< pathtToSPD>\CAN-EGM-PITNEYBOWES-CA8.spd --w
productdata delete --c geocoding-all --p geocoding-all --q USA-HERE --v 201907

where c=component, p=product, q=qualifier, v=vintage; these properties are in the metadata.json file

Importing Databases:

Similar to previous Spectrum versions, you can import databases using the import command.

globalgeocodedb import --f c:\<pathToFile>\GlobalGeocodeDbResource.txt

Improved dataset management for USPS® products (USA)

Overview

Datasets can now be added using the Spectrum Product Data (SPD) format. This applies to USPS products: Delivery Point Validation (DPV™), LACSLink®, SuiteLink®, and Residential Delivery Indicator (RDI™).

Note: These steps assume the Administration Utility has already been installed as part of installing the Client Tools.

Regardless of the USPS product, the overall process is the same:

  1. Download your licensed SPD files from the Software Data Marketplace (SDM), using the link provided in the release announcement or welcome email.
  2. Install it using the Administration Utility command productdata install. For more detail, see Installing a Spectrum Database.
  3. In Management Console, add the data as a Spectrum Database Resource (and any other required datasets).
  4. Go to the Global Geocode stage. On the Return Values tab, select the Return all available information setting.
  5. On the Preview tab, enter an address as Input Record 1 and click Run Preview. Output fields are returned.

Obtaining Data

Data requirements are outlined in the product-specific sections below. On the Software Data Marketplace (SDM), the datasets are listed as:

USPS SDM Name
DPV

GEOCODING GEOCODING DPV SPLIT AMER UNITED STATES ALL USA

LACSLink

GEOCODING GEOCODING LACSLINK DATABASE AMER UNITED STATES ALL USA

RDI

GEOCODING GEOCODING RESIDENTIAL INDICATOR AMER UNITED STATES ALL USA

SuiteLink

GEOCODING GEOCODING SUITELINK DATABASE AMER UNITED STATES ALL USA

DPV

Used for: Confirming if a ZIP + 4 address is a USPS delivery point; helpful for identifying potential addressing issues.

Requirements:DPV and Street datasets of the same vintage.

Example

Enter these values...
Input Field Value
placeName DRAKE BEAM
mainAddressLine 2502 N ROCKY POINT DR
areaName3 Tampa
areaName1 FL
postCode1 33607
...to return the output shown below.
Output Field Value
DPVConfirm D
DPVCMRA N
DPVFootNote1 AA
DPVFootNote2 N1
DPVNoSTAT Y
DPVShutdown N

DPVVacant

N

LACSLink

Used for: Obtaining new addresses after a 911 emergency system has been implemented.

Requirements: LACS and Street datasets of the same vintage.

Example

Enter these values...

...to return the output shown below.

RDI

Used for:Determining if an address is classified as a residence or business; helpful for avoiding delivery surcharges some shipping companies charge for residential delivery.

Requirements: DPV, RDI and Street datasets of the same vintage.

Example

Enter these values...

Input Field Value
mainAddressLine 1627 S Jasmine St
areaName3 Denver
areaName1 CO
postCode1 80224

...to return the output shown below.

Output Field Value
RDIReturnCode Y

SuiteLink

Used for: Adding known secondary (suite) information to business addresses; helpful for delivery sequencing.

Requirements: SuiteLink and Street datasets of the same vintage.

Example

Enter these values...
...to return the output shown below.

Custom Dataset Builder changes

Custom Dataset Builder (CDB) is a utility to create new datasets for use in the Global Geocoding Module and web service API.

Support for French-administered territories and Monaco

CDB now supports data creation for the French-administered territories of Guadeloupe (GLP), French Guiana (GUF), Martinique (MTQ), Mayotte (MYT), Réunion (REU), and the country of Monaco (MCO) using the similar command as used for other countries.

When geocoding any address from the territories, provide all the relevant settings as you would for France (including the country code FRA, not the territory code). Matching candidates are returned from those territories along with the country code of its parent (i.e., FRA).

Notes:

  • Custom Dataset Builder is a command line driven tool. Configuration is done through JSON.
  • Data must be in TAB format (native or nativeX).
  • Data created with Custom Dataset Builder does not support interactive geocoding at this time.
  • The result code for street geocoding contains a "U" for user datasets to distinguish it from "A" when candidates are from the standard address datasets. For example: S5HPNTSCZU instead of S5HPNTSCZA.

New parameter -usePackagedLib

Added optional parameter [required for USA] that uses the libraries bundled with the Custom Dataset Builder tool instead of using the library from the SPD. It will only work with Spectrum version 2019.1 or higher and SPD bundles OCT2019 or higher.

For more information, the Custom Dataset Builder Guide is included in the .zip file with the tool. Instructions are also provided in the Global Geocoding User Guide.

Spectrum Databases: new filter for data bundles

In Resources > Spectrum Databases > database name, you can now filter the list of data bundles to Show All, Show Enabled, or Show Disabled.

  • Show All displays all available datasets
  • Show Enabled displays datasets that are selected or configured in the database
  • Show Disabled displays all available datasets that are not selected or configured in the database

New Reset Settings button

From any Global Geocoding stage, you can now click Reset Settings to return all options, across all tabs, to their default values.

Dataset expiration

Added the ability to view when your database resources expire. In Management Console, go to System > Licensing and Expiration > Data tab to review the new Expires On column.

"Version" now identifies PB Release Vintage

Applies to all geocoding modules: In Management Console, when you navigate to System > Version, the Version field now displays the PB Release vintage instead of the vendor's vintage. This makes it easier to keep dataset vintages aligned.