Global Geocoding Module
Performance and scalability improvements
With the proper configuration, the Global Geocoding Module (GGM) now performs faster than the Enterprise Geocoding Module.
Enhance GGM performance by setting database resource values poolsize and POOL_MAX_ACTIVE equal to or double the number of CPUs.
poolsize (maxActive - for CLI users) = CPU or 2 * CPU
POOL_MAX_ACTIVE = poolsize
GGM Performance Configuration
Before proceeding, see these topics about tuning Spectrum performance:
GGM dataflow performance depends on multiple factors, including the following:
- Machine configuration (number of CPU, available RAM): Higher CPU and memory helps in executing multiple threads of geocoders inside GGM, leading to higher performance.
- Databases configured: Only configure those databases that need to be geocoded. Otherwise, the larger the configured databases are, the more memory GGM needs. This can eventually limit the number of threads that can be spawned on a given machine and limit performance.
- Input addresses: Refer to Optimizing Geocoding Stages.
- Dataflow runtime settings: Refer to Database Pool Size and Runtime Instances.
- GGM database configuration : GGM has certain configuration parameters which control internal threads and associated pools. To determine optimal value for your environment, review and adjust these parameters as explained in the Spectrum Dataflow Designer Guide. Key parameters are highlighted below.
GGM Database configuration parameters
An administrator can configure a GGM database by importing SPD files via the Administration Utility commands. Once it has been imported, users can go to the Management Console and configure a GGM database using imported SPDs.
While configuring databases in the Add Database screen, an administrator can set the following properties to enhance GGM performance:
- Pool size Database poolsize defines number of concurrent requests GGM can handle. Set it equal or double of current CPU to achieve better performance. (CLI users please refer to maxActive property.)
- Select the Override advanced settings option.
- MAXIMIZE_BATCH_SIZE The default is 100; this controls the maximum batch size for REST API.
- POOL_MAX_ACTIVE The default is 16, this controls the Internal API pool size.
Create and use multiple geocoding databases
GGM now allows users to create multiple database resources. Each database resource can be configured with a different vintage, which allows for flexibility within dataflows. Users can test a new release by comparing it with an older release before promoting it.
Backwards compatibility: when older flows are imported, they will default to the latest database created on 2019.1.
Notes:
- Before configuring and utilizing multiple database resources, confirm if there are any applications using the REST API from GGM. REST API calls will need to be modified to include the database name so it does not conflict with any new database resources.
- If you want to use this new capability and you have existing data flows, review and update any data flow database resources. Otherwise, no changes to data flow database resources are necessary.
MLD Extended Attributes including APN and Elevation
(USA) This new feature provides access to extended attributes associated with an addressable location that has a pbKey. When matching addresses with Master Location Data (MLD), Spectrum now returns additional property information associated with the address, such as Assessor's Parcel Number (APN), Elevation, Address Type, and Lot Size. APN can be used to identify the parcel so the parcel ID can be linked to additional information for the insurance industry, such as property and insurance risk attributes. For more detail, see the full list below.
Requirements
The following are required to return MLD Extended Attributes in the Global Geocoding Module (GGM):
- Master Location Data dataset.
- Streets dataset.
- MLD Extended Attributes dataset.
- Recommendation: the vintages of the MLD and MLD Extended Attributes datasets be within 4 months of each other.
- Within GGM, on the Return Values tab, select the Return all available information setting. Return values in Enterprise Designer, using Custom Fields.
MLD Extended Attributes Output Fields (optional)
Field |
Description |
---|---|
AddressType |
Address Type regarding number of units: S – Single unit M – Multiple units P – Post Office box X – Unknown |
Apn | Assessor’s parcel number. |
IncorpPlaceInd |
Incorporated Place Indicator. I – Incorporated place N – Not an incorporated place X – Unknown |
LotSize | Lot size of the parcel expressed in square feet; 0 if none. |
LotSizeMeters | Lot size of the parcel expressed in square meters; 0 if none. |
MECLatitude | Latitude of Minimum Enclosing Circle expressed with an implied 6 digits of decimal precision; 0 if none. |
MECLongitude | Longitude of Minimum Enclosing Circle expressed with an implied 6 digits of decimal precision; 0 if none. |
MECRadius | Radius of Minimum Enclosing Circle (in square feet) expressed as a whole number. For example: 1234 means 1,234 feet. |
MECRadiusMeters |
Radius of Minimum Enclosing Circle (in meters) expressed with 1 digit of decimal precision.
|
Elevation | Elevation above sea level (in feet) expressed with 1 digit of decimal precision. For example: 12.5 feet. |
ResidentialBusiness |
Usage Indicator: R – Residential use B – Business use M – Mixed use – residential and business X – Unknown use |
TigerFaceID | TIGER Face Identifier. This field can be used to match to all Census geocodes using external data; 0 if none. |
TigerPlace | TIGER Place code; 0 if none. |
UrbanAreaID | TIGER Urban Area Identifier. Defines the urban area if any; 0 if none. |
UrbanAreaPop | Census population of the urban area; 0 if none. |
Urbanicity | Urbanicity Indicator. An indicator that defines, according to the Census, the Urbanicity of the Address using TIGER UACE codes for categorization. |
Added support for RDI
The Residential Delivery Indicator (RDI™) is a United States Postal Service (USPS®) data product that identifies whether a delivery type is classified as residential or business. If you are shipping to residences, you may lower costs by shipping with the Postal Service™ and avoid residential delivery surcharges typically charged by other shipping companies.
Note: To use RDI, Delivery Point Validation (DPV) must also be enabled and a US Streets dataset loaded.
Output Field | Description |
---|---|
RDIRetCode | USPS Residential Delivery Indicator (RDI) return codes: • Y = Residence • N = Business • Blank = Not processed through RDI. |
See also: Improved dataset management for USPS® products (USA)
Reverse Geocoding: supports fallback to World geocoder
Reverse geocoding can now fallback to the World Geocoder for those countries that are included in the GGM database. The advantage of this feature is for points that do not return a point or street-level candidate, it can now return a geographic or postal-level centroid match.
New Input field: Addressline2
On the Global Geocode Stage: added the (optional) input field AddressLine2 which is supported for USA, AUS, CAN, and GBR. It is also a new field on the REST Service.
New Output field: MatchScore
This new custom field displays a score (0-100) of how well the input
compares to the candidate values for certain fields. It is used to better indicate what parts of
input address were changed to make the match. Fields checked include street name, house number,
directional, street type, unit number, place name, postal code, and area names 1, 3, and 4. A
lower match score indicates many input fields were changed.
New Output field: Confidence (USA Only)
This new field indicates the confidence in the output provided, from 0 to 100. The higher the score, the higher the probability that the match is correct. If the match is exact, the confidence score is 100. For all other matches, the confidence score is calculated based on which portions of the input address had to be changed to obtain a match.
If you have enabled the option to return centroids, the confidencevalue indicates the type of centroid returned:
- 60: street centroid
- 50: postal code centroid
- 35: city centroid
- 30: county centroid
- 25: state centroid
Example
An address was initially entered without the Areaname. As shown below, when the Areaname is specified, the Confidence level increased. Adding additional address information, such as city, house number, street name, trailing or leading directional, street suffix, and postal code, will increase Confidence.
New Output field: CPC record type (CAN Only)
Code | Description |
---|---|
<blank> | no address match was found |
* | Unknown |
A1 | High rise |
B1 | LVR (Large Volume Receiver) street |
C1 | Government Street Address |
D2 | LVR (Large Volume Receiver) Served by Lock Box |
E2 | Government Served by Lock Box |
F2 | LVR (Large Volume Receiver) Served by General Delivery |
11 | Street |
21 | Street served by route |
32 | PO Box |
42 | Route service |
52 | General Delivery |
Return Values: Parsed Address available for all countries
Parsed Address Output Fields
Previously available for both EGM USA and EGM Non USA, the Parsed Address output fields display the components of a matched address which has been parsed and standardized by the geocoder. In this release, additional output fields were added to support international addresses for all countries. (No changes to US fields.)
To enable, go to the Return Values tab and select the Parsed Address option.
Field Name | Parsed Input Description |
---|---|
AddressNumberInput | House or building number |
AreaName1Input | State, province or region |
AreaName2Input | County or district |
AreaName3Input | City, town or suburb |
AreaName4Input | Locality |
CountryInput | Country |
FormattedInputStreetInput | The formatted main address line |
GenericField2Input | Reserved for custom use by country level geocoders |
GenericField3Input | Reserved for custom use by country level geocoders |
GenericField4Input |
Reserved for custom use by country level geocoders |
IntersectingStreetInput |
If your data contains references by intersection, for example, main address “Central Avenue” intersecting street “Pine Lane”, you can enter the intersecting street on this line. |
LeadingDirectionalInput | Directional information for delivery (i.e., N, S, E, W, NE, NW, SE, SW) before the address |
PlaceNameInput | Point of interest or a business name |
PostCode1Input | Series of numbers, letters, and spaces used to sort the mail |
PostCode2Input |
If an address contains a primary postcode and a postal coded add-on, the add-on postcode is entered here |
StreetNameInput | Name of the street |
StreetPrefixInput | Word before the street such as N., S., E., or W. |
StreetSuffixInput | Word that follows the StreetName such as "Street" or "Avenue" |
TrailingDirectionalInput | Directional information for delivery (i.e., N, S, E, W, NE, NW, SE, SW) after the address. |
UnitTypeInput | Indicates the type of unit such as apartment or suite (APT, STE, etc.). |
UnitValueInput | The number of the unit |
UnparsedInput | The full address as entered |
New CLI commands
Defining memory size for GGM
ggmglobalgeocodedb memory set --name database_name --mn minimum_memory_size --mx maximum_memory_size
Important changes regarding data bundles and databases
In previous releases, archiving data bundles (SPD) was optional and turned off by default. In this release, the bundles(SPDs) will be archived on the machine where they are installed.
Default paths:
- Archive folder: <SpectrumInstallFolder>\server\archive
- Extracted data: <SpectrumInstallFolder>\server\ref-data
Changing default data extraction and archive folders:
productdata extract register --p platform --d c:\\<desiredpath>\<SpectrumDataFolder>
productdata archive register --p platform --d c:\\<desiredpath>\<SpectrumArchiveFolder>
Installing and deleting data bundles:
productdata install --f c:\<pathtToSPD>\KNT082019.spd –w
productdata install --f c:\< pathtToSPD>\CAN-EGM-PITNEYBOWES-CA8.spd --w
productdata delete --c geocoding-all --p geocoding-all --q USA-HERE --v 201907
where c=component, p=product, q=qualifier, v=vintage; these properties are in the metadata.json file
Importing Databases:
Similar to previous Spectrum versions, you can import databases using the import command.
globalgeocodedb import --f c:\<pathToFile>\GlobalGeocodeDbResource.txt
Improved dataset management for USPS® products (USA)
Overview
Datasets can now be added using the Spectrum Product Data (SPD) format. This applies to USPS products: Delivery Point Validation (DPV™), LACSLink®, SuiteLink®, and Residential Delivery Indicator (RDI™).
Regardless of the USPS product, the overall process is the same:
- Download your licensed SPD files from the Software Data Marketplace (SDM), using the link provided in the release announcement or welcome email.
- Install it using the Administration Utility command productdata install. For more detail, see Installing a Spectrum Database.
- In Management Console, add the data as a Spectrum Database Resource (and any other required datasets).
- Go to the Global Geocode stage. On the Return Values tab, select the Return all available information setting.
- On the Preview tab, enter an address as Input Record 1 and click Run Preview. Output fields are returned.
Obtaining Data
Data requirements are outlined in the product-specific sections below. On the Software Data Marketplace (SDM), the datasets are listed as:
USPS | SDM Name |
---|---|
DPV |
GEOCODING GEOCODING DPV SPLIT AMER UNITED STATES ALL USA |
LACSLink |
GEOCODING GEOCODING LACSLINK DATABASE AMER UNITED STATES ALL USA |
RDI |
GEOCODING GEOCODING RESIDENTIAL INDICATOR AMER UNITED STATES ALL USA |
SuiteLink |
GEOCODING GEOCODING SUITELINK DATABASE AMER UNITED STATES ALL USA |
DPV
Used for: Confirming if a ZIP + 4 address is a USPS delivery point; helpful for identifying potential addressing issues.
Requirements:DPV and Street datasets of the same vintage.
Example
Input Field | Value |
---|---|
placeName | DRAKE BEAM |
mainAddressLine | 2502 N ROCKY POINT DR |
areaName3 | Tampa |
areaName1 | FL |
postCode1 | 33607 |
Output Field | Value |
---|---|
DPVConfirm | D |
DPVCMRA | N |
DPVFootNote1 | AA |
DPVFootNote2 | N1 |
DPVNoSTAT | Y |
DPVShutdown | N |
DPVVacant |
N |
LACSLink
Used for: Obtaining new addresses after a 911 emergency system has been implemented.
Requirements: LACS and Street datasets of the same vintage.
Example
Enter these values...
Input Field | Value |
---|---|
mainAddressLine | 277 CLELAND LN |
areaName3 | BAXLEY |
areaName1 | GA |
postCode1 | 31513 |
country | USA |
Output Field | Value |
---|---|
LACSLinkIND | Y |
LACSLinkRetCode | A |
LACSLinkShutdown |
N |
RDI
Used for:Determining if an address is classified as a residence or business; helpful for avoiding delivery surcharges some shipping companies charge for residential delivery.
Requirements: DPV, RDI and Street datasets of the same vintage.
Example
Enter these values...
Input Field | Value |
---|---|
mainAddressLine | 1627 S Jasmine St |
areaName3 | Denver |
areaName1 | CO |
postCode1 | 80224 |
...to return the output shown below.
Output Field | Value |
---|---|
RDIReturnCode | Y |
SuiteLink
Used for: Adding known secondary (suite) information to business addresses; helpful for delivery sequencing.
Requirements: SuiteLink and Street datasets of the same vintage.
Example
Input Field | Value |
---|---|
placeName | DRAKE BEAM |
mainAddressLine | 2502 N ROCKY POINT DR |
areaName3 | Tampa |
areaName1 | FL |
postCode1 | 33607 |
Output Field | Value |
---|---|
SuiteLink_Ret_Code |
A |
Custom Dataset Builder changes
Custom Dataset Builder (CDB) is a utility to create new datasets for use in the Global Geocoding Module and web service API.
Support for French-administered territories and Monaco
CDB now supports data creation for the French-administered territories of Guadeloupe (GLP), French Guiana (GUF), Martinique (MTQ), Mayotte (MYT), Réunion (REU), and the country of Monaco (MCO) using the similar command as used for other countries.
When geocoding any address from the territories, provide all the relevant settings as you would for France (including the country code FRA, not the territory code). Matching candidates are returned from those territories along with the country code of its parent (i.e., FRA).
Notes:
- Custom Dataset Builder is a command line driven tool. Configuration is done through JSON.
- Data must be in TAB format (native or nativeX).
- Data created with Custom Dataset Builder does not support interactive geocoding at this time.
- The result code for street geocoding contains a "U" for user datasets to distinguish it from "A" when candidates are from the standard address datasets. For example: S5HPNTSCZU instead of S5HPNTSCZA.
New parameter -usePackagedLib
Added optional parameter [required for USA] that uses the libraries bundled with the Custom Dataset Builder tool instead of using the library from the SPD. It will only work with Spectrum version 2019.1 or higher and SPD bundles OCT2019 or higher.
For more information, the Custom Dataset Builder Guide is included in the .zip file with the tool. Instructions are also provided in the Global Geocoding User Guide.
Spectrum Databases: new filter for data bundles
In
, you can now filter the list of data bundles to Show All, Show Enabled, or Show Disabled.- Show All displays all available datasets
- Show Enabled displays datasets that are selected or configured in the database
- Show Disabled displays all available datasets that are not selected or configured in the database
New Reset Settings button
From any Global Geocoding stage, you can now click Reset Settings to return all options, across all tabs, to their default values.
Dataset expiration
Added the ability to view when your database resources expire. In Management Console, go to
to review the new Expires On column."Version" now identifies PB Release Vintage
Applies to all geocoding modules: In Management Console, when you navigate to System > Version, the Version field now displays the PB Release vintage instead of the vendor's vintage. This makes it easier to keep dataset vintages aligned.