Using enhanced search options
This section contains information on additional address concepts used by GeoStan, and includes the following topics:
Specifying database search order
GeoStan is able to process addresses using multiple databases at the same time. This allows you to find the best possible match from a variety of data sources and types of data (point data as well as street segment data). GeoStan processes these multiple data sources using a default search order. When GeoStan matches an address exactly, it stops searching rather than continuing on to search in additional databases. This saves on processing time. When an exact match is not found, GeoStan continues searching all of the available data sources for candidate address matches. The candidates are then scored and the highest scoring match from all of the data sources is returned as the match. If multiple candidates receive an identical score, a multi-match is returned instead. The default search order for GeoStan is:
Auxiliary files
User Dictionaries
Point GSD files
Street GSD files
You may specifically set the database search order using the GS_FIND_DB_ORDER property. This property allows you to specifically set the order in which User Dictionary and GSD databases are searched. This property is designed for situations where you feel that the data in particular data sources is superior or should be preferred over other available sources.
GeoStan supports the creation and use of User Dictionaries based on your own source data. A User Dictionary can be used independently or as a supplement to the supplied GSD. For more information, see User Dictionary.
You can geocode using:
A single User Dictionary or multiple User Dictionaries
Standard GSD
A combination of GSD and User Dictionaries
Specifying a preference for street name or P.O. Box
When using multi-line or two-line addresses, you can specify which input address GeoStan uses for matching: a P.O. Box or a street address. Use the following to specify your address preference:
C |
Use the GS_PREFER_STREET or GS_PREFER_POBOX properties of GsFindWithProps. |
COBOL |
Use the GS_PREFER_STREET or GS_PREFER_POBOX properties of GSFINDWP. |
JAVA |
Use the FIND_PREFER_STREET or FIND_PREFER_POBOX properties of FindProps. |
.NET |
Use GS_FIND_PREFER_STREET or GS_FIND_PREFER_POBOX properties of FindProps. |
If GeoStan cannot match to the preferred address, it tries to match to the alternative address.
The following example uses the C Language.
If you set the GS_FIND_PREFER_POBOX find property used with GsFindWithProps, and the input address is
123 Main St |
(GS_ADDRLINE) |
P.O. Box 24 |
(GS_ADDR2) |
GeoStan first attempts to match to P.O. Box 24. If GeoStan cannot find a match, it then attempts to match to 123 Main St.
If you do not specify a preference for a P.O. Box or street address, GeoStan attempts to match to the first address line it receives as input.
Using building name, firm and Point of Interest matching
GeoStan can enhance standard address matching by matching to building and business names. Two optional POI files are available, Premium POI and POI.
By default, GeoStan is able to match building names with unit numbers in the address line, the Decatur Airport as an example:
Firm:
Address: 1 DECATUR AIRPORT
Last Line: DECATUR IL 62521
The returned information is the address of the Chrysler building. GeoStan returns a standardized address in place of the building name:
Firm:
Address: 910 S AIRPORT RD STE 1
Last Line: DECATUR IL 62521-4288
Entering Radio City Music Hall, as an example, into the address line, the address for Radio City Music Hall returns in the address field:
Firm:
Address: RADIO CITY MUSIC HALL
Last Line: NEW YORK, NY 10020
GeoStan returns the following address:
Firm:
Address: 1260 AVENUE OF THE AMERICAS
Last Line: NEW YORK, NY 10020-1701
The ability to search by building name entered in the address line is controlled by setting the find property, GS_FIND_BUILDING_SEARCH, to True.
Entering a firm name in the Firm name field returns the address for the input firm in the address field:
Firm: RADIO CITY MUSIC HALL
Address:
Last Line: NEW YORK, NY 10020
GeoStan returns the following address:
Firm: RADIO CITY MUSIC HALL
Address: 1260 AVENUE OF THE AMERICAS
Last Line: NEW YORK, NY 10020-1797
By modifying GS_FIND_ALTERNATE_LOOKUP, you can specify whether GeoStan searches for the following:
GS_STREET_LOOKUP_ONLY (default) - Matches to the address line.
GS_PREFER_STREET_LOOKUP - Matches to the address line, if a match is not made, then GeoStan matches to the Firm name line.
GS_PREFER_FIRM_LOOKUP - Matches to the Firm name line, if a match is not made, then GeoStan matches to address line.
Using the optional POI Index file
The optional Point Of Interest (POI) Index file (poi.gsi) included with the Master Location Data and HERE Point Addresses data sets provides expanded support in alias name matching.
Implementation
Set up your data.
On Windows/UNIX/Linux:
Install the MLD and/or HERE points, and streets data sets and their associated license files. Note down the paths to these folders.
Define the data paths to the geocoding data sets you have installed for your application. Define the paths to the associated license files and passwords.
On z/OS:
Upload the MLD and/or HERE points, and streets data sets.
There are three poi.gsi files. There needs to be a separate DD statement in your JCL for each file using the GSIFILxx DD name, for example:
//GSDFILE DD DSN=&GEOSPFX..US.GSD,DISP=SHR
//GSDFIL01 DD DSN=&GEOSPFX..MPOINTS1.GSD,DISP=SHR <=== MLD file
//GSIFILE DD DSN=&GEOSPFX..MPOINTS1.GSI,DISP=SHR <=== MLD alias file
//GSDFIL02 DD DSN=&GEOSPFX..MPOINTS2.GSD,DISP=SHR <=== MLD file
//GSIFIL01 DD DSN=&GEOSPFX..MPOINTS2.GSI,DISP=SHR <=== MLD alias file
//GSDFIL03 DD DSN=&GEOSPFX..MPOINTS3.GSD,DISP=SHR <=== MLD file
//GSIFIL02 DD DSN=&GEOSPFX..MPOINTS3.GSI,DISP=SHR <=== MLD alias file
//GSIFIL03 DD DSN=&GEOSPFX..MLDPOI1.GSI,DISP=SHR <===MLD POI file
//GSIFIL04 DD DSN=&GEOSPFX..MLDPOI2.GSI,DISP=SHR <===MLD POI file
//GSIFIL05 DD DSN=&GEOSPFX..MLDPOI3.GSI,DISP=SHR <===MLD POI file
Note: It does not matter what you name the files on the mainframe as long as the DD statements point to it.To confirm the poi.gsi loaded successfully, query the Status File POI Index status output property. Boolean. True = file loaded successfully. Default = False.
C
COBOL
GS-STATUS-FILE-POI-IDX
JAVA
STATUS_FILE_POI_IDX
.NET
GS_STATUS_FILE_POI_IDX
Set the GS_FIND_BUILDING_SEARCH find property to true. The POI Index file will automatically be searched when the GS_FIND_BUILDING_SEARCH find option is enabled and a firm, building or POI name is specified in the address line.
Process the match by calling the Find Properties function.
If an alias match is made to the POI Index file, the return value is as follows:
C |
GS_IS_ALIAS returns "A11". |
COBOL |
GS-IS-ALIAS returns "A11". |
JAVA |
IS_ALIAS returns "A11". |
.NET |
GS_IS_ALIAS returns "A11". |
Using the optional Premium POI Index file
The optional Premium Point Of Interest (PPOI) Index file (ppoi1.gsi, ppoi2.gsi, ppoi3.gsi) included with the Master Location Data and HERE Point Addresses data sets provides expanded support in alias name matching.
Implementation
Set up your data.
On Windows/UNIX/Linux:
Install the MLD and/or HERE points, and streets data sets and their associated license files. Note the paths to these folders.
Define the data paths to the geocoding data sets you have installed for your application. Define the paths to the associated license files and passwords.
On z/OS:
Upload the MLD and/or HERE points, and streets data sets.
There are three poi.gsi files. There needs to be a separate DD statement in your JCL for each file using the GSIFILxx DD name, for example:
//GSDFILE DD DSN=&GEOSPFX..US.GSD,DISP=SHR
//GSDFIL01 DD DSN=&GEOSPFX..MPOINTS1.GSD,DISP=SHR <=== MLD file
//GSIFILE DD DSN=&GEOSPFX..MPOINTS1.GSI,DISP=SHR <=== MLD alias file
//GSDFIL02 DD DSN=&GEOSPFX..MPOINTS2.GSD,DISP=SHR <=== MLD file
//GSIFIL01 DD DSN=&GEOSPFX..MPOINTS2.GSI,DISP=SHR <=== MLD alias file
//GSDFIL03 DD DSN=&GEOSPFX..MPOINTS3.GSD,DISP=SHR <=== MLD file
//GSIFIL02 DD DSN=&GEOSPFX..MPOINTS3.GSI,DISP=SHR <=== MLD alias file
//GSIFIL03 DD DSN=&GEOSPFX..MLDPOI1.GSI,DISP=SHR <===MLD POI file
//GSIFIL04 DD DSN=&GEOSPFX..MLDPOI2.GSI,DISP=SHR <===MLD POI file
//GSIFIL05 DD DSN=&GEOSPFX..MLDPOI3.GSI,DISP=SHR <===MLD POI file
Note: It does not matter what you name the files on the mainframe as long as the DD statements point to it.To confirm the ppoi1.gsi, ppoi2.gsi, and ppoi3.gsi loaded successfully, query the Status File POI Index status output property. Boolean. True = file loaded successfully. Default = False.
C
COBOL
GS-STATUS-FILE-PREMIUM_POI-IDX
JAVA
STATUS_FILE_PREMIUM_POI_IDX
.NET
GS_STATUS_FILE_PREMIUM_POI_IDX
Set the GS_FIND_BUILDING_SEARCH find property to true. The Premium POI Index file will automatically be searched when the GS_FIND_BUILDING_SEARCH find option is enabled and a firm, building or POI name is specified in the address line.
Process the match by calling the Find Properties function.
If an alias match is made to the Premium POI Index file, the return value is as follows:
C |
GS_IS_ALIAS returns "A15". |
COBOL |
GS-IS-ALIAS returns "A15". |
JAVA |
IS_ALIAS returns "A15". |
.NET |
GS_IS_ALIAS returns "A15". |
Using correct lastline
GS_FIND_CORRECT_LASTLINE, when set to True, corrects elements of the output lastline, providing a good ZIP Code or close match on the soundex even if the address would not match or was non-existent.
The feature works when GS_FIND_ADDRCODE is True and the address does not match a candidate or when GS_FIND_Z_CODE is True and only lastline information is input.
For example, when GS_FIND_ADDRCODE = True:
Address: 0 MAIN
LastLine: BOLDER CA 80301
Returns:
MATCH_CODE=E622
LASTLINE=BOULDER, CO 80301
CITY=BOULDER
STATE=CO
ZIP=80301
For example, GS_FIND_Z_CODE = True:
Address:
LastLine: BOLDER CA 80301
Returns:
MATCH_CODE=Z6
LASTLINE=BOULDER, CO 80301
CITY=BOULDER
STATE=CO
ZIP=80301
The following elements are corrected:
-
City correction - The city correction is based on the input ZIP Code unless a match to city and state exists in which case both search areas are retained. The state input must be correct or spelled out correctly when no ZIP Code is input, location code, and coordinates based on input ZIP Code.
-
Input city is incorrect:
HAUDENVILLE MA 01039
Returns LASTLINE=HAYDENVILLE, MA 01039
LAT= 42396500 LON= -72689100
-
-
State correction - State is abbreviated when spelled out correctly or corrected when a ZIP Code is present. There are some variations of state input which are recognized, ILL, ILLI, CAL, but not MASS. GeoStan does not consider the abbreviation of the variation a change so ILL to IL is not identified as a change in the match code. In addition, the output of the ZIP Code for a single ZIP city is not considered a change.
-
Input city exists:
Bronx NT, 10451
Returns LASTLINE= BRONX, NY 10451
Bronx NT
Returns LASTLINE= BRONX NT
No ZIP Code for correction
-
Input city does not exist - preferred city for ZIP Code returned:
60515
Returns LASTLINE=DOWNERS GROVE, IL 60515
MATCH_CODE=E622
ILLINOIS 60515 (or ILL 60515 or IL 60515 or ILLI 60515)
Returns LASTLINE=DOWNERS GROVE, IL 60515
MATCH_CODE=E222
-
-
ZIP Code correction - ZIP Code is corrected only when a valid city/state is identified and has only one ZIP Code.
-
Exists on input:
HAUDENVILLE MA 01039
Returns LASTLINE=HAYDENVILLE, MA 01039
-
Incorrect on input - ZIP Code correction is not performed, both search areas are retained:
HAUDENVILLE MA 01030
Returns LASTLINE=HAYDENVILLE, MA 01030
City and ZIP do not correspond
-
Does not exist on input:
DOWNRS GROVE, IL
Returns LASTLINE=DOWNERS GROVE, IL
City with multiple ZIP Codes
LILSE IL
Returns LASTLINE=LISLE, IL 60532
City with a single ZIP Code
DOWNERS GROVE LL
Returns LASTLINE=DOWNERS GROVE LL,
No ZIP Code for correction
DOWNRS GROVE, LL
Returns LASTLINE=DOWNRS GROVE, LL
No ZIP Code for correction
LILSE ILLINOIS
Returns LASTLINE= LISLE, IL 60532
Correct spelled out state
LISLE ILLINOS
Returns LASTLINE= LISLE ILLINOS
Incorrect spelled out state, no ZIP Code for correction
-
Using predictive lastline
Predictive lastline allows you to match an address when only an input street address and latitude/longitude coordinates are provided, rather than the traditional street address with lastline input. For example, an input of 4750 Walnut with latitude/longitude coordinates located in Boulder, will return full address information.
Additional feature information
To use predictive lastline, the GS_INIT_OPTIONS_SPATIAL_QUERY property must be set to True.
Predictive lastline uses the search radius designated for reverse geocoding.
If the input lat/lon falls near the borders of multiple cities, GeoStan processes all cities and returns the results of the best match. If the results are determined as equal, then a multi-match is returned.
This feature does not require a license for reverse geocoding.
This feature will work with any type of data set except USPS-only.
Preferring a ZIP Code over a city
The GS_FIND_PREFER_ZIP_OVER_CITY property allows a user to prefer candidates that match to the input ZIP Code over candidates that match to the input city. GeoStan creates multiple search areas when the input city and ZIP Code do not correspond and this feature helps establish how the candidates should be scored.
When there is more than one candidate in the input ZIP Code, some attempt is made to alleviate multiple candidates for a match, or, where all the candidates get the same lastline score. If a candidate also matches the city and/or preferred city, that candidate gets a better score. Matching to just preferred city is a lesser score than matching both.
Input Address: 24 GLEN HAVEN RD
Input Last Line: NEW HAVEN CT 06513
Found:
24 GLEN HAVEN RD
NEW HAVEN, CT 06513-1105
Possible candidates:
score pref.last line city
2 98 GLEN HAVEN RD 06513-1105 S 0.8100000 NEW HAVEN * best match
24 98 GLEN HAVEN RD 06513-1248 S 2.2500000 EAST HAVEN
16 66 GLEN RD 06511-2825 S 46.3925000 NEW HAVEN
2 86 GLEN PKWY 06517-1415 S 52.1525000 HAMDEN
2 28 GLEN RD 06516-6509 S 52.1525000 WEST HAVEN
2 98 GLENHAM RD 06518-2517 S 75.0100000 HAMDEN
2 72 GLEN VIEW TER 06515-1519 S 97.0900000 NEW HAVEN
When there is more than one candidate, candidates matching the input ZIP Code score better.
Input Address: 301 BRYANT ST
Input Last Line: SAN FRANCISCO CA 94301
Found:
301 BRYANT ST
PALO ALTO, CA 94301-1408
Possible candidates:
score pref.last line city
301 301 BRYANT ST 94301-1408 S 3.2400000 PALO ALTO * ZIP preferred match
301 305 BRYANT CT 94301-1401 S 28.2400000 PALO ALTO
300 306 BRYANT CT 94301-00ND T 35.6600000 PALO ALTO
301 301 BRYANT ST 94107-4167 H 39.6900000 SAN FRANCISCO * default match
301 319 BRYANT ST 94107-1406 S 39.6900000 SAN FRANCISCO
When there is more than one candidate, candidates that match the ZIP Code search area score better. The ZIP Code search area is the finance area for the input ZIP Code.
This example is with GS_FIND_SEARCH_AREA set to FINANCE. With GS_FIND_SEARCH_AREA set to CITY the match is made to EAST AURORA 14052 as there is no candidate in the 14166 input ZIP Code.
Input Address: 100 MAIN ST
Input Last Line: EAST AURORA NY 14166
Found:
100 MAIN ST
DUNKIRK, NY 14048-1844
Possible candidates:
score pref.last line city
100 198 MAIN ST 14048-1844 S 3.2400000 DUNKIRK * same finance as input
ZIP 14166
100 168 MAIN ST 14052-1633 S 39.6900000 EAST AURORA
This example is with GS_FIND_SEARCH_AREA set to 0 (CITY).
Input Address: 4200 arapahoe
Input Last Line: denver co 80301
Found:
4200 ARAPAHOE AVE
BOULDER, CO 80303-1164
Possible candidates:
score pref.last line city
4200 4210 ARAPAHOE AVE 80303-1164 S 38.7400000 BOULDER *same city as
input ZIP 80301
4200 4210 ARAPAHOE RD 80303-1164 S 40.7000000 BOULDER (A06)
4200 4298 E ARAPAHOE PL 80122-00ND T 62.0900000 LITTLETON
4200 4498 E ARAPAHOE RD 80122-00ND T 62.0900000 LITTLETON
4181 4499 E ARAPAHOE RD 80122-00ND T 68.3400000 LITTLETON
Matching address ranges
Some business locations are identified by address ranges. For example, a shopping plaza could be addressed as 10-12 Front St. This is how business mail is typically addressed to such a business location. These address ranges can be geocoded to the interpolated mid-point of the range.
Address ranges are different from hyphenated (dashed) addresses that occur in some metropolitan areas. For example, a hyphenated address in Queens County (New York City) could be 243-20 147 Ave. This represents a single residence (rather than an address range) and is geocoded as a single address. If a hyphenated address similar to this example returns as an exact match, then there is no attempt to address range match.
Address range matching is disabled by default and is an optional mode. To enable address range matching, use GS_FIND_ADDRESS_RANGE.
The following fields are not returned by address range geocoding:
ZIP+4® (in multiple segment cases)
Delivery Point
Check Digit
Carrier Route
Record Type
Multi-Unit
Default flag
Address Range matching capabilities and guidelines
Address Range matching works within the following guidelines:
There must be two numbers separated by a hyphen.
The first number must be lower than the second number.
Both numbers must be of the same parity (odd or even) unless the address range itself has mixed odd and even addresses.
Numbers can be on the same street segment or can be on two different segments. The segments do not have to be contiguous.
If both numbers are on the same street segment, the geocoded point is interpolated to the approximate mid-point of the range.
If the numbers are on two different segments, the geocoded point is based on the last valid house number of the first segment. The ZIP Code and FIPS Code are based on the first segment.
In all cases, odd/even parity is evaluated to place the point on the correct side of the street.
A close match to a single address number is preferred over a ranged address match. GeoStan attempts a close match on the recombined address number before making a ranged match, as seen in the following example:
Input: 4750-4760 Walnut St, Boulder, CO
Output: 4750-4760 Walnut St, Boulder, CO
The Address Range match is to a single street segment, with the geocode being placed on the mid-point of the range.
Input: 47-50 Walnut St, Boulder, CO
Output: 4750 Walnut St, Boulder, CO
In the example below, the second number is not larger than the first, so GeoStan treats this as a unit number rather than a ranged address:
Input: 4750-200 Walnut St, Boulder, CO
Output: 4750 Walnut St STE 200, Boulder, CO
Understanding missing and wrong first letter
The missing and wrong first letter feature enables GeoStan to look for the correct first letter of a street address if the first letter is missing or incorrect. GeoStan searches through the alphabet looking for possible, correct first letters to complete the street address.
The feature is disabled by default. To enable this feature, modify GS_FIND_FIRST_LETTER_EXPANDED.
Below are some examples of wrong, missing first letter, and duplicate first letter input addresses and the corresponding GeoStan output:
The example includes an incorrect first letter:
Input: 4750 nalnut boulder co 80301
Output: 4750 Walnut St Boulder CO 80301-2532
This example is missing a first letter:
Input: 4750 alnut boulder co 80301
Output: 4750 Walnut St Boulder CO 80301-2532
This example includes an extra first letter:
Input: 4750 wwalnut boulder co 80301
Output: 4750 Walnut St Boulder CO 80301-2532
Permitting relaxed address number matching
When GeoStan matches an input address, its default behavior is to match to the address number. This default behavior corresponds to GS_FIND_MUST_MATCH_ADDRNUM set to True. GeoStan must match to an address number.
If GS_FIND_MUST_MATCH_ADDRNUM is set to False (in GS_MODE_CUSTOM only), then GeoStan no longer must match the address number, therefore permitting relaxed address number matching. By permitting relaxed address number matching, an inexact match can be found. If the input address number is missing, no matches are returned unless GS_FIND_STREET_CENTROID (see Understanding street locator geocoding) is also enabled.
If the input address number is not within a house number range from the street, GeoStan selects the nearest range on the street which has the same parity (even or odd) as the input address number. GeoStan returns one or more of the closest matches inside this range that preserves street parity. This requires GeoStan to change the house number. The new house number is equal to one of the range's endpoints, possibly plus or minus one to preserve street parity.
When GS_FIND_MUST_MATCH_ADDRNUM is set to False and no exact matching house number is found, a match code of either E029 (no matching range, single street segment found), or E030 (no matching range, multiple street segment) is returned by GsDataGet and GeoStan does not change the house number on the output address.
In order to access the inexact address number candidates, the GsMultipleGet function must be called. If there are inexact house number candidates returned by GsMultipleGet, the corresponding match codes begin with the letter 'H' indicating that the house number was not matched. Additionally, even when one or more exact candidates are found, inexact matches to the house number are still on the list of possible candidates, and these can be differentiated from the others by their Hxx match codes. For more information on match codes, see Match codes.
Forcing an exact match on specific address elements
Listed below are the properties to specify a desired exact match:
GS_FIND_MUST_MATCH_MAINADDR - Matches to an input street name exactly. This match is dependent on source data.
GS_FIND_MUST_MATCH_STATE - Matches to an input state exactly.
GS_FIND_MUST_MATCH_ZIPCODE - Matches to an input 5-digit ZIP Code exactly.
GS_FIND_MUST_MATCH_CITY - Matches to an input city exactly.