Using enhanced search options

This section contains information on additional address concepts used by GeoStan, and includes the following topics:

Note: For users of Master Location Data, additional options are available. See Using Master Location Data.

Specifying database search order

GeoStan is able to process addresses using multiple databases at the same time. This allows you to find the best possible match from a variety of data sources and types of data (point data as well as street segment data). GeoStan processes these multiple data sources using a default search order. When GeoStan matches an address exactly, it stops searching rather than continuing on to search in additional databases. This saves on processing time. When an exact match is not found, GeoStan continues searching all of the available data sources for candidate address matches. The candidates are then scored and the highest scoring match from all of the data sources is returned as the match. If multiple candidates receive an identical score, a multi-match is returned instead. The default search order for GeoStan is:

  • Auxiliary files

  • User Dictionaries

  • Point GSD files

  • Street GSD files

You may specifically set the database search order using the GS_FIND_DB_ORDER property. This property allows you to specifically set the order in which User Dictionary and GSD databases are searched. This property is designed for situations where you feel that the data in particular data sources is superior or should be preferred over other available sources.

Note: The GS_FIND_DB_ORDER can not be used to change the priority of auxiliary files. Auxiliary files always have the highest search priority.

GeoStan supports the creation and use of User Dictionaries based on your own source data. A User Dictionary can be used independently or as a supplement to the supplied GSD. For more information, see User Dictionary.

You can geocode using:

  • A single User Dictionary or multiple User Dictionaries

  • Standard GSD

  • A combination of GSD and User Dictionaries

Specifying a preference for street name or P.O. Box

When using multi-line or two-line addresses, you can specify which input address GeoStan uses for matching: a P.O. Box or a street address. Use the following to specify your address preference:

C

Use the GS_PREFER_STREET or GS_PREFER_POBOX properties of GsFindWithProps.

COBOL

Use the GS_PREFER_STREET or GS_PREFER_POBOX properties of GSFINDWP.

JAVA

Use the FIND_PREFER_STREET or FIND_PREFER_POBOX properties of FindProps.

.NET

Use GS_FIND_PREFER_STREET or GS_FIND_PREFER_POBOX properties of FindProps.

If GeoStan cannot match to the preferred address, it tries to match to the alternative address.

The following example uses the C Language.

If you set the GS_FIND_PREFER_POBOX find property used with GsFindWithProps, and the input address is

123 Main St

(GS_ADDRLINE)

P.O. Box 24

(GS_ADDR2)

GeoStan first attempts to match to P.O. Box 24. If GeoStan cannot find a match, it then attempts to match to 123 Main St.

Note: GeoStan ignores the address preference if processing in CASS mode.

If you do not specify a preference for a P.O. Box or street address, GeoStan attempts to match to the first address line it receives as input.

Using building name, firm and Point of Interest matching

GeoStan can enhance standard address matching by matching to building and business names. Two optional POI files are available, Premium POI and POI.

By default, GeoStan is able to match building names with unit numbers in the address line, the Decatur Airport as an example:

Firm:

Address: 1 DECATUR AIRPORT

Last Line: DECATUR IL 62521

The returned information is the address of the Chrysler building. GeoStan returns a standardized address in place of the building name:

Firm:

Address: 910 S AIRPORT RD STE 1

Last Line: DECATUR IL 62521-4288

Entering Radio City Music Hall, as an example, into the address line, the address for Radio City Music Hall returns in the address field:

Firm:

Address: RADIO CITY MUSIC HALL

Last Line: NEW YORK, NY 10020

GeoStan returns the following address:

Firm:

Address: 1260 AVENUE OF THE AMERICAS

Last Line: NEW YORK, NY 10020-1701

The ability to search by building name entered in the address line is controlled by setting the find property, GS_FIND_BUILDING_SEARCH, to True.

Entering a firm name in the Firm name field returns the address for the input firm in the address field:

Firm: RADIO CITY MUSIC HALL

Address:

Last Line: NEW YORK, NY 10020

GeoStan returns the following address:

Firm: RADIO CITY MUSIC HALL

Address: 1260 AVENUE OF THE AMERICAS

Last Line: NEW YORK, NY 10020-1797

Note: GS_FIND_BUILDING_SEARCH is not available in CASS mode.

By modifying GS_FIND_ALTERNATE_LOOKUP, you can specify whether GeoStan searches for the following:

  • GS_STREET_LOOKUP_ONLY (default) - Matches to the address line.

  • GS_PREFER_STREET_LOOKUP  - Matches to the address line, if a match is not made, then GeoStan matches to the Firm name line.

  • GS_PREFER_FIRM_LOOKUP - Matches to the Firm name line, if a match is not made, then GeoStan matches to address line.

Note: Neither building nor firm name searches are available when processing in CASS mode.
Two optional index files are available which are POI (poi.gsi) and Premium POI (ppoi1.gsi, ppoi2.gsi, ppoi3.gsi).
Note: If a match occurs across multiple datasets, the match order will first be a Premium POI record, then POI, and finally USPS records.

Using the optional POI Index file

The optional Point Of Interest (POI) Index file (poi.gsi) included with the Master Location Data and HERE Point Addresses data sets provides expanded support in alias name matching.

Implementation

  1. Set up your data.

    • On Windows/UNIX/Linux:

      • Install the MLD and/or HERE points, and streets data sets and their associated license files. Note down the paths to these folders.

      • Define the data paths to the geocoding data sets you have installed for your application. Define the paths to the associated license files and passwords.

    • On z/OS:

      • Upload the MLD and/or HERE points, and streets data sets.

      • There are three poi.gsi files. There needs to be a separate DD statement in your JCL for each file using the GSIFILxx DD name, for example:

    //GSDFILE  DD DSN=&GEOSPFX..US.GSD,DISP=SHR

    //GSDFIL01 DD DSN=&GEOSPFX..MPOINTS1.GSD,DISP=SHR <=== MLD file

    //GSIFILE  DD DSN=&GEOSPFX..MPOINTS1.GSI,DISP=SHR <=== MLD alias file

    //GSDFIL02 DD DSN=&GEOSPFX..MPOINTS2.GSD,DISP=SHR <=== MLD file

    //GSIFIL01 DD DSN=&GEOSPFX..MPOINTS2.GSI,DISP=SHR <=== MLD alias file

    //GSDFIL03 DD DSN=&GEOSPFX..MPOINTS3.GSD,DISP=SHR <=== MLD file

    //GSIFIL02 DD DSN=&GEOSPFX..MPOINTS3.GSI,DISP=SHR <=== MLD alias file

    //GSIFIL03 DD DSN=&GEOSPFX..MLDPOI1.GSI,DISP=SHR <===MLD POI file

    //GSIFIL04 DD DSN=&GEOSPFX..MLDPOI2.GSI,DISP=SHR <===MLD POI file

    //GSIFIL05 DD DSN=&GEOSPFX..MLDPOI3.GSI,DISP=SHR <===MLD POI file

    Note: It does not matter what you name the files on the mainframe as long as the DD statements point to it.
  2. To confirm the poi.gsi loaded successfully, query the Status File POI Index status output property. Boolean. True = file loaded successfully. Default = False.

    C

    GS_STATUS_FILE_POI_IDX

    COBOL

    GS-STATUS-FILE-POI-IDX

    JAVA

    STATUS_FILE_POI_IDX

    .NET

    GS_STATUS_FILE_POI_IDX

  3. Set the GS_FIND_BUILDING_SEARCH find property to true. The POI Index file will automatically be searched when the GS_FIND_BUILDING_SEARCH find option is enabled and a firm, building or POI name is specified in the address line.

  4. Process the match by calling the Find Properties function.

If an alias match is made to the POI Index file, the return value is as follows:

C

GS_IS_ALIAS returns "A11".

COBOL

GS-IS-ALIAS returns "A11".

JAVA

IS_ALIAS returns "A11".

.NET

GS_IS_ALIAS returns "A11".

Using the optional Premium POI Index file

The optional Premium Point Of Interest (PPOI) Index file (ppoi1.gsi, ppoi2.gsi, ppoi3.gsi) included with the Master Location Data and HERE Point Addresses data sets provides expanded support in alias name matching.

Implementation

  1. Set up your data.

    • On Windows/UNIX/Linux:

      • Install the MLD and/or HERE points, and streets data sets and their associated license files. Note the paths to these folders.

      • Define the data paths to the geocoding data sets you have installed for your application. Define the paths to the associated license files and passwords.

    • On z/OS:

      • Upload the MLD and/or HERE points, and streets data sets.

      • There are three poi.gsi files. There needs to be a separate DD statement in your JCL for each file using the GSIFILxx DD name, for example:

    //GSDFILE  DD DSN=&GEOSPFX..US.GSD,DISP=SHR

    //GSDFIL01 DD DSN=&GEOSPFX..MPOINTS1.GSD,DISP=SHR <=== MLD file

    //GSIFILE  DD DSN=&GEOSPFX..MPOINTS1.GSI,DISP=SHR <=== MLD alias file

    //GSDFIL02 DD DSN=&GEOSPFX..MPOINTS2.GSD,DISP=SHR <=== MLD file

    //GSIFIL01 DD DSN=&GEOSPFX..MPOINTS2.GSI,DISP=SHR <=== MLD alias file

    //GSDFIL03 DD DSN=&GEOSPFX..MPOINTS3.GSD,DISP=SHR <=== MLD file

    //GSIFIL02 DD DSN=&GEOSPFX..MPOINTS3.GSI,DISP=SHR <=== MLD alias file

    //GSIFIL03 DD DSN=&GEOSPFX..MLDPOI1.GSI,DISP=SHR <===MLD POI file

    //GSIFIL04 DD DSN=&GEOSPFX..MLDPOI2.GSI,DISP=SHR <===MLD POI file

    //GSIFIL05 DD DSN=&GEOSPFX..MLDPOI3.GSI,DISP=SHR <===MLD POI file

    Note: It does not matter what you name the files on the mainframe as long as the DD statements point to it.
  2. To confirm the ppoi1.gsi, ppoi2.gsi, and ppoi3.gsi loaded successfully, query the Status File POI Index status output property. Boolean. True = file loaded successfully. Default = False.

    C

    GS_STATUS_FILE_PREMIUM_POI_IDX

    COBOL

    GS-STATUS-FILE-PREMIUM_POI-IDX

    JAVA

    STATUS_FILE_PREMIUM_POI_IDX

    .NET

    GS_STATUS_FILE_PREMIUM_POI_IDX

  3. Set the GS_FIND_BUILDING_SEARCH find property to true. The Premium POI Index file will automatically be searched when the GS_FIND_BUILDING_SEARCH find option is enabled and a firm, building or POI name is specified in the address line.

  4. Process the match by calling the Find Properties function.

If an alias match is made to the Premium POI Index file, the return value is as follows:

C

GS_IS_ALIAS returns "A15".

COBOL

GS-IS-ALIAS returns "A15".

JAVA

IS_ALIAS returns "A15".

.NET

GS_IS_ALIAS returns "A15".

Using correct lastline

GS_FIND_CORRECT_LASTLINE, when set to True, corrects elements of the output lastline, providing a good ZIP Code or close match on the soundex even if the address would not match or was non-existent.

The feature works when GS_FIND_ADDRCODE is True and the address does not match a candidate or when GS_FIND_Z_CODE is True and only lastline information is input.

For example, when GS_FIND_ADDRCODE = True:

Address: 0 MAIN

LastLine: BOLDER CA 80301

Returns:

MATCH_CODE=E622

LASTLINE=BOULDER, CO 80301

CITY=BOULDER

STATE=CO

ZIP=80301

For example, GS_FIND_Z_CODE = True:

Address:

LastLine: BOLDER CA 80301

Returns:

MATCH_CODE=Z6

LASTLINE=BOULDER, CO  80301

CITY=BOULDER

STATE=CO

ZIP=80301

The following elements are corrected:

  • City correction - The city correction is based on the input ZIP Code unless a match to city and state exists in which case both search areas are retained. The state input must be correct or spelled out correctly when no ZIP Code is input, location code, and coordinates based on input ZIP Code.

    • Input city is incorrect:

      HAUDENVILLE MA 01039

      Returns LASTLINE=HAYDENVILLE, MA  01039

      LAT=  42396500 LON= -72689100

  • State correction - State is abbreviated when spelled out correctly or corrected when a ZIP Code is present. There are some variations of state input which are recognized, ILL, ILLI, CAL, but not MASS. GeoStan does not consider the abbreviation of the variation a change so ILL to IL is not identified as a change in the match code. In addition, the output of the ZIP Code for a single ZIP city is not considered a change.

    • Input city exists:

      Bronx NT, 10451

      Returns LASTLINE= BRONX, NY  10451

      Bronx NT

      Returns LASTLINE= BRONX NT

      No ZIP Code for correction

    • Input city does not exist - preferred city for ZIP Code returned:

      60515

      Returns LASTLINE=DOWNERS GROVE, IL  60515

      MATCH_CODE=E622

      ILLINOIS 60515 (or ILL 60515 or IL 60515 or ILLI 60515)

      Returns LASTLINE=DOWNERS GROVE, IL  60515

      MATCH_CODE=E222

  • ZIP Code correction - ZIP Code is corrected only when a valid city/state is identified and has only one ZIP Code.

    • Exists on input:

      HAUDENVILLE MA 01039

      Returns LASTLINE=HAYDENVILLE, MA  01039

    • Incorrect on input - ZIP Code correction is not performed, both search areas are retained:

      HAUDENVILLE MA 01030

      Returns LASTLINE=HAYDENVILLE, MA  01030

      City and ZIP do not correspond

    • Does not exist on input:

      DOWNRS GROVE, IL

      Returns LASTLINE=DOWNERS GROVE, IL

      City with multiple ZIP Codes

      LILSE IL

      Returns LASTLINE=LISLE, IL  60532

      City with a single ZIP Code

      DOWNERS GROVE LL

      Returns LASTLINE=DOWNERS GROVE LL,

      No ZIP Code for correction

      DOWNRS GROVE, LL

      Returns LASTLINE=DOWNRS GROVE, LL

      No ZIP Code for correction

      LILSE ILLINOIS

      Returns LASTLINE= LISLE, IL  60532

      Correct spelled out state

      LISLE ILLINOS

      Returns LASTLINE= LISLE ILLINOS

      Incorrect spelled out state, no ZIP Code for correction

Note: For information on the returned match codes, see Correct lastline match codes.

Using predictive lastline

Predictive lastline allows you to match an address when only an input street address and latitude/longitude coordinates are provided, rather than the traditional street address with lastline input. For example, an input of 4750 Walnut with latitude/longitude coordinates located in Boulder, will return full address information.

Additional feature information

  • To use predictive lastline, the  GS_INIT_OPTIONS_SPATIAL_QUERY property must be set to True.

  • Predictive lastline uses the search radius designated for reverse geocoding.

  • If the input lat/lon falls near the borders of multiple cities, GeoStan processes all cities and returns the results of the best match. If the results are determined as equal, then a multi-match is returned.

  • This feature does not require a license for reverse geocoding.

  • This feature will work with any type of data set except USPS-only.

Preferring a ZIP Code over a city

The GS_FIND_PREFER_ZIP_OVER_CITY property allows a user to prefer candidates that match to the input ZIP Code over candidates that match to the input city. GeoStan creates multiple search areas when the input city and ZIP Code do not correspond and this feature helps establish how the candidates should be scored.

Note: GeoStan ignores the ZIP Code over city preference if processing in Interactive or CASS modes.

When there is more than one candidate in the input ZIP Code, some attempt is made to alleviate multiple candidates for a match, or, where all the candidates get the same lastline score. If a candidate also matches the city and/or preferred city, that candidate gets a better score. Matching to just preferred city is a lesser score than matching both.

Input Address: 24 GLEN HAVEN RD

Input Last Line: NEW HAVEN CT 06513

Found:

24 GLEN HAVEN RD

NEW HAVEN, CT  06513-1105

Possible candidates:

       score             pref.last line city

2 98    GLEN HAVEN RD    06513-1105 S    0.8100000    NEW HAVEN    * best match

24 98    GLEN HAVEN RD    06513-1248 S    2.2500000    EAST HAVEN

16 66    GLEN RD    06511-2825 S    46.3925000    NEW HAVEN

2 86    GLEN PKWY    06517-1415 S    52.1525000    HAMDEN

2 28     GLEN RD    06516-6509 S    52.1525000    WEST HAVEN

2 98    GLENHAM RD    06518-2517 S    75.0100000    HAMDEN

2 72    GLEN VIEW TER    06515-1519 S    97.0900000    NEW HAVEN

When there is more than one candidate, candidates matching the input ZIP Code score better.

Input Address: 301 BRYANT ST

Input Last Line: SAN FRANCISCO CA  94301

Found:

301 BRYANT ST

PALO ALTO, CA  94301-1408

Possible candidates:

score    pref.last line city

301 301    BRYANT ST    94301-1408 S    3.2400000    PALO ALTO * ZIP preferred match

301 305    BRYANT CT    94301-1401 S    28.2400000    PALO ALTO

300 306    BRYANT CT    94301-00ND T    35.6600000    PALO ALTO

301 301    BRYANT ST    94107-4167 H    39.6900000    SAN FRANCISCO * default match

301 319    BRYANT ST    94107-1406 S    39.6900000     SAN FRANCISCO

When there is more than one candidate, candidates that match the ZIP Code search area score better. The ZIP Code search area is the finance area for the input ZIP Code.

This example is with GS_FIND_SEARCH_AREA set to FINANCE. With GS_FIND_SEARCH_AREA set to CITY the match is made to EAST AURORA 14052 as there is no candidate in the 14166 input ZIP Code.

Input Address:  100 MAIN ST

Input Last Line:  EAST AURORA NY 14166

Found:

100 MAIN ST

DUNKIRK, NY  14048-1844

Possible candidates:

score    pref.last line city

100 198    MAIN ST    14048-1844  S    3.2400000    DUNKIRK * same finance as input

ZIP 14166

100 168    MAIN ST    14052-1633 S    39.6900000    EAST AURORA

This example is with GS_FIND_SEARCH_AREA set to 0 (CITY).

Input Address:  4200 arapahoe

Input Last Line:  denver co 80301

Found:

4200 ARAPAHOE AVE

BOULDER, CO  80303-1164

Possible candidates:

score   pref.last line city

4200 4210    ARAPAHOE AVE    80303-1164 S    38.7400000    BOULDER *same city as

input ZIP 80301

4200 4210    ARAPAHOE RD    80303-1164 S    40.7000000    BOULDER (A06)

4200 4298    E ARAPAHOE PL    80122-00ND T    62.0900000    LITTLETON

4200 4498    E ARAPAHOE RD    80122-00ND T    62.0900000    LITTLETON

4181 4499    E ARAPAHOE RD    80122-00ND T    68.3400000    LITTLETON

Matching address ranges

Some business locations are identified by address ranges. For example, a shopping plaza could be addressed as 10-12 Front St. This is how business mail is typically addressed to such a business location. These address ranges can be geocoded to the interpolated mid-point of the range.

Address ranges are different from hyphenated (dashed) addresses that occur in some metropolitan areas. For example, a hyphenated address in Queens County (New York City) could be 243-20 147 Ave. This represents a single residence (rather than an address range) and is geocoded as a single address. If a hyphenated address similar to this example returns as an exact match, then there is no attempt to address range match.

Address range matching is disabled by default and is an optional mode. To enable address range matching, use GS_FIND_ADDRESS_RANGE.

Note: Address range matching is not available in Exact or CASS modes, since an address range is not an actual, mailable USPS® address.

The following fields are not returned by address range geocoding:

  • ZIP+4® (in multiple segment cases)

  • Delivery Point

  • Check Digit

  • Carrier Route

  • Record Type

  • Multi-Unit

  • Default flag

Address Range matching capabilities and guidelines

Address Range matching works within the following guidelines:

  • There must be two numbers separated by a hyphen.

  • The first number must be lower than the second number.

  • Both numbers must be of the same parity (odd or even) unless the address range itself has mixed odd and even addresses.

  • Numbers can be on the same street segment or can be on two different segments. The segments do not have to be contiguous.

  • If both numbers are on the same street segment, the geocoded point is interpolated to the approximate mid-point of the range.

  • If the numbers are on two different segments, the geocoded point is based on the last valid house number of the first segment. The ZIP Code and FIPS Code are based on the first segment.

  • In all cases, odd/even parity is evaluated to place the point on the correct side of the street.

A close match to a single address number is preferred over a ranged address match. GeoStan attempts a close match on the recombined address number before making a ranged match, as seen in the following example:

Input:      4750-4760 Walnut St, Boulder, CO

Output:   4750-4760 Walnut St, Boulder, CO

 

The Address Range match is to a single street segment, with the geocode being placed on the mid-point of the range.

Input:     47-50 Walnut St, Boulder, CO

Output:  4750 Walnut St, Boulder, CO

 

In the example below, the second number is not larger than the first, so GeoStan treats this as a unit number rather than a ranged address:

Input:     4750-200 Walnut St, Boulder, CO

Output:  4750 Walnut St STE 200, Boulder, CO

Understanding missing and wrong first letter

The missing and wrong first letter feature enables GeoStan to look for the correct first letter of a street address if the first letter is missing or incorrect. GeoStan searches through the alphabet looking for possible, correct first letters to complete the street address.

The feature is disabled by default. To enable this feature, modify GS_FIND_FIRST_LETTER_EXPANDED.

Note: The GS_FIND_FIRST_LETTER_EXPANDED property is ignored in Exact mode.

Below are some examples of wrong, missing first letter, and duplicate first letter input addresses and the corresponding GeoStan output:

The example includes an incorrect first letter:

Input:     4750 nalnut boulder co 80301

Output:  4750 Walnut St Boulder CO 80301-2532

 

This example is missing a first letter:

Input:     4750 alnut boulder co 80301

Output:  4750 Walnut St Boulder CO 80301-2532

 

This example includes an extra first letter:

Input:     4750 wwalnut boulder co 80301

Output:  4750 Walnut St Boulder CO 80301-2532

Permitting relaxed address number matching

When GeoStan matches an input address, its default behavior is to match to the address number. This default behavior corresponds to GS_FIND_MUST_MATCH_ADDRNUM set to True. GeoStan must match to an address number.

If GS_FIND_MUST_MATCH_ADDRNUM is set to False (in GS_MODE_CUSTOM only), then GeoStan no longer must match the address number, therefore permitting relaxed address number matching. By permitting relaxed address number matching, an inexact match can be found. If the input address number is missing, no matches are returned unless GS_FIND_STREET_CENTROID (see Understanding street locator geocoding) is also enabled.

If the input address number is not within a house number range from the street, GeoStan selects the nearest range on the street which has the same parity (even or odd) as the input address number. GeoStan returns one or more of the closest matches inside this range that preserves street parity. This requires GeoStan to change the house number. The new house number is equal to one of the range's endpoints, possibly plus or minus one to preserve street parity.

Note: Even when GS_FIND_MUST_MATCH_ADDRNUM is set to False and an inexact match on the house number is found, GeoStan still returns an error code and the inexact house number has to be retrieved as one would retrieve a multi-match.

When GS_FIND_MUST_MATCH_ADDRNUM is set to False and no exact matching house number is found, a match code of either E029 (no matching range, single street segment found), or E030 (no matching range, multiple street segment) is returned by GsDataGet and GeoStan does not change the house number on the output address.

In order to access the inexact address number candidates, the GsMultipleGet function must be called. If there are inexact house number candidates returned by GsMultipleGet, the corresponding match codes begin with the letter 'H' indicating that the house number was not matched. Additionally, even when one or more exact candidates are found, inexact matches to the house number are still on the list of possible candidates, and these can be differentiated from the others by their Hxx match codes. For more information on match codes, see Match codes.

Forcing an exact match on specific address elements

Listed below are the properties to specify a desired exact match:

Note: The Must Match properties are not valid for singleline matching.