User Dictionary

This section includes information on creating User Dictionaries, source data requirements and required fields, and other information specific to working with User Dictionaries.

Note: User Dictionaries are not for use with CASS geocoding.

Understanding User Dictionary capabilities and requirements

The capabilities of User Dictionaries and the basic requirements for creating them are as follows.

  • All fields supported by normal street geocoding can be included in User Dictionaries.

  • Landmarks and place names are supported in User Dictionaries. Postal or geographic centroid geocoding are not supported in User Dictionaries.

  • User Dictionaries support address browsing using partial street names or landmarks and place names.

  • GSDs are necessary to create the User Dictionary. This is because the GSDs have some internal structure that must be available when creating a User Dictionary.

The results from a User Dictionary are similar to that from the GSD. For address matches where the first letter of the match code would be 'S', a User Dictionary match has the letter 'J'. The value of the GS_REC_TYPE is 'U'. Also, the enum GS_DATATYPE returns a new value for the User Dictionary record matches, see User Dictionary (GsFileStatusEx) for more information.

For example: SE9 is a match code for a match that comes from a GSD, while JE9 is for a match that comes from a User Dictionary. See Appendix D: Status codes for a complete description of match codes.

Source data requirements

The source data for User Dictionaries includes street data but can also include place names and intersections.

To create a User Dictionary, your source data must conform to the following requirements:

  • Source records must include required fields, and these fields are mapped during the User Dictionary creation process. If a value of a required field is empty for a particular record, then that record will not be imported into the User Dictionary. Required fields may vary for different countries. The MapInfo® table must contain specific fields, which GeoStan then uses to convert the table into the dictionary format. These input fields are described in Required input fields.

  • Source records must be in a MapInfo table (TAB file). The TAB file requirements vary for different countries.

  • Segments must have two or more defined endpoints to be loaded into a User Dictionary. Segments without endpoints are ignored.

  • Segments that make up intersections must have one or more end points in the intersection for GeoStan to recognize it as an intersection. Source records can be either point objects or segments.

  • Each row in the table is equivalent to a street segment.

Required input fields

You must specify the field names in the MapInfo table (TAB file) in order for the table to be translated into a User Dictionary. Certain fields are required and must be present in the MapInfo table. Other fields are optional, but are strongly recommended because there may be negative consequences if they are omitted. This is described in Optional (Recommended) Input Fields on page Optional (recommended) input fields. If any of the required fields are missing, a missing field error code is returned.

The following table describes the required input fields.

Required fields

Description

Maximum field length

Left start address

Start of address range on left side of street.

10

Right start address

Start of address range on right side of street.

10

Left end address

End of address range on left side of street.

10

Right end address

End of address range on right side of street.

10

Street name

Name of street.

30

State abbreviation

Two-character state abbreviation.

2

Left ZIP Code

ZIP Code for left side of street.

5

Right ZIP Code

ZIP Code for the right side of the street.

5

Optional (recommended) input fields

The Left and Right Odd/Even Indicator fields are used to specify whether the sides of the street segment contain odd or even address ranges. Although these indicators are not required for creating a User Dictionary, it is important to use the Odd/Even Indicators when your data contains odd/even address numbers.

When the Odd/Even Indicator is specified, but is inconsistent with address numbers, the indicator is set to Both.

When the Odd/Even Indicator is not specified and both Start Address and End Address have values, the indicator is set to Both, unless the start and end address numbers are the same number. In that case, the indicator is set to Odd if the address numbers are odd, and set to Even if the address numbers are even.

When the Odd/Even Indicator is not specified and both Start Address and End Address have values, the indicator is set to Both (odd and even).

Note: If your table contains Odd/Even indicator information, we strongly recommend that you use the Odd/Even indicator fields. These fields ensure that your geocoded addresses are located on the correct side of the street. Omitting the fields when your data contains Odd/Even information may produce incorrect results.

The following table describes the optional input fields.

Optional fields

Description

Maximum field length

Left Odd/Even indicator*

Left side of the street contains only odd or even address ranges (O=odd, E=even, B=both)

1

Right Odd/Even indicator*

Right side of the street contains only odd or even address ranges (O=odd, E=even, B=both)

1

City*

City name

28

Left ZIP + 4 Code

 4-digit ZIP + 4 add-on for left side of street.

4

Right ZIP + 4 Code

 4-digit ZIP + 4 add-on for right side of street.

4

Left Census Block

Census Block ID for left side of street

15

Right Census Block

Census Block ID for right side of street

15

Place Name

Place name

40

* These fields are highly recommended.

User Dictionary file names and formats

GeoStan has some requirements for User Dictionary files that you must be aware of before you create a User Dictionary:

  • Each User Dictionary has a base name of eight characters or fewer.

  • Each User Dictionary resides in its own directory.

  • The maximum length of a path to a User Dictionary is 1024 characters.

  • The ZIP Code range in the MapInfo table for a User Dictionary is unlimited.

Because each User Dictionary resides in its own directory, User Dictionaries may share the same name. However, it is generally good practice to use a unique name for each User Dictionary.

Some of the output files are tied to the base name. The other output files have constant names. For example, the output files for a dictionary called ud1 are the following:

postinfo.jdr
postinfo.jdx
lastline.jdr
post2sac.mmj
geo2sac.mmj
sac2fn_ud.mmj
ud1.jdr
ud1.jdx
ud1.bdx

If your data includes place names, the dictionary contains the following files:

ud1.pdx
ud1.pbx

The dictionary also contains these log files:

ud1.log
ud1.err

Additional User Dictionary considerations

See the following topics for more information when working with User Dictionaries.

Data Access License

You must still have a valid access license to the data contained in the GSD when you are geocoding against your User Dictionary. For example, if you create a dictionary of New York streets and addresses, you must purchase the New York or entire U.S. GSD.

Use without GSD data files

To utilize a User Dictionary without the use of GSDs, the files listed below are required:

  • ctyst.dir - The USPS City State table.

  • parse.dir - The GeoStan dictionary

To perform postal centroid geocoding, in addition to a GSD or a User Dictionary and the files listed above, the following files are necessary:

  • us.z9  - Postal centroid information.

  • cbsac.dir - Required only if county names or CBSA/CSA data are needed.

CASS standards

You cannot geocode to CASS standards using a User Dictionary. This also means that the ParcelPrecision Dictionary cannot be used during CASS geocoding.

Address Range Order

GeoStan determines the order of the address range based on a comparison of the start and end addresses. The comparison produces the following results:

  • If the end is greater than the start, the range is ascending.

  • If the start is greater than the end, the range is descending.

  • If the start is equal to the end, the range is ascending.

Street intersections and User Dictionaries

When geocoding to street intersections with a User Dictionary, GeoStan cannot recognize the intersections if one or more of the segments that make up the intersection does not have an end point at the intersection. This can happen when you create the User Dictionary from a customized street table in which some segments that terminate at intersections do not have end points (Example 1).

Example 1: Intersection in User Dictionary does not have end points for all segments. GeoStan does not recognize this as an intersection.

Example 2: Intersection in TIGER-based GSD includes end points for all segments. GeoStan geocodes to this intersection.

City lookup

GeoStan relies on USPS data to determine addresses. If a new address was input, it might not have been recognized despite the address being valid if it was not yet valid according to the USPS. An example of an input address that would not match against a UD:

1 Second Street

Stickville, NY 11111

In this example, the city is fictitious and the ZIP Code is made up. This would fail to match even with a UD record having that city and that ZIP Code, because they are not found in the USPS data. But a user may possess a UD with such a city and ZIP Code.

When matching to a UD record, GeoStan, if necessary, corrects the city name and/or ZIP Code to the data that is in the UD record. GeoStan is now able to obtain matches for non-USPS cities and ZIP Codes that were prevented from succeeding or which required temporary workarounds.

Using User Dictionaries with address point interpolation

An important part of the process of creating a User Dictionary is to specify a mapping of fields from your source data. See the MapInfo User Dictionary Utility Product Guide, for a complete discussion. There are two main categories of data fields: required and optional.

Of the optional fields, there are two that have an impact on the address point interpolation feature. These are the "Left Odd/Even" and "Right Odd/Even" fields. If these are not populated, the results from address point interpolation is less accurate.

Please be aware that aforementioned fields are not populated by source data obtained via MapInfo StreetPro. You must modify the source TAB file by adding the "Left Odd/Even" and "Right Odd/Even" indicator fields, and create queries to populate them. Source data obtained from other products, or your own data, may have similar issues.

To add the "Left Odd/Even" and "Right Odd/Even" indicator fields to a source TAB file, you must add them and then run a series of SQL update queries to populate them. The fields should be filled in with "O" (odd), "E" (even), or "B" (both). Below are the steps for adding these fields:

  1. Add two 1-char columns to your TAB file.

    Naming each column, for example, Ind_Right and Ind_Left.

  2. Perform the following updates to populate these fields:

  • Update <tablename>

Set Ind_Left="E", Ind_Right="O"

Where From_Left mod 2=0 AND To_Left mod 2=0

  • Update <tablename>

Set Ind_Left="O", Ind_Right="E"

Where From_Left mod 2=1 AND To_Left mod 2=1

  • Update <tablename>

Set Ind_Left="B", Ind_Right="B"

Where From_Left="" AND To_Left=""

Note: These example queries are simplified for illustrative purposes. Your actual queries may need to be more complex.

Preferring User Dictionary Matches

If you select Prefer User Dictionary Matches, candidates from the User Dictionary are given a higher score than a similar candidate from the GSD. The GS_FIND_DB_ORDER property is designed for situations where you feel that your User Dictionary is superior to the GSD, and therefore you prefer matches from the User Dictionary over matches from the GSD whenever possible.

GeoStan  supports the creation and use of User Dictionaries based on your own source data. A User Dictionary can be used independently or as a supplement to the supplied GSD.

You can geocode using:

  • User Dictionary (or multiple User Dictionaries) alone.

  • Standard GSD.

  • A combination of GSD and User Dictionaries

The results from a User Dictionary are similar to that from the GSD. For address matches where the first letter of the match code would be 'S', a User Dictionary match has the letter 'J'. The value of the GS_REC_TYPE is 'U'. Also, the enum GS_DATATYPE returns a new value for the User Dictionary record matches, see User Dictionary (GsFileStatusEx) for more information.

For example: SE9 is a match code for a match that comes from a GSD, while JE9 is for a match that comes from a User Dictionary. See Appendix D: Status codes for a complete description of match codes.