How GeocodeUSAddress Processes Addresses
GeocodeUSAddress processes addresses in the following order:
- Parses the address elements.
GeocodeUSAddress parses input address data into single elements. Parsing occurs on data in the order in which you load the data. Even if a valid address is missing an element, GeocodeUSAddress can find a match. Some elements, such as predirectionals, may not be critical elements of some addresses. By comparing an address as input against all known addresses in a search area, GeocodeUSAddress can usually determine if any of these elements are missing or incorrect.
- Finds possible matches within the search area.
GeocodeUSAddress uses the last line elements of an address to determine a search area. You can specify if you want the search area based on a finance area or on an area defined by the city, state, and ZIP Code. (A Finance Area is a collection of ZIP Codes within a contiguous geographic region.) If the city and state are not in the ZIP Code, GeocodeUSAddress performs separate searches for the ZIP Code and city.
After GeocodeUSAddress has determined the search area, it tries to match the elements from the street address line to the records in the standardized data files and does the following:
- Checks input address ranges for missing or misplaced hyphens, and alpha-numeric ranges for proper sequence.
- Searches for any misspellings and standard abbreviations. For example, the GeocodeUSAddress can recognize Mane for Main and KC for Kansas City.
- Searches for any alias matches to the USPS and Spatial data (TIGER and TomTom). For example, GeocodeUSAddress recognizes that in Boulder, CO Highway 36 is know as 28th Street.
- Searches for any USPS recognized firm names for additional match verification.
- Searches for street intersection matches. Matching to an intersection is extremely useful when you are using address matching to obtain a geocode.
- Searches for addresses lines that contain a house number and unit number as the same element. For example, GeocodeUSAddress recognizes the input 4750-200 Walnut Street and performs recombination to output 4750 WALNUT ST STE 200.
Note: The USPS does not consider intersections valid addresses for postal delivery. Therefore, the GeocodeUSAddress does not match intersections when processing in CASS mode. - Scores each possible match against the parsed input.
GeocodeUSAddress compares each element in the input address to the corresponding element in the match candidates, and assigns a confidence level. GeocodeUSAddress weighs the confidence level for all of the elements within a match candidate, and assigns a final score to the sum.
Note: GeocodeUSAddress uses a penalty scoring system. If an element does not exactly match an element in the match candidate, the GeocodeUSAddress adds a penalty to the score of the match candidate. Therefore, scores with lower numbers are better matches. - Determines the match.
GeocodeUSAddress prioritizes each match candidate based on the assigned confidence score and returns as a match the candidate that has the lowest score.
The match mode you choose determines the range that GeocodeUSAddress allows for a match. GeocodeUSAddress only returns a match if the score of the target address falls within the range designated by the selected match mode.
In some cases, more than one match candidate may have the lowest score. In this instance, GeocodeUSAddress cannot determine on its own which record is correct, and returns a status indicating multiple matches.
Note: If you have enabled Delivery Point Validation (DPV) processing, GeocodeUSAddress automatically attempts to resolve multiple matches using DPV.Along with a standardized address, GeocodeUSAddress also returns the following:
- Geocode—Longitude and latitude for the address
- Match code—Information about the match of the input address to the reference data
- Location code—Precision level of a geocode
- Parity—The side of the street on which the match resides.
GeocodeUSAddress does not return parity when processing in relaxed mode. For more information about GeocodeUSAddress output, see Output.