Options

To define Match Key Generator options click the Add button. The Match Key Field dialog displays.
Note: The Dataflow Options feature in Enterprise Designer enables Match Key Generator to be exposed for configuration at runtime.
Table 1. Match Key Generator Options

Option Name

Description and Valid Values

Algorithm

Specifies one of these algorithms to use to generate the match key:

Consonant
Returns specified fields with consonants removed.
Double Metaphone
Returns a code based on a phonetic representation of their characters. Double Metaphone is an improved version of the Metaphone algorithm, and attempts to account for the many irregularities found in different languages.
Koeln
Indexes names by sound as they are pronounced in German. Allows names with the same pronunciation to be encoded to the same representation so that they can be matched, despite minor differences in spelling. The result is always a sequence of numbers; special characters and white spaces are ignored. This option was developed to respond to limitations of Soundex.
MD5
A message digest algorithm that produces a 128-bit hash value. This algorithm is commonly used to check data integrity.
Metaphone
Returns a Metaphone coded key of selected fields. Metaphone is an algorithm for coding words using their English pronunciation.
SpanishMetaphone
Returns a Metaphone coded key of selected fields for the Spanish language. This metaphone algorithm codes words using their Spanish pronunciation.
Metaphone 3
Improves upon the Metaphone and Double Metaphone algorithms with more exact consonant and internal vowel settings that allow you to produce words or names more or less closely matched to search terms on a phonetic basis. Metaphone 3 increases the accuracy of phonetic encoding to 98%. This option was developed to respond to limitations of Soundex.
Nysiis
Phonetic code algorithm that matches an approximate pronunciation to an exact spelling and indexes words that are pronounced similarly. Part of the New York State Identification and Intelligence System. Say, for example, that you are looking for someone's information in a database of people. You believe that the person's name sounds like "John Smith", but it is in fact spelled "Jon Smyth". If you conducted a search looking for an exact match for "John Smith" no results would be returned. However, if you index the database using the NYSIIS algorithm and search using the NYSIIS algorithm again, the correct match will be returned because both "John Smith" and "Jon Smyth" are indexed as "JAN SNATH" by the algorithm.
Phonix
Preprocesses name strings by applying more than 100 transformation rules to single characters or to sequences of several characters. 19 of those rules are applied only if the characters are at the beginning of the string, while 12 of the rules are applied only if they are at the middle of the string, and 28 of the rules are applied only if they are at the end of the string. The transformed name string is encoded into a code that is comprised by a starting letter followed by three digits (removing zeros and duplicate numbers). This option was developed to respond to limitations of Soundex; it is more complex and therefore slower than Soundex.
This algorithm determines the similarity between two French-language strings based on the phonetic representation of their characters.
It returns a Sonnex coded key of the selected fields.
Soundex
Returns a Soundex code of selected fields. Soundex produces a fixed-length code based on the English pronunciation of a word.
Substring
Returns a specified portion of the selected field.

Field name

Specifies the field to which you want to apply the selected algorithm to generate the match key. For example, if you select a field called LastName and you choose the Soundex algorithm, the Soundex algorithm would be applied to the data in the LastName field to produce a match key.

Start position

Specifies the starting position within the specified field. Not all algorithms allow you to specify a start position.

Length

Specifies the length of characters to include from the starting position. Not all algorithms allow you to specify a length.

Remove noise characters

Removes all non-numeric and non-alpha characters such as hyphens, white space, and other special characters from an input field.

Sort input

Sorts all characters in an input field or all terms in an input field in alphabetical order.

Characters
Sorts the characters values from an input field prior to creating a unique ID.
Terms
Sorts each term value from an input field prior to creating a unique ID.

If you add multiple match key generation algorithms, you can use the Move Up and Move Down buttons to change the order in which the algorithms are applied.

Generating an Express Match Key

Enable the Generate Express Match Key option and click Add to define an express match key to be used later in the dataflow by an Intraflow Match stage or an Interflow Match stage.

If the Generate Express Match Key option is enabled and the Express match key on option is selected in a downstream Interflow Match stage or Intraflow Match stage, the match attempt is first made using the express match key created here. If two records' express match keys match, then the record is considered a match and no further processing is attempted. If the records' express match keys do not match, then the match rules defined in Interflow Match or Intraflow Match are used to determine if the records match.