Using Algorithms to Augment a Unique ID
Unique ID Generator generates a unique ID for each record by either numbering each record sequentially or generating a date-time stamp for each record. You can optionally use algorithms to append additional information to the sequential or date-time unique ID, thereby creating a more complex unique ID and one that is more likely to be truly unique.
- In the Unique ID Generator stage, click Add.
-
In the Algorithm field, select the algorithm you want to
use to generate additional information in the ID.
- Consonant
- Returns specified fields with consonants removed.
- Double Metaphone
- Returns a code based on a phonetic representation of their characters. Double Metaphone is an improved version of the Metaphone algorithm, and attempts to account for the many irregularities found in different languages.
- Koeln
- Indexes names by sound as they are pronounced in German. Allows names with the same pronunciation to be encoded to the same representation so that they can be matched, despite minor differences in spelling. The result is always a sequence of numbers; special characters and white spaces are ignored. This option was developed to respond to limitations of Soundex.
- MD5
- A message digest algorithm that produces a 128-bit hash value. This algorithm is commonly used to check data integrity.
- Metaphone
- Returns a Metaphone coded key of selected fields. Metaphone is an algorithm for coding words using their English pronunciation.
- SpanishMetaphone
- Returns a Metaphone coded key of selected fields for the Spanish language. This metaphone algorithm codes words using their Spanish pronunciation.
- Metaphone 3
- Improves upon the Metaphone and Double Metaphone algorithms with more exact consonant and internal vowel settings that allow you to produce words or names more or less closely matched to search terms on a phonetic basis. Metaphone 3 increases the accuracy of phonetic encoding to 98%. This option was developed to respond to limitations of Soundex.
- In the Field name field, choose the field to which you want to apply the algorithm. For example, if you chose the soundex algorithm and chose a field named City, the ID would be generated by applying the soundex algorithm to the data in the City field.
-
If you selected the substring algorithm, specify the portion of the field you
want to use in the substring:
- In the Start position field, specify the position in the field where you want the substring to begin.
-
In the Length field, select the number of
characters from the start position that you want to include in the
substring.
For example, say you have this data in a field named LastName:
Augustine
If you specified 3 as the start position and 6 as the end position, the substring would produce:
gustin
- Check the Remove noise characters box to remove all non-numeric and non-alpha characters such as hyphens, white space, and other special characters from the field before applying the algorithm.
- For consonant and substring algorithms, you can sort the data in the field before applying the algorithm by checking the Sort input box. You can then choose to sort either the characters in the field or terms in the field in alphabetical order.
- Click OK to save your settings.
-
Repeat as needed if you want to add additional algorithms to produce a more complex ID.
Note: The unique key definition is always displayed in a different color and cannot be deleted.