Defining a Unique ID

By default, the Unique ID Generator stage creates a sequential ID, with the first record having an ID of 0, the second record having an ID of 1, the third record having an ID of 2, and so forth. If you want to change how the unique ID is generated, follow this procedure.
  1. On the Unique ID Generator Options: Unique ID Generator page, modify the Output field name, if required. The default is RecordID.
  2. By default Sequential Numeric Tag; Starting at 0 is displayed as the method to generate the Unique ID. Click the corresponding Modify icon to modify the method you want to use to generate the Unique ID.
    Options Description
    Sequential Numeric tag starting at Assigns an incremental numeric value to each record starting with the number you specify. If you specify 0, the first record will have an ID of 0, the second record will have an ID of 1, and so on.
    Date/Time stamp Creates a unique key based on the date and time stamp instead of sequential numbering.
    UUID

    Creates a universally unique 32-digit identifier key for each record. The digits in the key are displayed in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters (32 alphanumeric characters and four hyphens). Example: 123e4567-e89b-12d3-a456-432255330000

    Off Select this option only if you want to generate a non-unique key using an algorithm.
  3. To create a new rule, click the Add Rule button and specify the Unique ID Field Options, as described below.
    1. Algorithm: Specifies one of these algorithms to generate the match key:
      Consonant
      Returns specified fields with consonants removed.
      Double Metaphone
      Returns a code based on a phonetic representation of their characters. Double Metaphone is an improved version of the Metaphone algorithm, and attempts to account for the many irregularities found in different languages.
      Koeln
      Indexes names by sound as they are pronounced in German. Allows names with the same pronunciation to be encoded to the same representation so that they can be matched, despite minor differences in spelling. The result is always a sequence of numbers; special characters and white spaces are ignored. This option was developed to respond to limitations of Soundex.
      MD5
      A message digest algorithm that produces a 128-bit hash value. This algorithm is commonly used to check data integrity.
      Metaphone
      Returns a Metaphone coded key of selected fields. Metaphone is an algorithm for coding words using their English pronunciation.
      Metaphone 3
      Improves upon the Metaphone and Double Metaphone algorithms with more exact consonant and internal vowel settings that allow you to produce words or names more or less closely matched to search terms on a phonetic basis. Metaphone 3 increases the accuracy of phonetic encoding to 98%. This option was developed to respond to limitations of Soundex.
      Nysiis
      Phonetic code algorithm that matches an approximate pronunciation to an exact spelling and indexes words that are pronounced similarly. Part of the New York State Identification and Intelligence System. Say, for example, that you are looking for someone's information in a database of people. You believe that the person's name sounds like "John Smith", but it is in fact spelled "Jon Smyth". If you conducted a search looking for an exact match for "John Smith" no results would be returned. However, if you index the database using the NYSIIS algorithm and search using the NYSIIS algorithm again, the correct match will be returned because both "John Smith" and "Jon Smyth" are indexed as "JAN SNATH" by the algorithm.
      Phonix
      Preprocesses name strings by applying more than 100 transformation rules to single characters or to sequences of several characters. 19 of those rules are applied only if the characters are at the beginning of the string, while 12 of the rules are applied only if they are at the middle of the string, and 28 of the rules are applied only if they are at the end of the string. The transformed name string is encoded into a code that is comprised by a starting letter followed by three digits (removing zeros and duplicate numbers). This option was developed to respond to limitations of Soundex; it is more complex and therefore slower than Soundex.
      Soundex
      Returns a Soundex code of selected fields. Soundex produces a fixed-length code based on the English pronunciation of a word.
      SpanishMetaphone
      Returns a Metaphone coded key of selected fields for the Spanish language. This metaphone algorithm codes words using their Spanish pronunciation.
      Substring
      Returns a specified portion of the selected field.
    2. Field name: Choose the field to which you want to apply the algorithm. For example, if you chose the soundex algorithm and chose a field named City, the ID would be generated by applying the soundex algorithm to the data in the City field
    3. Start Position Length: These two field get enabled if you selected the Substring algorithm, and it allows you to specify the portion of the field you want to use in the substring.
      • In the Start Position field, specify the position in the field where you want the substring to begin.
      • In the Length field, select the number of characters from the start position that you want to include in the substring.
      For example, say the data in the LastName field has Augustine. If you specified 3 as the start position and 6 as the length, the substring would produce: gustin.
    4. Pre-Processing options:
      1. Check the Remove noise character box to remove all non-numeric and non-alpha characters such as hyphens, white space, and other special characters from the field before applying the algorithm.
      2. For consonant and substring algorithms, you can sort the data in the field before applying the algorithm by checking the Sort input box. You can then choose to sort either the Characters in the field or Terms in the field in alphabetical order.
  4. Click Ok.
    The rule gets added below Sequential Numeric Tag; Starting at 0 in the table.
  5. Repeat the steps to define additional algorithms to produce a more complex ID.
    Note: The unique key definition is always displayed in a different color and cannot be deleted.
  6. Click Apply
    A preview can be seen in the Preview section.