Selecting Columns

In this section, columns of your sample data are displayed in a tabular format. You must select the columns on which you would like to perform matching.

This procedure describes how to select columns for creating groups and generating match criteria:

  1. Click the Detect Semantic Type button. The detected semantic types in the selected records is displayed in the Semantic Type column. By default, NONE is displayed.
    If the desired semantic type is not displayed, you can select it from the drop-down after selecting the corresponding check-box of that column.
    Note: This step is recommended for generating better match criteria. Based on the selected semantic type, relevant algorithms are used for generating match criteria. For example, phonetic algorithms are used for the semantic type name and not for phone numbers and zip code.
  2. Slide the Smart Sampling to On to consider all the entire set of records for sampling. When Off, the first 20K records are taken for sampling.
  3. Select the Column Name check-box for the columns to be selected for generating match criteria.
  4. Use the Handling Nulls column to specify how to treat the null values in the respective columns. The options are:
    • Null as match: To treat the vacant fields equivalent to the corresponding field of a record pair
    • Null as non-match: To treat the vacant fields as non-equivalent to the corresponding field of a record pair
      Note: This is the default value.

      The selection made here reflect in the Enterprise Designer under the missing data option of the match rule. If you select Null as match, Count as 100 is pre-selected, and if you choose Null as non-match, Count as 0 is pre-selected.

    Note: This option is applied globally to a field; it will remain uniform for various conditions of a field.
  5. Rank your columns in the order you want those sampled. To rank, place your cursor at the extreme left of the column, and move it up or down when the cursor changes to a hand.
  6. Click the icon to save your changes and move to the next stage.
  7. Click the icon to cancel your current task.
Based on the selected columns and unsupervised machine learning algorithms, groups of records are automatically generated and these are displayed on the next page for tagging.