In this section, columns of your sample data are displayed
in a tabular format. You must select the columns on which you would like to perform
matching.
This procedure describes how to select columns for creating groups and generating match criteria:
Click the Detect Semantic Type button. The detected
semantic types in the selected records is displayed in the Semantic
Type column. By default, NONE is
displayed.
If the desired semantic type is not displayed, you can select it from the
drop-down after selecting the corresponding check-box of that column.
Note: This step is recommended for generating better match criteria. Based on
the selected semantic type, relevant algorithms are used for generating
match criteria. For example, phonetic algorithms are used for
the semantic type name and not for phone numbers
and zip code.
Slide the Smart Sampling to On to
consider all the entire set of records for sampling. When
Off, the first 20K records are taken for
sampling.
Select the Column Name check-box for the columns to be
selected for generating match criteria.
Use the Handling Nulls column to specify how to treat
the null values in the respective columns. The options are:
Null as match: To treat the vacant fields
equivalent to the corresponding field of a record pair
Null as non-match: To treat the vacant fields as
non-equivalent to the corresponding field of a record pair
Note: This is
the default value.
The selection made here reflect in the
Enterprise Designer under the
missing data option of the match rule. If you
select Null as match, Count as
100 is pre-selected, and if you choose
Null as non-match, Count as
0 is pre-selected.
Note: This option is applied globally to a field; it will remain uniform for
various conditions of a field.
Rank your columns in the order you want those sampled. To rank, place your
cursor at the extreme left of the column, and move it up or down when the cursor
changes to a hand.
Click the icon to save your changes and move to the next stage.
Click the icon to cancel your current task.
Based on the selected columns and unsupervised machine
learning algorithms, groups of records are automatically generated and these are
displayed on the next page for tagging.