Options
-
In the Load match rule field, select one of the predefined match
rules which you can either use as-is or modify to suit your needs. If you want
to create a new match rule without using one of the predefined match rules as a
starting point, click New. You can only have one custom
rule in a dataflow.
Note: The Dataflow Options feature in Enterprise Designer enables the match rule to be exposed for configuration at runtime.
- Click Group By to select a field to use for grouping records in the match queue. Intraflow Match only attempts to match records against other records in the same match queue.
- Select the Sort box to perform a pre-match sort of your input based on the field selected in the Group By field.
-
Click Advanced to specify additional sort performance options.
- In memory record limit
- Specifies the maximum number of data rows a sorter will hold
in memory before it starts paging to disk. By default, a sort of
10,000 records or less will be done in memory and a sort of more
than 10,000 records will be performed as a disk sort. The maximum
limit is 100,000 records. Typically an in-memory sort is much faster
than a disk sort, so this value should be set high enough so that
most of the sorts will be in-memory sorts and only large sets will
be written to disk.Note: Be careful in environments where there are jobs running concurrently because increasing the In memory record limit setting increases the likelihood of running out of memory.
- Maximum number of temporary files
- Specifies the maximum number of temporary files that may be used by a sort process. Using a larger number of temporary files can result in better performance. However, the optimal number is highly dependent on the configuration of the server running Spectrum Technology Platform. You should experiment with different settings, observing the effect on performance of using more or fewer temporary files. To calculate the approximate number of temporary files that may be needed, use this equation:
(NumberOfRecords × 2) ÷ InMemoryRecordLimit = NumberOfTempFiles
N- Note: The maximum number of temporary files cannot be more than 1,000.
- Enable compression
- Specifies that temporary files are compressed when they are written to disk.
Note: The optimal sort performance settings depends on your server's hardware configuration. You can use this equation as a general guideline to produce good sort performance:(InMemoryRecordLimit × MaxNumberOfTempFiles ÷ 2) >= TotalNumberOfRecords
-
Click Express Match On to perform an initial comparison of express key values to determine whether two records are considered a match.
You can generate an express key as part of generating a match key through MatchKeyGenerator. See Match Key Generator for more information.
-
In the Initial Collection Number text box, specify the starting number to assign to the collection number field for duplicate records.
The collection number identifies each duplicate record in a match queue. Unique records are assigned a collection number of 0. Each duplicate record is assigned a collection number starting with the value specified in the Initial Collection Number text box.
- Click Sliding Window to enable this matching method. For more information about Sliding Window, see Sliding Window Matching Method
- Click Generate Data for Analysis to generate match results. For more information, see Analyzing Match Results.
-
Assign collection number 0 to unique records, checked by default,
will assign zeroes as collection numbers to unique records. Uncheck this option
to generate collection numbers other than zero for unique records. The unique
record collection numbers will be in sequence with any other collection numbers.
For example, if your matching dataflow finds five records and the first three
records are unique, the collection numbers would be assigned as shown in the
first group below. If your matching dataflow finds five records and the last two
are unique, the collection numbers would be assigned as shown in the second
group below.
Option Description Collection Number Record Type 1 Unique 2 Unique 3 Unique 4 Duplicate/Suspect 4 Duplicate/Suspect Collection Number Record Type 1 Duplicate/Suspect 1 Duplicate/Suspect 2 Unique 3 Unique 4 Unique If you leave this box checked, any unique records found in your dataflow will be assigned a collection number of zero by default. - Select the Return match rule name option to include the selected match rule name in the stage output.
-
Select Return detailed match information if you want
detailed match information to be displayed as an output for your match rule. For
more information about the output fields, see Output.
Note: If you enable this field, it will hinder the overall stage performance.
- For information about modifying the other options, see Building a Match Rule.
- Click Evaluate to evaluate how a suspect record scored against candidate records. For more information, see Interflow Match.