Options

  1. In the Load match rule field, select one of the predefined match rules which you can either use as-is or modify to suit your needs. If you want to create a new match rule without using one of the predefined match rules as a starting point, click New. You can only have one custom rule in a dataflow.
    Note: The Dataflow Options feature in Enterprise Designer enables the match rule to be exposed for configuration at runtime.
  2. Click Group By to select a field to use for grouping records in the match queue. Intraflow Match only attempts to match records against other records in the same match queue.
  3. Select the Sort box to perform a pre-match sort of your input based on the field selected in the Group By field.
  4. Click Advanced to specify additional sort performance options.
    In memory record limit
    Specifies the maximum number of data rows a sorter will hold in memory before it starts paging to disk. By default, a sort of 10,000 records or less will be done in memory and a sort of more than 10,000 records will be performed as a disk sort. The maximum limit is 100,000 records. Typically an in-memory sort is much faster than a disk sort, so this value should be set high enough so that most of the sorts will be in-memory sorts and only large sets will be written to disk.
    Note: Be careful in environments where there are jobs running concurrently because increasing the In memory record limit setting increases the likelihood of running out of memory.
    Maximum number of temporary files
    Specifies the maximum number of temporary files that may be used by a sort process. Using a larger number of temporary files can result in better performance. However, the optimal number is highly dependent on the configuration of the server running Spectrum Technology Platform. You should experiment with different settings, observing the effect on performance of using more or fewer temporary files. To calculate the approximate number of temporary files that may be needed, use this equation:
    (NumberOfRecords × 2) ÷ InMemoryRecordLimit = NumberOfTempFilesN
    Note: The maximum number of temporary files cannot be more than 1,000.
    Enable compression
    Specifies that temporary files are compressed when they are written to disk.
    Note: The optimal sort performance settings depends on your server's hardware configuration. You can use this equation as a general guideline to produce good sort performance: (InMemoryRecordLimit × MaxNumberOfTempFiles ÷ 2) >= TotalNumberOfRecords
  5. Click Express Match On to perform an initial comparison of express key values to determine whether two records are considered a match.

    Express Key matching can be a useful tool for reducing the number of compares performed and thereby improving execution speed. A loose express key results in many false positive matches. You can generate an express key as part of generating a match key through MatchKeyGenerator. See Match Key Generator for more information.

    If two records have an exact match on the express key, the candidate is considered a 100% duplicate. If two records do not match on an express key value, they are compared using the rules-based method.

    To determine whether a candidate was matched using an express key, look at the value of the ExpressKeyIdentified field, which is either Y for a match or N for no match. Note that suspect records always have an ExpressKeyIdentified value of N.

  6. In the Initial Collection Number text box, specify the starting number to assign to the collection number field for duplicate records.

    The collection number identifies each duplicate record in a match queue. Unique records are assigned a collection number of 0. Each duplicate record is assigned a collection number starting with the value specified in the Initial Collection Number text box.

  7. Select one of the following:
    OptionDescription

    Compare suspect to all candidates

    This option matches the suspect to all candidates in the same match group (group by option) even if a duplicate is already found within the match group. For example:

    Suspect - John Smith
    Candidate - Bill Jones
    Candidate - John Smith
    Candidate - John Smith

    In the example, the suspect John Smith would be compared to both John smith candidates.

    Check the Return Unique Candidates box to return records within a match group from the candidate port that have been identified as unique records.

    Stop comparing suspect against candidates after finding n duplicates

    This option matches the suspect to all candidates in the same match group (group by option) but stops comparing when the user defined number of duplicates have been identified. For example, if you chose to stop comparing candidates after finding one duplicate and you had this data:

    Suspect - John Smith
    Candidate - Bill Jones
    Candidate - John Smith
    Candidate - John Smith

    In the example, the suspect record John Smith would stop comparing within the match group when the first John Smith candidate is identified as a duplicate.

  8. Click Generate Data for Analysis to generate match results. For more information, see Analyzing Match Results.
  9. Assign collection number 0 to unique records, checked by default, will assign zeroes as collection numbers to unique records. Uncheck this option to generate collection numbers other than zero for unique records. The unique record collection numbers will be in sequence with any other collection numbers. For example, if your matching dataflow finds five records and the first three records are unique, the collection numbers would be assigned as shown in the first group below. If your matching dataflow finds five records and the last two are unique, the collection numbers would be assigned as shown in the second group below.
    OptionDescription
    Collection Number Record Type
    1 Unique
    2 Unique
    3 Unique
    4 Duplicate/Suspect
    4 Duplicate/Suspect
       
    Collection Number Record Type
    1 Duplicate/Suspect
    1 Duplicate/Suspect
    2 Unique
    3 Unique
    4 Unique
    If you leave this box checked, any unique records found in your dataflow will be assigned a collection number of zero by default.
  10. Select the Return match rule name option to include the selected match rule name in the stage output.
  11. Select Return detailed match information if you want detailed match information to be displayed as an output for your match rule. For more information about the output fields, see Output.
    Note: If you enable this field, it will hinder the overall stage performance.
  12. If you are creating a new custom matching rule, see Building a Match Rule for more information.
  13. Click Evaluate to evaluate how a suspect record scored against candidate records. For more information, see Interflow Match.