Configuring the grouping and sorting options

On the Intraflow Match Options: Intraflow Match page, configure these options in the Settings panel:
Option Description

Group by

It selects a field to use for grouping records in the match queue. Intraflow Match only attempts to match records against other records in the same match queue.

Sort

It performs a pre-match sort of your input based on the selection in the Group by field.

Advanced Select this button to specify additional sort options. If you want to override your system's default performance options, toggle the Override sort performance options to ON and specify the values you want in these fields:
In memory record limit
Specifies the maximum number of data rows a sorter will hold in memory before it starts paging to disk. By default, a sort of 10,000 records or less will be done in memory, and a sort of more than 10,000 records will be performed as a disk sort. The maximum limit is 100,000 records. An in-memory sort is much faster than a disk sort, so this value should be set high enough so that most of the sorts are in-memory sorts, and only large sets are written to disk.
Note: Be careful in environments where jobs are running concurrently because increasing the In memory record limit setting increases the likelihood of running out of memory.
Maximum number of temporary files
Specifies the maximum number of temporary files that may be used by a sort process. Using a larger number of temporary files can result in better performance. However, the optimal number is highly dependent on the configuration of the server running Spectrum Technology Platform. You should experiment with different settings, observing the effect on the performance of using more or fewer temporary files. To calculate the approximate number of temporary files that may be needed, use this equation:
(NumberOfRecords × 2) ÷ InMemoryRecordLimit = NumberOfTempFiles 
Note: The maximum number of temporary files cannot be more than 1,000.
Compression
Specifies that temporary files are compressed when they are written to disk.
Note: The optimal sort of performance settings depends on your server's hardware configuration. You can use this equation as a general guideline to produce good sort performance: (InMemoryRecordLimit × MaxNumberOfTempFiles ÷ 2) >= TotalNumberOfRecords
Express match on

It performs an initial comparison of express key values to determine whether two records are considered a match.

Initial collection number

It specifies the starting number to assign to the collection number field for duplicate records.

The collection number identifies each duplicate record in a match queue. Unique records are assigned a collection number of 0. Each duplicate record is assigned a collection number starting with the value specified in the Initial collection number field.

Assign collection number 0 to unique records

It assigns zeroes as collection numbers to unique records. Un-check this option to generate collection numbers other than zero for unique records. The unique record collection numbers will be in sequence with any other collection numbers. See the examples below.

  • If your matching dataflow finds four records and the first two records are unique, the collection numbers would be assigned, as shown in the table below:
    Collection Number Record Type

    1

    Unique
    2 Unique
    3 Duplicate or Suspect
    3 Duplicate or Suspect
  • If your matching dataflow finds four records and the last two are unique, the collection numbers would be assigned, as shown in the table below:
    Collection Number Record Type

    1

    Duplicate or Suspect
    1 Duplicate or Suspect
    2 Unique
    3 Unique
Generate data for analysis It generates the match results. For more information, see Analyzing Match Results.
Return match rule name It includes the selected match rule name in the stage output.
Enable sliding window It enables the sliding window for this matching method and allows you to enter a value in the Window size field. By default, the value is 50.
The sliding window algorithm sequentially fills a predetermined buffer size called a window with the corresponding amount of data rows. As each row is added to the window, it's compared to each item already contained in the window. If a match with an item is determined, then the same group ID is given to:
  • the driver record (the new item to add to the window), and
  • the candidates (items already in the window).

This comparison is continued until the driver record has been compared to all items contained within the window.

As new drivers are added, the window will eventually reach its predetermined capacity. At this point, the window will slide, hence the term sliding window. Sliding means that the window buffer will remove and write the oldest item in the window as it adds the newest driver record to the window.