Configuring the Filter Options

Configuring the grouping and sorting options

On the Filter Options: Filter page, specify these details:
Field Description

Group by

Specifies the field to use to create groups of records to filter. The Filter stage will retain one or more records from each group, depending on how you configure the stage. In cases where you have used a matching stage earlier in the dataflow, such as Interflow Match, Intraflow Match, or Transactional Match, you should select the CollectionNumber field to use the collections created by the matching stage as the groups. However, if you want to group records by some other field, choose the field here. For example, if you want to filter out all but one record from records that have the same value in the AccountNumber field, you would select AccountNumber.

Sort

If you specify a field in the Group by field, check this box to sort the records by the value in the field you chose. This option is enabled by default.

Advanced Click this button to specify sort performance options. Te sort performance options specified in Management Console, which are the default performance options for your system and are in effect. If you want to override your system's default performance options, toggle the Override sort performance options to yes and specify the values you want in these fields:
In memory record limit
Specifies the maximum number of data rows a sorter will hold in memory before it starts paging to disk. By default, a sort of 10,000 records or less will be done in memory and a sort of more than 10,000 records will be performed as a disk sort. The maximum limit is 100,000 records. An in-memory sort is much faster than a disk sort, so this value should be set high enough so that most of the sorts are in-memory sorts and only large sets are written to disk.
Note: Be careful in environments where there are jobs running concurrently because increasing the In memory record limit setting increases the likelihood of running out of memory.
Maximum number of temporary files
Specifies the maximum number of temporary files that may be used by a sort process. Using a larger number of temporary files can result in better performance. However, the optimal number is highly dependent on the configuration of the server running Spectrum Technology Platform. You should experiment with different settings, observing the effect on performance of using more or fewer temporary files. To calculate the approximate number of temporary files that may be needed, use this equation:
(NumberOfRecords × 2) ÷ InMemoryRecordLimit = NumberOfTempFiles 
Note that the maximum number of temporary files cannot be more than 1,000.
Compression
Specifies that temporary files are compressed when they are written to disk.
Note: The optimal sort performance settings depends on your server's hardware configuration. You can use this equation as a general guideline to produce good sort performance: (InMemoryRecordLimit × MaxNumberOfTempFiles ÷ 2) >= TotalNumberOfRecords

Limit number of returned duplicate records

Specifies the maximum number of records that are returned from each group. If you set this option to 1, you can define filter rules to determine which record in each group should be returned. If no rules are defined, the first record in each collection is returned and the rest are discarded. In this mode, the filter rules define which record will be retained.

For example, if you define a rule where the record with the highest match score in a group is retained, and you set this option to 1, then the record with the highest match score in each group will survive and the other records in the group will be discarded.

If you set this option to a value higher than one, you cannot specify filter rules.

Note: In the event no records in the collection meet the defined rule criteria, then no records from the group are returned.

Remove duplicates from collection

Specifies to use filter rules to determine which records are removed from the collection. The remaining records in the collection are retained. When this option is selected, you must define a rule.

Note: If a group contains only one record, the filter rules are ignored and the record is retained.

Configuring the filter rules

Filter rules determine which records in a group are to be retained or removed. If you select the option Limit number of returned duplicate records then the rules determine which records survive the filter. If you select the option Remove duplicates from collection then the rules determine which records are removed from the dataflow.
  1. To configure filter rules, click the Add Rule icon in the Rule Configuration section, and enter details as described in the table below.
    Field Description

    Field name

    Specifies the name of the dataflow field whose value you want to evaluate to determine whether to filter the record.

    Field Type

    Specifies the type of data in the field. It can be one of these:
    • Non-Numeric: Choose this option if the field contains non-numeric data (for example, string data).
    • Numeric: Choose this option if the field contains numeric data (for example, double, and float).

    Operator

    Specifies the type of comparison you want to use to evaluate the field. Use one of these options:
    • Contains: Determines if the field contains the value specified. For example, "sailboat" contains the value "boat".
    • Equal: Determines if the field contains the exact value specified.
    • Greater Than: Determines if the field value is greater than the value specified.
    • Greater Than Or Equal To: Determines if the field value is greater than or equal to the value specified.
    • Highest: Compares the field's value for all the records group and determines which record has the highest value in the field. For example, if the fields in the group contain values of 10, 20, 30, and 100, the record with the field value 100 would be selected. If multiple records are tied for the longest value, one record is selected.
    • Is Empty: Determines if the field contains no value.
    • Is Not Empty: Determines if the field contains any value.
    • Less Than: Determines if the field value is less than the value specified.
    • Less Than Or Equal To: Determines if the field value is less than or equal to the value specified.
    • Longest: Compares the field's value for all the records group and determines which record has the longest (in bytes) value in the field. For example, if the group contains the values "Mike" and "Michael", the record with the value "Michael" is selected. If multiple records are tied for the longest value, one record is selected.
    • Shortest: Compares the field's value for all the records group and determines which record has the shortest (in bytes) value in the field. For example, if the group contains the values "Mike" and "Michael", the record with the value "Mike" is selected. If multiple records are tied for the shortest value, one record is selected.
    • Lowest: Compares the field's value for all the records group and determines which record has the lowest value in the field. For example, if the fields in the group contain values of 10, 20, 30, and 100, the record with the field value 10 would be selected. If multiple records are tied for the lowest value, one record is selected.
    • Most Common: Determines if the field value contains the value that occurs most frequently in this field among the records in the group. If two or more values are most common, no action is taken.
    • Not Equal: Determines if the field value is not the same as the value specified.

    Value type

    Specifies the type of value you want to compare to the field's value. Use one of these values:

    Note: This option is not available if you select the operators Highest, is Empty, is Not empty, Lowest, Longest, Most Common and Shortest.
    • Field: Choose this option if you want to compare another dataflow field's value to the field.
    • String: Choose this option if you want to compare the field to a specific value.
    Value
    Specifies the value to compare to the field's value. If you selected Field in the Field type field, select a dataflow field. If you selected String in the Value type field, type the value you want to use in the comparison.
    Note: This option is not available if you select the operators Highest, is Empty, is Not empty, Lowest, Longest, Most Common and Shortest.
  2. To define another rule, click the Add Rule icon and specify details as described in the table above.
    Note: If you specify multiple rules, you will have to select a logical operator between each rule. Choose And if you want the new rule and the previous rule to both pass in order for the condition to be met. Select Or if you want either the previous rule or the new rule to pass in order for the condition to be met.

Example of a Filter Rule

This rule retains the record in each group with the highest value in the MatchScore field. Note that Value and Value Type options do not apply when the Operator is highest or lowest.

Field Name = MatchScore
Field Type = Numeric
Operator = Highest

This rule retains the record where the value in the AccountNumber is "12345".

Field Name = AccountNumber
Field Type = Numeric
Operator = Equals
Value Type = String
Value = 12345