Setting Default Sort Performance Options

Sorting large data sets can be one of the most time-consuming operations performed during batch processing, so setting appropriate sort performance options can have a significant impact on the performance of your jobs. Sort performance options control memory and disk utilization, allowing you to take full advantage of the available memory and disk capacity.

There are two places where you can configure sort performance settings. The first is in Spectrum Management Console. This is where you specify default sort performance options for your system. The second place is in dataflow stages that perform a sort. The Sorter stage, Read from File, Write to File, and all other stages that include a sort operation, contain sort performance options. When you specify sort performance option in a stage, you override the default sort performance options, choosing different settings to apply to individual stages in a dataflow.

This procedure describes how to set the default sort performance options for jobs run on your Spectrum Technology Platform server.

  1. Open Spectrum Management Console.
  2. Go to Flows > Defaults.
  3. Use these settings to control sort performance:
    In memory record limit
    Specifies the maximum number of data rows a sorter will hold in memory before it starts paging to disk. By default, a sort of 10,000 records or less will be done in memory and a sort of more than 10,000 records will be performed as a disk sort. The maximum limit is 100,000 records. Typically an in-memory sort is much faster than a disk sort, so this value should be set high enough so that most of the sorts will be in-memory sorts and only large sets will be written to disk.
    Note: Be careful in environments where there are jobs running concurrently because increasing the In memory record limit setting increases the likelihood of running out of memory.
    Maximum number of temporary files
    Specifies the maximum number of temporary files that may be used by a sort process. Using a larger number of temporary files can result in better performance. However, the optimal number is highly dependent on the configuration of the server running Spectrum Technology Platform. You should experiment with different settings, observing the effect on performance of using more or fewer temporary files. To calculate the approximate number of temporary files that may be needed, use this equation:
    (NumberOfRecords × 2) ÷ InMemoryRecordLimit = NumberOfTempFilesN
    Note: The maximum number of temporary files cannot be more than 1,000.
    Enable compression
    Specifies that temporary files are compressed when they are written to disk.
    Note: The optimal sort performance settings depends on your server's hardware configuration. You can use this equation as a general guideline to produce good sort performance: (InMemoryRecordLimit × MaxNumberOfTempFiles ÷ 2) >= TotalNumberOfRecords