Design guidelines for optimal performance
Carefully designing your flows to optimize performance is the most important thing you can do to achieve good performance on Spectrum Technology Platform. These guidelines describe techniques you can use optimize flow performance.
Minimize the Number of Stages
Spectrum Technology Platform achieves high performance through parallel processing. Each stage in a flow runs asynchronously in its own thread. However, it is possible to overthread the processors when executing certain types of flows. When this happens, the system spends as much or more time managing threads as doing "real work". We have seen flows with as many as 130 individual stages that perform very poorly on smaller servers with one or two processors.
So the first consideration in designing flows that perform well is to use as many stages as needed, but no more. Some examples of using more stages than needed are:
- Using multiple conditional routers where one would suffice
- Defining multiple transformer stages instead of combining the transforms in a single stage
Fortunately it is usually possible to redesign these flows to remove redundant or unneeded stages and improve performance.
For complex flows, consider using embedded flows or subflows to reduce clutter on the canvas and make it easier to view and navigate the flow. Using embedded flows does not have a performance benefit at runtime, but it does make it easier to work with flows in Spectrum Enterprise Designer. Using subflows to simplify complex flows can improve Spectrum Enterprise Designer performance when editing flows.
Reduce Record Length
Since data is being passed between concurrently executing stages, another consideration is the length of the input records. Generally input with a longer record length will take longer to process than input with a shorter record length, simply because there is more data to read, write, and sort. Dataflows with multiple sort operations will particularly benefit from a reduced record length. In the case of very large record lengths it can be faster to remove the unnecessary fields from the input prior to running the Spectrum Technology Platform job then append them back to the resulting output file.
Use Sorting Appropriately
Another consideration is to minimize sort operations. Sorting is often more time consuming than other operations, and can become problematic as the number and size of input records increases. However, many Spectrum Technology Platform stages either require or prefer sorted input data. Spectrum Universal Addressing and Enterprise Geocoding, for example, perform optimally when the input is sorted by country and postal code. Stages such as Intraflow Match and Interflow Match require that the input be sorted by the "group by" field. In some cases you can use an external sort application to presort the input data and this can be faster than sorting within the Spectrum Technology Platform flow.