Distributed Processing

If you have a very complex job, or you are processing a very large data set such as one containing millions of records, you may be able to improve flow performance by distributing the processing of the flow to multiple instances of the Spectrum Technology Platform server on one or more physical servers.

The most scalable solution for distributed processing is to install Spectrum Technology Platform in a cluster. See the Installation Guide for instructions on installing and configuring a cluster.
Note: While it is also possible to use distributed processing on a single Spectrum Technology Platform server, the following information describes using distributed processing in a cluster. If you are using a single server, distributed subflow processing is broken up into microbatches and processed by the one server instead of by the cluster.

Once your clustered environment is set up, you can build distributed processing into a flow by creating subflows for the parts of the flow that you want to distributed to multiple servers. Spectrum Technology Platform manages the distribution of processing automatically after you specify just a few configuration options for the subflow.

Distributed processing looks like this:



As records are read into the subflow, the data is grouped into batches. These batches are then written to the cluster and automatically distributed to the a node in the cluster which processes the batch. This processing is called a microflow. A subflow may be configured to allow multiple microflows to be processed simultaneously, potentially improving performance of the flow. When the distributed instance is finished processing a microflow, it sends the output back into the parent flow.

The more Spectrum Technology Platform nodes you have the more microflows can be processed simultaneously, allowing you to scale your environment as needed to obtain the performance you require.

Once set up, a clustered environment is easy to maintain since all nodes in the cluster automatically synchronize their configuration, which means the settings you apply through Spectrum Management Console and the flows you design in Spectrum Enterprise Designer are available to all instances automatically.