Types of Flows

A dataflow is a series of operations that takes data from some source, processes that data, then writes the output to some destination. The processing of the data can be anything from simple sorting to more complex data quality and enrichment actions. The concept of a dataflow is simple, but you can design very complex dataflows with branching paths, multiple sources of input, and multiple output destinations.

There are four types of dataflows: jobs, services, subflows, and process flows.

Job

A job is a dataflow that performs batch processing. A job reads data from one or more files or databases, processes that data, and writes the output to one or more files or databases. Jobs can be executed manually in Enterprise Designer or can be run from a command line using the job executor.

The following dataflow is a job. Note that it uses the Read from File stage for input and two Write to File stages as output.



Service

A service is a dataflow that you can access as web services or using the Spectrumâ„¢ Technology Platform API. You pass a record to the service and optionally specify the options to use when processing the record. The service processes the data and returns the data.

Some services become available when you install a module. For example, when you install the Universal Addressing Module the service ValidateAddress becomes available on your system. In other cases, you must create a service in Enterprise Designer then expose that service on your system as a user-defined service. For example, the Location Intelligence Module's stages are not available as services unless you first create a service using the module's stages.

You can also design your own custom services in Enterprise Designer. For example, the following dataflow determines if an address is at risk for flooding:



Note: Since the service name, option name, and field name ultimately become XML elements, they may not contain characters that are invalid in XML element names (for example, spaces are not valid). Services not meeting this requirement will still function but will not be exposed as web services.

Subflow

A subflow is a dataflow that can be reused within other dataflows. Subflows are useful when you want to create a reusable process that can be easily incorporated into dataflows. For example, you might want to create a subflow that performs deduplication using certain settings in each stage so that you can use the same deduplication process in multiple dataflows. To do this you could create a subflow like this:

You could then use this subflow in a dataflow. For example, you could use the deduplication subflow within a dataflow that performs geocoding so that the data is deduplicated before the geocoding operation:

In this example, data would be read in from a database then passed to the deduplication subflow, where it would be processed through Match Key Generator, then Intraflow Match, then Best of Breed, and finally sent out of the subflow and on to the next stage in the parent dataflow, in this case Geocode US Address. Subflows are represented as a puzzle piece icon in the dataflow, as shown above.

Subflows that are saved and exposed are displayed in the User Defined Stages folder in Enterprise Designer.

Process Flow

A process flow executes a series of activities such as jobs and external applications. Each activity in the process flow executes after the previous activity finishes. Process flows are useful if you want to run multiple dataflows in sequence or if you want to run an external program. For example, a process flow could run a job to standardize names, validate addresses, then invoke an external application to sort the records into the proper sequence to claim postal discounts. Such a process flow would look like this:

In this example, the jobs Standardize Names and Validate Addresses are exposed jobs on the Spectrumâ„¢ Technology Platform server. Run Program invokes an external application, and the Success activity indicates the end of the process flow.