Record Joiner

Record Joiner performs a SQL-style JOIN operation to combine records from different streams based on a relationship between fields in the streams. You can use Record Joiner to join records from multiple files, multiple databases, or any upstream channels in the flow. You must connect at least two input channels to the Record Joiner. The results of the JOIN operation are then written to one output channel. Optionally, records that do not match the join condition can be written to a separate output channel.

Using Record Joiner

To use the Record Joiner stage in a new Flow, perform these steps:
  1. On the Spectrum Flow Designer Home page, click New.
  2. On the New Flow page, click Job, Service, or Subflow, as required and then click the correcsponding blank canvas.
  3. Click Ok.
  4. In the dialog box that appears, give a name to the Flow, Job, Service, or Subflow you are creating.
  5. Click Ok.
  6. From the Palette Panel, drag the Record Joiner stage to the canvas.
    Note: Record Joiner is one of the Control Stages.
  7. Drag all Sources (of the records that are to be joined) to the canvas and connect their Output Port to the Record Joiner Input Port.
  8. Drag the Sink for the joined records and connect its Input Port to the Record Joiner Output Port.
  9. Configure the Sources. See the documentation of the respective stage for field-level details.
  10. Click Record Joiner and configure the Join Definition as described below.
    Option Description
    Left input stage

    From the dropdown, select the stage, records of which you want to use as the left table in the JOIN operation. The selected stage is displayed as Select left input field(s) for the join condition.

    All other Source stages move under Select right input field(s) that match <stage selected in the left table> field(s) for the join condition. Records from these stages will be used as right tables in the JOIN operation.

    Note: "Left" table and "Right" table are SQL JOIN concepts. Before using Record Joiner you should have a good understanding of the SQL JOIN operation. For more information, see wikipedia.org/wiki/Join_(SQL).
    Select left input field(s) for the join condition Select the fields from the "Left" table for the JOIN operation
    Select right input field(s) that match <stage selected in the left table> field(s) for the join condition Select the fields for the Right tables in the JOIN operation.
    Note: The valid data types for join fields are integer, string, datetime, date, long, float, double, and big decimal.
    Join type

    Select the type of JOIN operation you want to perform.

    Inner
    Returns only those records that have a match between the left port and another port. For example, if you have four input sources and port 1 is the left port, an inner join will return records that have matching fields between port 1 and port 2, port 1 and port 3, and port 1 and port 4.
    Left
    Returns all records from the left port even if there are no matches between the left port and the other ports.
    Full
    Returns all records from all ports.
    Data from the previous stage is sorted

    Specifies whether the records are already sorted by the specified Join Fields. If the records are already sorted, checking this box can improve performance. If you do not check this box, Record Joiner will sort the records according to the specified Join Fields before performing the join operation.

    If you have specified multiple join fields, then the records must be sorted using the order of the fields listed in Join Fields. For example, if you have two join fields:

    • Amount
    • Region

    Then the records must be sorted first by the Amount field, then by the Region field.

    Important: If you select this option but the records are not sorted, you will get incorrect results from Record Joiner. Select this option only when you are sure that the records in the left port are already sorted.
  11. Click the Field Resolution tab, and configure it.

    This tab specifies which port's data to use in the joined record in cases where the same field name exists in more than one input port. For example, if you are performing a join on two sources of data, and each source contains a field named DateOfBirth, you can specify which port's data is to be used in the DateOfBirth field in the joined record.

    If there are fields of the same name but with different data, and you want to preserve both fields' data in the joined record, you must rename one of the fields before the data is sent to the Record Joiner. You can use the Transformer stage to rename fields.

Handling Records That Are Not Joined

In order for a record to be included in the Record Joiner output it must meet the join condition, or a join type must be selected that returns both joined records and those that did not meet the join condition. For example, a full join will return all records from all input ports regardless of whether a record meets the join condition. In the case of a join type that does not return all records from all ports, such as a left outer join or an inner join, only records that match the join condition are included in the Record Joiner output.

To capture the records that are not included in the result of the join operation, use the not_joined output port. The output from this port contains all records that were not included in the regular output port.

Records that come out of this port have the field InputPortIndex added to them. This field contains the number of the Record Joiner input port where the record came from. This allows you to identify the source of the record.

Note:
  • For optimal performance of this stage, ensure two independent streams of records are joined to generate a consolidated output.
  • If a single path is first branched using either a broadcaster or conditional router then re-joined back using a Record Joiner, the flow may hang. In case multiple stages are used between branching and joining, use the Sorter as close to the Record Joiner as possible.

Math

The Math stage handles mathematical calculations on a single data row and allows you to conduct a variety of math functions using one or more expressions. Data is input as strings but the values must be numeric or Boolean, based on the type of operation being performed on the data.

  1. Under Control Stages, click the Math stage and drag it to the canvas, placing it where you want on the flow.
  2. Connect the stage to other stages on the canvas.
  3. Double-click the Math stage. The Math Options dialog box appears, with the Expressions tab open. This view shows the input fields, the Calculator, and the Expressions canvas. Alternately, you can click the Functions tab to use functions instead of the Calculator.
The Input fields control lists the valid fields found on the input port. Field name syntax is very flexible but has some restrictions based on Groovy scripting rules. If you are not familiar with Groovy scripting, see this website for complete information about Groovy:groovy-lang.org.
Note: This stage is not available in the tech preview version of Flow Designer.