Record Joiner

Record Joiner performs a SQL-style JOIN operation to combine records from different streams based on a relationship between fields in the streams. You can use Record Joiner to join records from multiple files, multiple databases, or any upstream channels in the dataflow. You must connect at least two input channels to Record Joiner. The results of the JOIN operation are then written to one output channel. Optionally, records that do not match the join condition can be written to a separate output channel.

Note: Before using Record Joiner you should have a good understanding of the SQL JOIN operation. For more information, see wikipedia.org/wiki/Join_(SQL).

Join Definition

Option Description

Left port

The port whose records you want to use as the left table in the JOIN operation. All other input ports will be used as right tables in the JOIN operation.

Note: "Left" table and "right" table are SQL JOIN concepts. Before using Record Joiner you should have a good understanding of the SQL JOIN operation. For more information, see wikipedia.org/wiki/Join_(SQL).

Join type

The type of JOIN operation you want to perform. One of the following:

Left Outer
Returns all records from the left port even if there are no matches between the left port and the other ports. This option returns all records from the left port plus any records that match in any of the other ports.
Full
Returns all records from all ports.
Inner
Returns only those records that have a match between the left port and another port. For instance, if you have four input sources and port 1 is the left port, an inner join will return records that have matching fields between port 1 and port 2, port 1 and port 3, and port 1 and port 4.

Join Fields

The field or fields from the left port that must match the data in a field from another port in order for the records to be joined.

Note: Only fields that have a data type of string or integer or date or datetime can be used as join fields.

Data from the left port is sorted

Specifies whether the records in the left port are already sorted by the field specified in Join Fields. If the records are already sorted, checking this box can improve performance. If you do not check this box, Record Joiner will sort the records according to the field specified in Join Fields before performing the join operation.

If you have specified multiple join fields, then the records must be sorted using the order of the fields listed in Join Fields. For example, if you have two join fields:

  • Amount
  • Region

Then the records must be sorted first by the Amount field, then by the Region field.

Important: If you select this option but the records are not sorted, you will get incorrect results from Record Joiner. Only select this option if you are sure that the records in the left port are already sorted.

Join Definitions

Describes the join conditions that will be used to determine if a record from the left port should be joined with a record from one of the other ports. For example:

port1.Name = port2.Name

This indicates that if the value in the Name field of a record from port1 matches the value in the Name field of a record from port2, the two records will be joined.

To modify a join condition, click Modify. Select a field from the right port whose data must match the data in the join field from the left port in order for the records to be joined. If you want to change the left port field, click Cancel and change it in the Join Fields field. If the records in the right port are sorted by the join field, check the box Data from the right port is sorted. Checking this box can improve performance.

Important: If you select Data from the right port is sorted but the records are not sorted, you will get incorrect results from Record Joiner. Only select this option if you are sure that the records in the right port are already sorted.

Field Resolution

This tab specifies which port's data to use in the joined record in cases where the same field name exists in more than one input port. For example, if you are performing a join on two sources of data, and each source contains a field named DateOfBirth, you can specify which port's data to use in the DateOfBirth field in the joined record.

If there are fields of the same name but with different data, and you want to preserve both fields' data in the joined record, you must rename one of the fields before the data is sent to Record Joiner. You can use the Transformer stage to rename fields.

Handling Records That Are Not Joined

In order for a record to be included in the Record Joiner output it must meet the join condition, or a join type must be selected that returns both joined records and those that did not meet the join condition. For example, a full join will return all records from all input ports regardless of whether a record meets the join condition. In the case of a join type that does not return all records from all ports, such as a left outer join or an inner join, only records that match the join condition are included in the Record Joiner output.

To capture the records that are not included in the result of the join operation, use the not_joined output port. The output from this port contains all records that were not included in the regular output port. The not_joined output port is the white triangle on the right side of the Record Joiner stage as shown here:

Records that come out of this port have the field InputPortIndex added to them. This field contains the number of the Record Joiner input port where the record came from. This allows you to identify the source of the record.

Note:
  • For optimal performance of this stage, ensure two independent streams of records are joined to generate a consolidated output.
  • If a single path is first branched using either a broadcaster or conditional router and then joined back using a Record Joiner, the flow may hang. In case multiple stages are used between branching and joining, use the Sorter as close to the Record Joiner as possible.