Input

The Import to Hub stage requires that your dataflow contain two channels: one that provides data for entities going into the Entity Port (the top port) and one that provides data for relationships going into the Relationship port (the bottom port). This requirement could be met by two source stages (each containing one input file), or it could come from multiple source stages that feed into Record Combiners and ultimately become two streams, or it could come from one source file whose data goes through a Conditional Router or a Splitter that outputs the data into two streams. It doesn't matter which method you use as long as the end result is a channel of entity data and a channel of relationship data that go into the Import to Hub stage.

Entity Data

Data going into the Entity Port needs to include both type and ID information for your entities. You can have a Type field ("Person") and an ID field ("Bob"), or you can have just an ID field that combines both type and ID information, separated by a colon ("Person:Bob"). For instance, your file could look something like the comma-delimited data below. The Type field tells us that the entities are people and places, and the ID field provides the names of the people and places.

Alternatively, your input file could contain a single field that combines both type and ID:

Note: The fields that contain type and ID data do not actually need to be named "Type" and "ID"; any field name is acceptable.

Relationship Data

Data going into the Relationship Port needs to include fields that identify source types, source IDs, target types, target IDs, and labels that identify the relationships between the sources and targets. Note that all source and target entity information must reference entities that are provided on the Entity Port. Your relationship data may also include properties about those relationships. For instance, your file could look something like the data below. In this case, the SourceType field tells us that all sources are people, and the TargetType field tells us that the targets are people and places. The SourceID field provides names of all the sources, and the TargetID field provides names of the people and places. The Label field identifies the relationships, in this case "works with", "works at", or "lives at".

Sorting Requirements

The Import to Hub stage requires that input data be sorted in a certain manner. The entity input file must be sorted first on type, then on ID, in an ascending manner. The entity data shown above includes the necessary fields but is not sorted correctly. In order for an Import to Hub dataflow to run correctly, that entity data would need to look like this:
Or this, for combined fields:
The relationship input file must be sorted as well. If your relationship data includes both type and ID in the same field, the input file should be sorted as follows in ascending order:
  • Source type/ID
  • Target type/ID
  • Label
  • Unique ID (optional)
If your relationship data contains type information in a separate field, the input file should be sorted in an ascending manner with those fields broken out:
  • Source type
  • Source ID
  • Target type
  • Target ID
  • Label
  • Unique ID (optional)
As with the entity data, the relationship data shown above includes the necessary fields but is not sorted correctly. In order for an Import to Hub dataflow to run correctly, that relationship data would need to look like this:
Or this, for combined fields: