Example: Virtual Metadata in Data Hub Models

A use case for this virtual metadata might be when you want to access a large amount of customer information at one time, such as profile data, purchasing history, and so on. Storing and replicating all of this information "physically" in a Data Hub model presents potential barriers:

  • The volume of some data is too large or too widely distributed to be contained in a central hub.
  • The data changes frequently and would need to be replicated in the hub too often to be practical.
  • There are business or privacy concerns over sensitive information such as Social Security numbers, payroll figures, pricing data, and so on.

By storing some of the information externally and then accessing it virtually, you are still able to obtain a complete view of your customer while mitigating the concerns of duplicating data in the Data Hub.

The Data Hub Model

You have an existing model containing two types of entities, Customer and CustomerMaster, with a relationship of HasMasterRecord.

Customer entities contain the following information:

  • Customer ID (unique for each customer)
  • Name
  • Date of birth
  • Gold Club status (indicates whether the customer is a member of the Gold Club)

CustomerMaster entities contain the following information:

  • Master ID (unique for each household)
  • Customer ID
  • Name
  • Date of birth

The image below shows a model that is built using this metadata. Notice that George and Martha Washington have separate Customer entities, but they share the same CustomerMaster entity, because the master ID represents a household, not an individual.

The External Tables

We want to link customers to their purchase history, which we will create using three external tables—one for customers, one for purchases, and one for products.

We use the same information for the Customers table as we did to populate the Customer entities:

The Purchases table contains order numbers, product IDs, and purchase dates, and it indicates whether the purchase was made online. It ties this data to the customer ID, which was also used in the Customer table.

The Products table contains product names, model numbers, and descriptions. It ties this data to the product ID, which was also used in the Purchases table.

The Purchases table will serve as a join table that establishes connections between records in the Customers and Products tables. The image below illustrates the relationships among the various fields in the three external tables.

Integrating all of this information would produce the following record for Martha Washington:

Linking Physical and Virtual Data

To create relationships between entities in the Data Hub model and records in the external tables, the Data Hub entities must have properties that reference the external data. To link from the Data Hub model to records in the Customers table there must be a property in the external table that contains the same values that are stored in the entities' CustomerID field. You can see in the following image that the Martha Washington entity has a property called CustRef, which contains the same value as the CustomerID field in the Customers table.

To establish the link between the Customer entity and the external table, we must first create a virtual entity for the Customers table. The procedure for doing this is described in Create Virtual Entities. The following image shows the completed screen for the new entity type. Notice that "CustomerID" is selected as the Primary Key, which is the property used to link between the virtual and physical entities.

After the virtual entity is added, we need to add a relationship linking that entity to the physical entity already in the model. The procedure for doing this is described in Connect relationships to virtual entities. The following image shows the completed screen for the new relationship type. Notice that the Entity Source ID refers to the CustRef property and the Entity Target ID refers to the CustomerID property. This is the link that ties the virtual entity to the physical entity because both properties include the customer ID.

In this example, the relationship between customers is a one-to-one relationship, meaning for every physical entity there should be at most one virtual entity and for each virtual entity there should be at most one physical entity. For one-to-one relationships, as well as one-to-many relationships, there is no need for a separate join table, so we specify Customers as the table and CustomerID for both the Link Source ID and the Link Target ID.

The metadata for the model now includes a link to the virtual data. The blue box next to the ERP_Customer entity denotes that the entity is virtual. Similarly, the blue box next to the MasterErpRef relationship denotes that it connects to a virtual entity.

Many-to-Many Relationships

Now that we have linked the virtual customer to the physical customer, we need to link purchases and products as well. The optimal approach is to create a virtual entity for products and create a many-to-many virtual relationship between physical Customers and virtual Products using the Purchases table as a join table. We first need to create a virtual entity for the Products table that uses the ProductID as the Primary Key.

We next add a new relationship that uses the Purchases table as a join table to achieve a many-to-many relationship. Customer entities are linked with Product entities based on information in the Purchases table. Notice that the CustomerID property is used as the link ID for the Customer entities and the ProductID property is used as the link ID for the Product entities. The Purchases table contains both of these properties, which maps information from Customer to Product.

The Entity Source ID (CustRef) is the property of the Customer entity that will be matched to the Link Source ID column (CustomerID) in the Purchases table. The Link Target ID (ProductID) is the column in the Purchases table that will be matched to the Entity Target ID (also ProductID) property of the Products entity. The following diagram illustrates how each field is used to link Customer to Product.

When a query traverses from Customer to Product, it will begin with the property that was selected as the Entity Source ID, which in our example is the CustRef property. For Martha Washington the value for this property is C23; this value will be used to find rows in the Purchases table where the Link Source ID (in our case, the CustomerID) has a value of C23.

The value in the Link Target ID (the ProductID) will be used to look up the corresponding row in the Products table. For each row returned from the Purchases table, the value of the ProductID field will be matched to the Entity Target ID property (the ProductID) of the Products table. The following diagram illustrates how each field is used throughout the flow.

Advanced Configuration

One alternative to the approach shown here is to use the Purchases table to create the many-to-many relationship between the virtual customer and the virtual product entities. Doing this involves an additional and unnecessary virtual hop from physical customer to virtual customer. The Data Hub is much more efficient at traversing relationships than SQL data sources because it is built upon a graph database; therefore, the extra hop should be avoided if possible.

A second alternative is to create virtual entities for the Purchases table in addition to those for the Products table. This approach would require one-to-many relationships between physical customer entities and virtual purchases entities as well as one-to-many relationships between virtual purchases and virtual products. This is inefficient because it creates unnecessary virtual hops to get from physical customer to virtual product. There is no advantage in representing a purchase as an entity because the relationship can hold all the Purchases fields as properties. The only reason to create an entity for Purchases is if it connects more than two entity types, such as if there were a third foreign key in the table linking purchases to a store.