Using a Global Cache for Queries

If you have a large dimension table you can load the dimension table's data into a cache, and use the cache to look up surrogate keys. Using a cache improves performance compared to performing lookups directly to the dimension table with Query DB.

To use a cache you must create two dataflows: one to populate the cache with data from the dimension table, and another that uses the cache when updating the fact table. The following diagram illustrates how the two dataflows work together:

  1. Create a dataflow that populates the cache with dimension table data from the large dimension table.

    This dataflow should consist of two stages:

    • A Read from DB stage that reads data from the dimension table that you want to load into the cache.
    • A Write to Cache stage that populates the cache with the dimension table data.
  2. Run this dataflow to populate the cache.
  3. In the dataflow that populates the fact table, add a Query Cache.
  4. In the Query Cache stage, configure the stage to query the cache created by the Write to Cache stage.
  5. Run this dataflow to populate the fact table.

If you want to make sure that the cache is populated with the latest data from the dimension table each time you update your fact table, you can create a process flow that first runs the job to populate the dimension table, then runs the job to update the fact table. This allows you to execute the process flow in order to run both dataflows in succession. For more information about process flows, see the Dataflow Designer's Guide.