Updating a Fact Table

This procedure describes how to create a dataflow that reads data from a source database or file and uses that data to update a fact table in your data warehouse.

  1. In Enterprise Designer, select File > New > Dataflow > Job.
  2. Based on the source of the data you want to write to the fact table, drag the appropriate stage onto the canvas.
    • To use data from a database to populate the table, drag the Read from DB stage onto the canvas.
    • To use data from a flat file to populate the table, drag the Read from File stage onto the canvas.
    • To use data from a variable format file to populate the table, drag the Read from Variable Format File stage onto the canvas.
    • To use data from an XML file to populate the table, drag the Read from XML stage onto the canvas.
    Note: If you will be reading data from a file and not a database, make sure that the file contains only the new records that you want to add to the fact table and not records that already exist in the fact table. If you will be reading data from a database, you will define a query to filter records later in this procedure.
  3. Double-click the source stage you just placed on the canvas and configure it to point to the source of the data you want to populate to the fact table.
  4. If you are reading data from a database, filter the records so that only the records that are new will be added to the fact table. You can do this by defining the SQL SELECT statement to only read in records that were modified since the last time the fact table was updated.
  5. Drag a Broadcaster stage onto the canvas and connect the source stage to it.

    Your dataflow now looks like this:

  6. Drag one Query DB stage onto the canvas for each dimension table in your data warehouse and connect them to the Broadcaster stage.

    For example, if you have four dimension tables in your data warehouse, drag four Query DB stages onto the canvas. Your dataflow would look like this:

    The Query DB stages will be used to look up the surrogate key for each dimension using the natural key from the data source. The surrogate key will then replace the natural in each record being loaded into the fact table.

    Tip: You can modify the name of the stage to make it easy to see which table each stage queries.
  7. Configure each Query DB stage so that it looks up the surrogate key for each natural key from the data source. To do this:
    1. In the Connection field, specify the connection to the data warehouse.
    2. In the Table/View field, select the dimension table that you want this stage to query.
    3. In the Where field, write a WHERE statement that looks up the surrogate key based on the value in the appropriate dataflow field.

      For example, this would look up the surrogate key for a product by finding the record in the dimension table whose value in the description column matches the value in the data source's product_name field.

      description=${product_name}
    4. In the Include column select the database column that contains the surrogate key.

    For example, a Query DB stage that looks up the surrogate key for a product name would look like this:

    In this example, the query looks up the product key by finding the record in the prod_dimension table where the value in the description column matches the value in the dataflow field product_name. The stage returns the product_key field from the table and adds it to the dataflow, as indicated by the checked box in the Include column.

  8. Drag a Record Combiner stage to the canvas and connect all the Query DB stages to it.

    Your dataflow should now look like this:



  9. Drag a Write to DB stage onto the canvas and connect it to the Record Combiner stage.

    Your dataflow should now look like this:

  10. Configure the Write to DB stage to write the records to the fact table. To do this:
    1. In the Connection field, specify the connection to the data warehouse.
    2. In the Table/View field, select the fact table that you want this stage to query. If the fact table does not already exist in the data warehouse, click Create Table to create the fact table in the data warehouse.
    3. For each field that you want to write to the fact table, check the box in the Include column.
    4. On the Runtime tab, notice that by default the Insert option is selected for the write mode. Typically fact table population is run in insert mode, so you can leave this option selected.