Architecture and Processes

The following diagram illustrates the EnterworksDiff architecture and general flow of data:

Process Overview

The Syndication Source generates a full replacement file. This file is referred to here as the “new source file”. The new source file is picked up by a Scheduled Import that is configured with the EnterworksDiff pre-processing module. The module places the file into a staging area that already contains the previous full file from the Syndication Source. This previous full file is referred to here as the “old source file”. It renames both source files to the names specified during the configuration process.

The module then launches the EnterworksDiff Utility, which loads the two source files into a PostgreSQL database, compares them, and generates two files:

  • source_delta.csv will contain the records that are different between the two source files. It has the same columns as the old and new source files, with the added column il_modification_status which will contain the statuses:
    • new
    • removed
    • updated

    to indicate whether a record has been added, removed, or changed between the old and new source files.

  • source_stat.csv will contain the record counts for the old and new source files, as well as the number of added, updated, and deleted records.

The Scheduled Import passes the delta file to the EnterWorks Import process for loading into the Target Repository. It also retrieve the details from the source_stat.csv file and updates the Import log and job statistics.

When the process is initially triggered, if there is no old source file, which would be the case the first time the process is run, the preprocessing module skips the Diff processing and treats the new file as the delta file.

Configuration Overview

The EnterworksDiff process must be configured for each new source file that needs to be pre-processed. Each configuration identifies:

  • The name of the previous “new source file” that is left over from the previous running of the EnterworksDiff process.
  • A name for the old source file. The EnterworksDiff process will rename the previous “new source file” to the name specified for the old source file.
  • The name of the new source file generated by the Syndication Source.
  • A new name for the new source file when it is copied into the working folder.
  • The key columns: The columns to be used to uniquely identify each record in the files.
  • The names of the Diff files generated by the utility:
    • source_delta.csv
    • source_stat.csv