Introduction to Spectrum Smart Data Quality

Spectrum Smart Data Quality is a Machine Learning based solution which helps to create initial match rules and potential match key components for your entity resolution process. With added machine learning capabilities to the Data Quality processes, the matching procedure has been significantly simplified and is capable of unlocking maximum potential available in your data.

Matching algorithms and thresholds are learnt automatically based on the user's matching scenario. An initial match rule and potential match key components are generated via the inputs and tagging provided.

To generate match rules and match key components using this system, upload your data, which must be a comprehensive collection of all possible variations. Subsequently, select the columns on which matching has to be performed. Records are grouped automatically and group strength assigned to create the optimal training set for your model. Training sets are based on the unsupervised machine learning algorithms. You need to tag the records according to your matching scenario and obtain potential match key components as well as a match rule learnt from your sample data.

See the task flow and the subsequent sections for a step-by-step guide to generate a match rule and potential match key components.

The Task Flow

  1. Start with Selecting files from the source. The selected file must have all the possible variations.
  2. After uploading the file, Select Columns from your data on which you wish to perform matching. The columns selected in this step are used for automatically generating the groups. The default setting uses the first 20K records for creating the training set. However, you can choose to point the system to your complete data set. It will pick the relevant training set based on the matching definition of the business provided to it.
  3. After reviewing the groups, tag the displayed record pairs as Match, Non-Match, or Unsure.
  4. The final step involved is to view and analyze the generated results. Upon reviewing the match rule, you can choose to export it to match rule repository in the Enterprise Designer and consume it in the matching stages. For more information about match rules, see Match Rules.

    After reviewing match key components, you can use them in the Match Key Generator stage of the Enterprise Designer.

The Spectrum Smart Data Quality is integrated with Data Stewardship, and this results in the continuous evolution of the match rules based on the Data Stewardship intuitions. For details, see Improving match rules.