Introduction

The Big Data Quality SDK helps you create, configure and run MapReduce jobs, Spark jobs, and Hive User-Defined Functions for Data Quality operations on a Hadoop platform.

Using the SDK, you can create and execute the jobs directly on a Hadoop platform, thus eliminating network delays and running distributed Data Quality processes in cluster, resulting in a drastic improvement in the performance.

The modules supported in the Big Data Quality SDK are:
  1. Advanced Matching Module
  2. Data Normalization Module
  3. Universal Name Module
  4. Universal Addressing Module

SDK Usage

This SDK can currently be used through:
  1. Java APIs: Supports MapReduce and Spark
  2. Hive User-Defined Functions