Introduction

The Big Data Quality SDK helps you create, configure and run MapReduce jobs, Spark jobs, and Hive User-Defined Functions for Data Quality operations on a Hadoop platform.

Using the SDK, you can create and execute the jobs directly on a Hadoop platform, thus eliminating network delays and running distributed Data Quality processes in cluster, resulting in a drastic improvement in the performance.

The modules supported in the Big Data Quality SDK are:

Advanced Matching Module
Data Normalization Module
Universal Name Module
Universal Addressing Module

SDK Usage

This SDK can currently be used through:

Java APIs: Supports MapReduce and Spark
Hive User-Defined Functions