Introduction
The Big Data Quality SDK helps you create, configure and run MapReduce jobs, Spark jobs, and Hive User-Defined Functions for Data Quality operations on a Hadoop platform.
Using the SDK, you can create and execute the jobs directly on a Hadoop platform, thus eliminating network delays and running distributed Data Quality processes in cluster, resulting in a drastic improvement in the performance.
The modules supported in the Big Data Quality SDK are:
- Advanced Matching Module
- Data Normalization Module
- Universal Name Module
- Universal Addressing Module
SDK Usage
This SDK can currently be used through:- Java APIs: Supports MapReduce and Spark
- Hive User-Defined Functions