Introduction

The Spectrum™ Data & Address Quality for Big Data SDK helps you create, configure and run MapReduce jobs, Spark jobs, and Hive User-Defined Functions for Data Quality operations on a Hadoop platform.

Using the SDK, you can create and run the jobs directly on a Hadoop platform, thus eliminating network delays and running distributed Data Quality processes in cluster, resulting in a remarkable improvement in the performance.

Note: You can also use Amazon S3 Native FileSystem (s3n) as input and output for Hadoop MapReduce and Spark jobs.

SDK Usage

This SDK can currently be used through Java APIs and Hive User-Defined Functions (UDFs).

Java APIs
- MapReduce API
- Spark API
Hive User-Defined Functions