Introduction

Apache Hive provides User Defined Functions (UDF). A UDF can be defined to perform required actions and achieve desired objectives.

The Spectrum™ Data & Address Quality for Big Data SDK provides a set of Hive User Defined Functions and User Defined Aggregation Functions to run the listed Data Quality jobs.

User Defined Functions (UDF)

A User Defined Function processes one record at a time.

The UDF based jobs are:

Advanced Transformer
Custom Groovy Script
Global Address Validation
Match Key Generator
Open Name Parser
Open Parser
Table Lookup
Validate Address
Validate Address Global
Validate Address Loqate

User Defined Aggregation Functions (UDAF)

A User Defined Aggregation Function first aggregates records into collections based on the join field, and then processes one collection of records at a time.

The UDAF based jobs are:

Best of Breed
Duplicate Synchronization
Filter
Interflow Match
Intraflow Match
Transactional Match