Introduction

Apache Hive provides User Defined Functions (UDF). A UDF can be defined to perform required actions and achieve desired objectives.

The Spectrum™ Data & Address Quality for Big Data SDK provides a set of Hive User Defined Functions and User Defined Aggregation Functions to run the listed Data Quality jobs.

User Defined Functions (UDF)

A User Defined Function processes one record at a time.
The UDF based jobs are:
  • Advanced Transformer
  • Custom Groovy Script
  • Global Address Validation
  • Match Key Generator
  • Open Name Parser
  • Open Parser
  • Table Lookup
  • Validate Address
  • Validate Address Global
  • Validate Address Loqate

User Defined Aggregation Functions (UDAF)

A User Defined Aggregation Function first aggregates records into collections based on the join field, and then processes one collection of records at a time.
The UDAF based jobs are:
  • Best of Breed
  • Duplicate Synchronization
  • Filter
  • Interflow Match
  • Intraflow Match
  • Transactional Match