Introduction

Apache Hive provides User Defined Functions (UDF). A UDF can be defined to perform required actions and achieve desired objectives.

The Big Data Quality SDK provides a set of Hive User Defined Functions and User Defined Aggregation Functions to run the listed Data Quality jobs.

User Defined Functions (UDF)

A User Defined Function processes one record at a time.
The UDF based jobs are:
  • Match Key Generator
  • Table Lookup
  • Advanced Transformer
  • Open Name Parser

User Defined Aggregation Functions (UDAF)

A User Defined Aggregation Function first aggregates records into collections based on the join field, and then processes one collection of records at a time.
The UDAF based jobs are:
  • Interflow Match
  • Intraflow Match
  • Transactional Match
  • Best of Breed
  • Duplicate Synchronization
  • Filter
  • Validate Address
  • Validate Address Global
  • Validate Address Loqate