Introduction
Apache Hive provides User Defined Functions (UDF). A UDF can be defined to perform required actions and achieve desired objectives.
The Big Data Quality SDK provides a set of Hive User Defined Functions and User Defined Aggregation Functions to run the listed Data Quality jobs.
User Defined Functions (UDF)
A User Defined Function processes one record at a time.The UDF based jobs are:
- Match Key Generator
- Table Lookup
- Advanced Transformer
- Open Name Parser
User Defined Aggregation Functions (UDAF)
A User Defined Aggregation Function first aggregates records into collections based on the join field, and then processes one collection of records at a time.The UDAF based jobs are:
- Interflow Match
- Intraflow Match
- Transactional Match
- Best of Breed
- Duplicate Synchronization
- Filter
- Validate Address
- Validate Address Global
- Validate Address Loqate