Introduction
Apache Hive provides User Defined Functions (UDF). A UDF can be defined to perform required actions and achieve desired objectives.
The Spectrum™ Data & Address Quality for Big Data SDK provides a set of Hive User Defined Functions and User Defined Aggregation Functions to run the listed Data Quality jobs.
User Defined Functions (UDF)
A User Defined Function processes one record at a time.The UDF based jobs are:
- Advanced Transformer
- Custom Groovy Script
- Global Address Validation
- Match Key Generator
- Open Name Parser
- Open Parser
- Table Lookup
- Validate Address
- Validate Address Global
- Validate Address Loqate
User Defined Aggregation Functions (UDAF)
A User Defined Aggregation Function first aggregates records into collections based on the join field, and then processes one collection of records at a time.The UDAF based jobs are:
- Best of Breed
- Duplicate Synchronization
- Filter
- Interflow Match
- Intraflow Match
- Transactional Match