Special Scenarios

Records with Blank Group-By Column

All records with a blank group-by value are marked as malformed records, and dumped in separate files in the output HDFS folder.
These malformed files are named as:
Malformed Records in Candidate Files
Candidate file records with a blank group-by column are discarded as malformed records and inserted into files with the file naming convention malformedRecordsCandidate-m-<5 digit numeral>.

For example, malformedRecordsCandidate-m-00000, malformedRecordsCandidate-m-00001.

This applies to Interflow Match jobs.

Malformed Records in Suspect Files
Suspect file records with a blank group-by column are discarded as malformed records and inserted into files with the file naming convention malformedRecordsSuspect-m-<5 digit numeral>.

For example, malformedRecordsSuspect-m-00000, malformedRecordsSuspect-m-00001.

This applies to Interflow Match jobs.

Malformed Records in Input Files
Input file records with a blank group-by column are discarded as malformed records and inserted into files with the file naming convention malformedRecords-m-<5 digit numeral>.

For example, malformedRecords-m-00000, malformedRecords-m-00001.

This applies to the jobs Intraflow Match, Transactional Match, Best of Breed, Duplicate Synchronization, and Filter.

Counters for Malformed Records
The number of malformed records in a job run is stored in the counters:
  • MALFORMED_CANDIDATE_RECORDS
  • MALFORMED_SUSPECT_RECORDS
  • MALFORMED_RECORDS
Note: The values in these counters can be accessed by invoking the getCounters() method of the AdvanceMatchFactory instance.