Special Scenarios

Records with Blank Group-By Column

All records with a blank group-by value are marked as malformed records, and dumped in separate files in the output HDFS folder.

These malformed files are named as:

Malformed Records in Candidate Files

Candidate file records with a blank group-by column are discarded as malformed records and inserted into files with the file naming convention malformedRecordsCandidate-m-<5 digit numeral>.

For example, malformedRecordsCandidate-m-00000, malformedRecordsCandidate-m-00001.

This applies to Interflow Match jobs.

Malformed Records in Suspect Files

Suspect file records with a blank group-by column are discarded as malformed records and inserted into files with the file naming convention malformedRecordsSuspect-m-<5 digit numeral>.

For example, malformedRecordsSuspect-m-00000, malformedRecordsSuspect-m-00001.

This applies to Interflow Match jobs.

Malformed Records in Input Files

Input file records with a blank group-by column are discarded as malformed records and inserted into files with the file naming convention malformedRecords-m-<5 digit numeral>.

For example, malformedRecords-m-00000, malformedRecords-m-00001.

This applies to the jobs Intraflow Match, Transactional Match, Best of Breed, Duplicate Synchronization, and Filter.

Counters for Malformed Records

The number of malformed records in a job run is stored in the counters:

MALFORMED_CANDIDATE_RECORDS
MALFORMED_SUSPECT_RECORDS
MALFORMED_RECORDS

Note: The values in these counters can be accessed by invoking the getCounters() method of the AdvanceMatchFactory instance.