Identifying Members of a Household

This dataflow template demonstrates how to identify members of the same household by comparing information within a single input file and creating an output file of household collections.

Business Scenario

As data steward for a credit card company and you want to analyze your customer database and find out which addresses occur multiple times and under what names so that you can minimize that number of duplicate mailings and credit card offers sent to the same address.

The following dataflow provides a solution to the business scenario:

This dataflow template is available in Enterprise Designer. Go to File > New > Dataflow > From template and select HouseholdRelationships. This dataflow requires the following products: Advanced Matching, Data Normalization, and Universal Name.

For each record in the input file, this dataflow will do the following:

Read from File

This stage identifies the file name, location, and layout of the file that contains the names you want to parse. The file contains both male and female names.

Open Name Parser

Open Name Parser examines name fields and compares them to name data stored in the Spectrum Technology Platform name database files. Based on the comparison, it parses the name data into First, Middle, and Last name fields, assigns an entity type, and a gender to each name. It also uses pattern recognition in addition to the name data.

Standardize Nicknames

In this template, the Table Lookup stage is named Standardize Nicknames. Standardize Nickname stage looks up first names in the Nicknames.xml database and replaces any nicknames with the more regular form of the nickname. For example, the name Tommy is replaced with Thomas.

Transformer

In this template, the Transformer stage is named Assign Titles. Assign Titles stage uses a custom script to search each row in the data stream output by the Parse Personal Name stage and assign a TitleOfRespect value based on the GenderCode value.

The custom script is:

if (row.get('TitleOfRespect') == '') 
{ 
	if (row.get('GenderCode') == 'M') 
		row.set('TitleOfRespect', 'Mr') 
	if (row.get('GenderCode') == 'F') 
		row.set('TitleOfRespect', 'Ms')

Every time the Assign Titles stage encounters M in the GenderCode field it sets the value for TitleOfRespect as Mr. Every time the Assign Titles stages encounters F in the GenderCode field it sets the value of TitleOfRespect as Ms.

Match Key Generator

The Match Key Generator processes user-defined rules that consist of algorithms and input source fields to generate the match key field. A match key is a non-unique key shared by like records that identify records as potential duplicates. The match key is used to facilitate the matching process by only comparing records that contain the same match key. A match key is comprised of input fields. Each input field specified has a selected algorithm that is performed on it. The result of each field is then concatenated to create a single match key field.

In this template, two match key fields are defined: SubString (LastName (1:3)) and SubString (PostalCode (1:5)).

For example, if the incoming address was:

FirstName - Fred

LastName - Mertz

PostalCode - 21114-1687

And the rules specified that:


Input Field	Start Position	Length
LastName	1	3
PostalCode	1	5

Then the key, based on the rules and the input data shown above, would be:

Mer21114

Household Match

In this dataflow template the Intraflow Match stage is named Household Match. This stage locates matches between similar data records within a single input stream. Matched records can also be qualified by using non-name/non-address information. The matching engine allows you to create hierarchical rules based on any fields that have been defined or created in other stages.

A stream of records to be matched as well as settings that specify what fields should be compared, how scores should be computed, and generally what constitutes a successful match.

In this template, you create a custom matching rule that compares LastName and AddressLine1. Select the Generate data for analysis check box to generate data for the Interflow Summary Report.

Here are some guidelines to follow when creating your matching hierarchy:

A parent node must be given a unique name. It can not be a field.
The child field must be a Spectrum Technology Platform data type field, that is, one available through one or more components.
All children under a parent must use the same logical operators. To combine connectors you must first create intermediate parent nodes.
Thresholds at the parent node could be higher than the threshold of the children.
Parent nodes do not have to have a threshold.

Write to File

The template contains one Write to File stage that creates a text file that shows the addresses as a collection of households.

Intraflow Summary Report

The template contains the Intraflow Match Summary Report. After you run the job, expand Reports in the Execution Details window, and then click IntraflowMatchSummary.

The Intraflow Match Summary Report lists the statistics for the records processed and shows a bar chart that graphically illustrates the record count and overall matching score.