Parse Name
This transformation parses personal and business names into their constituent parts, such as given name, surname, and titles and qualifiers.
The Parse Name transformation completes several actions while parsing a name.
- Identify names as business or personal names.
- Determines the proper boundary between the given name, middle name, and surname.
- Identifies and separates titles and qualifiers from personal names.
- Identifies and separates suffix from firm names.
- Scores how likely the parse is correct based on similar names, and the relative frequency of the name phrase as a given name or surname.
- Step name
- Defines the name for a step. Provide a meaningful name so that anyone who edits transformation steps in a pipeline will be able to identify the purpose of a step.
- Columns
- Specifies columns to transform. Click to view the list of column names. Column names listed here correspond to the column names in the dataset inspection table. Click in the drop-down list to select or clear check boxes next to column names. Alternatively, click column headings in the inspection table to select or clear check boxes next to the corresponding column names in this box. To add a column to the selection, you must press the the Ctrl key when you click a column heading.
- Save
- Click this button to close settings and save changes to the transformation settings.
- Preview
- Click this button to preview the results of the transformation settings.
- Cancel
- Click this button to close settings for this transformations without saving any changes.
Output columns
For each of the input fields, the Parse Name transformation adds columns the data.
The added columns are labled by an identifying string appended to the original column name ColumnName
.
General output columns
General output columns are appended for both firm and personal names. These columns provide information about the parsing operation.
ColumnName_IsParsed
- Boolean specifies whether column field was parsed as a firm or person.
ColumnName_ParsingScore
- Scores from 0 to 100 that a name was parsed correctly.
ColumnName_ParsedAs
- Specifies how the column field was parsed.
Personal output columns
Output columns for a column that is idenfied as a personal name separates parts of the name into different columns. Names are parsed in both natural (first name first) and reverse order (last name first). The original column name is separated into its parts. The output columns are appended by a string that identifies a title, first name, middle name, last name, and suffix.
ColumnName_TitleOfRespect
- An honorific title (such as Mr. Mrs, Ms, or Professor) identified for a person in the input column.
ColumnName_FirstName
- The first name identified for a person in the input column.
ColumnName_MiddleName
- The middle name identified for a person in the input column.
ColumnName_LastName
- The last name identified for a person in the input column.
ColumnName_MaturitySuffix
- The suffix identified for a person in the input column. This suffix is sometimes categorized as generational (such as Jr. or Sr.), professional (J.P. or PT), or educational (MBA or PH.D.). This column collects any of these that follow a name in a personal name column.
Firm output columns
Firm output columns append identifying strings starting with _Firm
. These names are parsed from columns that are identified as firm names.
ColumnName_FirmName
- A distinguishable firm name identified in a column field.
ColumnName_FirmSuffix
- The corporate suffix (such as "Inc.", "Incorporated", "Ltd.", or "Limited") identified in a column field.