Standardization
Generally speaking, the process of standardization is to normalize values in a table. The values of the input table are copied to the output table. In the process, any non-standard values in the input table are replaced with standard values. Examples include standardizing abbreviations or how units of measure are specified.
To implement a standardization process, you first create a standardization specification. Standardization specifications are located in theStandardization subsection of theSpecifications section in the navigation tree. After creating a standardization specification you can:
- Execute it and see the result by right clicking on the specification name.
- Or create the GeneralStandardize operation in a scenario that executes the specification.
A standardization specification consists the following elements:
- Input table field contains the name of the input table;
- Result table field contains the name of the result table;
- Columns table contains a list of columns of the result table.
Each record in the Columns table represent a column or a group of columns of the result table. The Result column name column defines the name of the result column. The Expression column defines an expression to compute the value of the result column. Both Result column name and Expression can refer by ${col_name} to a column of the input table specified in the Input column column. The Input column column can have the following values:
- used to copy all columns from the input table.
- used when expression does not need to refer to a column from the input table.
- name of a column in the input table.
Let us consider the following Columns table as an example.
Input column |
Expression |
Result column name |
<expression> |
InternalRecordId |
data_id |
<any column> |
${col_name} |
colnameFirstName|ilstandstr({col_name}) |
The first row in the above table creates a duplicate column of column InternalRecordId with the name data_id. The second row copies all the columns from the input table into the result table, keeping their original names. The last two rows add two columns that contains standardized values of the FirstName and LastName columns. The names of the standardized columns will be FirstName_st and LastName_st respectively.