Column Level Profiling Details

These details are displayed when you select a column () in the left pane.

Summary

Completeness (%): The percentage of Complete, Null, and Blank detected in the column.
Uniqueness: These statistics are displayed here:
- Unique: Records with no duplicates in the data source.
- Non-unique: Records having duplicates in the data source.
- Distinct: A list of all records present in your data source irrespective of those being unique or non-unique records.
Frequency Analysis: Distribution of frequency of data value for any type of column. It shows the repetitions of the data value.

For example, your column contains these names:

Roger
Gigi
Gigi
Gigi
Garey
Elena
Brad
Brad

Here:

Further, the column-level details are displayed based on the data type in the column.

Profiling Results for Numerical Data

If the column data type is numerical, the following details are displayed:

Numerical Analysis: These statistics are displayed here:
- Minimum: It is the minimum value for any numerical data and date.
- Maximum: It is the maximum value for any numerical data and date.
- Standard Deviation: It is a statistic that measures the dispersion of a dataset relative to its mean.
- Variance: It is the average of the squared deviation from its mean.
- Average: It is an average of the numerical values present in the column data.

Percentile: It is a value where an observation falls in a range of other observations. For example, if a score falls in the 30th percentile, this means that 30 percent of all the scores recorded are lower.
Histogram: Represents the distribution of data.
Note: The histogram is not shown if the minimum and maximum values for the numerical column are equal.

Profiling Results for String Data

If the column data type is a string, the following details are displayed:

String Analysis: These statistics are displayed here:
- Semantic Type: Detected semantic type. It will not be displayed if the semantic type is not detected in the data.
- Min Length: Minimum length of the string in the column.
- Max Length: Maximum length of the string in the column.
Text Pattern: It shows whether a string contains a particular pattern of characters.
String Length: The distribution of string lengths in the selected string field. String length is the number of characters in a string.
Script Distribution: The scripts (alphabet) present in the selected string column. Characters common to scripts are categorized under common.
Character Categories: Graphically displays the frequencies of Latin character types detected in the selected string column. The various categories are:
- Casing: Upper Case, Lower Case, and Mixed Case
- Character Data Types: Alphabetic, Numeric, and Alphanumeric
- Contains Spaces: Single Space, Multiple Spaces, and Trailing or Leading Spaces
- Special Character: Contains or Does not Contain.
  Note: Only the special characters defined during the configuration of Character Analysis rule will be considered here.

The details are displayed when you select the Semantic tab.

Semantic Type: Displays the list of detected semantic types in the selected column.
Confidence: Displays the confidence level on the detected semantic type. It is the percentage of the surety or the possibility of the data present in the column.

For example, 98% confidence in the phone number means that there is a 98% possibility that the data contains phone numbers.

Validity: Displays the percentage of valid data for the semantic type.
Note: Validity is shown only in case you select the respective semantic rule under semantic analysis during profile creation. For details, refer to Semantic Analysis