Data Drift Statistics

Unique value count for all data types

The unique value count is referred to as Cardinality. Cardinality is a measure of the "unique" number of elements of the set. It provides a unique count of elements in the set.

For example, consider the below set of data.


Name	Age
Mark	35
John	29
Ashley	39
Jonas	33
Mark	35
John	25
Mark	20
James	33
Ashley	25
Emma	20

The unique values for "Name" column are: Mark, John, Ashley, Jonas, James, Emma

The cardinality of "Name" is 6

The unique values for "Age" column are: 35, 29, 39, 33, 35, 25, 20

The cardinality of "Name" is 7

Numeric Data

Minimum

The minimum is referred to as the Minimum Value in numeric data. It is the lowest value in the dataset.

For example, the set SALARY={1215, 2000, 5263, 1126, 3687} contains the lowest value "1126". The minimum value of the set SALARY is 1126.

Maximum

The maximum is referred to as the Maximum Value in numeric data. It is the highest value in the dataset.

For example, the set SALARY={1215, 2000, 5263, 1126, 3687} contains the highest value "5263". The maximum value of the set SALARY is 5263.

Mean

The Mean is the average of the elements present in the numeric data.

Mean = The addition of all the elements in a set/the number of elements.

For example, the set SALARY={1215, 2000, 5263, 1126, 3687} contains 5 elements. The mean can be calculated as [1215+2000+5263+1126+3687]/5 which equals 2658.2

Therefore, the mean of the set SALARY is 2658.2

Standard deviation

The Standard Deviation (SD) is a measure that provides a variation in data points from its mean value. The standard deviation is calculated as the square root of variance by determining each data point's deviation relative to the mean.

A higher SD means the data points are highly spread out from the mean, whereas, a lower SD means that the data points are close to the mean.

Textual Data

Minimum length

The Minimum Length is the smallest length of the text in the textual dataset.

For example, the set CITY={Sao Paulo, Mexico, Tokyo, Shanghai, Cairo, Mumbai} contains the lowest value for "Cairo" and "Tokyo". The minimum length of the set CITY is 5.

Maximum length

The Maximum Length is the largest length of the text in the textual dataset.

For example, the set CITY={Sao Paulo, Mexico, Tokyo, Shanghai, Cairo, Mumbai} contains the largest value for "Sao Paulo". The maximum length of the set CITY is 9.

Detail Drift

Distribution of value count

The distribution of value count is referred to as Cardinality Detail. Cardinality detail is a measure of the "count of occurrence of an element" in the set of elements. The cardinality detail can be calculated in percentage as,

(Number of rows an element has occurred/Total number of elements)*100

For example, consider the below set of data.


Name	Age
Mark	35
John	29
Ashley	39
Jonas	33
Mark	35
John	25
Mark	20
James	33
Ashley	25
Emma	20

The cardinality detail of "Mark" = (3/10)*100 which equals 30 percent

The cardinality detail of "25" = (2/10)*100 which equals 20 percent