Clustering

A PMML clustering model determines the best matching cluster for a given record based on the distance or similarity measure used for clustering. A cluster is a subset of similar data. Clustering (also called unsupervised learning) is the process of dividing a dataset into groups such that the members of each group are as similar to each other as possible and different groups are as dissimilar from each other as possible.

Model Element

<ClusteringModel functionName="clustering" ...

Unsupported Features

Clustering models with the <MiningSchema> element containing a reference to a <DerivedField> element are not supported.

Model Outputs

Supported Model Output Features Description
predictedValue The best matching cluster based on the distance or similarity measure used for clustering.
transformedValue A value generated via a transformation expression applied to the predicted model output.
decision A value generated via an expression applied to the predicted model output resulting in a categorized value.
predictedDisplayValue The human readable value used to represent the predicted value from the model.
   
entityId If present, the 1-based index (implicit identifier) of the winning/predicted cluster.
affinity The value of the distance or the similarity of the provided record to the predicted cluster as defined in the model.