Selecting Columns
Note: By
default, the maximum collection size limit is 10, and the groups larger than that
are excluded from the consolidation process.
The purpose of generating variations is to identify a small subset of collections for tagging, which covers most of the unique variations in source data. It's like picking up few collections from a large set of collections representing the complete set so that tagging on this subset will provide the best of breed rule close to one we would have got by tagging the entire collection set.
The variations are generated based on
operations that we have in the Best Of Breed stage.
BOB Operator | Based on Feature |
---|---|
Most Common | Frequency |
Longest/Shortest | Length |
Highest/Lowest | Rank |
Greater/Less Than | Absolute values |
Equals/Not Equals | It is based on finding the values which are category-specific and using the obtained values as a feature. |
Empty/Not Empty | Frequency |
Note: By default, the field Collection number, which is a
mandatory field, is auto-selected and disabled. The collection number identifies
each duplicate record in a match queue, and if the candidate is a duplicate, it is
assigned a collection number.