Basic Options tab

Standardize input fields
Check the check box to standardize the numeric columns to have zero mean and unit variance. This is the default.

If you do not use standardization, the results may include components dominated by variables appearing to have larger variances relative to other attributes as a matter of scale rather than true contribution.

Score input data
Check this check box to add a column for the model prediction (score) to the input data.
Prior
Check this check box if the data has been sampled and the mean of response does not reflect reality. Enter the prior probability for p(y==1) in the text field. The default value is 0.5.
Missing data
This option specifies how to handle missing data.
  • Skip—Skips missing data.
  • Impute means—Adds the mean value for missing data.
Sampling
  • Persentage for training data—Specify a value between 1 and 100 when the input data is randomly split into training and test data samples.
  • Percentage for test data—Enter the value of 100 minus the value entered in Persentage for training data.
  • Seed for sampling—Enter a number to ensure that when the data is split into test and train data in the same way each time you run the dataflow. Uncheck this field to get a random split each time you run the flow.