Configuring Basic Options

  1. Leave Standardize input fields checked to standardize the numeric columns to have zero mean and unit variance.
    If you do not use standardization, the results may include components dominated by variables appearing to have larger variances relative to other attributes as a matter of scale rather than true contribution.
  2. Check Score input data to add a column for the model prediction (score) to the input data.
  3. Check Prior if the data has been sampled and the mean of response does not reflect reality; then enter the prior probability for p(y==1) in the text field.
  4. Specify how to handle missing data by checking Skip or Impute means, which will add the mean value for any missing data.
  5. Specify a value between 1 and 100 as the Percentage for training data when the input data is randomly split into training and test data samples.
  6. Enter the value of 100 minus the amount you entered in Step 5 as the Percentage for test data.
  7. Enter a number as the Seed for sampling to ensure that when the data is split into test and train data it will occur the same way each time you run the dataflow. Uncheck this field to get a random split each time you run the flow.
  8. Click OK to save the model and configuration or continue to the next tab.