Configuring Basic Options

  1. Enter the maximum Number of trees in your model. Default is 50.
  2. Enter the Maximum depth.
    This is the maximum number of levels you want your model to contain. The default is 5.
  3. Enter the Minimum rows.
    This is the minimum number of rows (or records) you want your model to contain. The default is 10.
  4. Enter the Number of bins numeric.
    This is the number of bins you want the histogram to build and then split at the best point. The default is 20.
  5. Enter the Number of bins top level.
    This is the minimum number of bins you want at the root level. The default is 1024.
  6. Enter the Number of bins categorical.
    This is the maximum number of bins you want the histogram to build and then split at the best point. The default is 1024.
  7. Check Sample rate and enter the percentage of the rows to be used as a sample in each tree. This can be a value from 0.0 to 1.0.
  8. Check Column sample rate per tree and enter the column sampling rate for each tree.
    This can be a value from 0.0 to 1.0.
  9. Check Columns at each level and enter the relative change of the column sampling rate for every level.
    This option defaults to 1.0 and can be a value from 0.0 to 2.0.
  10. Check Score input data to add a column for the model prediction (score) to the input data.
  11. Specify a value between 1 and 100 as the Percentage for training data when the input data is randomly split into training and test data samples.
  12. Enter the value of 100 minus the amount you entered in step 11 as the Percentage for test data.
  13. Seed for sampling to ensure that when the data is split into test and train data it will occur the same way each time you run the dataflow. Uncheck this field to get a random split each time you run the flow.
  14. Click OK to save the model and configuration or continue to the next tab.