Configuring Basic Options
-
Enter the maximum Number of trees in your model. Default
is 50.
-
Enter the Maximum depth—or the maximum number of levels
you want your model to contain. Default is 5.
-
Enter the Minimum rows—the minimum number of rows (or
records) you want your model to contain. Default is 10.
-
Enter the Number of bins numeric—the number of bins you
want the histogram to build and then split at the best point. Default is
20.
-
Enter the Number of bins top level—the minimum number of
bins you want at the root level. Default is 1024.
-
Enter the Number of bins categorical—the maximum number
of bins you want the histogram to build and then split at the best point.
Default is 1024.
-
Check Sample rate and enter the percentage of the rows
to be used as a sample in each tree. This can be a value from 0.0 to 1.0.
-
Check Column sample rate per tree and enter the column
sampling rate for each tree. This can be a value from 0.0 to 1.0.
-
Check Columns at each level and enter the relative
change of the column sampling rate for every level. This option defaults to 1.0
and can be a value from 0.0 to 2.0.
-
Check Score input data to add a column for the model
prediction (score) to the input data.
-
Specify a value between 1 and 100 as the Percentage for training
data when the input data is randomly split into training and test
data samples.
-
Enter the value of 100 minus the amount you entered in Step 5 as the
Percentage for test data.
-
Seed for sampling to ensure that when the data is split
into test and train data it will occur the same way each time you run the
dataflow. Uncheck this field to get a random split each time you run the
flow.
-
Click OK to save the model and configuration or continue
to the next tab.