Configuring Advanced Options
- Leave Ignore constant fields checked to skip fields that have the same value for each record.
- Leave Compute p values checked to calculate p values for the parameter estimates.
-
Leave Remove collinear column checked to automatically
remove collinear columns during model building. This will result in a 0
coefficient in the returned model.
This option must be checked if Compute p values is also checked.
-
Leave Include constant term (Intercept) checked to
include a constant term (intercept) in the model.
This field must be checked if Remove collinear column is also checked.
-
Select a Solver from the drop-down list. Note that
COORDINATE_DESCENT and COORDINATE_DESCENT_NAIVE are currently
experimental.
- AUTO
- Solver will be determined based on input data and parameters.
- COORDINATE_DESCENT
- IRLSM with the covariance updates version of cyclical coordinate descent in the innermost loop.
- COORDINATE_DESCENT_NAIVE
- IRLSM with the naive updates version of cyclical coordinate descent in the innermost loop.
- IRLSM
- Ideal for problems with a small number of predictors or for Lambda searches with L1 penalty.
- L_BFGS
- Ideal for datasets with many columns.
- Leave Seed for N fold checked and enter a seed number to ensure that when the data is split into test and train data it will occur the same way each time you run the dataflow. Leave "0" in this field to get a random split each time you run the flow.
- Check N fold and enter the number of folds if you are performing cross-validation.
-
Check Fold assignment and select from the drop-down list
if you are performing cross-validation. This field is applicable only if you
entered a value in N fold and Fold
field is not specified.
- AUTO
-
Allows the algorithm to automatically choose an option; currently it uses Random.
- Modulo
-
Evenly splits the dataset into the folds and does not depend on the seed.
- Random
-
Randomly splits the data into nfolds pieces; best for large datasets.
- Stratified
-
Stratifies the folds based on the response variable for classification problems. Evenly distributes observations from the different classes to all sets when splitting a dataset into train and test data. This can be useful if there are many classes and the dataset is relatively small.
-
If you are performing cross-validation, check Fold field
and select the field that contains the cross-validation fold index assignment
from the drop-down list.
This field is applicable only if you did not enter a value in N fold and Fold assignment.
- Check Max interation and enter the number of training iterations that should take place.
- Check Objective epsilon and enter the threshold for convergence; this must be a value between 0 and 1. If the objective value is less than this threshold, the model will be converged.
- Check Beta epsilon and enter the threshold for convergence; this must be a value between 0 and 1. If the objective value is less than this threshold, the model will be converged. If the L1 normalization of the current beta change is below this threshold, consider using convergence.
- Click OK to save the model and configuration or continue to the next tab.