Advanced Options tab
Options
- Ignore constant fields
- Leave this check box checked to skip fields that have the same value for each record.
- Compute p values
- Leave this check box checked to calculate p values for the parameter estimates
- Remove collinear column
- Leave this check box checked to automatically remove collinear columns during model building. This option must be checked if Compute p values is also checked. This results in a 0 coefficient in the returned model.
- Include constant term (Intercept)
- Leave this check box checked to include a constant term (intercept) in the model. This field must be checked if the Remove collinear column check box is also checked.
- Solver
- Select a solver from in the drop-down list box.
Convergence Criteria
- Maximum iterations
- Specifies the number of training iterations that should take place.
- Objective epsilon
- Specifies the threshold for convergence. If the objective value is less than this threshold, the model will be converged. This must be a value between 0 and 1, exclusive. The default setting is 0.0001.
- Beta epsilon
- Specifies the threshold for convergence. If the objective value is less than this threshold, the model will be converged. If the L1 normalization of the current beta change is below this threshold, consider using convergence.
Cross Validation
- Seed for N fold
- Leave this check box checked and enter a seed number to ensure that when the data is split into test and train data in the same manner each time you run the dataflow. Uncheck this field to get a random split each time you run the flow. The default setting is 15341.
- N fold
- Check this check box and enter the number of folds to perform cross validation.
- Fold assignment
- Check this check box and select from the drop-down list if you are performing cross-validation. This field is applicable only if you entered a value in the N fold box and the Fold field is not specified.
- Fold field
- f you are performing cross-validation, check this check box and select the field that contains the cross-validation fold index assignment from the drop-down list. This field is applicable only if you did not enter a value in N fold and Fold assignment.
Regularization
- Regularization type
- Choose the appropriate regularization type. A common concern in predictive modeling is overfitting, when an analytical model corresponds too closely (or exactly) to a specific dataset and therefore may fail when applied to additional data or future observations. Regularization is one method used to mitigate overfitting.
- Value of alpha
- Check this check box and change the value if you do not want to use the default of .5. The alpha parameter controls the distribution between the ℓ1 and ℓ2 penalties. Valid values range between 0 and 1; a value of 1.0 represents LASSO, and a value of 0.0 produces ridge regression. The table below illustrates how alpha and lambda affect regularization.
- Value of lambda
- Check this check box and specify a value if you do not want Logistic Regression to use the default method of calculating the lambda value, which is a heuristic based on training data. The lambda parameter controls the amount of regularization applied. For example, if lambda is 0.0, no regularization is applied and the alpha parameter is ignored.
- Search for optimal value of lambda
- Check this check box to have Logistic Regression compute models for full regularization path. This starts at lambda max (the highest lambda value that makes sense—that is, the lowest value driving all coefficients to zero) and goes down to lambda min on the log scale, decreasing regularization strength at each step. The returned model will have coefficients corresponding to the optimal lambda value as decided during training.
- Maximum active predictors
- Check this check box and enter the maximum number of predictors to use during computation. This value is used as a stopping criterion to prevent expensive model building with many predictors