You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
praftery edited this page Sep 5, 2015
·
1 revision
Using the pre-processed input data as described here, mave uses several different methods to build a model on the pre-retrofit data using the following methods: Dummy Regressor, Hour Of Week Bin Regressor, K Nearest Neighbors Regressor, Random Forest Regressor, & Extra Trees Regressor.
Parameter selection for each method
Each of these methods has an associated set of configuration parameters. For example, the simple case of the Hour of Week Bin Regressor might make a prediction using either the mean or the median of each bin. For these simple methods mave explores all possible combinations of parameter values. However, the more complex methods typically have more parameters and thousands (or millions) of combinations of parameter values. For these complex methods, mave trains a model using a randomly selected set of parameter values and iterates this process multiple times (default search_iterations=20).
Model selection
For each method, mave selects the best performing set of parameter values using k-fold cross validation (default k=10), according to the R2 value. Mave then selects an overall best model also based on the highest R2 value, and this is the final model used for prediction. For the vast majority of datasets we have looked at, this is typically either the Random Forest Regressor or Extra Trees Regressor.