0

I am trying to make prediction using random forest regression and then utilize GridSearchCV to tune hyperparameters(just 'n_estimators'). However results of GridSearchCV are worse than base model. Despite GridSearchCV parameter grid has same values with base model the results of GridSearchCV model is still worse. So, why is the RMSE of the GridSearchCV model worse than base model?

Here are models that I used:

Base Model:

rfr_rd_fi = RandomForestRegressor(n_estimators = 100, 
                                  max_features=7, 
                                  random_state = 3115)

Root Mean Squared Error for base model is 200.2

GridSearchCV model:

param_grid = {'n_estimators' : np.arange(10,1010,10, dtype=int),
              'max_features' : [7],
              'random_state' : [3115]}

cv_test= KFold(n_splits=10)

rfr_gs = GridSearchCV(RandomForestRegressor(), 
                      param_grid=param_grid,
                      scoring='neg_mean_squared_error',
                      cv=cv_test, verbose=4, n_jobs=-1).fit(X_train, y_train)

Root Mean Squared Error for GridSearchCV model is 257.2

Snehal Patel
  • 912
  • 1
  • 1
  • 25

1 Answers1

0

There are a number of possible reasons why this is the case. Here are a few things to check.

 1. Was training and testing done on the same dataset for your baseline model?
 2. Did you do cross validation with your baseline model?
 3. If so, did you use the same folds?

Bottom line: To compare apples to apples, your baseline model and your Grid search should have the same pipelines, except for the hyperparameters, of course. After you have done that, you can look at your grid search model where n_estimators=100) and you should have identical results as the baseline model.

On a separate note, you should consider nesting your hyperparameter tuning within your crossvalidation (see article).

Snehal Patel
  • 912
  • 1
  • 1
  • 25