Ensemble models - XGboost

Question

I am building 2 models using XGboost, one with x number of parameters and the other with y number of parameters of the data set.

It is a classification problem. A yes-yes, no-no case is easy, but what should I do when one model predicts a yes and the other model predicts a no ?

Model A with x parameters has an accuracy of 82% and model B with y parameters has accuracy of 79%.

You haven't given any information on how well these two models perform. Show their ROC curves, confusion matrices, accuracies, etc. Once you summarize how the two models perform you can pick the better of the two and ignore the other one. — Brian O'Donnell, Nov 01 '18 at 03:14
@siddharth I approved the edit, but can I ask, are siddharth and Siddharth Srivastava the same person? (If so, please consider merging the two accounts:) — DukeZhou, Nov 01 '18 at 19:52

Saber · Answer 1 · 2018-11-02T11:42:05.923

Without any additional information, lean towards the vote of the best performing classifier when it comes to ties.

However, as others have stated already, it is best to analyze the performances in more detail (e.g. confusion matrices).

For instance, it could be that model B almost always classifies class X correctly (hardly any false positives). In that case, you could lean towards the prediction of model B if it predicts class X and model A does not. In other words; you could weigh the votes of the models based on how well they did in similar, previous predictions.

Brian O'Donnell · Answer 2 · 2018-11-02T02:07:40.520

0

Given only the fact that model A has a higher accuracy than model B you should just use model A. More information on the performance of the two classifiers should be provided for a better answer.

edited Nov 02 '18 at 02:07

answered Nov 01 '18 at 23:23

Brian O'Donnell

1,853
6
20

But what is the margin of error? imho I don't see a lot of difference between 82% and 79%... – DukeZhou Nov 02 '18 at 01:46
1

We don't have any information to determine a 'margin of error'. Without any other performance data I would say 82% is 3% more than 79% which is enough to be the winner. With more detailed information from Siddarth we could make a better analysis. Accuracy is not a lot to make a determination especially when the application is not known. I have seen high accuracy classifiers get beaten by much lower accuracy classifiers. – Brian O'Donnell Nov 02 '18 at 02:05

Ensemble models - XGboost

2 Answers2