3

In the attached image

Table1

there is the probability with the Naive Bayes algorithm of:

Fem:dv/m/s Young own Ex-credpaid Good ->62%

I calculated the probability so:

$$P(Fem:dv/m/s \mid Good) * P(Young \mid Good)*P(own \mid Good)*P(Ex-credpaid \mid good)*P(Good) = 1/6*2/6*5/6*3/6*0.6 = 0,01389$$

I don't know where I failed. Could someone please tell me where is my error?

nbro
  • 39,006
  • 12
  • 98
  • 176
TomaateTip
  • 71
  • 1

1 Answers1

2

Your probability hasn't been normalized!

In this case, you are computing the probability of being good, given that the other features have a fixed value. To obtain the correct probability, you need to normalize (divide) the value from your calculation by the probability that the features have taken on those fixed values.

You can calculate this as follows: $$P(Fem:dv/m/s, Young, own, Ex-credpaid) = \\ \sum_{x \in \{good,bad\}} P(Fem:dv/m/s, Young, own, Ex-credpaid, x) $$

by the marginalization rule.

Then, by the chain rule, you may write:

$$\\ \sum_{x \in \{good,bad\}} P(Fem:dv/m/s, Young, own, Ex-credpaid | x) * P(x) $$

So the correct probability of 0.62 should be obtained by the equation:

$$ \frac{P(Fem:dv/m/s, Young, own, Ex-credpaid | good) * P(good)}{\sum_{x \in \{good,bad\}} P(Fem:dv/m/s, Young, own, Ex-credpaid | x) * P(x)}$$

You just need to calculate

$$P(Fem:dv/m/s, Young, own, Ex-credpaid | bad) * P(bad)$$

and it should be easy to compute the rest.

John Doucette
  • 9,147
  • 1
  • 17
  • 52
  • thanks for the Answer, but I dont have the same result of 0.62. when I doing: P(Fem:dv/m/s | bad)*P(Young | bad)*P(own | bad)*P(Ex-credpaid | bad)*P(bad) -> 2/4*3/4*1/4*2/4*0.4= 0,01875 and 0,01389/(0,01389+0,01875)=0,4255 @John Doucette – TomaateTip Aug 20 '18 at 21:36
  • Hmm. You are correct. Do you have more context for that table? Where does it come from? – John Doucette Aug 21 '18 at 12:23
  • the table comes from this paper: (page 3) https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4909197 – TomaateTip Aug 21 '18 at 14:05
  • Hmm. It looks like there are several things at play. First, I think the authors may have used Laplacian Smoothing. This is a usual thing to do, but it changes the calculation. Basically, you add one additional observation of each kind to each of the conditional distributions to make a uniform prior, and then continue to count these when computing the posteriors. Even doing that however, I only get 50%. It's possible that the table is simply incorrect. – John Doucette Aug 21 '18 at 15:15
  • I hear, that the probability was false, and I do the same algorithm in Python and I dont know how do you did the naivebayes algorithm, how do you did? – TomaateTip Aug 29 '18 at 14:58
  • @TomaateTip I'm not sure I understand what you mean. I think the probability in the table is incorrect, and doing Naive Bayes the way you described (plus the normalization in my answer) will yield the correct probabilities. – John Doucette Aug 29 '18 at 20:37
  • yes, is incorrect, but you said that you get 50%, do you programmed it? when yes, how? – TomaateTip Aug 29 '18 at 20:40
  • No, I computed it by hand. – John Doucette Aug 30 '18 at 00:56