1(a). Naive Bayes



1(b). Assumptions

Which simplifying assumptions are made by the Naive Bayes model? Why can these assumptions result in less accurate classifiers compared to other learning algorithms?


The Naive Bayes model assume that $x_i$ are conditionaly independent given by label $y$ that why we have the the thirs “=” in the formula above.

$$ \hat{y} = argmaxP(y)P(x|y) = argmaxP(y)P(x_1,x_2,...x_n|y) \\= argmaxP(y)\prod_{i=1}^{n}P(x_i|y) $$

However, in general $P(x_1,x_2,...x_n|y)$ is far from $\prod_{i=1}^{n}P(x_i|y)$ so this assumption is not realistic. For example the word ‘Barack’ and ‘Obama’ is considered co-occur, they are not independent features. So that’s the reason Naive Bayes leads to less accurate classifiers compared to other learning algorithm.

2(a). HMM tagger

Consider the two sentences

  1. February made me shiver.