Table of Contents

Scikit Learn – Classification with NaÃ¯ve Bayes

NaÃ¯ve Bayes methods are a set of supervised learning algorithms based on applying Bayesâ theorem with a strong assumption that all the predictors are independent to each other i.e. the presence of a feature in a class is independent to the presence of any other feature in the same class. This is naÃ¯ve assumption that is why these methods are called NaÃ¯ve Bayes methods.

Bayes theorem states the following relationship in order to find the posterior probability of class i.e. the probability of a label and some observed features, $Pleft(begin{array}{c} Yarrowvert featuresend{array}right)$.

$$Pleft(begin{array}{c} Yarrowvert featuresend{array}right)=left(frac{Plgroup Yrgroup Pleft(begin{array}{c} featuresarrowvert Yend{array}right)}{Pleft(begin{array}{c} featuresend{array}right)}right)$$

Here, $Pleft(begin{array}{c} Yarrowvert featuresend{array}right)$ is the posterior probability of class.

$Pleft(begin{array}{c} Yend{array}right)$ is the prior probability of class.

$Pleft(begin{array}{c} featuresarrowvert Yend{array}right)$ is the likelihood which is the probability of predictor given class.

$Pleft(begin{array}{c} featuresend{array}right)$ is the prior probability of predictor.

The Scikit-learn provides different naÃ¯ve Bayes classifiers models namely Gaussian, Multinomial, Complement and Bernoulli. All of them differ mainly by the assumption they make regarding the distribution of ð·$Pleft(begin{array}{c} featuresarrowvert Yend{array}right)$ i.e. the probability of predictor given class.

Sr.No	Model & Description
1	Gaussian NaÃ¯ve Bayes classifier assumes that the data from each label is drawn from a simple Gaussian distribution.
2	It assumes that the features are drawn from a simple Multinomial distribution.
3	The assumption in this model is that the features binary (0s and 1s) in nature. An application of Bernoulli NaÃ¯ve Bayes classification is Text classification with âbag of wordsâ model
4	It was designed to correct the severe assumptions made by Multinomial Bayes classifier. This kind of NB classifier is suitable for imbalanced data sets

Building NaÃ¯ve Bayes Classifier

We can also apply NaÃ¯ve Bayes classifier on Scikit-learn dataset. In the example below, we are applying GaussianNB and fitting the breast_cancer dataset of Scikit-leran.

Example

Import Sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
label_names = data[''target_names'']
labels = data[''target'']
feature_names = data[''feature_names'']
features = data[''data'']
   print(label_names)
   print(labels[0])
   print(feature_names[0])
   print(features[0])
train, test, train_labels, test_labels = train_test_split(
   features,labels,test_size = 0.40, random_state = 42
)
from sklearn.naive_bayes import GaussianNB
GNBclf = GaussianNB()
model = GNBclf.fit(train, train_labels)
preds = GNBclf.predict(test)
print(preds)

Output

[
   1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1
   1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 
   1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 0 
   1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 
   1 1 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1 
   0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 
   1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0 
   1 0 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 
   1 1 1 1 0 1 0 0 1 1 0 1
]

The above output consists of a series of 0s and 1s which are basically the predicted values from tumor classes namely malignant and benign.

Learn Classification with Naïve Bayes work project make money

Scikit Learn – Classification with NaÃ¯ve Bayes

Building NaÃ¯ve Bayes Classifier

Example

Output

Leave a Reply Cancel reply