Naive Bayes Classifier

Classifier adalah model machine learning yang digunakan untuk membedakan objek berdasarkan fitur tertentu.

Naive Bayes Classifier adalah machine learning yang menggunakan probabilitas untuk mengklasfikasi objek

Teorema Bayes :

Tipe Data Numerik

$$ P(c|x) = \frac{P(x_i|c) P©}{P(x_i)} $$ Dimana (P©) = Prior/Probabilitas kelas dari data yang ada P(c) = Prior (Probability) P(xi) = Evidenence dari setiap fitur/Probabilitas dari setiap fitur

(P(xi|c) = Likelihood dari setiap fitur yang diperoleh dari setiap kelas dengan menggunakan rumus: $$ P(x_i|c) = \frac{1}{\sqrt{2 \pi \sigma_c}} e{-\frac{1}{2}(\frac{x_i-\mu_c}{\sigma_c})2} $$

Tipe Data Categorical

P(c|x) = \frac{P(x_i|c) P(c)}{P(x_i)}

Dimana P(xi|c) dapat diperoleh dari probabilitas berapa banyak fitur yang muncul dibagi banyak kelas yang muncul pada data yang ada

Tipe Data Campuran

Untuk tipe data campuran maka kita akan menggunakan rumus sesuai tipe dari attribut tersebut. Jika atributnya adalah numerik maka kita akan menggunakan rumus numerik akan tetapi jika attributnya categorical maka kita akan meggunakan rumus categorical

from sklearn import datasets
from pandas import *
from numpy import *
from math import *

from IPython.display import HTML, display; from tabulate import tabulate
def table(df): display(HTML(tabulate(df, tablefmt='html', headers='keys', showindex=False)))
# IRIS TRAINING TABLE
iris = datasets.load_iris()
data = [list(s)+[iris.target_names[iris.target[i]]] for i,s in enumerate(iris.data)]
dataset = DataFrame(data, columns=iris.feature_names+['class']).sample(frac=0.2)
table(dataset)
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)class
5.9 3.2 4.8 1.8versicolor
6.9 3.1 5.1 2.3virginica
6.6 2.9 4.6 1.3versicolor
6.7 2.5 5.8 1.8virginica
5 3 1.6 0.2setosa
4.4 3 1.3 0.2setosa
5.7 3.8 1.7 0.3setosa
4.8 3.4 1.9 0.2setosa
5.4 3 4.5 1.5versicolor
6.9 3.2 5.7 2.3virginica
5.1 3.7 1.5 0.4setosa
5.2 2.7 3.9 1.4versicolor
5.5 3.5 1.3 0.2setosa
5.2 4.1 1.5 0.1setosa
6.7 3.1 4.7 1.5versicolor
6 2.9 4.5 1.5versicolor
4.4 3.2 1.3 0.2setosa
5 3.5 1.6 0.6setosa
6.5 2.8 4.6 1.5versicolor
6 3.4 4.5 1.6versicolor
6.4 2.7 5.3 1.9virginica
5.9 3 5.1 1.8virginica
4.8 3 1.4 0.3setosa
6.4 2.8 5.6 2.2virginica
5.7 2.6 3.5 1 versicolor
6 2.2 5 1.5virginica
5.5 2.4 3.7 1 versicolor
6 3 4.8 1.8virginica
5.2 3.5 1.5 0.2setosa
7.9 3.8 6.4 2 virginica

Sampel data untuk tes

test = [3,5,2,4]
print("sampel data: ", test)
sampel data:  [3, 5, 2, 4]

Identifikasi Per Grup Class Target untuk data Training

dataset_classes = {}
# table per classes
for key,group in dataset.groupby('class'):
    mu_s = [group[c].mean() for c in group.columns[:-1]]
    sigma_s = [group[c].std() for c in group.columns[:-1]]
    dataset_classes[key] = [group, mu_s, sigma_s]
    print(key, "===>")
    print('Mu_s =>', array(mu_s))
    print('Sigma_s =>', array(sigma_s))
    table(group)
setosa ===>
Mu_s => [5.00909091 3.42727273 1.50909091 0.26363636]
Sigma_s => [0.40361998 0.35802488 0.18683975 0.1361817 ]
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)class
5 3 1.6 0.2setosa
4.4 3 1.3 0.2setosa
5.7 3.8 1.7 0.3setosa
4.8 3.4 1.9 0.2setosa
5.1 3.7 1.5 0.4setosa
5.5 3.5 1.3 0.2setosa
5.2 4.1 1.5 0.1setosa
4.4 3.2 1.3 0.2setosa
5 3.5 1.6 0.6setosa
4.8 3 1.4 0.3setosa
5.2 3.5 1.5 0.2setosa
versicolor ===>
Mu_s => [5.95 2.9  4.33 1.41]
Sigma_s => [0.51908038 0.29439203 0.45472825 0.2514403 ]
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)class
5.9 3.2 4.8 1.8versicolor
6.6 2.9 4.6 1.3versicolor
5.4 3 4.5 1.5versicolor
5.2 2.7 3.9 1.4versicolor
6.7 3.1 4.7 1.5versicolor
6 2.9 4.5 1.5versicolor
6.5 2.8 4.6 1.5versicolor
6 3.4 4.5 1.6versicolor
5.7 2.6 3.5 1 versicolor
5.5 2.4 3.7 1 versicolor
virginica ===>
Mu_s => [6.56666667 2.92222222 5.42222222 1.95555556]
Sigma_s => [0.62849025 0.45491147 0.49944414 0.26977357]
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)class
6.9 3.1 5.1 2.3virginica
6.7 2.5 5.8 1.8virginica
6.9 3.2 5.7 2.3virginica
6.4 2.7 5.3 1.9virginica
5.9 3 5.1 1.8virginica
6.4 2.8 5.6 2.2virginica
6 2.2 5 1.5virginica
6 3 4.8 1.8virginica
7.9 3.8 6.4 2 virginica

Menghitung Probabilitas Prior dan Likelihood

def numericalPriorProbability(v, mu, sigma):
    return (1.0/sqrt(2 * pi * (sigma ** 2))*exp(-((v-mu)**2)/(2*(sigma**2))))

def categoricalProbability(sample,universe):
    return sample.shape[0]/universe.shape[0]

Ps = ([[y]+[numericalPriorProbability(x, d[1][i], d[2][i]) for i,x in enumerate(test)]+
          [categoricalProbability(d[0],dataset)] for y,d in dataset_classes.items()])

table(DataFrame(Ps, columns=["classes"]+["P( %d | C )" % d for d in test]+["P( C )"]))
classes P( 3 | C ) P( 5 | C ) P( 2 | C ) P( 4 | C ) P( C )
setosa 4.11734e-06 7.19113e-05 0.0676682 1.01259e-1630.366667
versicolor 7.4519e-08 1.20929e-11 1.74586e-061.44674e-23 0.333333
virginica 6.44629e-08 2.58808e-05 5.09561e-114.99771e-13 0.3

Memberikan rank/urutan terhadap setiap kelas

Pss = ([[r[0], prod(r[1:])] for r in Ps])
PDss = DataFrame(Pss, columns=['class', 'probability']).sort_values('probability')[::-1]
table(PDss)
class probability
virginica 1.27461e-35
versicolor 7.58711e-48
setosa 7.43878e-175
print("Prediksi Bayes untuk", test, "adalah", PDss.values[0,0])
Prediksi Bayes untuk [3, 5, 2, 4] adalah virginica

Setelah kita sudah menghitung untuk data training kita, kita akan lakukan test lagi untuk data asli kita

# ONE FUNCTION FOR CLASSIFIER

def predict(sampel):
    priorLikehoods = ([[y]+[numericalPriorProbability(x, d[1][i], d[2][i]) for i,x in enumerate(sampel)]+
          [categoricalProbability(d[0],dataset)] for y,d in dataset_classes.items()])
    products = ([[r[0], prod(r[1:])] for r in priorLikehoods])
    result = DataFrame(products, columns=['class', 'probability']).sort_values('probability')[::-1]
    return result.values[0,0]

dataset_test = DataFrame([list(d)+[predict(d[:4])] for d in data], columns=list(dataset.columns)+['predicted class (by predict())'])
table(dataset_test)
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)class predicted class (by predict())
5.1 3.5 1.4 0.2setosa setosa
4.9 3 1.4 0.2setosa setosa
4.7 3.2 1.3 0.2setosa setosa
4.6 3.1 1.5 0.2setosa setosa
5 3.6 1.4 0.2setosa setosa
5.4 3.9 1.7 0.4setosa setosa
4.6 3.4 1.4 0.3setosa setosa
5 3.4 1.5 0.2setosa setosa
4.4 2.9 1.4 0.2setosa setosa
4.9 3.1 1.5 0.1setosa setosa
5.4 3.7 1.5 0.2setosa setosa
4.8 3.4 1.6 0.2setosa setosa
4.8 3 1.4 0.1setosa setosa
4.3 3 1.1 0.1setosa setosa
5.8 4 1.2 0.2setosa setosa
5.7 4.4 1.5 0.4setosa setosa
5.4 3.9 1.3 0.4setosa setosa
5.1 3.5 1.4 0.3setosa setosa
5.7 3.8 1.7 0.3setosa setosa
5.1 3.8 1.5 0.3setosa setosa
5.4 3.4 1.7 0.2setosa setosa
5.1 3.7 1.5 0.4setosa setosa
4.6 3.6 1 0.2setosa setosa
5.1 3.3 1.7 0.5setosa setosa
4.8 3.4 1.9 0.2setosa setosa
5 3 1.6 0.2setosa setosa
5 3.4 1.6 0.4setosa setosa
5.2 3.5 1.5 0.2setosa setosa
5.2 3.4 1.4 0.2setosa setosa
4.7 3.2 1.6 0.2setosa setosa
4.8 3.1 1.6 0.2setosa setosa
5.4 3.4 1.5 0.4setosa setosa
5.2 4.1 1.5 0.1setosa setosa
5.5 4.2 1.4 0.2setosa setosa
4.9 3.1 1.5 0.2setosa setosa
5 3.2 1.2 0.2setosa setosa
5.5 3.5 1.3 0.2setosa setosa
4.9 3.6 1.4 0.1setosa setosa
4.4 3 1.3 0.2setosa setosa
5.1 3.4 1.5 0.2setosa setosa
5 3.5 1.3 0.3setosa setosa
4.5 2.3 1.3 0.3setosa setosa
4.4 3.2 1.3 0.2setosa setosa
5 3.5 1.6 0.6setosa setosa
5.1 3.8 1.9 0.4setosa setosa
4.8 3 1.4 0.3setosa setosa
5.1 3.8 1.6 0.2setosa setosa
4.6 3.2 1.4 0.2setosa setosa
5.3 3.7 1.5 0.2setosa setosa
5 3.3 1.4 0.2setosa setosa
7 3.2 4.7 1.4versicolorversicolor
6.4 3.2 4.5 1.5versicolorversicolor
6.9 3.1 4.9 1.5versicolorversicolor
5.5 2.3 4 1.3versicolorversicolor
6.5 2.8 4.6 1.5versicolorversicolor
5.7 2.8 4.5 1.3versicolorversicolor
6.3 3.3 4.7 1.6versicolorversicolor
4.9 2.4 3.3 1 versicolorversicolor
6.6 2.9 4.6 1.3versicolorversicolor
5.2 2.7 3.9 1.4versicolorversicolor
5 2 3.5 1 versicolorversicolor
5.9 3 4.2 1.5versicolorversicolor
6 2.2 4 1 versicolorversicolor
6.1 2.9 4.7 1.4versicolorversicolor
5.6 2.9 3.6 1.3versicolorversicolor
6.7 3.1 4.4 1.4versicolorversicolor
5.6 3 4.5 1.5versicolorversicolor
5.8 2.7 4.1 1 versicolorversicolor
6.2 2.2 4.5 1.5versicolorversicolor
5.6 2.5 3.9 1.1versicolorversicolor
5.9 3.2 4.8 1.8versicolorversicolor
6.1 2.8 4 1.3versicolorversicolor
6.3 2.5 4.9 1.5versicolorversicolor
6.1 2.8 4.7 1.2versicolorversicolor
6.4 2.9 4.3 1.3versicolorversicolor
6.6 3 4.4 1.4versicolorversicolor
6.8 2.8 4.8 1.4versicolorversicolor
6.7 3 5 1.7versicolorvirginica
6 2.9 4.5 1.5versicolorversicolor
5.7 2.6 3.5 1 versicolorversicolor
5.5 2.4 3.8 1.1versicolorversicolor
5.5 2.4 3.7 1 versicolorversicolor
5.8 2.7 3.9 1.2versicolorversicolor
6 2.7 5.1 1.6versicolorversicolor
5.4 3 4.5 1.5versicolorversicolor
6 3.4 4.5 1.6versicolorversicolor
6.7 3.1 4.7 1.5versicolorversicolor
6.3 2.3 4.4 1.3versicolorversicolor
5.6 3 4.1 1.3versicolorversicolor
5.5 2.5 4 1.3versicolorversicolor
5.5 2.6 4.4 1.2versicolorversicolor
6.1 3 4.6 1.4versicolorversicolor
5.8 2.6 4 1.2versicolorversicolor
5 2.3 3.3 1 versicolorversicolor
5.6 2.7 4.2 1.3versicolorversicolor
5.7 3 4.2 1.2versicolorversicolor
5.7 2.9 4.2 1.3versicolorversicolor
6.2 2.9 4.3 1.3versicolorversicolor
5.1 2.5 3 1.1versicolorversicolor
5.7 2.8 4.1 1.3versicolorversicolor
6.3 3.3 6 2.5virginica virginica
5.8 2.7 5.1 1.9virginica virginica
7.1 3 5.9 2.1virginica virginica
6.3 2.9 5.6 1.8virginica virginica
6.5 3 5.8 2.2virginica virginica
7.6 3 6.6 2.1virginica virginica
4.9 2.5 4.5 1.7virginica versicolor
7.3 2.9 6.3 1.8virginica virginica
6.7 2.5 5.8 1.8virginica virginica
7.2 3.6 6.1 2.5virginica virginica
6.5 3.2 5.1 2 virginica virginica
6.4 2.7 5.3 1.9virginica virginica
6.8 3 5.5 2.1virginica virginica
5.7 2.5 5 2 virginica virginica
5.8 2.8 5.1 2.4virginica virginica
6.4 3.2 5.3 2.3virginica virginica
6.5 3 5.5 1.8virginica virginica
7.7 3.8 6.7 2.2virginica virginica
7.7 2.6 6.9 2.3virginica virginica
6 2.2 5 1.5virginica versicolor
6.9 3.2 5.7 2.3virginica virginica
5.6 2.8 4.9 2 virginica virginica
7.7 2.8 6.7 2 virginica virginica
6.3 2.7 4.9 1.8virginica virginica
6.7 3.3 5.7 2.1virginica virginica
7.2 3.2 6 1.8virginica virginica
6.2 2.8 4.8 1.8virginica versicolor
6.1 3 4.9 1.8virginica virginica
6.4 2.8 5.6 2.1virginica virginica
7.2 3 5.8 1.6virginica virginica
7.4 2.8 6.1 1.9virginica virginica
7.9 3.8 6.4 2 virginica virginica
6.4 2.8 5.6 2.2virginica virginica
6.3 2.8 5.1 1.5virginica versicolor
6.1 2.6 5.6 1.4virginica virginica
7.7 3 6.1 2.3virginica virginica
6.3 3.4 5.6 2.4virginica virginica
6.4 3.1 5.5 1.8virginica virginica
6 3 4.8 1.8virginica versicolor
6.9 3.1 5.4 2.1virginica virginica
6.7 3.1 5.6 2.4virginica virginica
6.9 3.1 5.1 2.3virginica virginica
5.8 2.7 5.1 1.9virginica virginica
6.8 3.2 5.9 2.3virginica virginica
6.7 3.3 5.7 2.5virginica virginica
6.7 3 5.2 2.3virginica virginica
6.3 2.5 5 1.9virginica virginica
6.5 3 5.2 2 virginica virginica
6.2 3.4 5.4 2.3virginica virginica
5.9 3 5.1 1.8virginica virginica
corrects = dataset_test.loc[dataset_test['class'] == dataset_test['predicted class (by predict())']].shape[0]
print('Prediksi Training Bayes: %d of %d == %f %%' % (corrects, len(data), corrects / len(data) * 100))
Prediksi Training Bayes: 144 of 150 == 96.000000 %

Referensi

  1. https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c
  2. https://www.geeksforgeeks.org/naive-bayes-classifiers/