Naive Bayes Classifier
Classifier adalah model machine learning yang digunakan untuk membedakan objek berdasarkan fitur tertentu.
Naive Bayes Classifier adalah machine learning yang menggunakan probabilitas untuk mengklasfikasi objek
Teorema Bayes :¶
Tipe Data Numerik¶
$$ P(c|x) = \frac{P(x_i|c) P©}{P(x_i)} $$ Dimana (P©) = Prior/Probabilitas kelas dari data yang ada P(c) = Prior (Probability) P(xi) = Evidenence dari setiap fitur/Probabilitas dari setiap fitur
(P(xi|c) = Likelihood dari setiap fitur yang diperoleh dari setiap kelas dengan menggunakan rumus: $$ P(x_i|c) = \frac{1}{\sqrt{2 \pi \sigma_c}} e{-\frac{1}{2}(\frac{x_i-\mu_c}{\sigma_c})2} $$
Tipe Data Categorical¶
Dimana P(xi|c) dapat diperoleh dari probabilitas berapa banyak fitur yang muncul dibagi banyak kelas yang muncul pada data yang ada
Tipe Data Campuran¶
Untuk tipe data campuran maka kita akan menggunakan rumus sesuai tipe dari attribut tersebut. Jika atributnya adalah numerik maka kita akan menggunakan rumus numerik akan tetapi jika attributnya categorical maka kita akan meggunakan rumus categorical
from sklearn import datasets
from pandas import *
from numpy import *
from math import *
from IPython.display import HTML, display; from tabulate import tabulate
def table(df): display(HTML(tabulate(df, tablefmt='html', headers='keys', showindex=False)))
# IRIS TRAINING TABLE
iris = datasets.load_iris()
data = [list(s)+[iris.target_names[iris.target[i]]] for i,s in enumerate(iris.data)]
dataset = DataFrame(data, columns=iris.feature_names+['class']).sample(frac=0.2)
table(dataset)
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | class |
---|---|---|---|---|
5.9 | 3.2 | 4.8 | 1.8 | versicolor |
6.9 | 3.1 | 5.1 | 2.3 | virginica |
6.6 | 2.9 | 4.6 | 1.3 | versicolor |
6.7 | 2.5 | 5.8 | 1.8 | virginica |
5 | 3 | 1.6 | 0.2 | setosa |
4.4 | 3 | 1.3 | 0.2 | setosa |
5.7 | 3.8 | 1.7 | 0.3 | setosa |
4.8 | 3.4 | 1.9 | 0.2 | setosa |
5.4 | 3 | 4.5 | 1.5 | versicolor |
6.9 | 3.2 | 5.7 | 2.3 | virginica |
5.1 | 3.7 | 1.5 | 0.4 | setosa |
5.2 | 2.7 | 3.9 | 1.4 | versicolor |
5.5 | 3.5 | 1.3 | 0.2 | setosa |
5.2 | 4.1 | 1.5 | 0.1 | setosa |
6.7 | 3.1 | 4.7 | 1.5 | versicolor |
6 | 2.9 | 4.5 | 1.5 | versicolor |
4.4 | 3.2 | 1.3 | 0.2 | setosa |
5 | 3.5 | 1.6 | 0.6 | setosa |
6.5 | 2.8 | 4.6 | 1.5 | versicolor |
6 | 3.4 | 4.5 | 1.6 | versicolor |
6.4 | 2.7 | 5.3 | 1.9 | virginica |
5.9 | 3 | 5.1 | 1.8 | virginica |
4.8 | 3 | 1.4 | 0.3 | setosa |
6.4 | 2.8 | 5.6 | 2.2 | virginica |
5.7 | 2.6 | 3.5 | 1 | versicolor |
6 | 2.2 | 5 | 1.5 | virginica |
5.5 | 2.4 | 3.7 | 1 | versicolor |
6 | 3 | 4.8 | 1.8 | virginica |
5.2 | 3.5 | 1.5 | 0.2 | setosa |
7.9 | 3.8 | 6.4 | 2 | virginica |
Sampel data untuk tes¶
test = [3,5,2,4]
print("sampel data: ", test)
sampel data: [3, 5, 2, 4]
Identifikasi Per Grup Class Target untuk data Training¶
dataset_classes = {}
# table per classes
for key,group in dataset.groupby('class'):
mu_s = [group[c].mean() for c in group.columns[:-1]]
sigma_s = [group[c].std() for c in group.columns[:-1]]
dataset_classes[key] = [group, mu_s, sigma_s]
print(key, "===>")
print('Mu_s =>', array(mu_s))
print('Sigma_s =>', array(sigma_s))
table(group)
setosa ===>
Mu_s => [5.00909091 3.42727273 1.50909091 0.26363636]
Sigma_s => [0.40361998 0.35802488 0.18683975 0.1361817 ]
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | class |
---|---|---|---|---|
5 | 3 | 1.6 | 0.2 | setosa |
4.4 | 3 | 1.3 | 0.2 | setosa |
5.7 | 3.8 | 1.7 | 0.3 | setosa |
4.8 | 3.4 | 1.9 | 0.2 | setosa |
5.1 | 3.7 | 1.5 | 0.4 | setosa |
5.5 | 3.5 | 1.3 | 0.2 | setosa |
5.2 | 4.1 | 1.5 | 0.1 | setosa |
4.4 | 3.2 | 1.3 | 0.2 | setosa |
5 | 3.5 | 1.6 | 0.6 | setosa |
4.8 | 3 | 1.4 | 0.3 | setosa |
5.2 | 3.5 | 1.5 | 0.2 | setosa |
versicolor ===>
Mu_s => [5.95 2.9 4.33 1.41]
Sigma_s => [0.51908038 0.29439203 0.45472825 0.2514403 ]
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | class |
---|---|---|---|---|
5.9 | 3.2 | 4.8 | 1.8 | versicolor |
6.6 | 2.9 | 4.6 | 1.3 | versicolor |
5.4 | 3 | 4.5 | 1.5 | versicolor |
5.2 | 2.7 | 3.9 | 1.4 | versicolor |
6.7 | 3.1 | 4.7 | 1.5 | versicolor |
6 | 2.9 | 4.5 | 1.5 | versicolor |
6.5 | 2.8 | 4.6 | 1.5 | versicolor |
6 | 3.4 | 4.5 | 1.6 | versicolor |
5.7 | 2.6 | 3.5 | 1 | versicolor |
5.5 | 2.4 | 3.7 | 1 | versicolor |
virginica ===>
Mu_s => [6.56666667 2.92222222 5.42222222 1.95555556]
Sigma_s => [0.62849025 0.45491147 0.49944414 0.26977357]
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | class |
---|---|---|---|---|
6.9 | 3.1 | 5.1 | 2.3 | virginica |
6.7 | 2.5 | 5.8 | 1.8 | virginica |
6.9 | 3.2 | 5.7 | 2.3 | virginica |
6.4 | 2.7 | 5.3 | 1.9 | virginica |
5.9 | 3 | 5.1 | 1.8 | virginica |
6.4 | 2.8 | 5.6 | 2.2 | virginica |
6 | 2.2 | 5 | 1.5 | virginica |
6 | 3 | 4.8 | 1.8 | virginica |
7.9 | 3.8 | 6.4 | 2 | virginica |
Menghitung Probabilitas Prior dan Likelihood¶
def numericalPriorProbability(v, mu, sigma):
return (1.0/sqrt(2 * pi * (sigma ** 2))*exp(-((v-mu)**2)/(2*(sigma**2))))
def categoricalProbability(sample,universe):
return sample.shape[0]/universe.shape[0]
Ps = ([[y]+[numericalPriorProbability(x, d[1][i], d[2][i]) for i,x in enumerate(test)]+
[categoricalProbability(d[0],dataset)] for y,d in dataset_classes.items()])
table(DataFrame(Ps, columns=["classes"]+["P( %d | C )" % d for d in test]+["P( C )"]))
classes | P( 3 | C ) | P( 5 | C ) | P( 2 | C ) | P( 4 | C ) | P( C ) |
---|---|---|---|---|---|
setosa | 4.11734e-06 | 7.19113e-05 | 0.0676682 | 1.01259e-163 | 0.366667 |
versicolor | 7.4519e-08 | 1.20929e-11 | 1.74586e-06 | 1.44674e-23 | 0.333333 |
virginica | 6.44629e-08 | 2.58808e-05 | 5.09561e-11 | 4.99771e-13 | 0.3 |
Memberikan rank/urutan terhadap setiap kelas¶
Pss = ([[r[0], prod(r[1:])] for r in Ps])
PDss = DataFrame(Pss, columns=['class', 'probability']).sort_values('probability')[::-1]
table(PDss)
class | probability |
---|---|
virginica | 1.27461e-35 |
versicolor | 7.58711e-48 |
setosa | 7.43878e-175 |
print("Prediksi Bayes untuk", test, "adalah", PDss.values[0,0])
Prediksi Bayes untuk [3, 5, 2, 4] adalah virginica
Setelah kita sudah menghitung untuk data training kita, kita akan lakukan test lagi untuk data asli kita
# ONE FUNCTION FOR CLASSIFIER
def predict(sampel):
priorLikehoods = ([[y]+[numericalPriorProbability(x, d[1][i], d[2][i]) for i,x in enumerate(sampel)]+
[categoricalProbability(d[0],dataset)] for y,d in dataset_classes.items()])
products = ([[r[0], prod(r[1:])] for r in priorLikehoods])
result = DataFrame(products, columns=['class', 'probability']).sort_values('probability')[::-1]
return result.values[0,0]
dataset_test = DataFrame([list(d)+[predict(d[:4])] for d in data], columns=list(dataset.columns)+['predicted class (by predict())'])
table(dataset_test)
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | class | predicted class (by predict()) |
---|---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa | setosa |
4.9 | 3 | 1.4 | 0.2 | setosa | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa | setosa |
5 | 3.6 | 1.4 | 0.2 | setosa | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa | setosa |
4.6 | 3.4 | 1.4 | 0.3 | setosa | setosa |
5 | 3.4 | 1.5 | 0.2 | setosa | setosa |
4.4 | 2.9 | 1.4 | 0.2 | setosa | setosa |
4.9 | 3.1 | 1.5 | 0.1 | setosa | setosa |
5.4 | 3.7 | 1.5 | 0.2 | setosa | setosa |
4.8 | 3.4 | 1.6 | 0.2 | setosa | setosa |
4.8 | 3 | 1.4 | 0.1 | setosa | setosa |
4.3 | 3 | 1.1 | 0.1 | setosa | setosa |
5.8 | 4 | 1.2 | 0.2 | setosa | setosa |
5.7 | 4.4 | 1.5 | 0.4 | setosa | setosa |
5.4 | 3.9 | 1.3 | 0.4 | setosa | setosa |
5.1 | 3.5 | 1.4 | 0.3 | setosa | setosa |
5.7 | 3.8 | 1.7 | 0.3 | setosa | setosa |
5.1 | 3.8 | 1.5 | 0.3 | setosa | setosa |
5.4 | 3.4 | 1.7 | 0.2 | setosa | setosa |
5.1 | 3.7 | 1.5 | 0.4 | setosa | setosa |
4.6 | 3.6 | 1 | 0.2 | setosa | setosa |
5.1 | 3.3 | 1.7 | 0.5 | setosa | setosa |
4.8 | 3.4 | 1.9 | 0.2 | setosa | setosa |
5 | 3 | 1.6 | 0.2 | setosa | setosa |
5 | 3.4 | 1.6 | 0.4 | setosa | setosa |
5.2 | 3.5 | 1.5 | 0.2 | setosa | setosa |
5.2 | 3.4 | 1.4 | 0.2 | setosa | setosa |
4.7 | 3.2 | 1.6 | 0.2 | setosa | setosa |
4.8 | 3.1 | 1.6 | 0.2 | setosa | setosa |
5.4 | 3.4 | 1.5 | 0.4 | setosa | setosa |
5.2 | 4.1 | 1.5 | 0.1 | setosa | setosa |
5.5 | 4.2 | 1.4 | 0.2 | setosa | setosa |
4.9 | 3.1 | 1.5 | 0.2 | setosa | setosa |
5 | 3.2 | 1.2 | 0.2 | setosa | setosa |
5.5 | 3.5 | 1.3 | 0.2 | setosa | setosa |
4.9 | 3.6 | 1.4 | 0.1 | setosa | setosa |
4.4 | 3 | 1.3 | 0.2 | setosa | setosa |
5.1 | 3.4 | 1.5 | 0.2 | setosa | setosa |
5 | 3.5 | 1.3 | 0.3 | setosa | setosa |
4.5 | 2.3 | 1.3 | 0.3 | setosa | setosa |
4.4 | 3.2 | 1.3 | 0.2 | setosa | setosa |
5 | 3.5 | 1.6 | 0.6 | setosa | setosa |
5.1 | 3.8 | 1.9 | 0.4 | setosa | setosa |
4.8 | 3 | 1.4 | 0.3 | setosa | setosa |
5.1 | 3.8 | 1.6 | 0.2 | setosa | setosa |
4.6 | 3.2 | 1.4 | 0.2 | setosa | setosa |
5.3 | 3.7 | 1.5 | 0.2 | setosa | setosa |
5 | 3.3 | 1.4 | 0.2 | setosa | setosa |
7 | 3.2 | 4.7 | 1.4 | versicolor | versicolor |
6.4 | 3.2 | 4.5 | 1.5 | versicolor | versicolor |
6.9 | 3.1 | 4.9 | 1.5 | versicolor | versicolor |
5.5 | 2.3 | 4 | 1.3 | versicolor | versicolor |
6.5 | 2.8 | 4.6 | 1.5 | versicolor | versicolor |
5.7 | 2.8 | 4.5 | 1.3 | versicolor | versicolor |
6.3 | 3.3 | 4.7 | 1.6 | versicolor | versicolor |
4.9 | 2.4 | 3.3 | 1 | versicolor | versicolor |
6.6 | 2.9 | 4.6 | 1.3 | versicolor | versicolor |
5.2 | 2.7 | 3.9 | 1.4 | versicolor | versicolor |
5 | 2 | 3.5 | 1 | versicolor | versicolor |
5.9 | 3 | 4.2 | 1.5 | versicolor | versicolor |
6 | 2.2 | 4 | 1 | versicolor | versicolor |
6.1 | 2.9 | 4.7 | 1.4 | versicolor | versicolor |
5.6 | 2.9 | 3.6 | 1.3 | versicolor | versicolor |
6.7 | 3.1 | 4.4 | 1.4 | versicolor | versicolor |
5.6 | 3 | 4.5 | 1.5 | versicolor | versicolor |
5.8 | 2.7 | 4.1 | 1 | versicolor | versicolor |
6.2 | 2.2 | 4.5 | 1.5 | versicolor | versicolor |
5.6 | 2.5 | 3.9 | 1.1 | versicolor | versicolor |
5.9 | 3.2 | 4.8 | 1.8 | versicolor | versicolor |
6.1 | 2.8 | 4 | 1.3 | versicolor | versicolor |
6.3 | 2.5 | 4.9 | 1.5 | versicolor | versicolor |
6.1 | 2.8 | 4.7 | 1.2 | versicolor | versicolor |
6.4 | 2.9 | 4.3 | 1.3 | versicolor | versicolor |
6.6 | 3 | 4.4 | 1.4 | versicolor | versicolor |
6.8 | 2.8 | 4.8 | 1.4 | versicolor | versicolor |
6.7 | 3 | 5 | 1.7 | versicolor | virginica |
6 | 2.9 | 4.5 | 1.5 | versicolor | versicolor |
5.7 | 2.6 | 3.5 | 1 | versicolor | versicolor |
5.5 | 2.4 | 3.8 | 1.1 | versicolor | versicolor |
5.5 | 2.4 | 3.7 | 1 | versicolor | versicolor |
5.8 | 2.7 | 3.9 | 1.2 | versicolor | versicolor |
6 | 2.7 | 5.1 | 1.6 | versicolor | versicolor |
5.4 | 3 | 4.5 | 1.5 | versicolor | versicolor |
6 | 3.4 | 4.5 | 1.6 | versicolor | versicolor |
6.7 | 3.1 | 4.7 | 1.5 | versicolor | versicolor |
6.3 | 2.3 | 4.4 | 1.3 | versicolor | versicolor |
5.6 | 3 | 4.1 | 1.3 | versicolor | versicolor |
5.5 | 2.5 | 4 | 1.3 | versicolor | versicolor |
5.5 | 2.6 | 4.4 | 1.2 | versicolor | versicolor |
6.1 | 3 | 4.6 | 1.4 | versicolor | versicolor |
5.8 | 2.6 | 4 | 1.2 | versicolor | versicolor |
5 | 2.3 | 3.3 | 1 | versicolor | versicolor |
5.6 | 2.7 | 4.2 | 1.3 | versicolor | versicolor |
5.7 | 3 | 4.2 | 1.2 | versicolor | versicolor |
5.7 | 2.9 | 4.2 | 1.3 | versicolor | versicolor |
6.2 | 2.9 | 4.3 | 1.3 | versicolor | versicolor |
5.1 | 2.5 | 3 | 1.1 | versicolor | versicolor |
5.7 | 2.8 | 4.1 | 1.3 | versicolor | versicolor |
6.3 | 3.3 | 6 | 2.5 | virginica | virginica |
5.8 | 2.7 | 5.1 | 1.9 | virginica | virginica |
7.1 | 3 | 5.9 | 2.1 | virginica | virginica |
6.3 | 2.9 | 5.6 | 1.8 | virginica | virginica |
6.5 | 3 | 5.8 | 2.2 | virginica | virginica |
7.6 | 3 | 6.6 | 2.1 | virginica | virginica |
4.9 | 2.5 | 4.5 | 1.7 | virginica | versicolor |
7.3 | 2.9 | 6.3 | 1.8 | virginica | virginica |
6.7 | 2.5 | 5.8 | 1.8 | virginica | virginica |
7.2 | 3.6 | 6.1 | 2.5 | virginica | virginica |
6.5 | 3.2 | 5.1 | 2 | virginica | virginica |
6.4 | 2.7 | 5.3 | 1.9 | virginica | virginica |
6.8 | 3 | 5.5 | 2.1 | virginica | virginica |
5.7 | 2.5 | 5 | 2 | virginica | virginica |
5.8 | 2.8 | 5.1 | 2.4 | virginica | virginica |
6.4 | 3.2 | 5.3 | 2.3 | virginica | virginica |
6.5 | 3 | 5.5 | 1.8 | virginica | virginica |
7.7 | 3.8 | 6.7 | 2.2 | virginica | virginica |
7.7 | 2.6 | 6.9 | 2.3 | virginica | virginica |
6 | 2.2 | 5 | 1.5 | virginica | versicolor |
6.9 | 3.2 | 5.7 | 2.3 | virginica | virginica |
5.6 | 2.8 | 4.9 | 2 | virginica | virginica |
7.7 | 2.8 | 6.7 | 2 | virginica | virginica |
6.3 | 2.7 | 4.9 | 1.8 | virginica | virginica |
6.7 | 3.3 | 5.7 | 2.1 | virginica | virginica |
7.2 | 3.2 | 6 | 1.8 | virginica | virginica |
6.2 | 2.8 | 4.8 | 1.8 | virginica | versicolor |
6.1 | 3 | 4.9 | 1.8 | virginica | virginica |
6.4 | 2.8 | 5.6 | 2.1 | virginica | virginica |
7.2 | 3 | 5.8 | 1.6 | virginica | virginica |
7.4 | 2.8 | 6.1 | 1.9 | virginica | virginica |
7.9 | 3.8 | 6.4 | 2 | virginica | virginica |
6.4 | 2.8 | 5.6 | 2.2 | virginica | virginica |
6.3 | 2.8 | 5.1 | 1.5 | virginica | versicolor |
6.1 | 2.6 | 5.6 | 1.4 | virginica | virginica |
7.7 | 3 | 6.1 | 2.3 | virginica | virginica |
6.3 | 3.4 | 5.6 | 2.4 | virginica | virginica |
6.4 | 3.1 | 5.5 | 1.8 | virginica | virginica |
6 | 3 | 4.8 | 1.8 | virginica | versicolor |
6.9 | 3.1 | 5.4 | 2.1 | virginica | virginica |
6.7 | 3.1 | 5.6 | 2.4 | virginica | virginica |
6.9 | 3.1 | 5.1 | 2.3 | virginica | virginica |
5.8 | 2.7 | 5.1 | 1.9 | virginica | virginica |
6.8 | 3.2 | 5.9 | 2.3 | virginica | virginica |
6.7 | 3.3 | 5.7 | 2.5 | virginica | virginica |
6.7 | 3 | 5.2 | 2.3 | virginica | virginica |
6.3 | 2.5 | 5 | 1.9 | virginica | virginica |
6.5 | 3 | 5.2 | 2 | virginica | virginica |
6.2 | 3.4 | 5.4 | 2.3 | virginica | virginica |
5.9 | 3 | 5.1 | 1.8 | virginica | virginica |
corrects = dataset_test.loc[dataset_test['class'] == dataset_test['predicted class (by predict())']].shape[0]
print('Prediksi Training Bayes: %d of %d == %f %%' % (corrects, len(data), corrects / len(data) * 100))
Prediksi Training Bayes: 144 of 150 == 96.000000 %