Machine Learning

What is SVM?

Support Vector Machine (SVM)

support vectors

hyperplane

margin

Classification Problem

SVMClassification

classification.py

Datasets

recipes_muffins_cupcakes.csv - A CSV file containing information about petrol consumption.

Modules for Analysis:
pandas - for datasets
numpy - for arrays
svm from sklearn - for the svm algorithm

Modules for Visuals:
matplotlib.pyplot - for basic graphs
seaborn - prettier graphs (we will also set the font for our visuals to be slightly bigger using sns.set(font_scale = 1.2))

import pandas as pd
import numpy as np
from sklearn import svm
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(font_scale=1.2)

recipes_muffins_cupcakes.csv

'Datasets/'

import pandas as pd
import numpy as np
from sklearn import svm
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(font_scale=1.2)

recipes = pd.read_csv('Datasets/recipes_muffins_cupcakes.csv')
print(recipes)

cups, tbsp, tsp, and eggs

Visualizing the Data

seaborn

sns.lmplot()

There are many parameters that can be used to construct the graph, but we will only use a few.

seaborn.lmplot(x, y, data=*, hue='*', palette='*', fit_reg=*, scatter_kws={"*": *});

Parameters:

x, y - We pass strings into the x and y variables here to be the names of the axes. These should be any features (columns) from the dataset.

data=* - Here we assign a tidy dataframe to data where each column is a variable and each row is an observation

hue - Here we assign a string to hue that defines the subsets of the data (in our case Type i.e the type of food)

palette - Here we assign a string to palette that defines the colors to use for the different subsets.

fit_reg - Here we assign a bool value to fit_reg. If True, the graph will estimate and plot a regression model relating the x and y variables. This will give us a hint as to the relationship between the two variables.

scatter_kws - Here we can assign a dictionary of additional keyword arguments which edit further how the graph will look.

You can read more about seaborn.lmplot() and its parameters here.

sugar

flour

recipes

Type

Set1

False

scatter_kws={"s": 70}

recipes = pd.read_csv('Datasets/recipes_muffins_cupcakes.csv')

sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.show()

plt.show()

plots tab

#sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70});
#plt.show()

Fitting the SVM Model

#sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70});
#plt.show()

features = recipes[['Flour', 'Sugar']].to_numpy()
label = np.where(recipes['Type'] == 'Muffin', 0, 1)

to_numpy()

np.where()

Type

Muffin

Muffin

features = recipes[['Flour', 'Sugar']].to_numpy()
label = np.where(recipes['Type'] == 'Muffin', 0, 1)

model = svm.SVC(kernel='linear')
model.fit(features, label)

svm.SVC()

kernel

linear

model = svm.SVC(kernel='linear')
model.fit(features, label)

# Get the separating hyperplane
w = model.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(30, 60)
yy = a * xx - (model.intercept_[0]) / w[1]

# Plot the parallels to the separating hyperplane that pass through the support vectors
b = model.support_vectors_[0]
yy_down = a * xx + (b[1] - a * b[0])
b = model.support_vectors_[-1]
yy_up = a * xx + (b[1] - a * b[0])

# Plot the hyperplane
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')

plt.show()

# Plot the hyperplane
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black');

# Look at the margins and support vectors
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')
plt.plot(xx, yy_down, 'k--')
plt.plot(xx, yy_up, 'k--')
plt.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=80, facecolors='none')

plt.show()

Why is it Flat?

1e16

1 to the power of 16

plt.ticklabel_format()

# Look at the margins and support vectors
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')
plt.plot(xx, yy_down, 'k--')
plt.plot(xx, yy_up, 'k--')
plt.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=80, facecolors='none');

plt.ticklabel_format(style='plain', axis='y')

plt.show()

plt.axis()

plt.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=80, facecolors='none');

plt.axis([30, 60, 0, 35])

plt.show()

Predicting New Cases

plt.show()

def muffin_or_cupcake(flour, sugar):
    if(model.predict([[flour, sugar]]))==0:
        print("You're looking at a muffin recipe!")
    else:
        print("You're looking at a cupcake recipe!")

muffin_or_cupcake

model.predict()

def muffin_or_cupcake(flour, sugar):
    if(model.predict([[flour, sugar]]))==0:
        print("You're looking at a muffin recipe!")
    else:
        print("You're looking at a cupcake recipe!")

muffin_or_cupcake(50, 20)