Machine Learning

Naive Bayes

Setup

TextClassification

classifier.py

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB

df = pd.read_csv("Datasets/gamespot_game_reviews.csv")

CountVectorizer - This module will allow us to count the number of times a token (word) appears in the text.

TfidfTransformer - This module will filter out words that occur too frequently and thus do not give useful information, such as 'the', 'or', 'and'.

MultinomialNB - This module will allow us to train and test our data using the Naive Bayes MultinomialNB algorithm.

Data Analysis

df = pd.read_csv("gamespot_game_reviews.csv")

print(df.columns)

df = pd.read_csv("Dataset/gamespot_game_reviews.csv")

reviews_df = df[['tagline', 'classifier']]
print(reviews_df)

pos

neg

Understanding Text

reviews_df = df[['tagline', 'classifier']]

tagline_list = reviews_df['tagline'].tolist()
count_vect = CountVectorizer()
x_train_counts = count_vect.fit_transform(tagline_list)

print(x_train_counts)

Term Frequency -Inverse Document Frequency

importance

the

amazing

terrible

amazing

terrible

x_train_counts = count_vect.fit_transform(tagline_list)

tfidf = TfidfTransformer()
x_train_tfidf = tfidf.fit_transform(x_train_counts)

Train, Predict, Test

train_test_split()

x_train_counts = count_vect.fit_transform(tagline_list)

tfidf = TfidfTransformer()
x_train_tfidf = tfidf.fit_transform(x_train_counts)

X_train, X_test, y_train, y_test = train_test_split(
    x_train_tfidf,
    np.array(reviews_df['classifier']),
    test_size=0.3,
    random_state=0)

classification_model = MultinomialNB().fit(X_train, y_train)
y_pred = classification_model.predict(X_test)

print(y_pred)

y_pred

y_pred = classification_model.predict(X_test)

number_right = 0
for i in range(len(y_pred)):
    if y_pred[i] == y_test[i]:
        number_right +=1

print("Accuracy for tagline classify: %.2f%%" % ((number_right/float(len(y_test)) * 100)))

Activity: Interactivity

Have the program train the model, just like before.
Then, have the program ask the user to write a review of the game.
Run the user input through the prediction to get a result, and display that back to the user.

Text Classification