Sentiment Analysis
This lesson will teach you how to perform sentiment analysis with VADER.
What are Sentiments?
Consider the following sentences:
I love playing games.
I hate doing homework.
From reading those two sentences, it is pretty easy to tell which one conveys a positive feeling and which one conveys a negative feeling.
As humans we always associate words and phrases with emotion, without even thinking about it. Whether that emotion is positive, negative or neutral, we can figure it out pretty well.
Clipart, in the public domain
So what if we wanted a computer to also understand human language and the emotions that are being conveyed through those words? This is where sentiment analysis comes in.We define sentiment as an attitude, thought, or judgment prompted by feeling. Therefore, Sentiment Analysis is the process of computationally determining the sentiment of text or speech; whether it is positive, negative, or neutral.
It is also known as opinion mining as it can derive the opinion or attitude of a selection of text in human language.
In this lesson, we are going to use VADER. Not the Star Wars villain but instead the Python package that is used to perform sentiment analysis.
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool. A lexicon is basically an extremely comprehensive dictionary of words that contains a large amount of information about each word.
In this lesson we are going to create a program that allows us to feed it sentences, and it will tell us if the sentences are positive or negative.
Setup
First, create a new project in Spyder and save it as SentimentAnalysis.
Then, create a new file inside the project and save it as sentiment.py. You can also delete the text that's already in the file.
VADER comes packaged with nltk, so we can import it easily into the project.
from nltk.sentiment.vader import SentimentIntensityAnalyzer
This is the only external module we need to start doing sentiment analysis.
Analysis
VADER can analyze text by telling us how positive or negative it is. We find this out by feeding text into VADER and it returns polarity scores for that text.
A polarity score is is just a value for how positive or negative the text is. VADER provides us with 3 separate polarity scores:
negative (neg)
neutral (neu)
positive (pos)
We will start by defining our function which takes in a string to analyze.
from nltk.sentiment.vader import SentimentIntensityAnalyzer def sentiment_scores(sentence):
Inside the function, we must first create a SentimentIntensityAnalyzer object which will allow us to analyze text.
def sentiment_scores(sentence): analyzer = SentimentIntensityAnalyzer()
Now to analyze our sentence, all we need to do is use the call the analyzer's polarity_scores() method.
This method will take our sentence and return a sentiment dictionary, which is a dictionary that contains the positive, negative, and neutral polarity scores.
It also contains a compound score which is the combination of the other scores in order to get an overall rating of the sentence.
def sentiment_scores(sentence): analyzer = SentimentIntensityAnalyzer() sentiment_dict = analyzer.polarity_scores(sentence)
Now let's create a print statement that will print the whole sentiment dictionary for the sentence.
def sentiment_scores(sentence): analyzer = SentimentIntensityAnalyzer() sentiment_dict = analyzer.polarity_scores(sentence) print("Overall sentiment dictionary is : ", sentiment_dict)
We will come back and add more to this function later, but for now let's test it to see if it works.
Let's create a variable to hold our sentence. We can then print this sentence, followed by a newline.
Then we will pass our sentence into the sentiment_scores() method to show the results of our analysis.
def sentiment_scores(sentence): analyzer = SentimentIntensityAnalyzer() sentiment_dict = analyzer.polarity_scores(sentence) print("Overall sentiment dictionary is : ", sentiment_dict) sentence = 'I really like playing video games' print(sentence, '\n') sentiment_scores(sentence)
As you can see, a polarity score was calculated for each category. Then these scores were used to create a compound score.
You can think of the scores as a percentage:
neg: 0.0 = 0% negative
neu: 0.381 = 38.1% neutral
pos: 0.619 = 61.9% neutral
compound: 0.5956: This score represents the overall sentiment of the text.
A compound score can range from -1 to 1 where negative scores show a negative sentiment and positive scores show a positive sentiment.
The closer to 0 a compound score is, the more neutral the text.
This sentence gave a compound score of 0.5956 meaning that it was overall positive. The analysis was correct!
Now let's add some more print statements to our function to better present this data.
def sentiment_scores(sentence): analyzer = SentimentIntensityAnalyzer() sentiment_dict = analyzer.polarity_scores(sentence) print("Overall sentiment dictionary is : ", sentiment_dict) print("\nsentence was rated as ", sentiment_dict['neg']*100, "% Negative") print("sentence was rated as ", sentiment_dict['neu']*100, "% Neutral") print("sentence was rated as ", sentiment_dict['pos']*100, "% Positive") sentence = 'I really like playing video games' print(sentence, '\n') sentiment_scores(sentence)
Here we create 3 extra print statements that return the negative, positive, and neutral percentages of the sentence.
We do this by getting the specific polarity score we want by label-based indexing. For example, sentiment_dict['neg'] returns the negative polarity score.
We then multiply this value by 100 to get the percentage value.
Run the program again to see the results.
The polarity scores are now much clearer outside of the dictionary and as percentages.
The last thing to do is print a clear conclusion on whether the sentence is positive, negative, or neutral based on the compound score.
We can do this using if statements.
def sentiment_scores(sentence): analyzer = SentimentIntensityAnalyzer() sentiment_dict = analyzer.polarity_scores(sentence) print("Overall sentiment dictionary is : ", sentiment_dict) print("\nsentence was rated as ", sentiment_dict['neg']*100, "% Negative") print("sentence was rated as ", sentiment_dict['neu']*100, "% Neutral") print("sentence was rated as ", sentiment_dict['pos']*100, "% Positive") if sentiment_dict['compound'] >= 0.05: overall = 'Positive' elif sentiment_dict['compound'] <= -0.05: overall = 'Negative' else: overall = 'Neutral' print("\nSentence Overall Rated As: ", overall) sentence = 'I really like playing video games' print(sentence, '\n') sentiment_scores(sentence)
Here, we assign a value to the overall string variable based on our results with a series of if statements.
If the compound score is greater than or equal to 0.05 then we say that the overall score was positive.
If the compound score is less than or equal to -0.05 then we say that the overall score was negative.
If none of the above returns True then we say the overall score was neutral.
Now run the program again to see the results.
Perfect! Our sentiment analysis results are much clearer now.
Let's try our function on some other sentences to see how the results differ. You can make this easier by putting the sentences inside of a list and iterating through the list to determine scores.
sentence_list = ['I really like playing video games', 'My week has been the same as always', 'I did not feel well yesterday'] for sentence in sentence_list: print('-------------------------------------------') print(sentence) sentiment_scores(sentence)
Here we create three new sentences in a list and print them out along with their sentiment_scores()
Note: The print('----------') statements are just to separate the results.
Run the program again to see the new results. Can you guess what they will be?
As expected, the second sentence was analyzed to be neutral and the last sentence was analyzed to be negative.
Try testing different and more complex sentences to see how VADER classifies them.
After a little bit of practice, you will be able to perform sentiment analysis on other data sets as well.