A Sentiment Analysis & Natural Language Processing Tutorial

In this tutorial, our agency data team lays out not only the case for sentiment analysis and utilization of natural language processing, but we have also constructed multiple examples of how we conduct such analysis.

It is no surprise that information travels quicker than ever before. This includes information through the news, blogs, social media, and many other platforms. Therefore, to avoid any serious harm to a brand’s reputation, it is increasingly important to keep track of the spoken opinions within these platforms, and deal with potential problems as soon as they arise. It has been found that consumers expect companies to respond to complaints quicker than many would typically assume. In fact, it is estimated that this expectation is within three to six hours but research shows that the actual response time is six days. This difference is quite substantial, yet it is not a simple demand to meet. For smaller brands, this comes down to manually scanning various sources in order to locate potential problems. For larger companies, this would mean sorting through tens of thousands of opinions or text by hand which is unreasonable and extremely costly. Sentiment analysis and natural language processing makes this process manageable and is even within reach for smaller firms looking to perform some basic analysis using some out-of-the-box tools that can handle common tasks. Many might be thinking, “sentiment analysis isn’t new, tell me something I don’t already know,” which is why we will display some savvy analytic techniques that go a step further than the average ready-to-use sentiment analysis package and hopefully leave the reader with ideas to improve their own analysis. Moreover, analysis like this can even reveal hidden opportunities that were once impossible to spot without the procedures. By studying a brand’s Google review data at scale, we will reveal a few impactful opportunities to help guide decision-makers to improve various aspects of their business.

But for those who are unfamiliar, what is Sentiment Analysis?

Sentiment Analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text to determine the writer’s attitude toward a topic. Natural Language Processing (NLP), a text mining technique often used with sentiment analysis, is a linguistic science seeking to accurately characterize semantics in language. In business, NLP and sentiment analysis are also commonly used to analyze the polarity, sentiment, emotion, and tone of language as they relate to various business functions or outcomes. For example, this could be classifying a tweet as joyful, sad, or angry, or more traditionally, classifying a comment as either positive, or negative without reading or manually labeling the text.

We are looking to automatically classify and analyze data that is in the following form:

What are some benefits of performing Sentiment Analysis?

If not already immediately apparent, monitoring sentiment is particularly important due to word-of-mouth’s strong influence on a brands image and the speed in which this information now travels. In addition, the ability to classify customer’s opinions can help brands adjust their strategy to better suit their target market. This could mean investing more in customer-facing employee training or adjusting a product’s specifications to meet a common criticism expressed in product reviews. In addition, sentiment analysis can also be used to quickly identify disappointed customers automatically, allowing a representative to respond and resolve the issue before it escalates and becomes damaging.

Other common uses of the technique are to analyze product or service reviews, social media posts, blogs, news, or whichever opinion-related channel the business may be connected to. However, your goal may be to simply analyze the conversation surrounding your brand to determine what is most important to your audience, and how these needs may be changing over time. Whatever your needs may be, text analysis can help explore the general conversion surrounding your brand, extracting useful insights at scale, and put them to work efficiently.

A walk through of some common procedures

To demonstrate a few of these techniques, our data team web scraped and preprocessed a sample of a brand’s Google Review data.

During the initial exploratory analysis, it is common to look at distributions to get a better sense of the data. Here, we have displayed review lengths distributions, as well as the distribution by review rating. Ratings are between 1 and 5 stars.

Based on this visualization, we learn that most of the reviews that are greater than three stars, are between one and ten words long. At first this may not seem particularly meaningful however, if designing a sentiment analysis algorithm from scratch, engineered features like this often prove extremely valuable and cost-effective to increase the overall accuracy of the model.

Part-of-Speech (POS): Noun and Adjective Distributions

Text mining also allows us to extract specific types of words from the reviews, which are known as part-of-speech (POS). We can extract nouns, verbs, adjectives, as well as others, to view the frequencies of the words as they appear in the reviews. Below is a plot of the top twenty adjectives, and below that, a plot of the top twenty nouns from all the reviews in the sample.

Viewing nouns help us see what parts of the business are discussed most frequently and can also help with aspect-oriented sentiment analysis, which will be demonstrated in more detail later in this post.

In addition to viewing the frequencies, we can also see specifically what was mentioned in the reviews when the common words appeared. By matching those words, we can narrow our results to specific aspects of the business to help guide decision-makers focus on things that will improve the business most.

Below is the distribution visualization that we created of this brand’s Google review ratings as shown in the left bar chart. At first, our visualization does not provide us with much new information since we can already access this on the businesses Google profile. However, once our team preprocessed the text data, trained a sentiment analysis machine learning model, and scored the reviews, we were able to see a bit deeper into the ratings. To the right of the ratings distribution below is a stacked bar chart after training a model and performing the classification procedure. This view helps us peak into the sentiment of each rating itself.

With the reviews scored by sentiment, our team divides the words into positive and negative sentiment groups and displays the most frequent words in word clouds.

From the word clouds above, we see some common words that appear frequently in both contexts, while the word “employee” seems to have a strong correlation with negative context. It seems this business may want to investigate employee training to help improve that outcome.

Aspect-Oriented Sentiment Analysis

An interesting method of dealing with opinion is through aspect-oriented sentiment analysis. With the trained model, we generally perform a correlation analysis on the nouns and sentiment classifications to see which aspects of the business need might need the most improvement and where resources may be allocated most effectively.

Here we can see that price is most correlated with positive reviews and service is most correlated with negative reviews. Based on the confluence appearing in the broader analyses, as well as the insight revealed in the word clouds, this brand might want to focus more resources on staff training and service to best improve its review ratings in the future.

Tracking Sentiment Trends

Although sometimes overlooked, our team finds that it is also useful to track the reviews and sentiment through time to help spot fluctuations early on. In the data visualization that our data science team created here, the thicker black line is a rolling average of the sentiment score, and the pink line of the actual review rating.

Keeping track of the trend with a rolling average can help eliminate the noise of each individual review while helping the company spot drastic changes quicker than simply viewing the overall average sentiment or overall Google review rating. In addition, overall averages can be robust to early or short-lasting changes to ratings especially if dealing with large samples, but detrimental to a business if not dealt with quickly.

Key Takeaways

Although tracking the opinion of consumers continues to grow in complexity due to the nature of semantics, the speed of information and the increasing variety of text sources that must be analyzed, the process remains essential. Those brands that prioritize monitoring consumer sentiment understand the power negative opinions can have on their company’s reputation. In addition, with some savvy analysis, decision-makers are also provided new opportunities by using these procedures. By studying various aspects of language surrounding their brand, they not only have the ability to monitor sentiment and address problems quickly, but they can also make adjustments to various aspects of their business, fitting them to their audience’s needs quicker than ever before.

A Sentiment Analysis & Natural Language Processing Tutorial

Share this article

Let's connect and find out what we can do together .