In today’s digital marketing world, with so much information and tools at our fingertips, data analysis is more necessary than ever before. What better way to leverage this information than using what is essentially a correlation analysis? Correlation analysis allows you to measure the strength of the relationship between certain data points and actions (but not the cause). Using a linear regression analysis for marketing purposes can open up new doors and insights that you otherwise would not have discovered. Let’s look at what, why, and how you can you use this model to your advantage.
What is a Regression Analysis?
After reading this blog title, you may be thinking, “What is a regression analysis?” If you turned to Google, like the majority of us do, you were probably hit with this Google snippet:
“In statistical modeling, regression analysis is a statistical process for estimating the relationships among variables.” – Wikipedia definition of regression analysis
Great, but once again, “What is a regression analysis?” This time in common English, please!
A regression analysis is a way for us to measure the relationship of one variable to another. This allows us to see what factors of our marketing efforts relate to others. Exploring the relationship between different marketing outlooks and actions creates a foundation for eventually testing causality.
Here are some examples of how a regression analysis can be used for marketing purposes:
- Analyze if Social Engagement relates to Pageviews
- Discover whether E-mail Open Rates relate to Conversions
- Learn whether Page Authority relates to Organic Pageviews
Once we determine whether a relationship exists, we can dig deeper into the relationship to find out why.
How Do We Perform a Regression Analysis?
So first and foremost you’ll need graphing data software – we often use Tableau at SHIFT. You can easily access their free public software here. If you have any questions, feel free to sign up and use the Tableau slack channel. You can also use the R project or even common spreadsheet software.
Once you have that setup, you’ll want to collect the data based on what metrics you want to consider variables. In this exercise, we’ll use the first example listed above and analyze if Social Engagement drives Pageviews.
First, we collect our social media engagement data through the platform’s analytics, such as Facebook Insights and Twitter Analytics. You’ll want to ensure you have the URL to your domain’s site included and save it down to a spreadsheet. If you’re having trouble connecting the different platforms of data, try using VLOOKUP in Excel. Although the step-by-step walkthrough we linked to is regarding a media list, you can use this formula to link any data sets within your spreadsheet.
Next, we’ll want to export the Pageview data in Google Analytics. Go to Behavior > Site Content > All Pages. Highlight the timeframe that you are going to focus on whether it’s a month, quarter or year. Just ensure whatever time period you use for Pageviews, you use the same time frame for Social Engagement data.
Lastly, connect both datasets in Excel using the VLOOKUP formula. This is the final step in data prepping before using Tableau to perform a regression analysis.
When you connect this Excel to Tableau, select the scatter plot graph. Next, drag Pageviews to Columns and Total Social to Rows, followed by dragging URL onto the graph itself. From there, the dots in the scatter plot should populate. Once the blue circles appear, much like the image below, add the trend lines by going to Analysis > Trend Lines > Show Trend Lines.
So I have my data and my graph set up, now what? Hover over the grey trend lines and this information will appear:
We see two data points here, R-Squared and the P-Value.
The P-Value is the probability that our hypothesis – a relationship between these two variables – is true. Generally speaking, a p-value below 0.05 indicates statistical significance – our hypothesis has a reasonable chance of being true. Above, we see a p-value of 0.03, indicating statistical significance: our model is mathematically sound.
R-Squared represents how strong the correlation is, on a scale from 0 – 1. Anything above .85 is a strong correlation; the data fits closely to the line the software drew. Anything below .7 indicates very loose or no relationship; while the data and the line are going in the same general direction, it’s too loose to draw any useful conclusions. Above, we see that this data fits very poorly to the line. The relationship is quite weak.
There are a few insights that can be drawn from this data, even if it doesn’t support the claim that we want it to.
- We can drag and drop Engagement by Platform into Tableau to test different relationships. How many different combinations have low p-values and high R-squared values?
- Looking at the chart, we can see a handful of blogs that have garnered higher volumes of Pageviews with low volumes of social shares. This may be an opportunity to re-share these blogs on our platforms to gain more traction. Having posts appear in the top right quadrant of this graph, would mean these posts were optimized for both engagement and traffic.
- Are there other channels or mediums that are correlated with Pageviews? We can run a regression analysis on potential dimensions causing these Pageviews and garner insight.
Marketing has become a data-driven service and, as a result, we should all feel more comfortable pulling from our statistics knowledge bank. A linear regression analysis is a great stepping stone into the stats sphere and allows us to actually garner insight into different marketing metrics relationships.