What is Data-Driven PR, Part 6: Analysis

At SHIFT, our approach is to apply equal parts art and science to build integrated programs that help brands connect with the people that matter most. But what does the ‘science’ part of communications entail? What does it look like in action?

First and foremost, it means to be data-driven in our planning and execution; to make informed decisions based on data and research. In this series, we examine how to become a more data-driven communications professional.

Analyzing our results

In the previous post we discussed setting up a test based on the hypothesis. Let’s now look at how to analyze the results to prove our test true or false.

The hypothesis

We’ll begin with a hypothesis: Trustworthy content is shared more on social media.

This hypothesis meets all the criteria for a good hypothesis:

  • Can be proven true or false
  • Testing one condition at a time
  • Identifiable variables and data sources

Take a moment to reflect: do you believe trustworthy content is shared more on social media? Do you believe the hypothesis above is true or false?

The data

We’ll use SHIFT’s custom-built SCALE Scanner to evaluate links for social sharing and trust, as measured by Moz’s MozTrust score. Let’s take a large, diverse sample of links and the relevant sharing and trust metrics for each of them. We’ll use a selection of 15,000 links shared at least once, from both B2B and B2C content selected from around the web.

The test

As trust is not a binary state, neither is our trust metric. Instead, it’s a logarithmic score. Thus, to prove or disprove our hypothesis, we should examine the spectrum of trust as a normal distribution, then examine how social media sharing aligns with that distribution.

The analysis: method

Before we do any analysis, we must inspect our data for hygiene, removing malformed or broken data. Once done, we’ll lay out our distribution:

Social and Trust

What we see above is a distribution of trust scores along a continuum. It looks relatively normal; that is, it is distributed with a lot of average trustworthy pages, and relatively few pages on either extreme.

Next, let’s add in our social media sharing, averaged by trust score:


What we see is a very different pattern of distribution. We see enormous amounts of sharing in low trust pages, followed by a lull in the middle, and then higher shares of high trust pages.

An important rule in statistics is that we cannot judge correlation based solely on visual observation. To confirm what we’re seeing, we must run the analysis through a statistical tool.

The statistical analysis shows a slight weak correlation, an R score of -0.14, which suggests that not only does trust not create social sharing, but could possibly inhibit it.

The verdict

In its current incarnation, our hypothesis, Trustworthy content is shared more on social media, has proven false. The next question is, are there conditions or circumstances in which it is true?

Next: refining

We’ve now established our hypothesis is false. Our next step is to determine how to refine it. Should we reject it entirely? Are there conditions which might make it true? We’ll learn more in the next post.

Ready to Work Together?

We're Ready