A social media data warning from Sherlock Holmes

In the literary classic A Scandal in Bohemia, consulting detective Sherlock Holmes warns us of a grave error that far too many commit, not only in forensic science, but in understanding the various claims made using data in the worlds of PR and marketing.

“I have no data yet. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”

BBC_One_-_Sherlock__Series_3___Sherlock_Series_3Photo Credit: BBC One

Several people called to my attention an article yesterday stating that there was no correlation between social sharing and the actual consumption of content. This is quite a bold statement, and I’m certain a number of people shared the article (possibly without reading it). The first question we must ask ourselves isn’t what we should do about it, but whether the facts supports the theory.

How would you go about proving this?

“Data! Data! Data! I can’t make bricks without clay!” – Sherlock Holmes, The Adventure of the Copper Beeches

The answer lies in the text of the article itself: “Chartbeat’s lead data scientist Josh Schwartz later clarified to The Verge that Haile was talking specifically about tweets“. Let’s get some social media data to work with from Twitter. I downloaded all of the tweets I’ve posted with links to my own website over the past two months, which gives you the tweet as well as the number of favorites, retweets, and replies. This will tell me how many people are sharing, with or without reading the content.

Screenshot_2_19_14__7_12_AM

So far, so good. From there, I went into Google Analytics and created a filter to show only traffic from Twitter. I took the visits to each URL from Twitter and lined them up next to each of the corresponding Tweets.

Screenshot_2_19_14__7_23_AM

If the theory is correct, there should be no correlation between the number of social shares and the number of people who visited each article from Twitter. Let’s find out by running a standard Pearson regression analysis. Any statistics tool, including your favorite spreadsheet, can do this. The answer is:

SOFA_Statistics_Report_2014-02-19_07_25_32

In this particular dataset, there is a moderate correlation of 0.247 between visits to the URL and retweets. It is not “no correlation”, which would be a value of 0, nor is it a strong correlation, which would be a value of 1.

Updated: Ethan Jewett pointed out in the comments and on Twitter that one of my Tweets promoting my book is a significant outlier that unduly influences the correlation. Using a different regression method (Spearman instead of Pearson), we get a correlation of .14, which is significantly weaker.

So what does this mean? For this particular audience, there is a weak correlation between retweets and people actually consuming the content, or at least getting to the content. Thus, you can’t make a global, generalized declaration that social shares and content consumption have no relationship. They may in this dataset. The next logical step would be to test out a different dataset or increase the sample size to get a more firm conclusion, and to test it with both correlation methods.

What you can definitively say is that every brand and every publisher must do their own work to find out whether their particular audience does or does not consume the content they share. Don’t rely on someone else’s data when you have your own data to look at – and certainly don’t make business decisions about the future of your company from someone else’s dataset.

Christopher S. Penn
Vice President, Marketing Technology

Download our new eBook, Reinventing Public Relations

Posted on February 19, 2014 in Advertising, Marketing, Metrics, Twitter

Share the Story

About the Author

Christopher S. Penn is an authority on digital marketing and marketing technology. A recognized thought leader, author, and speaker, he has shaped three key fields in the marketing industry: Google Analytics adoption, data-driven marketing and PR, and email marketing. Known for his high-octane, here’s how to get it done approach, his expertise benefits companies such as Citrix Systems, McDonald’s, GoDaddy, McKesson, and many others. His latest work, Leading Innovation, teaches organizations how to implement and scale innovative practices to direct change. Christopher is a highly-sought keynote speaker thanks to his energetic, informative talks. In 2015, he delivered insightful, innovative talks on all aspects of marketing and analytics at over 30 events to critical acclaim. He is a founding member of IBM’s Watson Analytics Predictioneers, co-founder of the groundbreaking PodCamp Conference, and co-host of the Marketing Over Coffee marketing podcast. Christopher is a Google Analytics Certified Professional and a Google AdWords Certified Professional. He is the author of over two dozen marketing books including bestsellers such as Marketing White Belt: Basics for the Digital Marketer, Marketing Red Belt: Connecting With Your Creative Mind, and Marketing Blue Belt: From Data Zero to Marketing Hero.
Back to Top
Get Fresh PR News Delivered Weekly!

Get Fresh PR News Delivered Weekly!

Want fresh PR and earned media news delivered to your inbox? Sign up for the SHIFT HAPPENS newsletter!

You have Successfully Subscribed!