A social media data warning from Sherlock Holmes

In the literary classic A Scandal in Bohemia, consulting detective Sherlock Holmes warns us of a grave error that far too many commit, not only in forensic science, but in understanding the various claims made using data in the worlds of PR and marketing.

“I have no data yet. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”

BBC_One_-_Sherlock__Series_3___Sherlock_Series_3Photo Credit: BBC One

Several people called to my attention an article yesterday stating that there was no correlation between social sharing and the actual consumption of content. This is quite a bold statement, and I’m certain a number of people shared the article (possibly without reading it). The first question we must ask ourselves isn’t what we should do about it, but whether the facts supports the theory.

How would you go about proving this?

“Data! Data! Data! I can’t make bricks without clay!” – Sherlock Holmes, The Adventure of the Copper Beeches

The answer lies in the text of the article itself: “Chartbeat’s lead data scientist Josh Schwartz later clarified to The Verge that Haile was talking specifically about tweets“. Let’s get some social media data to work with from Twitter. I downloaded all of the tweets I’ve posted with links to my own website over the past two months, which gives you the tweet as well as the number of favorites, retweets, and replies. This will tell me how many people are sharing, with or without reading the content.

Screenshot_2_19_14__7_12_AM

So far, so good. From there, I went into Google Analytics and created a filter to show only traffic from Twitter. I took the visits to each URL from Twitter and lined them up next to each of the corresponding Tweets.

Screenshot_2_19_14__7_23_AM

If the theory is correct, there should be no correlation between the number of social shares and the number of people who visited each article from Twitter. Let’s find out by running a standard Pearson regression analysis. Any statistics tool, including your favorite spreadsheet, can do this. The answer is:

SOFA_Statistics_Report_2014-02-19_07_25_32

In this particular dataset, there is a moderate correlation of 0.247 between visits to the URL and retweets. It is not “no correlation”, which would be a value of 0, nor is it a strong correlation, which would be a value of 1.

Updated: Ethan Jewett pointed out in the comments and on Twitter that one of my Tweets promoting my book is a significant outlier that unduly influences the correlation. Using a different regression method (Spearman instead of Pearson), we get a correlation of .14, which is significantly weaker.

So what does this mean? For this particular audience, there is a weak correlation between retweets and people actually consuming the content, or at least getting to the content. Thus, you can’t make a global, generalized declaration that social shares and content consumption have no relationship. They may in this dataset. The next logical step would be to test out a different dataset or increase the sample size to get a more firm conclusion, and to test it with both correlation methods.

What you can definitively say is that every brand and every publisher must do their own work to find out whether their particular audience does or does not consume the content they share. Don’t rely on someone else’s data when you have your own data to look at – and certainly don’t make business decisions about the future of your company from someone else’s dataset.

Christopher S. Penn
Vice President, Marketing Technology

Download our new eBook, How to Measure the Value of PR

Posted on February 19, 2014 in Advertising, Marketing, Metrics, Twitter

Share the Story

About the Author

Christopher S. Penn has been featured as a recognized authority in many books, publications such as the Wall Street Journal, Washington Post, New York Times, BusinessWeek and US News & World Report, and television networks such as PBS, CNN, CNBC, Fox News, and ABC News for his leadership in new media and marketing. In 2012 and again in 2013, Forbes Magazine recognized him as one of the top 50 most influential people in social media and digital marketing; Marketo Corporation named him a Marketing Illuminator, and PR News nominated him as Social Media Person of the Year. Mr. Penn is the Vice President of Marketing Technology at SHIFT Communications, a public relations firm, as well as co-founder of the groundbreaking PodCamp New Media Community Conference, and co-host of the Marketing Over Coffee marketing podcast. He is an adjunct professor of Internet marketing and the lead subject matter expert and professor of Advanced Social Media at the University of San Francisco. He’s the author of the best-selling book Marketing White Belt: Basics for the Digital Marketer.
21 comments
cspenn
cspenn

@BenZee Yes. I jumped up one level to ask if anyone even made it to the site in the first place.

esjewett
esjewett

@cspenn @ScottMonty Just eye-balling it, I don’t think there’s much of a correlation at all in your sample data set.

esjewett
esjewett

@cspenn @ScottMonty No, there’s not. Outliers can create the appearance of false correlations in a Pearson regression.

esjewett
esjewett

@cspenn Sounds more like the reality of the data set. You’d have to do some significance tests, but .14 is pretty close to 0.

esjewett
esjewett

@cspenn Trimming the tops/bottoms could be OK, though it will skew the data a little.

cspenn
cspenn

@esjewett It's definitely weaker than a .247. Thanks for the feedback - it's genuinely appreciated.

esjewett
esjewett

@cspenn They aren’t comparable values, IIRC. Spearman takes a lot more work to determine significance.

esjewett
esjewett

@cspenn Glad it’s appreciated. Mis-used statistical tests are really rife in the industry. Really bothers me. They are powerful tools.

cspenn
cspenn

@esjewett Post text is updated with your feedback. Thank you :)

cspenn
cspenn

@esjewett Spearman came in at .14 but with a much higher p. Might just be more sensible to trim top/bottoms, no?

esjewett
esjewett

@cspenn @ScottMonty Did you know Pearson’s r is sensitive to outliers? Bad data set to use it on with that 80 retweet point.

ScottMonty
ScottMonty

I always appreciate a relevant Sherlock Holmes quote. Another, buried a little deeper in the Canon is from "The Adventure of Wisteria Lodge": 


"Still, it is an error to argue in front of your data. You find yourself insensibly twisting them round to fit your theories."



cspenn
cspenn moderator

@ScottMonty  I haven't read that one in a while. Recently cruised through Adventures and Memoirs, Last Bow is next up :)

ScottMonty
ScottMonty

@cspenn Well played, sir. Another is "Data! Data! Data! I cannot make bricks without clay." - 'The Copper Beeches'

Back to Top