The Citizen Analyst Manifesto, Part 5: Seek truth in your data

In this series, we explore what it means to be a citizen analyst, what values you stand for, and what qualities in the world you adamantly must oppose.

Seek truth in your data, yet not hold it too high.

When we talk about data, we’re talking about truth. The truth that lies in a set of facts, and the insight we hope to glean from those facts. Yet truth also requires clean data, data we’ve collected accurately, without error or bias. Let’s use Twitter’s new Polls feature as an example. When you issue a poll to your followers, what happens? The poll appears in their Twitter stream, and some of them answer it. You’ve put thought into your questions and structured your poll to collect the facts you need. So how can your data still be biased?

For starters, polls are voluntary: this means, there’s a strong chance your data has a non-response bias, i.e.only people who want to answer respond. Moreover, your responsive audience may be statistically different to your non-responsive one. Let’s say I asked about baseball–it’s a pretty motivating subject for a large segment of the population, right? Now, let’s say I phrased my question in terms of very specific limitations: “Which is the better baseball team, the Red Sox or the Yankees?” I’ve immediately segmented my potential responsive audience. Why? If you’re a White Sox fan, you probably won’t respond and the real question, which is the best team in baseball, will not be answered accurately by the poll. The responsive audience will be statistically different to the non-responsive one.

Next up, we have selection bias. Our audience is uniquely keyed to SHIFT. People follow us for specific reasons–asking questions of our followers generates different results to asking questions of the general population. Suppose we asked, “What is your opinion of SHIFT Communications?” According to the US Bureau of Labor Statistics, 208,000 people (as of 2014) work in the public relations industry. Based on the United States population, if you were to survey all 70,000 people seated at the Super Bowl this coming year, only 50 of them would work in public relations. Thus, the answer to the above question in the general population should be “I’ve never heard of SHIFT.” However, the Twitter poll would generate a different, statistically invalid answer.

Finally, accept that for some problems, objective data is not enough. Humans are emotional creatures, and the decisions we make are often based in emotion, in the orbitofrontal cortex of the brain. Only after we decide emotionally do we attempt to rationalize our decisions. When we analyze data about decisions, about choices our customers make, about how to describe a complex issue, we must remember to account for the qualitative nature of emotion.

Recall from last week’s Citizen Analyst the elevator in San Francisco which is missing both the 13th and 4th floors. Objective data clearly says two floors are missing, but the root reason those floors are missing is because of fear, an emotional response strong enough that hotel patrons would object to staying on either of those floors. Superstition makes us feel better about staying on the 14th floor, even though we understand rationally that it is 13 floors up.

Seek truth in your data, yet not hold it too high.


Christopher S. Penn
Vice President, Marketing Technology

Disclosure: IBM Watson Analytics is a trademark of IBM. Used with permission.


Keep in Touch

Want fresh perspective on communications trends & strategy? Sign up for the SHIFT/ahead newsletter.

Ready to shift ahead?

Let's talk