In the last post, we looked at a sample prediction using Google Analytics™ data to make a prediction about my blog’s website traffic. We used clean, compatible, well-chosen data and looked forward 365 days to see what future performance of my blog looked like.
What if, however, we didn’t have textbook data at our fingertips? How might our predictions go awry? In this post, we’ll look at some common scenarios which confound our predictive skills.
The first and most common scenario in predictive analytics is flat-out bad data. If we have data which is poorly formed, broken, incompatible, etc. – and we don’t know it – then our predictions are likely to be very wrong.
Novice data analysts often assume that a data source, especially an internal one or one from a bespoke source like Google Analytics™, is inherently clean. Nothing could be further from the truth. Treat all data as suspect until we’ve had a chance to inspect it for quality.
The second scenario where predictive analytics often fails is with black swans – events that are significant and impactful, but could not be foretold from existing data. While much business and marketing data is cyclical, seasonal, and predictable, we still encounter these events from time to time.
For example, no amount of predictive analytics could have correctly forecasted the September 11 attacks or the attack on Pearl Harbor, yet these events changed the world.
A third circumstance in which predictive analytics often fail is with confounding variables. These situations occur from our failure to understand our data and the context it occurs in. To use a classical data science and statistics example, suppose we’re modeling and predicting ice cream sales. We’ve got great sales data from the last 50 years, and we’re building our model based on it.
Yet, the next year we look back and we see our predictive forecasts were terribly wrong because the summer was unseasonably cold. If we only used our ice cream sales data and didn’t account for weather in the model at all, we missed the context of our data. There was a dependency we didn’t forecast that we should have known about; we certainly have weather data for the last 50 years and could have built models for a cold summer, an average summer, and a hot summer.
The fourth circumstance in which predictive analytics go awry is in insufficient engineering. This is specific to data science; feature engineering is the time we spend ensuring we’ve selected good data and trained our models appropriately.
For example, in my web analytics, if I’m attempting to forecast my traffic for the next year and I know anomalies are present, I should engineer them out. A random one-time Reddit hit is enough to skew a model, but if I had no part in creating the hit, if it was truly random, it has no place in the model.
Mistaking Predictions for Insights
The final way predictive analytics goes wrong isn’t with the prediction, but what we do with it. All descriptive and diagnostic analytics, being based in mathematics and statistics, can only tell us what happened. Predictive analytics models, built with the same math and statistics, will only tell us what is likely to happen.
None of these analytics ever explain why something did happen or why it will happen. None of these mathematical models understand the humans often at the root of the data we’re studying.
Never mistake what for why. Our models help us plan and predict what is to come, but we still require human insight and judgement to determine whether the circumstances of the model remain appropriate – and if not, how to build more insightful models.
Next: The Future of PR
In the next post in this series, we’ll review where we’ve been and what the road ahead looks like as more machine learning and artificial intelligence find their way into the world of public relations. Stay tuned!
Christopher S. Penn
Vice President, Marketing Technology