In the last post, we introduced the idea of predictive analytics in public relations. Public relations would benefit strongly from predictive analytics, forecasting communications needs in advance.
The foundation of predictive analytics is one familiar to modern PR practitioners: data.
What data do we need for predictive analytics?
To make accurate predictions, we need data with three attributes, what I call the three Cs:
- Chosen Well
If our data lacks any of these attributes, creating reliable predictions will be impossible.
Prediction is nothing more than sophisticated mathematical extrapolation of existing data. Granted, the level of sophistication is generally beyond what most people are familiar with and far more time-consuming than people could reasonably accomplish in a day’s work, but it’s still just mathematics.
As such, if our data is corrupted, filled with junk, or flat-out wrong, our predictions will simply magnify these errors. To prepare for predictive analytics, we must clean our data as best as possible.
Most predictive analytics software and systems are based on open-source libraries and technologies. As such, we must ensure our data is in compatible formats so that as little conversion/manipulation is required prior to import. Some of the most popular data formats for compatible numerical data transfer are:
- Comma-separated value files, or CSV files – the gold standard
- Structured query language files, or SQL files
- Tab-separated value files, or TSV files
Other file formats we’ll run into frequently in predictive analytics include:
- SAS format, a file format from SAS Institute’s proprietary software
- SPSS format, a file format from IBM’s proprietary statistics software
- XLS/XLSX format, a file format from Microsoft’s proprietary Excel spreadsheet software
Dealing with proprietary software often means converting our data into compatible formats for use by multiple, different tools. To prepare for predictive analytics, we must work in formats that ensure maximum compatibility.
The final and arguably most important attribute of great data for predictive analytics is data that’s been chosen well. Chosen well means data with just the right amount of detail.
Too little detail, and our data will not form accurate predictions from a lack of information.
Too much detail, and our data will not form accurate predictions from overfitting or flat-out crashing our software due to size.
To prepare for predictive analytics, we must choose our data carefully. Choosing data well comes from great data governance, including documentation of what’s in our data, how we acquired it, and how we prepared it.
Sources of great data for predictive PR
So, what data fits the mold for the above criteria? What data do we have access to as communicators? A quick look at the data landscape shows much promise:
- Web analytics from systems like Google Analytics™
- Social media data
- Publication data from the many media monitoring systems out there
- Massive numbers of public data sets
- Trend data from reputable providers like Google
- SEO data
We are awash in sources of great data from which we build our predictive analytics.
Next: peering into the future
In the next post in this series, we’ll walk through a predictive analytics example that’s applicable to every PR practitioner and explain how to use it. Stay tuned!