Miss the intro to this #DataDrivenPR series? Get caught up here and here!
Now that you have access to customer data, you need to determine what the level of data quality is. Rarely are you handed a dataset that is in pristine condition and doesn’t need some level of cleaning and normalizing. You should ask yourself a couple of basic questions before you start your analysis to better understand your data quality level:
Do you have any additional data sources to compare your data against?
Having a second data set to compare against is not a given, it’s a luxury in most situations. If you don’t have a second data set to directly correlate to, seek validation through other methods such as financial or other publicly available data.
Do you have a second set of eyes?
Sometimes the best thing you can have to validate a data set is another person. As a best practice, your analysis should be peer reviewed to ensure accuracy and repeatability. Before getting to the analysis phase of your project, have a peer review the data set for structure and integrity. It can be as simple as “can you look at this with me?” – it doesn’t need to be a formal review process.
Do you see a lot of anomalies, outliers or wide ranges of values that seem inconsistent?
An example of anomaly in your data set might be having a value of “30” when the range should be 0-10. An outlier might be a $1M deal when every other deal is valued at $10k. Before starting your analysis you should check back with the source of the data, ideally a person or company to see if something was miscoded or if those values are indeed accurate. You might also be seeing values in your data set that just don’t belong. For example are there special characters (%, $, @) where there should be numeric values?
Can you document your process and make it repeatable?
A best practice with dealing with data is to document what you’re doing – whether it’s an automated log file or a good old fashioned list of steps that you’re writing out. As easy way to determine whether or not your data is usable is by having a repeatable process that garners the same result over and over.
When working with data, apply some common sense. If the data looks weird it probably is. Before moving forward to analysis, make sure you’re working with a data set that is clean and valid.
Many people jump straight from acquiring data to analyzing without stopping to assess it first. You can save yourself a lot of time and headache by taking some time to evaluate what you’re working with first.
Coming up in Part 4, Data Analysis: You have the data, now what?
Director, Marketing Technology