I stopped by Kontagent Konnect yesterday, the up-and-coming data analytics service provider‘s inaugural user conference. And while most of the event’s content may have focused on Kontagent’s core social/mobile/gaming app verticals, a presentation on the so-called 7 Deadly Sins of Data Science by President and Chief Science Officer Josh Williams had solid advice on common pitfalls to avoid no matter how you’re leveraging analytics.
Williams’ seven points, as per his presentation (Update: Williams got in touch to clarify a few points from his presentation, noted below):
- Sloth: Lazy data collection. You can’t manage what you can’t measure, as the saying goes, and if you don’t take a good, hard look at the instrumentation you’ve set up and data you’ve collected and vet them for completeness and usefulness, you’re not going to be able to trust your analytics at all. An important corollary to that is that what you don’t measure is just as important as what you do. For instance, if you’re measuring application usage, but not usage over time, you might be missing the critical insight that the heaviest users only use the app for six weeks before burning out entirely – an insight that could make or break a business.
- Negligence: Misapplied analysis. Use the wrong algorithm, use old data, use a single data point rather than a range, or simply use your analytics improperly, and the result is the same: An actionable insight, sure, but one that tells you to jump the wrong way. Paying attention to data ranges and tempering your confidence in analytics with common sense and prudence are the antidote here.
- Gluttony: Too many reports. Even in the age of Twitter, there’s such a thing as information overload. Analytics can turn up a dizzying array of data, 24 hours a day, 7 days a week, and there’s a temptation to soak it all in. But it’s too much for anyone to absorb in any meaningful way. It’s up to the data scientist to make sure that every single report to the customer has real value, or else any signal will become lost in all the noise.
- Polemy: Disagreement on definitions and actions. Don’t get into a philosophical debate over whether or not analytics represent objective reality and “the source of truth.” Just try to draw the best conclusions you can from the data. (Update: Williams expands on this, explaining that what’s really important is an organization-wide consensus on what you’re measuring for and the really key metrics, lest you spend more time arguing about the analysis than acting on its conclusions.)
- Imprudence: Jumping to conclusions. Simply put, what works for another business might not work for yours. The competition ran their own analytics on their own customers and came up with their own conclusions. If you base your decisions on their data, you run the risk of swerving the business into a brick wall. (Update: Williams clarifies that using others’ data is just one form of imprudence – basically, any time you look at the data and misunderstand the context or the caveats of the analysis, you’re heading for failure.)
- Pride: Decision-driven data making. Always beware an executive who comes up with an idea first and looks at the analytics afterwards. A common theme of the Kontagent Konnect programing I attended was the concept of the democratization of data science. In other words, once analytics are in the hands of an entire organization’s leadership, anybody who notes a key trend can come forward with the idea. But if someone tries to use the data to back up a preconceived notion, the entire benefit of big data analytics is lost.
- Torpor: Learning and acting slowly. If you know how to handle it, big data provides plenty of feedback, really quickly. But all that insight is useless if you’re slow to integrate it into your business. If the data suggests you need to change, make the change quickly.
The talk was couched in religious symbolism, Williams said, simply because he’s religious about data science. And while many of these points may seem obvious to businesses already far along their own big data journeys, it’s extremely important to have these best practices codified, especially as the industry searches for the next generation of data scientists. And besides that, it’s simply good advice.