I have found that Data Science projects often deal with two types of clients: One of whom understands the Data Science Process and has a good grasp of what it can do, and the other who wants to add a dash of Machine Learning (ML) and/or Artificial Intelligence (AI) on top of Business Intelligence to find information they don’t have now.
The second type of project is doomed to fail. The team will drift between a high-level discussion of what they want to the details of a technology they have chosen even before the goal is fully addressed. They will implement something, and it won’t fulfil a goal because a focused goal was never defined. They fail on the basics, including understanding what Data Science is used for.
Let’s start at the beginning: Data Science doesn’t replace anything you’re doing now.
The Data Processing Strategy
You still need to ensure that you have a stable, reliable, clean, tested, backed-up and secure data processing strategy. A data processing strategy covers everything from “collection hygiene” where you ensure that dates are dates in a web form and there are no SQL Injection attacks, to a good Relational Database Management System (RDBMS) design and implementation.
If you’re not mature in this foundational area, there’s little point in going forward. Almost every customer I talk to has work to do to ensure that the base data is reliable, secure, and performs well.
Query and Reporting
Once your data processing strategy is sound, you need to know who needs to access the data, what they need, and how they need to see it. Many organizations that tell me they need deep AI to solve a problem often simply need a query that returns the answer or some basic statistical knowledge. Whatever querying tool you give your users, it needs to be flexible enough to give your users the information they need from the systems they should have access to.
Reports are the life-blood of an organization. Most companies would do well to stand up a series of trainings for things like Excel reporting, Power BI and the like. Reporting isn’t going away, and will always be important.
There are times when you need to do historical exploration of data. From Data Marts to Data Warehouses to Enterprise Data Warehouses, Data Lakes, Data Troughs, and whatever other terms we use to store lots of data, you need a way to process and analyze it.
A good Business Intelligence system allows the right users to ask the right questions of historical, aggregated data. On top of these systems you still need to have a good query strategy (who should have access to what, and how) and a good reporting strategy. In fact, this area excels in “exploratory” reporting, which is more data visualization than paper/screen reporting.
Finally, we come to Data Science. I’ve explained what Data Science is in another article, and I’ve also explained the process you should be using to engage a Data Science team.
The first part of that process is to define the question. You need to be able to take a business question and turn it in to something you can answer with statistics, Machine Learning, or Artificial Intelligence – if possible. Not all questions can be answered with Data Science.
I’ll cover the specifics of converting requirements into questions, and questions into Data Science questions, in another article. But for now, know that if you don’t have a solid foundation, a good understanding of the data you have access to and how your users are using it, and a solid question to pursue, you need to back up.
My advice to my clients? Start at the very beginning. It’s a very good place to start.