The Keys to Effective Data Science Projects – Part 9: Testing and Validation

blog3_thumb.pngWe’re continuing our discussion of the series of the Keys to Effective Data Science Projects,  this time focusing on Testing and Validating the Model. We’re in the general phase in the Team Data Science Process called “Customer Acceptance“.

“Testing” in the general sense is the same in Data Science projects and any other typical software project – it’s ensuring that the system does what it purports to do. The way you do that, however, is slightly different for a predictive, grouping, or deep learning solution than in other software.

The primary difference is that in most software projects, the outcome is most often deterministic. You know if your solution works because you can feed a set of data and/or criteria into the system and then get an expected outcome, the same result from the same data every time. If the outcome in the testing phase doesn’t match the expected values, then you can automatically fail the system.

In predictive and grouping solutions, the answer is often probabilistic. After all, Machine Learning algorithms are based on statistics, and statistics at it’s heart is an educated guess. So it’s more difficult to test and validate what you’re looking for.

In this final phase of Data Science projects we gather the stakeholders together and perform a series of tests just like in other software projects. But two problems emerge.

The first is that results often look deterministic – the user sees the end-application, and it says “That person will probably buy the product, so you should market to them.” We live in a short-attention, smartphone world, where many people don’t understand the technology and how it works. They are simply too trusting.  I’ve also had the issue that the users are not trusting enough, thinking that the system is not precise enough for their use. They are just too used to a deterministic outcome.

The other issue that comes out during this testing and validation is that too much of the testing is left to the user. The users will often run a test or two, and make a judgement based on far too few attempts, or not enough sampling, and other basic errors. They don’t understand how software like this should be tested, and don’t know to ask that question.

The first key in this phase for success is to explain the application as simply as possible, and no simpler. Take the documentation you used to develop the solution, the slides you used to present to management and other stakeholders, and all the math explanations you used with each other to create a simple, cohesive, understandable story you can re-present to the testers. The more they know about the solution, the better.

The second key is to stay with the testers so that you can watch how they are testing, keep them “out of the weeds” and suggest helpful ranges or other variables they might try. It’s important, however, that they drive the testing – not you – since you’ll over-fit your own testing with prior knowledge of what you want the outcome to be. Let it be as real-world as possible.

Next time we’ll continue talking about the Customer Acceptance phase, and the keys you need to remember for more success.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.