In this series on DevOps for Data Science, I’ve explained the concept of a DevOps “Maturity Model” – a list of things you can do, in order, that will set you on the path for implementing DevOps in Data Science. The first thing you can do in your projects is to implement Infrastructure as Code (IaC), and the second thing to focus on is Continuous Integration (CI). However, in order to set up CI, you need to have as much automated testing as you can – and in the case of Data Science programs, that’s difficult to do. From there, the next step in the DevOps Maturity Model is Continuous Delivery (CD). Once you have that maturity level down, you can focus on Release Management.
Release Management (RM), as a concept, is essentially what it says – determining a method of releasing new and changed software into an environment in a planned fashion. While this sounds simple, it actually takes quite a bit of forethought and planning, and involves not only the technical teams, but several business teams as well.
RM is slightly different based on whether you are selling the software you are writing. But in any case, it comes down to the business function of planning. So how does that affect the Data Science team?
In Data Science, new releases are mostly based on either new questions, or improved algorithms and data. It’s more of an academic cadence – changes are considered when something is new or improved. But in the business function, predictability of change is paramount. Think about a petrol station – you show up with your auto, and the attendant says “Good news everyone! We’ve completely redesigned the fuel delivery system because this one is far better.” You protest that the new nozzle doesn’t work with your car at all, and that you had no idea this change was going to happen. “But t’s better” the attendant protests. While that sounds silly, it’s the sort of thing that businesses consider. They aren’t as interested in your new method unless it doesn’t break everything else, and they want to be able to charge for it – or in the case of internal software, be able to “sell” the effort of the change to management.
So what are the practical steps you need to take in this level of maturity for DevOps? Well, now that you’ve gotten the mechanics of the previous steps handled, you can understand the flow of change required by your organization, and make appropriate choices in your work efforts. If you know a small change (say a parameter in an algorithm) could boost prediction performance in your part of the application, you can fold that in to the shorter timeframe your business has for the release cycle. If you come across a new Python or R library that you think will significantly change the code, that’s when you need to coordinate with the larger team.
It’s at this point where you’ll really see the value of a DevOps mindset – many of the friction points between business and technology teams can be alleviated by adopting a Release Management maturity level.
For Data Science, I find this progression works best – taking these one step at a time, and building on the previous step – the entire series is here:
- Infrastructure as Code (IaC)
- Continuous Integration (CI) and Automated Testing
- Continuous Delivery (CD)
- Release Management (RM)
- Application Performance Monitoring
- Load Testing and Auto-Scale
In the articles in this series that follows, I’ll help you implement each of these in turn.
(If you’d like to implement DevOps, Microsoft has a site to assist. You can even get a free offering for Open-Source and other projects: https://azure.microsoft.com/en-us/pricing/details/devops/azure-devops-services/)
- Need a quick introduction to DevOps? Check out this series: https://channel9.msdn.com/Series/DevOps-Fundamentals
- Here’s a complete, full course on DevOps on the Microsoft Virtual Academy – https://mva.microsoft.com/en-us/training-courses/devops-with-visual-studio-team-services-and-team-foundation-server-16779#!