DevOps for Data Science – Continuous Delivery

In this series on DevOps for Data Science, I’ve explained the concept of a DevOps “Maturity Model” – a list of things you can do, in order, that will set you on the path for implementing DevOps in Data Science. The first thing you can do in your projects is to implement Infrastructure as Code (IaC), and the second thing to focus on is Continuous Integration (CI). However, in order to set up CI, you need to have as much automated testing as you can – and in the case of Data Science programs, that’s difficult to do. You can, however, mitigate this problem a great deal, and get your part of the solution as automated as possible.

The next step in the DevOps Maturity Model is Continuous Delivery (CD). There’s actually some discussion we need to cover here, since the definitions of DevOps and Continuous Delivery are quite similar, and to some, CD doesn’t belong “under” DevOps. Both DevOps and CD involve an agile mindset of releasing smaller, faster, and automated bits of code into the process rather than waiting for several changes to integrate at once. But DevOps is more a philosophy of teams working together to that end, and CD is a guided process involving all of the steps of design, coding, tracking, testing and release. CD is often more tool-aligned than DevOps is (or at least DevOps shouldn’t be tool oriented). If you look at a standard workflow in Visual Studio Team Services, you’re effectively looking at CD, but not necessarily DevOps.

Just to confuse things a bit further, some DevOps references define the “CD” acronym as Continuous Deployment – which is another implementation function. Continuous Deployment means automating the build so that changes happen automatically, all the way out to the deployment process of the end user’s software. Imagine a smartphone app that can take a picture of a plant and identity it. The Data Science function within this application is a trained model using custom vision API’s, and perhaps you make a change that improves the recognition score. Once tested, your change would not only be placed into the build, but pushed all the way out to the user automatically – perhaps within minutes of the test completing. That’s Continuous Deployment – then mechanisms that make that push possible.

So I’ve included Continuous Delivery as the third maturity of DevOps, which I’m certain will annoy the purists on both sides. However, I think it belongs there because until your teams have a DevOps mindset, it will be harder to effectively implement a true Continuous Delivery system. And I think that starting with IaC and CI is essential to start the CD journey.

So with those explanations in mind, how does the Data Science team fit in to CD? It’s here that we face another change in your day-to-day routine. You’ll need to learn, understand and use whatever CD system your company uses. Here at Microsoft we use Visual Studio Team Services (VSTS), which includes CD and the ability to implement DevOps. And yes, some of the Data Scientists have had to go back to school on it. Learning these systems – the “plumbing” – isn’t often desirable to a bona-fide Data Scientist, but it’s essential to being part of a team, and having a DevOps mindset. Underneath VSTS we use git and github, which has other implications. Most Data Scientists I’ve worked with do understand git commands, so there’s less pushback there.

See you in the next installment on the DevOps for Data Science series, where I’ll cover the next level in your DevOps Maturity Model for Data Science teams.

For Data Science, I find this progression works best – taking these one step at a time, and building on the previous step – the entire series is here:

  1. Infrastructure as Code (IaC)
  2. Continuous Integration (CI) and Automated Testing
  3. Continuous Delivery (CD)
  4. Release Management (RM)
  5. Application Performance Monitoring
  6. Load Testing and Auto-Scale

In the articles in this series that follows, I’ll help you implement each of these in turn.

(If you’d like to implement DevOps, Microsoft has a site to assist. You can even get a free offering for Open-Source and other projects: https://azure.microsoft.com/en-us/pricing/details/devops/azure-devops-services/)

Advertisements

Published by: BuckWoody

Buck Woody works on the Microsoft Cloud and AI Team, and uses data and technology to solve business and science problems. With over 35 years of professional and practical experience in computer technology, he is also a popular speaker at conferences around the world; author of over 700 articles and seven books (databases, machine learning, and R) sits on various Data Science Boards at two US Universities, and specializes in advanced data analysis techniques. He is passionate about mentoring and growing the next generation of data professionals. Specialties: Data, Data Science, Databases, Communication, Teaching, Speaking, Writing, Cloud Computing, Security Clifton's Strengths: Individualization, Learner, Connectedness, Positivity, Achiever, Ideation

Categories Data Science, Artificial Intelligence and Advanced Analytics Project Management, DevOps, Learning Data ScienceTagsLeave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.