DevOps for Data Science – Who needs it?

Data Scientists have often worked in a bit of a “silo” – meaning they were off to the side in an organization, maybe not even part of the Information Technology (IT) function. But that is changing. As data science projects are adopted into the mainstream, there is a need for structure. I’ve explained a modern data science structure for integration, called the Team Data Science Process (TDSP). It’s similar to IITL or the MOF but is designed to handle the processes involved in machine learning, artificial intelligence, and other advanced analytics.

Developer Operations – or DevOps – is not a framework for “doing Information Technology”. It’s really three things: People, Process, and Products. I’ll explain more about DevOps in a later article, but the point is that DevOps overlays the TDSP nicely, and is certainly something you need to think about from the outset.  To distill the thought a bit, DevOps can be thought of as a “shift-left” mentality. That means at the very start of the project, you think about the outcomes of each step – coding, building, testing, deployment, security, patching – all that.

Seems difficult, doesn’t it? It’s actually not. Yes, there is work involved, but once you start, it simply becomes part of the process. And like all good habits, it requires a little effort and maintenance to keep it going. I’ll show you how to implement DevOps in Data Science as we go, but for now, know that it is essential to your data science projects. Essential? Why?

Because security. Because maintenance. Because testing. Because constant technical debt. For these reasons and many more that will become apparent, you need to start thinking about not only the TDSP as your structure your projects, but also DevOps. In this series I’ll show you how.

For Data Science, I find this progression works best – taking these one step at a time, and building on the previous step – the entire series is here:

  1. Infrastructure as Code (IaC)
  2. Continuous Integration (CI) and Automated Testing
  3. Continuous Delivery (CD)
  4. Release Management (RM)
  5. Application Performance Monitoring
  6. Load Testing and Auto-Scale

In the articles in this series that follows, I’ll help you implement each of these in turn.

(If you’d like to implement DevOps, Microsoft has a site to assist. You can even get a free offering for Open-Source and other projects: https://azure.microsoft.com/en-us/pricing/details/devops/azure-devops-services/)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.