DevOps for Data Science – Load Testing and Auto-Scale

In this series on DevOps for Data Science, I’ve explained the concept of a DevOps “Maturity Model” – a list of things you can do, in order, that will set you on the path for implementing DevOps in Data Science.

The final DevOps Maturity Model  is Load Testing and Auto-Scale. Note that you want to follow this progression – there’s no way to do proper load-testing if you aren’t automatically integrating the Infrastructure as Code, CI, CD, RM and APM phases. The reason is that the automatic balancing you’ll do depends on the automation that precedes it – there’s no reason to scale something that you’re about to change.

Load Testing

I covered automated testing a previous article, but that type of testing focuses primarily on functionality and integration. For load testing, you’re running the system with as many inputs as you can, until it fails. For the Data Science team, you should inform the larger testing team about any load-testing you’ve done on your trained model (or the re-training task if that is incorporated into your part of the solution) using any load testing tools you can run in R or Python or whatever language/runtime you are using.

The larger testing team will incorporate those numbers, run a “hammer” test on the entire solution, to see when the application becomes overloaded.

An interesting development I’m seeing lately is that the Data Science team is asking for the metrics from the load (which also contains performance information of course) to do data analysis and even prediction. That’s a great value-add.

Auto-Scale

The Auto-Scale maturity level is where you really need to interact with the entire team, from the very earliest planning phase – of course, that is the very definition of DevOps. You need to find out how large the system will be as early as possible, because it can affect the design of your system. Certain technologies allow scale (Spark, Hadoop, Docker, others) and other technologies don’t parallelize or scale well. Writing your code in an efficient but unscalable technology will come back to hurt the application in the end, if the solution will grow. If you create a huge architecture and the solution should scale down to an “Internet of Things” environment, you’ll likewise face issues. Of course, some languages can be used on scalable technologies and smaller ones, so it’s up to you to know the limits and features of the various ways of working through these scenarios.

With that, we’re done with my series on DevOps for Data Science. Follow the Maturity Model, develop the DevOps mindset, and take it one step at a time. It’s a journey worth taking.

 

For Data Science, I find this progression works best – taking these one step at a time, and building on the previous step – the entire series is here:

  1. Infrastructure as Code (IaC)
  2. Continuous Integration (CI) and Automated Testing
  3. Continuous Delivery (CD)
  4. Release Management (RM)
  5. Application Performance Monitoring
  6. Load Testing and Auto-Scale

In the articles in this series that follows, I’ll help you implement each of these in turn.

(If you’d like to implement DevOps, Microsoft has a site to assist. You can even get a free offering for Open-Source and other projects: https://azure.microsoft.com/en-us/pricing/details/devops/azure-devops-services/)

Advertisements

Published by: BuckWoody

Buck Woody works on the Microsoft Cloud and AI Team, and uses data and technology to solve business and science problems. With over 35 years of professional and practical experience in computer technology, he is also a popular speaker at conferences around the world; author of over 700 articles and seven books (databases, machine learning, and R) sits on various Data Science Boards at two US Universities, and specializes in advanced data analysis techniques. He is passionate about mentoring and growing the next generation of data professionals. Specialties: Data, Data Science, Databases, Communication, Teaching, Speaking, Writing, Cloud Computing, Security Clifton's Strengths: Individualization, Learner, Connectedness, Positivity, Achiever, Ideation

Categories Data Science, Artificial Intelligence and Advanced Analytics Project Management, DevOps, Learning Data ScienceTagsLeave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.