If you’re a data professional, you know that it’s important to set aside some time for training when a new release or paradigm comes from your platform. In the case of SQL Server 2019 (and later), you’ll want to pay close attention to the Big Data Clusters feature. It’s a exponential knowledge increase, and that’s no exaggeration.
There’s a lot to learn to implement SQL Server‘s Big Data Cluster system. I’ll be covering these topics at various workshops, events, courses, webinars and presentations around the world in more depth, and I thought I might show a few of the things the data professional needs to understand to get ready.
Some of these technologies and concepts are not owned or created by Microsoft – the concepts are universal, and a few of the technologies are open-source. I’ve marked those in italics.
I’ve also included a few links to a training resource I’ve found to be useful. I normally use LinkedIn Learning for larger courses, along with EdX, DataCamp, and many other platforms for in-depth training. The links I have indicated here are by no means exhaustive, but they are free, and provide a good starting point.
Look for the training announcements I’ll post here on this blog to find out where our team is presenting these topics, and feel free to post comments on resources you have found useful.
Linux – Operating system used in Containers and Container management (Kubernetes)
git – Source control management system
Containers – Encapsulation level for the SQL Server Big Data Cluster architecture
Kubernetes – Management, control plane and security for Containers
Microsoft Azure – Cloud environment for services
Azure Kubernetes Service (AKS) – Kubernetes as a Service
Apache HDFS – Scale-out storage subsystem
Apache Spark – In-memory large-scale, scale-out data processing architecture used by SQL Server
Python, R, Java, SparkML – ML/AI programming languages used for Machine Learning and AI Model creation
Azure Data Studio – Tooling for SQL Server, HDFS, Kubernetes cluster management, T-SQL, R, Python, and SparkML languages
SQL Server Machine Learning Services – R, Python and Java extensions for SQL Server
Microsoft Data Science Process (TDSP) – Project, Development, Control and Management framework
Monitoring and Management – Dashboards, logs, API’s and other constructs to manage and monitor the solution
Security – RBAC, Keys, Secrets, VNETs and Compliance for solutions
If that looks like a lot, it’s because it’s a lot. Stay tuned – I’m with you on the journey. We’ll learn together.