Ethics and the Importance of Being an Information Skeptic

Whenever I teach or present a session on Artificial Intelligence, I start with Ethics. We've created a site where you can quickly walk through a few of the major principles we follow at Microsoft for AI here: http://aka.ms/ai-ethics. I walk through these principles before I show how to design a Machine Learning solution, and then … Continue reading Ethics and the Importance of Being an Information Skeptic

Data Wrangling – Regular Expressions

The first step of deep analysis using the Team Data Science Process is to find the right question. From there, we need to determine where the data you need lives, or even if you have it. If we don’t have it, we need to get it – a topic I’ll cover in another post. The … Continue reading Data Wrangling – Regular Expressions

The Inherent Insecurity of Data Science

  Data Science attempts to derive meaning from data. There are a lot of techniques, processes and tools you can use to do that – I cover those in this blog site. But Data Science is insecure – by default. And that’s a real problem. In a solution involving a Relational Database Management (RDBMS) system, … Continue reading The Inherent Insecurity of Data Science

Is the Microsoft R Client a…client?

Microsoft has recently been on a tear introducing R into, well, everything. And now there are several R offerings - from Microsoft R Server and Microsoft R Open to R Services in SQL Server (2016) and now the Microsoft R Client. But is the Microsoft R Client a client? So it's a command-line, or a … Continue reading Is the Microsoft R Client a…client?

The Cortana Intelligence Suite (and friends)- What to Use When

I'm just back from trips around the U.S. and Europe teaching Data Scientists and Data Architects how to use the Cortana Intelligence Suite (formerly Cortana Analytics) in deep analytic projects. There are multiple components to learn and apply - and one of the most commonly asked questions is "What do I use for a given … Continue reading The Cortana Intelligence Suite (and friends)- What to Use When

The Importance of Unit of Analysis

There are two defining difference between Data Science (or Data Mining for that matter) and other types of data analysis: The first is how far back you push the data analysis, and the other is the multiple processes and tools you’ll use within the analysis. In this post I'll explain the first difference. In a … Continue reading The Importance of Unit of Analysis

The Five Minute* Guide to Machine Learning

* Plot twist: It’s Mr. Math under that Mask, Scooby Doo! OK, you really can’t learn all about Machine Learning in five minutes. Or five days. Or five weeks. It takes longer than that, but what I can show you in five minutes is what Machine Learning is about, a couple of important terms to … Continue reading The Five Minute* Guide to Machine Learning

Can Data Science Cure Creeping Determinism?

Hindsight, it is said, has 20/20 vision. We seem to be able to predict the past flawlessly – or can we? The answer is surprisingly “no”. “Creeping Determinism” is phrase from 1970’s psychology. It’s the effect of thinking that something was predictable, but only after it happens. We look back and say – “Ah – … Continue reading Can Data Science Cure Creeping Determinism?

Occam’s Razor and the Data Science Project

The Cortana Analytics suite from Microsoft is not a single platform, but actually a group of related products and features. Why so many? Couldn’t someone just use Microsoft R Server, or Azure ML, or Hadoop to create a solution? Isn’t the simplest solution always the best? Well, yes, but only inasmuch as it is as simple … Continue reading Occam’s Razor and the Data Science Project

But *Why* Do You Trust Your Data?

At the beginning of every data project is the data. While we spend a great deal of time figuring out how to move it, store it, compute it and evaluate it, the most important step is often given short shrift – sourcing the data properly. And that involves two things: Finding authoritative data and knowing … Continue reading But *Why* Do You Trust Your Data?