Big Data is just Data

A few years ago it was all the rage to talk about "Big Data". Lots of descriptions of "Big Data" popped up, including the "V's" (Variety, Velocity, Volume, etc.) that proved very helpful. I even have my own definition: Big Data is any data you can't process in the time you want with the systems … Continue reading Big Data is just Data

Ethics and the Importance of Being an Information Skeptic

Whenever I teach or present a session on Artificial Intelligence, I start with Ethics. We've created a site where you can quickly walk through a few of the major principles we follow at Microsoft for AI here: http://aka.ms/ai-ethics. I walk through these principles before I show how to design a Machine Learning solution, and then … Continue reading Ethics and the Importance of Being an Information Skeptic

Data Wrangling – Regular Expressions

The first step of deep analysis using the Team Data Science Process is to find the right question. From there, we need to determine where the data you need lives, or even if you have it. If we don’t have it, we need to get it – a topic I’ll cover in another post. The … Continue reading Data Wrangling – Regular Expressions

The Inherent Insecurity of Data Science

  Data Science attempts to derive meaning from data. There are a lot of techniques, processes and tools you can use to do that – I cover those in this blog site. But Data Science is insecure – by default. And that’s a real problem. In a solution involving a Relational Database Management (RDBMS) system, … Continue reading The Inherent Insecurity of Data Science

Is the Microsoft R Client a…client?

Microsoft has recently been on a tear introducing R into, well, everything. And now there are several R offerings - from Microsoft R Server and Microsoft R Open to R Services in SQL Server (2016) and now the Microsoft R Client. But is the Microsoft R Client a client? So it's a command-line, or a … Continue reading Is the Microsoft R Client a…client?

The Cortana Intelligence Suite (and friends)- What to Use When

I'm just back from trips around the U.S. and Europe teaching Data Scientists and Data Architects how to use the Cortana Intelligence Suite (formerly Cortana Analytics) in deep analytic projects. There are multiple components to learn and apply - and one of the most commonly asked questions is "What do I use for a given … Continue reading The Cortana Intelligence Suite (and friends)- What to Use When

The Importance of Unit of Analysis

There are two defining difference between Data Science (or Data Mining for that matter) and other types of data analysis: The first is how far back you push the data analysis, and the other is the multiple processes and tools you’ll use within the analysis. In this post I'll explain the first difference. In a … Continue reading The Importance of Unit of Analysis