There are two defining difference between Data Science (or Data Mining for that matter) and other types of data analysis: The first is how far back you push the data analysis, and the other is the multiple processes and tools you’ll use within the analysis. In this post I'll explain the first difference. In a … Continue reading The Importance of Unit of Analysis
* Plot twist: It’s Mr. Math under that Mask, Scooby Doo! OK, you really can’t learn all about Machine Learning in five minutes. Or five days. Or five weeks. It takes longer than that, but what I can show you in five minutes is what Machine Learning is about, a couple of important terms to … Continue reading The Five Minute* Guide to Machine Learning
Hindsight, it is said, has 20/20 vision. We seem to be able to predict the past flawlessly – or can we? The answer is surprisingly “no”. “Creeping Determinism” is phrase from 1970’s psychology. It’s the effect of thinking that something was predictable, but only after it happens. We look back and say – “Ah – … Continue reading Can Data Science Cure Creeping Determinism?
The beginnings of data science is data. Data are things that you know about, well, other things, so it makes sense to ensure you have a firm grasp on handling that data. Note: I know this seems really is basic, but stick with me - it gets deep quick, and it's essential to understand this … Continue reading Databas(ics)
I add things to this site pretty much every week, so to help you navigate all the articles, I've set up this handy guide. You can bookmark this article, and I'll make sure I keep it up to date as I add new Data Science Notebook entries. In essence, I'm creating a book as I … Continue reading Backyard Data Science
(Complete Table of Contents here: http://aka.ms/backyarddatascience) What, Why, How In a previous Notebook entry, I showed you where you can learn Statistics. It’s one of the base skills you need to know if you're going to work with Data Science. But many times students know the process of using a statistical formula (more accurately called … Continue reading Knowing Which Statistical Formula to Use
(Complete Table of Contents here: http://aka.ms/backyarddatascience) Catalog This is a “Catalog” entry, which is used in Scientific Notebooks to list out a species name and data about that species. In this Notebook, I’ll use that to show a list entry of things you need to know. This entry is about Statistics, and where you … Continue reading Learning Statistics
(Complete Table of Contents here: http://aka.ms/backyarddatascience) What, Why, How In a previous notebook I introduced the R programming language and environment. While R is very powerful, widely used and has multiple packages, another language called “Python” is also popular with Data Scientists. Yes, you can do amazing things in R – in fact, part of … Continue reading Python for the Data Scientist