(Complete Table of Contents here: http://aka.ms/backyarddatascience)
The definition for “skeptical” is “Not easily convinced, having doubts or reservations” (http://www.oxforddictionaries.com/us/definition/american_english/skeptical). As a Data Scientist, you’ll want to keep a healthy dose of skepticism in two areas:
- The source and meaning of data
- The conclusions someone draws from that data
In this notebook entry, I’ll focus on the second concern.
I saw an article last week with the provocative title “Why can’t Microsoft find analytic talent?” (http://www.analyticbridge.com/profiles/blogs/why-can-t-microsoft-find-analytic-talent). Being in the Data Science group at Microsoft, and wanting to know more, I read the article.
The article opines that Microsoft wants to bring in H1B talent, because they can’t find good workers here in the U.S. The author explains further than Microsoft does not pay well, is too large a company, and that they are hampered with hiring “the old, women, overweight people, and slow speakers” (My assumption here is that “slow speakers” are non-Americans). He sees these things as a weakness.
His data source is his apparent observations of some folks he thinks work for Microsoft that “show up at his favorite restaurant”, so I’m assuming he means these are the women, overweight, old, and minorities that he is basing his conclusions on.
In the Jewish “Book of the Wisdom of Solomon” (http://jewishencyclopedia.com/articles/14951-wisdom-of-solomon-book-of-the), there is an enigmatic set of proverbs, positioned next to each other. One says “Respond to a fool or he will think he is right and continue in his error”, and the other says “Don’t respond to a fool, because essentially you’re lending him credence and wasting your time.” Seems counter-intuitive, no? I’m told my my more learned friends that means that there are times you do, and times you don’t, answer someone who you think is wrong.
I couldn’t help but be reminded of these proverbs when I read the article, since I disagree with almost everything in it (he did spell Microsoft correctly), but I felt it would bring my credence down to even do that.
So instead, this becomes a wonderful learning opportunity for how we should treat conclusions and data.
I’ve long taught my daughter that whenever she is told something, she needs to ask three questions before she believes it: Who is telling me this, Why are they telling me this, and What are they telling me. Let’s use this method to break down a conclusion – we’ll use the aforementioned article as an example.
We start with the source of the interpretation. After all, if I don’t trust the person or program giving me the answer, I don’t need to go any further with the conclusion.
In the case of an algorithm or formula or even program, it’s a simple enough matter to test the process to ensure that you trust it. In the case of a person, that is harder to do.
The author of this article is in fact a highly educated person working in the field of Data Science. According to his LinkedIn page (http://www.linkedin.com/in/vincentg) he has been working in the statistical and computational area since 1995. So it seems his background tends to indicate a high level of scientific rigor.
However, I don’t see a background of working at Microsoft, doing studies on whether the old, women, or minorities make good data scientists or other authoritative information qualifying this person to make the claims made. So I have to wonder what qualifies this person to make the conclusion – he certainly seems to have the education to perform a study on the data and back up the claims.
The takeaway for us as we learn is to thoroughly research the person, algorithm, system or process that provides us with a conclusion before we accept it. This is standard scientific method. (http://www-personal.umd.umich.edu/~delittle/Encyclopedia%20entries/Verification.htm)
After verifying that you trust the person or system making the conclusion, the next task is to find out why that conclusion is presented. In the case of a formula or computer program, that is quite easy to determine: because you ran the data through the system.
In the case of a human or group of humans, motivation and bias come in to play. If a salesperson tells me that this car “Is the greenest diesel you can buy”, it’s important to know that she needs to sell that car by today to make her quota. Would her statement be correct if that were not the case? More research is needed.
The takeaway in this step is to see if the person or system has bias – and if you can account for that. (https://www.psychologytoday.com/topics/bias/essentials)
The article we’re using as an object lesson seems to have two points:
- Microsoft’s claims around H1B workers are incorrect
- A “good” Data Scientist does not work at Microsoft, for reasons of hiring and wage
The first point in this case seems to get lost – there are no data, examples, or citations on the first point. Without that source data, the claims are impossible to evaluate, so we will not continue that here.
The second point is the more interesting one. We need to define terms – what is a “good” data scientist? Is that a number of whitepapers published, systems developed, implementation of effective data science systems, etc.? And does hiring older workers, women, and minorities in fact a problem for an organization, or a strength?
Biology, society, and even a Financial portfolio need Diversity to survive – I’ll leave the study of business and scientific diversity as a topic for you to learn more about. http://www.workforcediversitynetwork.com/res_articles_DiversityMetricsMeasurementEvaluation.aspx
Let me be clear here that at Microsoft we are not “forced” to hire diversely – we do that on purpose! Especially and including the data science area: http://blogs.msdn.com/b/msr_er/archive/2015/03/24/diversity-in-data-science_3a00_-microsoft-research_1920_s-summer-school-aims-high-.aspx
And yes, we’re hiring – come one, come all. I don’t even care if you’re old, like me. 🙂 https://careers.microsoft.com/
There are a few folks you can look up in history, not to mention recently, who might be a good resource:
- Ada Lovelace – http://www.sdsc.edu/ScienceWomen/lovelace.html
- Grace Hopper – http://www.biography.com/people/grace-hopper-21406809
- George Washington Carver – http://www.biography.com/people/george-washington-carver-9240299
To name but a few of folks I admire in science, and who oddly enough appear to be women, old, and minorities. Even that, I admit, is a small data sample, but it would seem that anything that disputes the null hypothesis makes it not acceptable.
Learning from everything
This post isn’t a rebuttal to the article in question. That would take more effort than it is worth – the more important thing to takeaway is the process you can follow to evaluate the conclusions from a set of data. You should do that on your own, for every important decision.
In future notebook entries, we’ll take a look at the first question in data skepticism – the source of the data.