(Complete Table of Contents here: http://aka.ms/backyarddatascience)
One of the most distinctive features of Data Science, as opposed to working with databases, Business Intelligence or other data professions, is its heavy use of statistical methods. At the first appearance of computing science, programs and algorithms were created to deal with the large amounts of calculations required in statistics.
One of those implementations was the “S” programming language, invented in the mid-1970’s. Based on those concepts, the “R” environment was created by Ross Ihaka and Robert Gentleman in New Zealand under the GNU license. Interestingly, it’s written in C, Fortran, and R itself. It’s one of the premier languages and environments you can use in Data Science. It has amazing language breadth, and it can be extended through the use of “packages” – there are SO many packages out there, your first task in using R, it seems, is learning what is already written so you can leverage it.
In future notebook entries we’ll explore working with R, but for now, we need to install it. That really isn’t that difficult, but it does bring up something we need to deal with first. While the R environment is truly amazing, it has some limitations. It’s most glaring issue is that the data you want to work with is loaded into memory as a frame, which of course limits the amount of data you can process for a given task. It’s also not terribly suited for parallelism – many things are handled as in-line tasks. And if you use a package in your script, you have to ensure others load that script, and at the right version.
Enter Revolution Analytics – a company that changed R to include more features and capabilities to correct these issues, along with a few others. They have a great name in the industry, bright people, and great products – so Microsoft bought them. That means the “RRE” engine they created is going to start popping up in all sorts of places, like SQL Server 2016, Azure Machine Learning, and many others. But the “stand-alone” RRE products are still available, and at the current version. Microsoft acquired Revolution R and renamed it to Microsoft R – So that’s what we’ll install.
Microsoft R builds on the R engine, so we’ll need that first. However, the installation for Microsoft Rhas a dependency on the version of R we install, and as of this writing that’s 3.2.2 for Microsoft R Open (MRO) and 3.1.3 for Microsoft R Enterprise (MRE). More on those choices in a moment.
You can start with Open-Source R – you can find that here: https://cran.rstudio.com/bin/windows/base/R-3.2.2-win.exe
Once the download completes, select your language and “Next” from the Welcome panel.
Select “Next” after you read the Information panel, then “Next” again at the Select Destination Location panel.
At the Select Components panel, select “Next” again, and for all subsequent panels unless you want to change the defaults – although you don’t need to change anything for RRO or RRE.
You have two choices for the stand-alone version of Microsoft R: Open, and Enterprise. The differences between the two are summed up at this bottom of this page: http://www.revolutionanalytics.com/get-revolution-r
For this exercise, we’ll install the Open version of Microsoft R, although in production we’ll want those features that deal more with the limitations of R, and also provide interfaces to Hadoop and contains interfaces to work with big data. We’ll install on Windows, but Ubuntu, RedHat, and SuSE Linux is also supported. We start here:
Since we’re running Windows during this installation, we select the link next to that platform.
Once that downloads, we start the installation:
For this installation, we can take all the defaults. I did add the icon to the quick launch area, since I plan to be in R quite a bit. After we install the main RRO package, we’ll want the math libraries. You can see that at the download site, just to the right of the installation for the Windows package we just launched. Click that “MKL” link, and once again, take all the defaults.
Exploring the Tools
We have two new folders on the Windows Start Menu. One is for R, and the other is for Microsoft R.
Inside the RRO folder there’s an icon for the Microsoft RGUI:
Opening that brings us to the R Console, with RRO loaded up and ready. We can now run some simple commands, like these:
We’ll get into what all this means in future posts – but that doesn’t stop you from taking a free class ahead of time: https://www.datacamp.com/courses/big-data-revolution-r-enterprise-tutorial