Why you should move from Stata to R

Before going into the details about studying economics with R, it makes sense to explain why you should use R compared to Stata. Before I start, please note that I have been using Stata occasionally for about a year whereas I spend much more time on R so I may forget some features that Stata has and that I am not aware of. However, I believe that what I have made with Stata corresponds to most Master students’ experiences, e.g. data cleaning and treatment, data analysis, econometrics, etc.

Now we can begin.

Reason 1: R is free

That may seem a false argument for some people, especially because in many universities, students have freely access to Stata. However, in my experience, I know that we frequently want to work home or in group on some projects and therefore we need Stata on our personal laptop. Therefore, some cracked versions circulate between students and it is well-known that when downloading illegally softwares (and movies, TV shows, etc.), there’s always a risk of being infected by a virus. I don’t know if this happens often or not, maybe you will never suffer from it, but it would be just stupid to have to suffer from a hacking just because the statistical software was not free. That’s the big advantage of R: it is completely free. Whatever your operating system, you can download base R and every package you want and it won’t cost any money.

Reason 2: R is open-source

I have already heard one of my professor complaining about the fact that Stata is a “black box” (not like those in planes but more like an opaque system). On the contrary, R is open-source (meaning that anyone can see the code, contribute to it and distribute it) and the code behind the functions you use is easily visible with just one click. That accessiblity entails the next argument, which is the diversity of packages.

Reason 3: the diversity of packages

There is A LOT of packages on R (more than 10,000 on CRAN as shown here, and it was in 2017!). Additionally to the packages on CRAN (the Comprehensive R Archive Network, where the stable versions of the packages are), some packages are hosted only on Github and others are made by users or companies only for private purposes and will not be released on open-source. The packages are the strength of R. Base-R (i.e. the basic version of R, without any packages manually installed) is a great start to learn how to code and to manipulate data, and in fact you can stay with base-R only if you limit your study to some basic data analysis. However, base-R may also be hard to learn and not very esthetic. Moreover, some packages allow to extend R functionalities beyond base-R.

This is a list (far from being exhaustive) of some of the most important packages for students in economics:

  • tidyverse: this is a portmanteau word of tidy and universe. It regroups more than 20 packages for data import (readr), data treatment (dplyr, tidyr…), graphics (ggplot2), etc.

  • rmarkdown: as a student (in economics but in other domains too), you will have to write a Master’s thesis and before that, you will certainly have to do some group projects, sort of small reports. When the data analysis will be done, you will have to write your report and to incorporate the results in the document. That can lead to some mistakes/typos that can lead to big errors, like changes in p-values between the results obtained in the statistical software and the word processing program (whether it is LaTeX or Microsoft Word). To guarantee that you won’t make this sort of mistakes, the most effective way is to write directly in R and to incorporate your code directly in the text. Therefore, in one document, you will have the text of your report and the code needed for the data analysis, all of that ready to be converted in PDF, HTML or Word.

  • shiny: while rmarkdown promotes reproducibility by keeping all in a unique document, shiny goes a step further. Once you have made some data analysis, you can put it in a Shiny application (“app”) that will create a web page in which people interested in your work will be able to reproduce your results but also to check wether they are robust. Indeed, Shiny makes results reactive, meaning that you can change the sample size or the years or anything else you want and the results will automatically adapt. That is very useful in econometrics, where robustness is very important and always checked. There exist thousands and thousands of R packages which cover a large spectrum of the problems and questions you might have, and that is definitely a strength of R.

Reason 4: the community

It is certain that will have some problems with your code, everybody has. The documentation is very complete and allows to solve most of them, but sometimes you may need to seek for help online. It is quite probable that the question you ask yourself has already been asked by somebody else before you and if it has, you will find the answer on StackOverflow or on the RStudio Community.

Reason 5: RStudio is just a pleasure to use

RStudio is the most used IDE for R (Integrated Desktop Environment, not the language but a software that permits to use more easily the language). It has tons of shortcuts and is very customizable. It is a real pleasure to use, and it can be linked to other great services like GitHub (maybe you don’t know what it is so in a few words, it is a service that permits version control i.e. keep a trace of every change in a project, whether it is a report, a package or a web application).

PhD Student in Economics