Skip to content

UTIL researchers: Stata for reproducible research

Enrique Pinzon, Associate Director Econometrics @ Stata Blog:

I care about reproducible research. Anyone who has ever been a research assistant or tried to follow the path set by other researchers also cares. Sometimes, reproducing others’ results is a frustrating task; sometimes, it is outright impossible. Yet sometimes, it is satisfyingly simple. In my experience, reproducing results is easy when it involves a Stata do-file. I believe this is true even beyond my personal bias (I work for Stata and used the software regularly before that). A recent article published by the American Economic Association (AEA), Vilhuber, Turrito, and Welch (2020), shows that Stata is the preferred package among economists, and I believe reproducibility is a big reason why.

The AEA established reproducibility guidelines in 2008. Recently, it updated its guidelines to require authors not only to make data and analysis available but also to provide the code used to clean the data and the raw data, whenever feasible. Now, the editorial process includes an AEA data editor who verifies that the information provided by the authors is sufficient to replicate the results in the paper.

Vilhuber, Turrito, and Welch (2020) show that since the inception of the policy, Stata has been used in 73% of the supplements provided by the authors. The usage has been increasing over the span of the policy. The graph below shows the percentage of data supplements in which different software packages are used. These percentages may add up to more than 100% because content from more than one software package may be submitted in each supplement.

This is not a surprise to anyone who has used Stata. I believe one important reason researchers choose Stata is that reproducing your results is easy. Case in point is the graph above. To get the data and reproduce the graph, you just need to run the do-file, which I discuss in Appendix I. If you want to create a reproducible report, see my discussion in Appendix II.

More in Stata Blog entry