Some things don’t lend themselves to replication: the pattern of a snowflake, fingerprints, Wilt Chamberlain’s 100-point performance against the New York Knicks. To that list we can now add “research results from top economics journals.”
In a new paper, Andrew Chang, an economist at the Federal Reserve and Phillip Li, an economist with the Office of the Comptroller of the Currency, describe their attempt to replicate 67 papers from 13 well-regarded economics journals. They selected papers that used U.S. data, and that sought to establish an empirical result that involved gross domestic product, or GDP, thanks to its status as a commonly used indicator of macroeconomic conditions.
The pair was methodical and unfailingly polite: If no data or code were provided in journal archives on the authors’ personal websites, they contacted the corresponding author, waited a week for a reply, and continued to contact the other listed authors until they received a response. They gave authors at least one month to send data or code (“code” refers to programming language that tells a computer how to run mathematical models). Six of the papers had to be dropped because the relevant data sets were proprietary, and two because the authors did not have the necessary software.
Their results? Just under half, 29 out of the remaining 59, of the papers could be qualitatively replicated (that is to say, their general findings held up, even if the authors did not arrive at the exact same quantitative result). For the other half whose results could not be replicated, the most common reason was “missing public data or code.”
These findings are disturbing not least because they, er, replicate earlier similar findings by other economists, although those tended to focus on papers from a single journal. In 2004, then-editor of the American Economic Review Ben Bernanke reaffirmed the AER’s policy after a paper found that half of the authors whose quantitative work appeared in a 1999 issue did not furnish the data or code for their published papers. Mr. Bernanke, who later became chairman of the Federal Reserve, made publication conditional on submission of data and code. Many other economics journals followed suit, although Mssrs. Chang and Li found that these editorial policies were less-than-enthusiastically enforced.
Pinelopi Goldberg, a Yale economics professor and the current editor in chief of the AER, said the journal has an editorial assistant whose sole job it is to replicate the findings of submitted papers before they are published. She added that she had a hard time evaluating the paper by Mssrs. Chang and Li because they did not specify why their replications failed, beyond broad reasons such as “missing data.” The AER accepts work done using proprietary data because otherwise the journal would lose access to “really interesting and policy-relevant” research, Ms. Goldberg said.
She added that many researchers work with multiple data sets, and making some of them public could violate the terms of a researcher’s contract or privacy protections (for tax data provided by the U.S. government, for example). One main aim of this undertaking, wrote Mssrs. Chang and Li, was “to bring economics more in line with the natural sciences by embracing the scientific method’s power to verify published results.” But of course, science itself is hardly bulletproof. While academic economics has had its share of highly public corrections, such as an error in the widely cited paper by Carmen Reinhart and Kenneth Rogoff that linked high debt burdens with GDP growth, this inability to replicate is perhaps more commonly associated with more traditionally “scientific” areas like drug trials and psychology.
Ted Miguel, a professor of economics at the University of California, Berkeley, directs the Berkeley Initiative for Transparency in the Social Sciences, which was borne of conversations between researchers working in disciplines ranging from economics to political to biostatistics, all of whom were struggling with similar issues of data transparency and academic integrity. He said that papers such as the one by Mssrs. Chang and Li served not just to point out errors made in the past, but to lay a foundation for better research.
“At this point, everybody doing with work with data and economics has an expectation that their data is very likely to get posted online, that someone is going to scrutinize it, that someone is going to try to replicate it. Knowing that’s the case is going to make people document their data better and be much more careful,” he said.
But H.D. Vinod, an economics professor at Fordham University whose 2003 paper with Drexel University professor Bruce McCullough led Mr. Bernanke to clarify the AER’s submission policy, noted that that sense of caution could be outweighed by the sheer amount of work it takes to clean up data files in order to make them reproducible.
“It’s human laziness,” he said. “There’s all this work involved in getting the data together.”
His co-author on the 2003 paper, Bruce McCullough, said he thought the authors’ definition of what counted as replication–achieving the same qualitative, as opposed to quantitative, results–was far too generous. If a paper’s conclusions are correct, he argues, one should be able to arrive at the same numbers using the same data.
“What these journals produce is not science,” he said. “People should treat the numerical results as if they were produced by a random number generator.”
Mssrs. Chang and Li recommend that publication of an economics paper should be contingent on the submission of data files and code. They also advise making raw data files publicly available, so that interested readers can use it to explore their own research questions. Of course, these efforts are no guarantee of success—16 of the papers that couldn’t be replicated came from journals that already had mandatory data and code policies.
Get WSJ economic analysis delivered to your inbox: