Data Archiving Errors Limit Scientific Scrutiny

Researchers have reported that more than half of the public data sets provided with scientific papers are incomplete, preventing reproducibility tests and follow-up studies. However the study, published in PLOS Biology (, found that only slight improvements to research practices could make a big difference.

Dr Dominique Roche of The Australian National University explained that many peer-reviewed biological journals now require authors to publicly archive their data when a paper is published. Making research data available improves the transparency and reproducibility of research results and avoids unnecessary duplication of data collection. “Unfortunately, our study suggests that many public data sets may be unusable,” he said.

The survey of 100 papers published in leading journals in ecology and evolution found that more than 50% of the data sets associated with these studies were incomplete due to missing data or essential information needed to interpret the data.

While making the data public is extremely useful, Roche said that the process is often compromised by simple errors made by researchers. “Many scientists, including myself, lack proper training in public data archiving and open science practices. These are new practices for most researchers,” he said.

“Biologists often deal with large and complex data sets that require good organisational skills to present in ways that others can use them. The archived data sets can be just as important as the published paper.”

But Roche said “many of the problems we encountered in our study can be fixed relatively quickly and easily”. These include providing basic but complete data descriptors, using standard file formats such as comma-separated values (CSV) rather than PDFs or Excel files, and archiving data sets in an established, searchable online database instead of as an appendix to the research paper.

Co-author Prof Loeske Kruuk of ANU said the paper recommended rewarding researchers that work transparently and collaboratively. “Journals and databases don’t have the resources to check whether archived data sets are adequate,” she said. “The quality of the archived data sets relies on researchers’ goodwill.”