Australasian Science: Australia's authority on science since 1938

How to Get the Most out of Scientific Data

By Allen Greer

Researchers act as if they own their data, but this is counterproductive to the pursuit of science.

Most people would assume that any data produced and published by publicly funded research would be available to anyone interested in it, either through a public repository or upon reasonable request. This is not the case.

A few societies and journals, such as The Royal Society and Science, mandate open access to all data underlying a published research report. A few journals, such as Nature and its stablemates, mandate open access to certain kinds of data, but not all. Some granting agencies, like the Australian Research Council and the National Health and Medical Research Council, “encourage” the placement of data in publicly accessible repositories, while others like the National Science Foundation “expect” researchers to share their data. The Australian Code for the Responsible Conduct of Research simply states that “research data should be made available for use by other researchers”.

Many societies and journals have no policy at all. Sanctions are never mentioned.

Any policy other than open access is totally inadequate in a digitally based, open and transparency society. There can be no reason for not mandating that all data be deposited in an online database or for the data to be made available without argument upon request. There should be explicit sanctions for non-compliance such as exclusion from further grants and publication.

Any policy based on a cap-in-hand approach to data access ignores human nature and the politics of science. The most likely responses to a request for data are, “Don’t you trust me,” and “Just tell me what the problem is and I’ll look into it”.

If the parties are colleagues of equal status, a request for data may put an edge on the relationship. If they are of unequal status, the junior researcher may not want to risk putting the senior colleague offside, while a senior researcher’s request may panic a junior colleague.

Moreover, while a researcher might feel comfortable making one request for data, they might feel inhibited from making further requests lest they get a “reputation”.

The policy can also foster the attitude that the researcher “owns” the data, rather than just having the right to the first use of it.

When access to data is a matter of a researcher’s discretion, it impedes independent scrutiny of a published claim. All researchers would agree that rigorous scrutiny of a claim is vital for the advancement of knowledge. The corollary to this truism is open and timely access to the relevant data.

The policy also prohibits the further use of data. Without access to the data, all other researchers will be denied the opportunity to use it to advance the original problem or to shed light on another.

An argument against full and timely disclosure might be that the researcher plans to use the data for a subsequent paper and doesn’t want to be pre-empted. But new and more exciting ideas or other duties often subvert good intentions, and the intended paper somehow just never gets written.

If this was a real concern, a case could be put for an embargo of, say, 6 months before the release of the data. Other exemptions may be warranted, but they should be granted on a case-by-case basis.

The default policy should be open access. Some researchers may argue, especially to the public, that the “upon application” policy works perfectly well among a community dedicated to the pursuit of knowledge. However, intense competition for appointment, promotion and grant money makes this belief naïve.

An open access policy could only help reduce the surprisingly high incidence of non-replicability of results and the infrequent cases of outright fraud. Knowing the data are going to be released simultaneously with the publication would focus the mind wonderfully.

Requiring all data to be part of the publication or deposited in an open access database simultaneously with publication is not onerous. Researchers now routinely store their data electronically, and these files can be exported with little time or expense. There are now depositories for all kinds of data.

A final reason for making an article’s complete data immediately available is the uncertainty of life. When a researcher dies, it may be difficult for their family or even close colleagues to be sure what the data relates to and to answer questions about it. Most researchers go to Glory with reams of data, used for only one project but wistfully intended for others, in their file cabinets and on their hard drives. Often, the last “rite” performed to mark a researcher’s passing is when family or colleagues, after much prevarication, finally decide to throw out, delete or entomb the data in an institutional archive.

Allen Greer is a biologist who writes about science and nature. His became interested in this topic when he was denied access to a small data set produced partly under two ARC grants.