Open Science and Data Sharing: What Research and Publication Professionals Need to Know

The open science movement is poised to become a momentous industry shift in medical publication. The National Institutes of Health, one of the largest U.S. medical research funding bodies, recently implemented a policy requiring all applications to include a formal data management plan, with resultant data being publicly available. This policy, described as “seismic” by Nature, has already caused a ripple effect of similar policy shifts throughout the field. Here, we’ll discuss the meteoric rise of open science and data sharing in recent years, with a focus on what anyone working with medical publications needs to know.

What actually is ‘open’ science?

There have been several ‘open’ shifts in medical publications over the last few decades. For most, the first identifiable trend was open access publication, which was recently discussed on Cabells’ The Source blog. Since then, many other aspects of medical research have, so to say, opened, including open peer reviewopen source software, and open altmetrics, to name a few. In the context of these initiatives, ‘open science’ has become an umbrella term referring to movements toward public exchange of research information and data.

Who supports open science?

Open science has gained proponents from many sectors of the industry. Most notably, as we previously mentioned, the National Institutes of Health recently mandated that all applications must have a data management plan that outlines how the raw data will be made publicly accessible. Supporting this decision were several attentional U.S. government statements and policies, including the White House Office of Science Technology & Policy’s new requirement for “free, immediate, and equitable access to federally funded research.”

This movement kickstarted a wave of open science policies. As of September of 2022, it is estimated that approximately half of the 110 largest health research funders either recommend or mandate data sharing. Many academic journal publishers have also committed to this initiative by strengthening their data sharing policies. Though most high-impact journals recommend data deposition in publicly available databases, publishers like PLOS have taken the extra step to require data deposition as a condition of publication. Data sharing requirements are especially common for clinical trial reports.

Key concepts for research and administrative professionals

The open science initiative is, in many ways, still in its fledgling stages. While it’s unclear what the long-term implications will be for this movement, there are a few key concepts that anyone working in the research or publication industries should be aware of.

Data respositories

Data repositories are one of the largest-growing aspects of the open science movement. Essentially, these platforms serve as comprehensive databases to effectively store and organize the massive amounts of raw data generated by today’s medical research industry. Repositories are typically restricted to a specific niche, either by topic or by file type (eg, image-only repositories). Deposited data are assigned a digital object identifier (DOI) or alternative persistent identifier and are typically published under a Creative Commons Attribution (CC BY) license.

Standardly, depositing data in a repository is free, with no upfront or recurring charges for data management. It’s possible that this may change in the future, especially as repositories bear the weight of years upon years’ worth of massive datasets. Some institutions have also begun establishing their own internal repositories intended for use by their research faculty.

When selecting a repository, there are several key factors to consider. First, be sure to contact your institution’s library or research administration to check whether there are any explicitly recommended or banned repositories. Second, determine what kinds of data you plan to deposit. Many repositories only process specific types of data or data files. Third, identify any discipline-specific repositories that are related to your research focus. There are several reputable websites that evaluate and recommend repositories across several fields, such as PLOS’s Recommended Repositories page or Harvard’s Data Repositories resource. Fourth, check the website of potential repositories you identified. Some repositories may have particular guidelines regarding who can share data and when data can be deposited among other considerations, which may guide your selection process.

Data availability statements

As the name implies, data availability statements are short, one-to-two sentence statements published with manuscripts that describe how data can be made available to interested readers. Heavily encouraged since late 2010’s, data sharing statements have become very common in medical research publications, especially among high impact journals. There are several templates available for these statements based on the particular circumstances of your research project; Taylor & Francis have an excellent table detailing statement types and templates/examples.

I want to submit a manuscript to a journal. How do I find their data requirements?

Most journals will include their data sharing requirements on the ‘Instructions for Authors’ page (this might also be titled ‘Guide for Authors’, ‘Submission Guidelines’, etc.). At this point, if there’s no information about data sharing mandates on this page, it can generally be assumed that the journal does not have any data sharing–related publication conditions. However, it’s always a good idea to reach out to the editorial board if you have any questions. Contact information should be available either on the ‘Instructions for Authors’ page, the ‘About the Journal’ page, or a dedicated ‘Contact Us’ page.