What should be included in a Publisher Data Policy?

Funders, such as the European Commission, the Wellcome Trust, the National Institute of Health, Research Councils UK and others are now asking researchers to include Data Management or Sharing Plans as part of their grant that demonstrate how grantees are providing public access to the data associated with their project. Data availability and the ability to mine both text and data is also a prominent part of the May 2016 recommendations from the European Council in response to a draft proposal by the Dutch Presidency. Although many see the push for data access as a response to share biomedical information, it is also a priority for the humanities (e.g. DARIAH in the EU and some Federal agencies in the US).

In parallel, many publishers are taking action to increase access to the data associated with their published articles. Such action ranges from a light-touch approach to more mandatory requirements, including:
• An expectation that authors share their data but without a specific workflow to implement or enforce the policy.
• A mandatory statement by authors stating a commitment to share or provide reasons for not sharing (e.g., Annals of Internal Medicine, April 2016 Section 3. Data Sharing and Reproducible research).
• An expectation that authors share all their data and a workflow that mandates and enforces sharing of some specific datasets at publication (e.g., by Nature journals).
• An expectation that authors share all the data underlying the findings of a paper and a mandatory Data Accessibility Statement within the published paper about how that data is made available (e.g., by PLOS).
• A mandate that authors share their data on publication in an appropriate public archive and include a ‘Data Accessibility’ section in their paper plus an editorial workflow to ensure that all the relevant data are included in each paper (e.g., Molecular Ecology).

The general expectation where publishers have a data policy is that relevant data are made available in trusted subject-specific or institutional repositories. Some journals provide hyperlinks to a list of some of the most well-known (e.g., PLOS Recommended Data Repositories or Nature-Springer’s Scientific Data). Many publishers have also integrated their submission workflow with digital data repositories, such as Dryad or Figshare, to enable authors to share their data more easily on publication and to ensure authors can receive credit for the output through a data-specific persistent identifier, such as a DOI (e.g. provided by DataCite) or URI (such as that provided by the EMBL-EBI’s MIRIAM Registry). Further integration with ORCID allows credit to be attributed to specific authors by providing a link between the ORCID identifier for the author and the dataset in question. Other publishers have also launched journals specifically to host data, such as Biomed Central’s GigaScience.

While many journals have an expectation that authors share all the relevant data there are important exceptions. These include, for example, information that may jeopardize patient privacy or that reveals the location of rare or endangered species. The ability to anonymise clinical data in particular is not straightforward (1) and there are several initiatives looking at the challenges involved (e.g. by the International Committee of Medical Journal Editors [ICMJE] in relation to clinical trial data, and the Research Data Alliance [RDA] about data security and trust more generally). In addition, projects such as AllTrials and Vivli (2) aim to foster the reporting of clinical trials that have traditionally not been published and help standardise the data to enable its reuse when published.

It is important that publishers do not act in isolation when creating data policies but that they work together with funders, institutions and researchers to create community standards, such as that for clinical trial data. Aligning policies across different stakeholders will also help increase the transparency of data and its reporting, and ensure good data stewardship. A crucial consideration in policy creation is making the implementation practical while reducing the burden on researchers as far as possible.

How researchers share and reuse that data are part of the ongoing transition from Open Access to Open Science and dependent on changing technology and new platforms (e.g. the European Science Cloud). There are currently few guides for publishers about the role they should take in making data accessible. In 2015, Strasser and Lin (3), however, outlined the following general recommendations for publishers:

Establish and enforce a mandatory data availability policy.
Contribute to establishing community standards for data management and sharing.
Contribute to establishing community standards for data preservation in trusted repositories.
Provide formal channels to share data.
Work with repositories to streamline data submission.
Require appropriate citation to all data associated with a publication—both produced and used.
Develop and report indicators that will support data as a first-class scholarly output.
Incentivize data sharing by promoting the value of data sharing.

(1) Trial Data, Committee on Strategies for Responsible Sharing of Clinical, Board on Health Sciences Policy, and Institute of Medicine. Concepts and Methods for De-Identifying Clinical Trial Data. National Academies Press (US), 2015. http://www.ncbi.nlm.nih.gov/books/NBK285994/
(2) “Open Medicine.” Nature News 533, no. 7603 (May 19, 2016): 292. doi:10.1038/533292a.
(3) Lin J, Strasser C (2014) Recommendations for the Role of Publishers in Access to Data. PLoS Biol 12(10): e1001975. doi:10.1371/journal.pbio.1001975