Abstract
Efforts to make research results open and reproducible are increasingly
reflected by journal policies encouraging or mandating authors to provide data
availability statements. As a consequence of this, there has been a strong
uptake of data availability statements in recent literature. Nevertheless, it
is still unclear what proportion of these statements actually contain
well-formed links to data, for example via a URL or permanent identifier, and
if there is an added value in providing such links. We consider 531,889 journal
articles published by PLOS and BMC, develop an automatic system for labelling
their data availability statements according to four categories based on their
content and the type of data availability they display, and finally analyze the
citation advantage of different statement categories via regression. We find
that, following mandated publisher policies, data availability statements
become very common. In 2018 93.7% of 21,793 PLOS articles and 88.2% of 31,956
BMC articles had data availability statements. Data availability statements
containing a link to data in a repository -- rather than being available on
request or included as supporting information files -- are a fraction of the
total. In 2017 and 2018, 20.8% of PLOS publications and 12.2% of BMC
publications provided DAS containing a link to data in a repository. We also
find an association between articles that include statements that link to data
in a repository and up to 25.36% ($\pm$~1.07%) higher citation impact on
average, using a citation prediction model. We discuss the potential
implications of these results for authors (researchers) and journal publishers
who make the effort of sharing their data in repositories. All our data and
code are made available in order to reproduce and extend our results.
Description
[1907.02565] The citation advantage of linking publications to research data
Links and resources
Tags
community