Introduction to Data Curation for Reproducibility: Key Points

Overview of Scientific Reproducibility

The term ‘reproducibility’ has been used in different ways in different disciplinary contexts.
Computational reproducibility, which is the focus of this and follow-up lessons, refers to the duplication of reported findings by re-executing the analysis with the data and code used by the original author to generate their findings.
Scientific reproducibility is not a novel concept, but one that has been reiterated by prominent scholars throughout history as a cornerstone of scientific practice.
Failed attempts to reproduce published scientific research are considered by some to be reflective of an ongoing crisis in scientific integrity.
Stakeholders have taken note of the importance of reproducibility and thus have issued policies requiring researchers to share their research artifacts with the scientific community.

Reproducible research requires access to a “research compendium” that contains all of the artifacts and documentation necessary to repeat the steps of the analytical workflow to produce expected results.
Curating for reproducibility goes beyond curating data; it applies curation actions to all of the research artifacts within the research compendium to ensure it is independently understandable for informed reuse.
Despite calls for reproducible research, challenges exist that can make it difficult to achieve this standard.

Data savvy librarians and other information professionals play an important role in supporting and promoting scientific reproducibility.
While LIS professionals already engage in many practices that support reproducibility, they may need to skill up to perform some critical curation for reproducibility tasks.
There are various models of data curation for implementation services. It is important to think about what a service might look like at your organization so that you can articulate your ideas effectively when given the opportunity.