The FAIR Guiding Principles for scientific data management and stewardship

Ambra Perugini & Margherita Calderan

Why FAIR?

  • Growing demands for transparency and reproducibility
  • Data is often hard to find, access, interpret, or reuse
  • FAIR principles offer a structured solution

“FAIR = Findable, Accessible, Interoperable, Reusable”

European Research Council (European Commission) EU Flag

Open Research Data and Data Management Plans

  • The ERC embraces the FAIR data principles: research data should be findable, accessible, interoperable and re-usable
  • The ERC expects data underlying publications by ERC grantees to adhere to the FAIR principles
  • The article by Wilkinson et al. on “The FAIR Guiding Principles for scientific data management and stewardship” provides a detailed discussion of the FAIR principles
  • Grantees should demonstrate that their approach to data management planning is in line with the FAIR principles by providing adequate information on these five topics [dataset description; standards and metadata; name and persistent identifier for the datasets; curation and preservation methodology; data sharing methodology]
  • Guidelines on FAIR Data Management in Horizon 2020

Overview of FAIR

  • A set of guiding principles
  • Applicable across disciplines
  • Aiming at machine-readability and referring to data stewardship

Findable

Key ideas:

  • Assign globally unique and persistent identifiers (PIDs) 1
  • Metadata should be rich and indexed in searchable resources

Example in practice:

  • Use DOIs for datasets
  • Tag datasets with relevant keywords and metadata



PID Type Used For Example
DOI (Digital Object Identifier) Publications, datasets, software 10.5281/zenodo.1234567
ORCID (Open Researcher and Contributor ID) Researcher identities https://orcid.org/0000-0002-1825-0097
ROR (Research Organization Registry) Institutions/organizations https://ror.org/03yrm5c26
ARK (Archival Resource Key) Digital archives, libraries ark:/12025/654xz321
SWHID (SoftWare Hash IDentifiers) Software source code artifacts, source code files, source trees, commits Example not provided

Accessible

Key ideas:

  • Use standardized protocols
  • Metadata remains accessible even if data is restricted

Example in practice:

  • Publish metadata openly even if data is sensitive (e.g., clinical trials)
  • Use data repositories (e.g., OSF, Zenodo)

Interoperable

Key ideas:

  • Use standardized vocabularies and formats
  • Data should interoperate with tools and other datasets

Example in practice:

  • Store data in open formats (CSV, JSON)
  • Use standardized variable naming

Reusable

Key ideas:

  • Clear data usage licenses
  • Rich metadata describing context

Example in practice:

  • Add a Creative Commons license
  • Include documentation, codebooks, analysis scripts

Understanding Data Licenses

Why it Matters:

  • Licenses clarify what others can legally do with your data or code

  • No license? No one is allowed to reuse your work—even for research

Common License Types


Creative Commons (CC) — for datasets, docs, media:

  • CC0 (CC0-1.0) — No rights reserved, public domain

  • CC-BY (CC-BY-4.0) — Attribution required

Licenses related to softwares

  • MIT — Simple, permissive, attribution only

  • Apache-2.0 — MIT + patent rights

  • GPL-3.0 — Must share derivative code with same license

  • Unlicense — Public domain equivalent

Most GitHub projects use: MIT, Apache-2.0, or GPL-3.0

Why Use FAIR Principles in Psychology?


  • Improves replicability and meta-analyses
  • Facilitates collaboration
  • Encourages better documentation and data curation

First steps to become more FAIR

  • Use data repositories (e.g., OSF)

  • Adopt metadata standards (e.g., JSON)

  • Publish protocols and code

  • Provide clear licenses

Tools and Resources

How are we doing?

The daily costs of workaholism

  • Findable: DOI 10.17605/OSF.IO/AWBXJ

  • Accessible: https://osf.io/awbxj/

  • Interoperable: .csv file

  • Reusable: CC-By Attribution 4.0 International, datadictionary, readme file, scripts

(half on github and half on osf?)

Domain-level differences in skills and traits change goals

  • Findable: DOI 10.17605/OSF.IO/SWDC2

  • Accessible: https://osf.io/swdc2/

  • Interoperable: .xlsx instead of .csv (also for codebook)

  • Reusable: CC-By Attribution 4.0 International, codebook (ie. datadictionary), readme file, scripts

Predictions under sleep restriction

  • Findable: DOI 10.17605/OSF.IO/6FXMH

  • Accessible: https://osf.io/6fxmh/

  • Interoperable: both .csv and .xlsx

  • Reusable: CC-By Attribution 4.0 International, renv (helps you create reproducible environments for your R projects), but absence of datadictionary and readme

Auditory and cognitive performance in elderly musicians and nonmusicians

Dyslexia Polygenic Scores Show Heightened Prediction of Verbal Working Memory and Arithmetic

The full GWAS summary statistics for the 23andMe discovery data set are available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Datasets will be made available at no cost for academic use. Please visit https://research.23andme.com/collaborate/#dataset-access/ for more information and to apply to access the data.

Best Practice


Manifesto Comunità Italiana Data Steward