The FAIR Guiding Principles for scientific data management and stewardship

Ambra Perugini & Margherita Calderan

Why FAIR?

Growing demands for transparency and reproducibility
Data is often hard to find, access, interpret, or reuse
FAIR principles offer a structured solution

“FAIR = Findable, Accessible, Interoperable, Reusable”

Article: The FAIR Guiding Principles for scientific data management and stewardship

European Research Council (European Commission) EU Flag

Open Research Data and Data Management Plans

The ERC embraces the FAIR data principles: research data should be findable, accessible, interoperable and re-usable
The ERC expects data underlying publications by ERC grantees to adhere to the FAIR principles
The article by Wilkinson et al. on “The FAIR Guiding Principles for scientific data management and stewardship” provides a detailed discussion of the FAIR principles
Grantees should demonstrate that their approach to data management planning is in line with the FAIR principles by providing adequate information on these five topics [dataset description; standards and metadata; name and persistent identifier for the datasets; curation and preservation methodology; data sharing methodology]
Guidelines on FAIR Data Management in Horizon 2020

Overview of FAIR

A set of guiding principles
Applicable across disciplines
Aiming at machine-readability and referring to data stewardship

Findable

Key ideas:

Assign globally unique and persistent identifiers (PIDs) ¹
Metadata should be rich and indexed in searchable resources

Example in practice:

Use DOIs for datasets
Tag datasets with relevant keywords and metadata

PID Type	Used For	Example
DOI (Digital Object Identifier)	Publications, datasets, software	`10.5281/zenodo.1234567`
ORCID (Open Researcher and Contributor ID)	Researcher identities	`https://orcid.org/0000-0002-1825-0097`
ROR (Research Organization Registry)	Institutions/organizations	`https://ror.org/03yrm5c26`
ARK (Archival Resource Key)	Digital archives, libraries	`ark:/12025/654xz321`
SWHID (SoftWare Hash IDentifiers)	Software source code artifacts, source code files, source trees, commits	Example not provided

Accessible

Key ideas:

Use standardized protocols
Metadata remains accessible even if data is restricted

Example in practice:

Publish metadata openly even if data is sensitive (e.g., clinical trials)
Use data repositories (e.g., OSF, Zenodo)

Interoperable

Key ideas:

Use standardized vocabularies and formats
Data should interoperate with tools and other datasets

Example in practice:

Store data in open formats (CSV, JSON)
Use standardized variable naming

Reusable

Key ideas:

Clear data usage licenses
Rich metadata describing context

Example in practice:

Add a Creative Commons license
Include documentation, codebooks, analysis scripts

Understanding Data Licenses

Why it Matters:

Licenses clarify what others can legally do with your data or code
No license? No one is allowed to reuse your work—even for research

Common License Types

Creative Commons (CC) — for datasets, docs, media:

CC0 (CC0-1.0) — No rights reserved, public domain
CC-BY (CC-BY-4.0) — Attribution required

Licenses related to softwares

MIT — Simple, permissive, attribution only
Apache-2.0 — MIT + patent rights
GPL-3.0 — Must share derivative code with same license
Unlicense — Public domain equivalent

Most GitHub projects use: MIT, Apache-2.0, or GPL-3.0

Why Use FAIR Principles in Psychology?

Improves replicability and meta-analyses
Facilitates collaboration
Encourages better documentation and data curation

First steps to become more FAIR

Use data repositories (e.g., OSF)
Adopt metadata standards (e.g., JSON)
Publish protocols and code
Provide clear licenses

Tools and Resources

How are we doing?

The daily costs of workaholism

Findable: DOI 10.17605/OSF.IO/AWBXJ
Accessible: https://osf.io/awbxj/
Interoperable: .csv file
Reusable: CC-By Attribution 4.0 International, datadictionary, readme file, scripts

(half on github and half on osf?)

Domain-level differences in skills and traits change goals

Findable: DOI 10.17605/OSF.IO/SWDC2
Accessible: https://osf.io/swdc2/
Interoperable: .xlsx instead of .csv (also for codebook)
Reusable: CC-By Attribution 4.0 International, codebook (ie. datadictionary), readme file, scripts

Predictions under sleep restriction

Findable: DOI 10.17605/OSF.IO/6FXMH
Accessible: https://osf.io/6fxmh/
Interoperable: both .csv and .xlsx
Reusable: CC-By Attribution 4.0 International, renv (helps you create reproducible environments for your R projects), but absence of datadictionary and readme

Auditory and cognitive performance in elderly musicians and nonmusicians

Findable: https://doi.org/10.6084/m9.figshare.5402527.v1, Keywords
Accessible: figshare
Interoperable: .xlsx instead of .csv
Reusable: CC-By Attribution 4.0 International, absence of ReadMe and datadictionary

Dyslexia Polygenic Scores Show Heightened Prediction of Verbal Working Memory and Arithmetic

The full GWAS summary statistics for the 23andMe discovery data set are available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Datasets will be made available at no cost for academic use. Please visit https://research.23andme.com/collaborate/#dataset-access/ for more information and to apply to access the data.

Best Practice

Manifesto Comunità Italiana Data Steward