Principles of code and data management

This page lists mandatory rules to be followed when working with code and data.

Working with code

The objective of these rules is to ensure:

Every simulation and post-processing code must be versioned in a Git (or similar) repository on git.dalembert.umpc.fr (or similar). Commit messages must be meaningful.
Every repository must have a README file explaining what is the purpose of the code, what are the dependencies, and how to run the code, what the code produces, how to run tests.
Every library repository must contain tests. Running the tests should be done with a single command.
Every library must have a documented API, i.e. each function of the API has a basic description of what it does, along a description of inputs and outputs.
Every library must have usage examples.
Code versions used in a publication must be saved on SoftwareHeritage and the resulting SWHID cited in the publication.

The objective of these rules is to ensure:

Your $HOME must have scheduled daily backups on an external drive / remote server. Periodically make sure backups are working and can be recovered.
Simulation data for a workflow / pipeline / paper is grouped in a dataset. Datasets must be documented with a README file explaining what is the data, how it was generated and how it can be used.
Datasets must be (whenever possible) published to Zenodo at the time of submission, and the dataset DOI cited in the article.
Open-source file formats must be used to store data and metadata. Self-describing file formats are preferred.
All datasets must be uploaded to ??? when leaving the lab.