5 Research simulation
Lucas Frérot edited this page 2024-12-18 12:06:23 +01:00

Principles of code and data management

This page lists mandatory rules to be followed when working with code and data.

Working with code

The objectives of these rules is to ensure:

  1. Code developed in the lab is preserved.
  2. Code can be easily shared in the lab and outside.
  3. Code can be easily reused by other people (this includes future you!).
  4. Simulations are reproducible.

Tenets

  1. Every simulation and post-processing code must be versioned in a Git (or similar) repository on git.dalembert.umpc.fr (or similar). Commit messages must be meaningful.
  2. Every repository must have a README file explaining what is the purpose of the code, what are the dependencies, and how to run the code, what the code produces, how to run tests.
  3. Every library repository must contain tests. Running the tests should be done with a single command.
  4. Every library must have a documented API, i.e. each function of the API has a basic description of what it does, along a description of inputs and outputs.
  5. Every library must have usage examples.
  6. Code versions used in a publication must be saved on SoftwareHeritage and the resulting SWHID cited in the publication.

Working with data

The objective of these rules is to ensure:

  1. Data produced in the lab is preserved.
  2. Data can be easily shared in the lab and outside.
  3. Data can be easily reused by other people (this includes future you!).
  4. Data origin can be traced.

Tenets

  1. Your $HOME must have scheduled daily backups on an external drive / remote server. Periodically make sure backups are working and can be recovered.
  2. Simulation data for a workflow / pipeline / paper is grouped in a dataset. Datasets must be documented with a README file explaining what is the data, how it was generated and how it can be used.
  3. Datasets must be (whenever possible) published to Zenodo at the time of submission, and the dataset DOI cited in the article.
  4. Open-source file formats must be used to store data and metadata. Self-describing file formats are preferred.
  5. All datasets must be uploaded to ??? when leaving the lab.

Next: Computing environment, Previous: Home, Up: Home