Update Workflow management

Lucas FRÉROT 2024-12-15 15:25:34 +00:00
parent d8056fe707
commit d733d8aa09
1 changed files with 27 additions and 11 deletions

@ -32,17 +32,6 @@ acyclic graph*. This allows two things:
These two features together provide a solid step towards reproducible simulation These two features together provide a solid step towards reproducible simulation
work. work.
# GNU Make
Make is a program specifically designed to be a build system, i.e. a tool that
coordinates the compilation of a program's source code so that an executable or
library can be built. Each file of the build process is called a *target* and is
the output of some rule. Although it's primary purpose is creating build files,
it can easily be made to manage outputs of simulations. While it has the
advantage of being installed on virtually every Linux machine used for
scientific work, it lacks some features (most notably integration with queue
systems) which only make it practical for small cases (although I am sure some
shortcomings could be solved with a strong knowledge of Make).
# Snakemake # Snakemake
Snakemake is a tool written in Python to managed rule-based workflows. The Snakemake is a tool written in Python to managed rule-based workflows. The
workflow definition is a rather simple text file (usually a `Snakefile`), which workflow definition is a rather simple text file (usually a `Snakefile`), which
@ -171,3 +160,30 @@ Here is a list of useful features:
the name. the name.
- [Parameter space - [Parameter space
exploration](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#parameter-space-exploration). exploration](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#parameter-space-exploration).
# GNU Make
Make is a program specifically designed to be a build system, i.e. a tool that
coordinates the compilation of a program's source code so that an executable or
library can be built. Each file of the build process is called a *target* and is
the output of some rule. Although it's primary purpose is creating build files,
it can easily be made to manage outputs of simulations. While it has the
advantage of being installed on virtually every Linux machine used for
scientific work, it lacks some features (most notably integration with queue
systems) which only make it practical for small cases (although I am sure some
shortcomings could be solved with a strong knowledge of Make).
For reference, here is a `Makefile` defining the same rules as the Snakemake example above.
```makefile
# One input, one output
groups_with_users.txt: /etc/group
cat $< | awk -F ':' '$$4 != "" { print $$1,$$4; }' > $@
# Multiple outputs with grouped targets
sorted_groups.txt only_users.txt &: groups_with_users.txt
sort < $< | tee sorted_groups.txt | cut -d ' ' -f 2 > only_users.txt
# Rule with pattern
start_with_letter_%.txt: groups_with_users.txt
grep '^$*' < $< > $@
```