Add Workflow management
parent
702aa177be
commit
62b09bbebf
|
@ -0,0 +1,8 @@
|
|||
Scientific simulations are often complex beasts: each step of a simulation requires some input data, a code to run on this input, and produces some output. Steps together form a complex, intricate workflow that can be difficult to deploy, even harder to maintain, and downright impossible to be reproduced by a third party.
|
||||
|
||||
This is why making this process easier is often a good time investment: it often requires thinking logically about a workflow, to split it into simple steps linked only by input and output data. This alone helps structure a workflow so that it's easier to add, remove or change simulation steps. Workflow management software helps to define a workflow graph and formalizes this process. It also allows tracking data dependency, re-run steps that require running when input data changes, and allows the configuration of parameter spaces.
|
||||
|
||||
There are a good number of workflow management programs designed for scientific computation. Some run as a complex server process that contain a live description of a workflow. In my experience, deploying these systems is not worth the time investment. Instead, I recommend using a tool called [Snakemake](https://snakemake.github.io/), which runs in Python and is greatly inspired from `make`, a very established build system. While it has its own faults, I have found it quite useful to run complex simulations.
|
||||
|
||||
# Snakemake
|
||||
A workflow in Snakemake is defined in a text file called `Snakefile`, the equivalent of Make's `Makefile`. This file defines *rules*, which are a basic unit defining a simulation step with three basic features: input, how to run the code, output. A rule basically explains how a given output is generated. Each output can be used as input to another rule, thereby creating a dependency graph (also called direct acyclic graph). One can then request the creating of a specific output, and the system will know which rules to execute to get to this output.
|
Loading…
Reference in New Issue