diff --git a/Workflow-management.md b/Workflow-management.md index 31a8241..61c78be 100644 --- a/Workflow-management.md +++ b/Workflow-management.md @@ -65,8 +65,22 @@ rule sort_group_names: "only_users.txt", # only contains the user names shell: "sort < {input[0]} | tee {output[0]} | cut -d ' ' -f 2 > {output[1]}" + +rule filter_by_letter: + input: + rules.list_groups_with_users.output[0] + output: + "start_with_letter_{letter}.txt", # only groups starting with a letter + shell: + "grep '^{wildcards.letter}' < {input} > {output}" ``` +> This example filters the file /etc/group (which contains all groups on a linux +> system) and writes to three files. The first has the group name and users +> (created by the first rule). Then the second rule creates a sorted file and a +> file with the user names only. This rather pointless application shows that it +> is possible to chain rule inputs and outputs, and to have multiple outputs. + Executing the workflow with the command `snakemake only_users.txt` (to tell it to generate the `only_users.txt` file) should execute both rules, with an output similar to: @@ -122,13 +136,38 @@ outputs (which are numbered from `0` to `N` by default, and can be named). The `shell` directive specifies that we want to run a shell command. This is the most flexible option. Alternatively one can use the `run` directive and write inline python code directly in the `Snakefile`, the `script` directive, which -specifies the name of a Python (or another language) script to be run (Snakemake -creates a context for this script which allows it to access the input and output -objects), or finally the `notebook` directive, similar to the `script` -directive, for which Snakemake allows interactive execution (useful for -postprocessing/data exploration). +specifies the name of a Python (or another language) +[script](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#external-scripts) +to be run (Snakemake creates a context for this script which allows it to access +the input and output objects), or finally the [`notebook` +directive](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#jupyter-notebook-integration), +similar to the `script` directive, for which Snakemake allows interactive +execution (useful for postprocessing/data exploration). Reading the [documentation](https://snakemake.readthedocs.io/en/stable/index.html) is highly recommended. Although the examples are often biology oriented, the features they demonstrate are easily transposed to a mechanics environment. + +Here is a list of useful features: + +- [Wildcards](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#wildcards) + allow to specify parameter values from file names. In the example above, + running `snakemake start_with_letter_m.txt` will replace the + `{wildcards.letter}` in the `shell` directive by `m`. This is very useful to + distinguish output files based on parameter values. Multiple wildcards can be + used in the same rule. +- [Expansion](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#the-expand-function) + allows to specify a range of values for a wildcard. This is useful to explore + a parametric space, or + [aggregate](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#defining-scatter-gather-processes) + the data of several values for one wildcard. +- [Rule + parameters](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules) + allow one to specify additional parameters (i.e. non-file inputs) to rules. +- [Rule + dependencies](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#rule-dependencies) + allow using the output of a rule as input to another without having to specify + the name. +- [Parameter space + exploration](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#parameter-space-exploration).