added list of useful features

Lucas Frérot 2024-12-11 15:37:12 +01:00
parent ce5cdc9871
commit 7832746e90
No known key found for this signature in database
GPG Key ID: 03B54A50E3FBA7E8
1 changed files with 44 additions and 5 deletions

@ -65,8 +65,22 @@ rule sort_group_names:
"only_users.txt", # only contains the user names
shell:
"sort < {input[0]} | tee {output[0]} | cut -d ' ' -f 2 > {output[1]}"
rule filter_by_letter:
input:
rules.list_groups_with_users.output[0]
output:
"start_with_letter_{letter}.txt", # only groups starting with a letter
shell:
"grep '^{wildcards.letter}' < {input} > {output}"
```
> This example filters the file /etc/group (which contains all groups on a linux
> system) and writes to three files. The first has the group name and users
> (created by the first rule). Then the second rule creates a sorted file and a
> file with the user names only. This rather pointless application shows that it
> is possible to chain rule inputs and outputs, and to have multiple outputs.
Executing the workflow with the command `snakemake only_users.txt` (to tell it
to generate the `only_users.txt` file) should execute both rules, with an output
similar to:
@ -122,13 +136,38 @@ outputs (which are numbered from `0` to `N` by default, and can be named). The
`shell` directive specifies that we want to run a shell command. This is the
most flexible option. Alternatively one can use the `run` directive and write
inline python code directly in the `Snakefile`, the `script` directive, which
specifies the name of a Python (or another language) script to be run (Snakemake
creates a context for this script which allows it to access the input and output
objects), or finally the `notebook` directive, similar to the `script`
directive, for which Snakemake allows interactive execution (useful for
postprocessing/data exploration).
specifies the name of a Python (or another language)
[script](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#external-scripts)
to be run (Snakemake creates a context for this script which allows it to access
the input and output objects), or finally the [`notebook`
directive](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#jupyter-notebook-integration),
similar to the `script` directive, for which Snakemake allows interactive
execution (useful for postprocessing/data exploration).
Reading the
[documentation](https://snakemake.readthedocs.io/en/stable/index.html) is highly
recommended. Although the examples are often biology oriented, the features they
demonstrate are easily transposed to a mechanics environment.
Here is a list of useful features:
- [Wildcards](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#wildcards)
allow to specify parameter values from file names. In the example above,
running `snakemake start_with_letter_m.txt` will replace the
`{wildcards.letter}` in the `shell` directive by `m`. This is very useful to
distinguish output files based on parameter values. Multiple wildcards can be
used in the same rule.
- [Expansion](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#the-expand-function)
allows to specify a range of values for a wildcard. This is useful to explore
a parametric space, or
[aggregate](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#defining-scatter-gather-processes)
the data of several values for one wildcard.
- [Rule
parameters](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules)
allow one to specify additional parameters (i.e. non-file inputs) to rules.
- [Rule
dependencies](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#rule-dependencies)
allow using the output of a rule as input to another without having to specify
the name.
- [Parameter space
exploration](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#parameter-space-exploration).