3 Snakemake Tutorial
Lucas Frérot edited this page 2026-01-14 14:31:58 +01:00

Snakemake tutorial : free falling particle

The computation

We'll study the equation for the velocity of a free falling particle with viscous friction, in adimensional form:

 \frac{\mathrm d v}{\mathrm dt} = 1 - \eta v,\qquad v(0) = 0 

We will vary the physical parameter \eta, the time step \Delta t for the numerical integration scheme and the number of time steps.

The following function solves the ODE numerically:

import numpy as np

def solve_free_fall(eta, dt, total_time):
    """
    Solve free fall equation with centered FD scheme.

    Returns time and velocity values as Numpy arrays.
    """

    Nsteps = int(total_time / dt)
    v = np.zeros(Nsteps)
    dv = 1

    for i in range(Nsteps-1):
        vpred = v[i] + dt / 2 * dv
        dv = (1 - eta * vpred) / (1 + eta * dt / 2)
        v[i+1] = vpred + dt / 2 * dv
    return np.vstack((np.arange(Nsteps) * dt, v))

It returns a single array where the first line is time and the second line is velocity.

Save the above function in a simulation.py file.

Steps

We want to do three things for our tutorial study:

  • For any value of eta, dt and total_time, run the computation and save the output data (we'll call it the simulation step)
  • For any computation, plot the velocity as function of time and save the plot file (the plot step)
  • For a chosen range of eta, plot the terminal velocity as a function of eta (the aggregation step)

Each of these three things constitutes a workflow step, which means we define a rule for each. Each will also have its own script.

Since the first two steps do not specify which values are used for eta, dt and total_time, these three parameters will be wildcards, i.e. values that will be specified by the user when an output is requested (e.g. the user wants the time-velocity plot for eta=0.1, dt=1e-3, total_time=20).

Rules

Snakemake rules (defined in a Snakefile text file) typically define three things for a step:

  1. The input files (optional)
  2. The output files
  3. What needs to be executed to generate the output (e.g. a script or shell command)

The first simulation step does not require any input other than the wildcard parameter values, so we can write its rule in the Snakefile as:

rule simulation:
    output:
        "free_fall,eta={eta},dt={dt},total_time={total_time}.txt"
    script:
        "simulation.py"

Here the output file is defined as "free_fall,eta={eta},dt={dt},total_time={total_time}.txt". The {...} syntax indicates a wildcard parameter (here eta, total_time and dt), but the rest of the output path is free (e.g. the eta= part could be omitted).

One can request the output file by running:

snakemake -j1 free_fall,eta=0.5,dt=1e-3,total_time=20.txt

This command will execute the simulation rule with wildcard replaced by the specified values. Since simulation.py only contains a function, nothing happens. Let's see how to retrieve the wildcard values and output paths from the script file.

Snakemake context

A running script launched with Snakemake has access to the workflow context with the snakemake.script.snakemake object, which we can import in our simulation.py script:

from snakemake.script import snakemake

You can print the following properties of the context object:

print("input", snakemake.input)
print("output", snakemake.output)
print("wildcards", dict(snakemake.wildcards))
print("parameters", dict(snakemake.params))

We can test this with:

snakemake -j1 free_fall,eta=1,dt=1e-1,total_time=10.txt

< ... >

input
output free_fall,eta=1,dt=1e-1,total_time=10.txt
wildcards {'eta': '1', 'dt': '1e-1', 'total_time': '10'}
parameters {}

Note that the snakemake.wildcards object can be interpreted as a dictionary, with each value associated to a wildcard being a string (note that the values here are not numbers). This means numerical values need to be converted before being passed on to our function.

Now that we know how to retrieve context values, we can use them to run our simulation and save the result in a text file:

wildcards = snakemake.wildcards
eta = float(wildcards.eta)
dt = float(wildcards.dt)
total_time = float(wildcards.total_time)

assert eta >= 0 and dt > 0 and total_time > 0

results = solve_free_fall(eta, dt, total_time)
np.savetxt(snakemake.output[0], results)

Note that snakemake.output is always a list. Since we only have one output, we take snakemake.output[0].

New rule: plotting

We now want to implement the second part of the workflow: plotting the results of a single simulation.

The rule we will implement will have an input file, which is the same as the output of the previous rule. Snakemake makes this easy with a rules object which is used to access context from other rules (such as output files). One must be careful: wildcards need to be conserved between rules. The number of wildcards in the output of a rule can never be smaller than the number of wildcards in the input (although it can be larger). This means we have to reuse the wildcards in the filename for the plot.

Let us also say we want to change the color of the plot. Since this is not really a simulation parameter, we will add a color parameter to the rule itself. The advantage is that to change the parameter value one can quickly look at the rule, there is no need to go read the source code.

rule simulation_plot:
    input:
        data=rules.simulation.output[0],
    output:
        plot_file='time_series,eta={eta},dt={dt},total_time={total_time}.pdf'
    params:
        plot_color='red'
    script:
        "simulation_plot.py"

Note that we have used named attributes for input, output, etc. This means we can use the names data, plot_file and plot_color in the script, like so:

import matplotlib.pyplot as plt
import numpy as np

from snakemake.script import snakemake

data = np.loadtxt(snakemake.input.data)

plt.plot(data[0], data[1], color=snakemake.params.plot_color)
plt.xlabel('$t$')
plt.ylabel('$v(t)$')
plt.savefig(snakemake.output.plot_file)

You can generate the plot with:

snakemake -j1 time_series,eta=1,dt=1e-1,total_time=10.pdf

Notice that only the plot step gets executed if the simulation data is up-to-date.