Snakemake tutorial : free falling particle
The computation
We'll study the equation for the velocity of a free falling particle with viscous friction, in adimensional form:
\frac{\mathrm d v}{\mathrm dt} = 1 - \eta v,\qquad v(0) = 0
We will vary the physical parameter \eta, the time step \Delta t for the
numerical integration scheme and the number of time steps.
The following function solves the ODE numerically:
import numpy as np
def solve_free_fall(eta, dt, total_time):
"""
Solve free fall equation with centered FD scheme.
Returns time and velocity values as Numpy arrays.
"""
Nsteps = int(total_time / dt)
v = np.zeros(Nsteps)
dv = 1
for i in range(Nsteps-1):
vpred = v[i] + dt / 2 * dv
dv = (1 - eta * vpred) / (1 + eta * dt / 2)
v[i+1] = vpred + dt / 2 * dv
return np.vstack((np.arange(Nsteps) * dt, v))
It returns a single array where the first line is time and the second line is velocity.
Save the above function in a simulation.py file.
Steps
We want to do three things for our tutorial study:
- For any value of
eta,dtandtotal_time, run the computation and save the output data (we'll call it the simulation step) - For any computation, plot the velocity as function of time and save the plot file (the plot step)
- For a chosen range of
eta, plot the terminal velocity as a function ofeta(the aggregation step)
Each of these three things constitutes a workflow step, which means we define a rule for each. Each will also have its own script.
Since the first two steps do not specify which values are used for eta, dt
and total_time, these three parameters will be wildcards, i.e. values that
will be specified by the user when an output is requested (e.g. the user wants
the time-velocity plot for eta=0.1, dt=1e-3, total_time=20).
Rules
Snakemake rules (defined in a Snakefile text file) typically define three
things for a step:
- The input files (optional)
- The output files
- What needs to be executed to generate the output (e.g. a script or shell command)
The first simulation step does not require any input other than the wildcard
parameter values, so we can write its rule in the Snakefile as:
rule simulation:
output:
"free_fall,eta={eta},dt={dt},total_time={total_time}.txt"
script:
"simulation.py"
Here the output file is defined as
"free_fall,eta={eta},dt={dt},total_time={total_time}.txt". The {...} syntax
indicates a wildcard parameter (here eta, total_time and dt), but the
rest of the output path is free (e.g. the eta= part could be omitted).
One can request the output file by running:
snakemake -j1 free_fall,eta=0.5,dt=1e-3,total_time=20.txt
This command will execute the simulation rule with wildcard replaced by the
specified values. Since simulation.py only contains a function, nothing
happens. Let's see how to retrieve the wildcard values and output paths from the
script file.
Snakemake context
A running script launched with Snakemake has access to the workflow context with
the snakemake.script.snakemake object, which we can import in our
simulation.py script:
from snakemake.script import snakemake
You can print the following properties of the context object:
print("input", snakemake.input)
print("output", snakemake.output)
print("wildcards", dict(snakemake.wildcards))
print("parameters", dict(snakemake.params))
We can test this with:
snakemake -j1 free_fall,eta=1,dt=1e-1,total_time=10.txt
< ... >
input
output free_fall,eta=1,dt=1e-1,total_time=10.txt
wildcards {'eta': '1', 'dt': '1e-1', 'total_time': '10'}
parameters {}
Note that the snakemake.wildcards object can be interpreted as a dictionary,
with each value associated to a wildcard being a string (note that the values
here are not numbers). This means numerical values need to be converted before
being passed on to our function.
Now that we know how to retrieve context values, we can use them to run our simulation and save the result in a text file:
wildcards = snakemake.wildcards
eta = float(wildcards.eta)
dt = float(wildcards.dt)
total_time = float(wildcards.total_time)
assert eta >= 0 and dt > 0 and total_time > 0
results = solve_free_fall(eta, dt, total_time)
np.savetxt(snakemake.output[0], results)
Note that snakemake.output is always a list. Since we only have one output, we
take snakemake.output[0].
New rule: plotting
We now want to implement the second part of the workflow: plotting the results of a single simulation.
The rule we will implement will have an input file, which is the same as the
output of the previous rule. Snakemake makes this easy with a rules object
which is used to access context from other rules (such as output files). One
must be careful: wildcards need to be conserved between rules. The number of
wildcards in the output of a rule can never be smaller than the number of
wildcards in the input (although it can be larger). This means we have to reuse
the wildcards in the filename for the plot.
Let us also say we want to change the color of the plot. Since this is not
really a simulation parameter, we will add a color parameter to the rule
itself. The advantage is that to change the parameter value one can quickly look
at the rule, there is no need to go read the source code.
rule simulation_plot:
input:
data=rules.simulation.output[0],
output:
plot_file='time_series,eta={eta},dt={dt},total_time={total_time}.pdf'
params:
plot_color='red'
script:
"simulation_plot.py"
Note that we have used named attributes for input, output, etc. This means we
can use the names data, plot_file and plot_color in the script, like so:
import matplotlib.pyplot as plt
import numpy as np
from snakemake.script import snakemake
data = np.loadtxt(snakemake.input.data)
plt.plot(data[0], data[1], color=snakemake.params.plot_color)
plt.xlabel('$t$')
plt.ylabel('$v(t)$')
plt.savefig(snakemake.output.plot_file)
You can generate the plot with:
snakemake -j1 time_series,eta=1,dt=1e-1,total_time=10.pdf
Notice that only the plot step gets executed if the simulation data is up-to-date.