Basic elements of an IMAGINE pipeline

In this tutorial, we focus on introducing the basic building blocks of the IMAGINE package and how to use them for assembling a Bayesian analysis pipeline.

We will use mock data with only two independent free parameters. First, we will generate the mock data. Then we will assemble all elements needed for the IMAGINE pipeline, execute the pipeline and investigate its results.

The mock data are designed to “naively” mimic Faraday depth, which is affected linearly by the (Galactic) magnetic field and thermal electron density. As a function of position \(x\), we define a constant coherent magnetic field component \(a_0\) and a random magnetic field component which is drawn from a Gaussian distribution with standard deviation \(b_0\). The electron density is assumed to be independently known and given by a \(\cos(x)\) with arbitrary scaling. The mock data values we get are related to the Faraday depth of a background source at some arbitrary distance:

\[signal(x) = \left[1+\cos(x)\right] \times \mathcal{G}(\mu=a_0,\sigma=b_0;seed=s)\,{\mu\rm G\,cm}^{-3} , \; x \in [0,2\pi]\,\rm kpc\]

where \(\{a_0,b_0\}\) is the ‘physical’ parameter set, and \(s\) represents the seed for random variable generation.

The purpose is not to fit the exact signal, since it includes a stochastic component, but to fit the amplitude of the signal and of the variations around it. So this is fitting the strength of the coherent field \(a_0\) and the amplitude of the random field \(b_0\). With these mock data and its (co)variance matrix, we shall assemble the IMAGINE pipeline, execute it and examine its results.

First, import the necessary packages.

[1]:
import numpy as np
import logging as log
from astropy.table import Table
import astropy.units as u
import corner
import matplotlib.pyplot as plt

import imagine as img

%matplotlib inline

1) Preparing the mock data

In calculating the mock data values, we introduce noise as:

\[data(x) = signal(x) + noise(x)\]

For simplicity, we propose a simple gaussian noise with mean zero and a standard deviation \(e\):

\[noise(x) = \mathcal{G}(\mu=0,\sigma=e)\]

.

We will assume that we have 10 points in the x-direction, in the range \([0, 2\pi]\,\rm kpc\).

[2]:
a0 = 3. # true value of a in microgauss
b0 = 6. # true value of b in microgauss
e = 0.1 # std of gaussian measurement error
s = 233 # seed fixed for signal field

size = 10 # data size in measurements
x = np.linspace(0.01,2.*np.pi-0.01,size) # where the observer is looking at

np.random.seed(s) # set seed for signal field

signal = (1+np.cos(x)) * np.random.normal(loc=a0,scale=b0,size=size)

fd = signal + np.random.normal(loc=0.,scale=e,size=size)

# We load these to an astropy table for illustration/visualisation
data = Table({'meas' : fd,
              'err': np.ones_like(fd)*e,
              'x': x,
              'y': np.zeros_like(fd),
              'z': np.zeros_like(fd),
              'other': np.ones_like(fd)*42
              })
data[:4] # Shows the first 4 points in tabular form
[2]:
Table length=4
measerrxyzother
float64float64float64float64float64float64
16.42177908175520.10.010.00.042.0
7.1724687312015070.10.70590947857550970.00.042.0
-3.22549478214604330.11.40181895715101930.00.042.0
0.279493347589664650.12.09772843572652870.00.042.0

These data need to be converted to an IMAGINE compatible format. To do this, we first create TabularDataset object, which helps importing dictionary-like dataset onto IMAGINE.

[3]:
fd_units = u.microgauss*u.cm**-3

mockDataset = img.observables.TabularDataset(data, name='test',
                                             data_column='meas',
                                             coordinates_type='cartesian',
                                             x_column='x', y_column='y',
                                             z_column='z', error_column='err',
                                             units=fd_units)

These lines simply explain how to read the tabular dataset (note that the ‘other’ column is ignored): name contains the type of observable we are using (here, we use ‘test’, it could also be ‘sync’ for synchrotron observables (e.g, Stokes parameters), ‘fd’ for Faraday Depth, etc. The data_column argument specifies the key or name of the column containing the relevant measurement. Coordinates can be either cartesian as in this example, which requires specifying columns for \(x\), \(y\) and \(z\) in \(\rm kpc\), or galactic, which requires setting the arguments lat_column and lon_column both in degrees. The units of the dataset are represented using astropy.units objects and must be supplied (the Simulator will later check whether these are adequate and automatically convert the data to other units if needed).

The dataset can be loaded onto Measurements and Covariances object, which are subclasses of ObservableDict. These objects allow one to supply multiple datasets to the pipeline.

[4]:
mock_data = img.observables.Measurements() # create empty Measrurements object
mock_data.append(dataset=mockDataset)

mock_cov = img.observables.Covariances() # create empty Covariance object
mock_cov.append(dataset=mockDataset)

The dataset object creates a standard key for each appended dataset. In our case, there is only one key.

[5]:
keys = list(mock_data.keys())
keys
[5]:
[('test', 'nan', 'tab', 'nan')]

Let us plot the mock data as well as the \(1+\cos(x)\) function that is the underlying variation.

The property Measurements.global_data extracts arrays from the Observable object which is hosted inside the ObservableDict class.

[6]:
plt.scatter(x, mock_data[keys[0]].global_data[0], marker='.', label='signal')
plt.plot(x,(1+np.cos(x))*a0,'r--',label='$1+\cos(x)$')
plt.xlabel('x'); plt.legend();
_images/tutorial_one_15_0.png

Note that the variance in the signal is highest where the \(\cos(x)\) is also strongest. This is the way we expect the Faraday depth to work, since a fluctuation in the strength of \(\mathbf B\) has a larger effect on the RM when \(n_e\) also happens to be higher.

2) Pipeline assembly

Now that we have generated mock data, there are a few steps to set up the pipeline to estimate the input parameters. We need to specify: a grid, Field Factories, Simulators, and Likelihoods.

Setting the coordinate grid

Fields in IMAGINE represent models of any kind of physical field – in this particular tutorial, we will need a magnetic field and thermal electron density.

The Fields are evaluated on a grid of coordinates, represented by a img.Grid object. Here we exemplify how to produce a regular cartesian grid. To do so, we need to specify the values of the coordinates on the 6 extremities of the box (i.e. the minimum and maximum value for each coordinate), and the resolution over each dimension.

For this particular artificial example, we actually only need one dimension, so we set the resolution to 1 for \(y\) and \(z\).

[7]:
one_d_grid = img.fields.UniformGrid(box=[[0,2*np.pi]*u.kpc,
                                         [0,0]*u.kpc,
                                         [0,0]*u.kpc],
                                    resolution=[30,1,1])

Preparing the Field Factory list

A particular realisation of a model for a physical field is represented within IMAGINE by a Field object, which, given set of parameters, evaluates the field for over the grid.

A Field Factory is an associated piece of infrastructure used by the Pipeline to produce new Fields. It is a Factory object that needs to be initialized and supplied to the Pipeline. This is what we will illustrate here.

[8]:
from imagine import fields
ne_factory = fields.CosThermalElectronDensityFactory(grid=one_d_grid)

The previous line instantiates CosThermalElectronDensityFactory with the previously defined Grid object. This Factory allows the Pipeline to produce CosThermalElectronDensity objects. These correspond to a toy model for electron density with the form:

\[n_e(x,y,z) = n_0 [1+\cos (a x + \alpha)][1+\cos (b y + \beta)][1+\cos(c y + \gamma)]\,.\]

We can set and check the default parameter values in the following way:

[9]:
ne_factory.default_parameters= {'a': 1*u.rad/u.kpc,
                                'beta':  np.pi/2*u.rad,
                                'gamma': np.pi/2*u.rad}
ne_factory.default_parameters
[9]:
{'n0': <Quantity 1. 1 / cm3>,
 'a': <Quantity 1. rad / kpc>,
 'b': <Quantity 0. rad / kpc>,
 'c': <Quantity 0. rad / kpc>,
 'alpha': <Quantity 0. rad>,
 'beta': <Quantity 1.57079633 rad>,
 'gamma': <Quantity 1.57079633 rad>}
[10]:
ne_factory.active_parameters
[10]:
()

For ne_factory, no active parameters were set. This means that the Field will be always evaluated using the specified default parameter values.

We will now similarly define the magnetic field, using the NaiveGaussianMagneticField which constructs a “naive” random field (i.e. the magnitude of \(x\), \(y\) and \(z\) components of the field are drawn from a Gaussian distribution without imposing zero divergence, thus do not use this for serious applications).

[11]:
B_factory = fields.NaiveGaussianMagneticFieldFactory(grid=one_d_grid)

Differently from the case of ne_factory, in this case we would like to make the parameters active. All individual components of the field are drawn from a Gaussian distribution with mean \(a_0\) and standard deviation \(b_0\). To set these parameters as active we do:

[12]:
B_factory.active_parameters = ('a0','b0')
B_factory.priors ={'a0': img.priors.FlatPrior(interval=[-5,5]*u.microgauss),
                   'b0': img.priors.FlatPrior(interval=[0,10]*u.microgauss)}

In the lines above we chose uniform (flat) priors for both parameters within the above specified ranges. Any active parameter must have a Prior distribution specified.

Once the two FieldFactory objects are prepared, they put together in a list which is later supplied to the Pipeline.

[13]:
factory_list = [ne_factory, B_factory]

Initializing the Simulator

For this tutorial, we use a customized TestSimulator which simply computes the quantity: \(t(x,y,z) = B_y\,n_e\,\),i.e. the contribution at one specific point to the Faraday depth.

The simulator is inialized with the mock Measurements defined before, which allows it to know what is the correct format for output.

[14]:
from imagine.simulators import TestSimulator
simer = TestSimulator(mock_data)

Initializing the Likelihood

IMAGINE provides the Likelihood class with EnsembleLikelihood and SimpleLikelihood as two options. The SimpleLikelihood is what you expect, computing a single \(\chi^2\) from the difference of the simulated and the measured datasets. The EnsembleLikelihood is how IMAGINE handles a signal which itself includes a stochastic component, e.g., what we call the Galactic variance. This likelihood module makes use of a finite ensemble of simulated realizations and uses their mean and covariance to compare them to the measured dataset.

[15]:
likelihood = img.likelihoods.EnsembleLikelihood(mock_data, mock_cov)

3) Running the pipeline

Now we have all the necessary components available to run our pipeline. This can be done through a Pipeline object, which interfaces with some algorithm to sample the likelihood space accounting for the prescribed prior distributions for the parameters.

IMAGINE comes with a range of samplers coded as different Pipeline classes, most of which are based on the nested sampling approach. In what follows we will use the UltraNest sampler as an example.

IMAGINE takes care of stochastic fields by evaluating an ensemble of random realisations for each selected point in the parameter space, and computing the associated covariance (i.e. estimating the Galactic variance). We can set this up through the ensemble_size argument.

Now we are ready to initialize our final pipeline.

[16]:
pipeline = img.pipelines.UltranestPipeline(simulator=simer,
                                           factory_list=factory_list,
                                           likelihood=likelihood,
                                           ensemble_size=150,
                                           chains_directory=None)

The chains_directory keyword may contain a path where the chains generated by the sampler are saved in the sampler’s native format (if the sampler supports this). If this argument is absent or set to None, a temporary directory will be created in the current working directory, and will be automatically removed when together the Pipeline object becomes out of scope (e.g. when one runs del pipeline or exits the script/notebook). After initialization, we can check this choice inspecting the chains_directory property:

[17]:
pipeline.chains_directory
[17]:
'/home/lfsr/IMAGINE/imagine-pipeline/tutorials/imagine_chains_2i8qk58j'

The property sampling_controllers allows one to send sampler-specific parameters to the chosen Pipeline. Each IMAGINE Pipeline object will have slightly different sampling controllers, which can be found in the specific Pipeline’s docstring.

[18]:
pipeline.sampling_controllers = {'max_ncalls': 500, 'min_num_live_points': 50}

The UltranestPipeline, for example, allows one to set a limit on the number of likelihood evaluations using the sampling controller 'max_ncalls'). This is convenient for quick tests and examples (as this tutorial). Similar functionality is available in the DynestyPipeline under the name 'max_call' (and also 'max_call_init'/'max_call_batch' for dynamic nested sampling).

In a production run, however, one would rather set the target estimated uncertainty in the model evidence and/or target posterior uncertainty, and allow the model to converge to it. This can be done using the following sampling controllers (do check individual docs for details):

In UltranestPipeline this is done using 'dlogZ' for target log-evidence error and 'dKL' for target posterior error. In MultinestPipeline one can only controll the target log-evidence error, using 'evidence_tolerance'. In DynestyPipeline the log-evidence error can be controlled using: 'dlogz' (also 'dlogz_init'), but the sampler can also try to optimize the estimate of the posterior while running in the dynamic mode.

Now, we finally can run the pipeline!

[19]:
results = pipeline()
[ultranest] PointStore: have 0 items
INFO:ultranest:PointStore: have 0 items
[ultranest] Sampling 50 live points from prior ...
INFO:ultranest:Sampling 50 live points from prior ...
DEBUG:ultranest:minimal_widths_sequence: [(-inf, 50.0), (inf, 50.0)]
[ultranest] Explored until L=-3e+01   [-2211.1490..-30.6137] | it/evals=144/500 eff=32.0000% N=50
INFO:ultranest:Explored until L=-3e+01
[ultranest] Likelihood function evaluations: 500
INFO:ultranest:Likelihood function evaluations: 500
[ultranest] Writing samples and results to disk ...
INFO:ultranest:Writing samples and results to disk ...
[ultranest] Writing samples and results to disk ... done
INFO:ultranest:Writing samples and results to disk ... done
DEBUG:ultranest:did a run_iter pass!
[ultranest] Reached maximum number of likelihood calls (500 > 500)...
INFO:ultranest:Reached maximum number of likelihood calls (500 > 500)...
[ultranest] done iterating.
INFO:ultranest:done iterating.

When one runs the pipeline it returns a results dictionary object in the native format of the chosen sampler. Alternatively, after running the pipeline object, the results can also be accessed through its attributes, which are standard interfaces (i.e. all pipelines should work in the same way).

For comparing different models, the quantity of interest is the model evidence or marginal likelihood. After a run, this can be easily accessed as follows.

[20]:
print('log evidence:', round(pipeline.log_evidence,4))
print('log evidence error:', round(pipeline.log_evidence_err,4))
log evidence: -32.3431
log evidence error: 0.382

A quick-and-dirty summary of the constraints on the parameters can be (nicely) displayed using the posterior_report() method.

(The “real” parameter values were \(a_0=3\mu \rm G\) and \(b_0=6\mu \rm G\).)

[21]:
print('\nParameter constraints:')
pipeline.posterior_report()

Parameter constraints:
$\displaystyle \\ \text{ naive_gaussian_magnetic_field_a0: }\; 2.7^{+1.3}_{-2.4}\,\mathrm{\mu G}\\\\ \text{ naive_gaussian_magnetic_field_b0: }\; 5.7^{+1.7}_{-1.0}\,\mathrm{\mu G}\\$

Similar information can be accessed through property posterior_summary, which returns a dictionary.

[22]:
pipeline.posterior_summary
[22]:
{'naive_gaussian_magnetic_field_a0': {'median': <Quantity 2.70238735 uG>,
  'errlo': <Quantity 2.41938506 uG>,
  'errup': <Quantity 1.31069702 uG>,
  'mean': <Quantity 2.25622256 uG>,
  'stdev': <Quantity 1.85700372 uG>},
 'naive_gaussian_magnetic_field_b0': {'median': <Quantity 5.65438118 uG>,
  'errlo': <Quantity 1.03419168 uG>,
  'errup': <Quantity 1.66290483 uG>,
  'mean': <Quantity 5.83415716 uG>,
  'stdev': <Quantity 1.30901039 uG>}}

In most cases, however, one would (should) prefer to work directly on the samples produced by the sampler. A table containing the parameter values of the samples generated can be accessed through:

[23]:
samples = pipeline.samples
samples[:3] # Displays only first 3 rows
[23]:
QTable length=3
naive_gaussian_magnetic_field_a0naive_gaussian_magnetic_field_b0
uGuG
float64float64
4.0811003491327846.717489837154965
0.123772318615081646.231348024468215
3.8300928024783814.537991481720585

The distributions of samples approximate the posterior distribution, below this is plotted with the help of the corner library, whcih also shows the best-fit values and \(1\sigma\) and \(2\sigma\) contours.

[24]:
def plot_samples_corner(samp):

    ##  See https://corner.readthedocs.io/en/latest/pages/sigmas.html about contour levels.
    ##  "Contours are shown at 0.5, 1, 1.5, and 2 sigma" by default
    ##  according to https://pypi.org/project/corner/1.0.1/, but I want 1, 2, and 3.
    sigmas=np.array([1.,2.,3.])
    levels=1-np.exp(-0.5*sigmas*sigmas)

    # Visualize with a corner plot
    figure = corner.corner(np.vstack([samp.columns[0].value, samp.columns[1].value]).T,
                           range=[0.99]*len(samp.colnames),
                           quantiles=[0.02, 0.5, 0.98],
                           labels=samp.colnames,
                           show_titles=True,
                           title_kwargs={"fontsize": 12},
                           color='steelblue',
                           truths=[a0,b0],
                           truth_color='firebrick',
                           plot_contours=True,
                           hist_kwargs={'linewidth': 2},
                           label_kwargs={'fontsize': 10},
                           levels=levels)
[25]:
plot_samples_corner(samples)
WARNING:root:Too few points to create valid contours
_images/tutorial_one_56_1.png

One can, of course, choose other plotting/analysis routines. Below, the use of seaborn is exemplified.

[26]:
import seaborn as sns
def plot_samples_seaborn(samp):

    def show_truth_in_jointplot(jointplot, true_x, true_y, color='r'):
        for ax in (jointplot.ax_joint, jointplot.ax_marg_x):
            ax.vlines([true_x], *ax.get_ylim(), colors=color)
        for ax in (jointplot.ax_joint, jointplot.ax_marg_y):
            ax.hlines([true_y], *ax.get_xlim(), colors=color)

    snsfig = sns.jointplot(*samp.colnames, data=samp.to_pandas(), kind='kde')
    snsfig.plot_joint(sns.scatterplot, linewidth=0, marker='.', color='0.3')
    show_truth_in_jointplot(snsfig, a0, b0)
[27]:
plot_samples_seaborn(samples)
_images/tutorial_one_59_0.png

Random seeds and convergence checks

The pipeline relies on random numbers in multiple ways. The Monte Carlo sampler will draw randomly chosen points in the parameter space duing its exploration (in the specific case of nested sampling pipelines, these are drawn from the prior distributions). Also, while evaluating the fields at each point, random realisations of the stochastic fields are generated.

It is possible to control the behaviour of the random seeding of an IMAGINE pipeline through the attribute master_seed. This attribute has two uses: it is passed to the sampler, ensuring that its behaviour is reproducible; and it is also used to generate a fresh list of new random seeds to each stochastic field that is evaluated.

[28]:
pipeline.master_seed
[28]:
1

By default, the master seed is fixed and set to 1, but you can alter its value before running the pipeline.

One can also change the seeding behaviour through the random_type attribute. There are three allowed options for this:

  • ‘controllable’ - the master_seed is constant and a re-running the pipeline should lead to the exact same results (default);
  • ‘free’ - on each execution, a new master_seed is drawn (using numpy.randint);
  • ‘fixed’ - this mode is for debugging purposes. The master_seed is fixed, as in the ‘controllable’ case, however each individual stochastic field receives the exact same list of ensemble seeds every time (while in the controllable these are chosen “randomly” at run-time). Such choice can be inspected using pipeline.ensemble_seeds.

Let us now check whether different executions of the pipeline are generating consistent results. To do so, we run it five times and just overplot histograms of the outputs to see if they all look the same. There are more rigorous tests, of course, that we have done, but they take longer. The following can be done in a few minutes.

[29]:
# Adjust the random behaviour
pipeline.random_type = 'free'

fig, axs = plt.subplots(1, 2, figsize=(15, 4))
repeat = 5

for i in range(repeat):
    if i>0:
        print('-'*60+'\nRun {}/{}'.format(i+1,repeat))
        # Re-runs the pipeline!
        _ = pipeline()

    for j, param in enumerate(pipeline.samples.columns):
        samp = pipeline.samples[param]
        axs[j].hist(samp.value, alpha=0.4, bins=30, label=pipeline.master_seed)
        axs[j].set_title(param)

for i in range(2):
    axs[i].legend(title='seed')
------------------------------------------------------------
Run 2/5
[ultranest] PointStore: have 0 items
INFO:ultranest:PointStore: have 0 items
[ultranest] Sampling 50 live points from prior ...
INFO:ultranest:Sampling 50 live points from prior ...
DEBUG:ultranest:minimal_widths_sequence: [(-inf, 50.0), (inf, 50.0)]
[ultranest] Explored until L=-3e+01   [-31.0469..-29.8293] | it/evals=169/501 eff=37.4723% N=50 0
INFO:ultranest:Explored until L=-3e+01
[ultranest] Likelihood function evaluations: 501
INFO:ultranest:Likelihood function evaluations: 501
[ultranest] Writing samples and results to disk ...
INFO:ultranest:Writing samples and results to disk ...
[ultranest] Writing samples and results to disk ... done
INFO:ultranest:Writing samples and results to disk ... done
DEBUG:ultranest:did a run_iter pass!
[ultranest] Reached maximum number of likelihood calls (501 > 500)...
INFO:ultranest:Reached maximum number of likelihood calls (501 > 500)...
[ultranest] done iterating.
INFO:ultranest:done iterating.
------------------------------------------------------------
Run 3/5
[ultranest] PointStore: have 0 items
INFO:ultranest:PointStore: have 0 items
[ultranest] Sampling 50 live points from prior ...
INFO:ultranest:Sampling 50 live points from prior ...
DEBUG:ultranest:minimal_widths_sequence: [(-inf, 50.0), (inf, 50.0)]
[ultranest] Explored until L=-3e+01   [-30.9045..-29.7086] | it/evals=168/502 eff=37.1681% N=50 0
INFO:ultranest:Explored until L=-3e+01
[ultranest] Likelihood function evaluations: 502
INFO:ultranest:Likelihood function evaluations: 502
[ultranest] Writing samples and results to disk ...
INFO:ultranest:Writing samples and results to disk ...
[ultranest] Writing samples and results to disk ... done
INFO:ultranest:Writing samples and results to disk ... done
DEBUG:ultranest:did a run_iter pass!
[ultranest] Reached maximum number of likelihood calls (502 > 500)...
INFO:ultranest:Reached maximum number of likelihood calls (502 > 500)...
[ultranest] done iterating.
INFO:ultranest:done iterating.
------------------------------------------------------------
Run 4/5
[ultranest] PointStore: have 0 items
INFO:ultranest:PointStore: have 0 items
[ultranest] Sampling 50 live points from prior ...
INFO:ultranest:Sampling 50 live points from prior ...
DEBUG:ultranest:minimal_widths_sequence: [(-inf, 50.0), (inf, 50.0)]
[ultranest] Explored until L=-3e+01   [-680.4351..-30.5342] | it/evals=154/523 eff=32.5581% N=50
INFO:ultranest:Explored until L=-3e+01
[ultranest] Likelihood function evaluations: 523
INFO:ultranest:Likelihood function evaluations: 523
[ultranest] Writing samples and results to disk ...
INFO:ultranest:Writing samples and results to disk ...
[ultranest] Writing samples and results to disk ... done
INFO:ultranest:Writing samples and results to disk ... done
DEBUG:ultranest:did a run_iter pass!
[ultranest] Reached maximum number of likelihood calls (523 > 500)...
INFO:ultranest:Reached maximum number of likelihood calls (523 > 500)...
[ultranest] done iterating.
INFO:ultranest:done iterating.
------------------------------------------------------------
Run 5/5
[ultranest] PointStore: have 0 items
INFO:ultranest:PointStore: have 0 items
[ultranest] Sampling 50 live points from prior ...
INFO:ultranest:Sampling 50 live points from prior ...
DEBUG:ultranest:minimal_widths_sequence: [(-inf, 50.0), (inf, 50.0)]
[ultranest] Explored until L=-3e+01   [-3169.7507..-30.2520] | it/evals=169/502 eff=37.3894% N=50
INFO:ultranest:Explored until L=-3e+01
[ultranest] Likelihood function evaluations: 502
INFO:ultranest:Likelihood function evaluations: 502
[ultranest] Writing samples and results to disk ...
INFO:ultranest:Writing samples and results to disk ...
[ultranest] Writing samples and results to disk ... done
INFO:ultranest:Writing samples and results to disk ... done
DEBUG:ultranest:did a run_iter pass!
[ultranest] Reached maximum number of likelihood calls (502 > 500)...
INFO:ultranest:Reached maximum number of likelihood calls (502 > 500)...
[ultranest] done iterating.
INFO:ultranest:done iterating.
_images/tutorial_one_64_64.png

Script example

A script version of this tutorial can be found in the examples directory.