# imagine.pipelines package¶

## imagine.pipelines.dynesty_pipeline module¶

class imagine.pipelines.dynesty_pipeline.DynestyPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]

Bayesian analysis pipeline with dynesty

This pipeline may use DynamicNestedSampler if the sampling parameter ‘dynamic’ is set to True (default) or NestedSampler if ‘dynamic is False (default).

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

• dynamic (bool) – If True (default), use dynesty.DynamicNestedSampler otherwise uses dynesty.NestedSampler.

• dlogz (float) – Iteration will stop, in the dynamic==False case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est) - ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e-3 * (nlive - 1) + 0.01. Otherwise, the default is 0.01.

• dlogz_init (float) – The baseline run will stop, in the dynamic==True case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est) - ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e-3 * (nlive - 1) + 0.01. Otherwise, the default is 0.01.

• nlive (int) – If dynamic is False, this sets the number of live points used. Default is 400.

• nlive_init (int) – If dynamic is True, this sets the number of live points used during the initial (“baseline”) nested sampling run. Default is 400.

• nlive_batch (int) – If dynamic is True, this sets the number of live points used when adding additional samples from a nested sampling run within each batch. Default is 400.

• logl_max (float) – Iteration will stop when the sampled ln(likelihood) exceeds the threshold set by logl_max. Default is no bound (np.inf).

• logl_max_init (float) – The baseline run will stop, in the dynamic==True case, when the sampled ln(likelihood) exceeds this threshold. Default is no bound (np.inf).

• maxiter (int) – Maximum number of iterations. Iteration may stop earlier if the termination condition is reached. Default is (no limit).

• maxiter_init (int) – If dynamic is True, this sets the maximum number of iterations for the initial baseline nested sampling run. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).

• maxiter_batch (int) – If dynamic is True, this sets the maximum number of iterations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).

• maxcall (int) – Maximum number of likelihood evaluations (without considering the initial points, i.e. maxcall_effective = maxcall + nlive). Iteration may stop earlier if termination condition is reached. Default is sys.maxsize (no limit).

• maxcall_init (int) – If dynamic is True, maximum number of likelihood evaluations in the baseline run.

• maxcall_batch (int) – If dynamic is True, maximum number of likelihood evaluations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).

• maxbatch (int) – If dynamic is True, maximum number of batches allowed. Default is sys.maxsize (no limit).

• use_stop (bool, optional) – Whether to evaluate our stopping function after each batch. Disabling this can improve performance if other stopping criteria such as maxcall are already specified. Default is True.

• n_effective (int) – Minimum number of effective posterior samples. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).

• n_effective_init (int) – Minimum number of effective posterior samples during the baseline run. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).

• add_live (bool) – Whether or not to add the remaining set of live points to the list of samples at the end of each run. Default is True.

• print_progress (bool) – Whether or not to output a simple summary of the current run that updates with each iteration. Default is True.

• print_func (function) – A function that prints out the current state of the sampler. If not provided, the default results.print_fn() is used.

• save_bounds (bool) –

Whether or not to save past bounding distributions used to bound

the live points internally. Default is True.

• bound ({‘none’, ‘single’, ‘multi’, ‘balls’, ‘cubes’}) – Method used to approximately bound the prior using the current set of live points. Conditions the sampling methods used to propose new live points. Choices are no bound (‘none’), a single bounding ellipsoid (‘single’), multiple bounding ellipsoids (‘multi’), balls centered on each live point (‘balls’), and cubes centered on each live point (‘cubes’). Default is ‘multi’.

• sample ({‘auto’, ‘unif’, ‘rwalk’, ‘rstagger’,) – ‘slice’, ‘rslice’} Method used to sample uniformly within the likelihood constraint, conditioned on the provided bounds. Unique methods available are: uniform sampling within the bounds(‘unif’), random walks with fixed proposals (‘rwalk’), random walks with variable (“staggering”) proposals (‘rstagger’), multivariate slice sampling along preferred orientations (‘slice’), and “random” slice sampling along all orientations (‘rslice’). ‘auto’ selects the sampling method based on the dimensionality of the problem (from ndim). When ndim < 10, this defaults to ‘unif’. When 10 <= ndim <= 20, this defaults to ‘rwalk’. When ndim > 20, this defaults to ‘slice’. ‘rstagger’ and ‘rslice’ are provided as alternatives for ‘rwalk’ and ‘slice’, respectively. Default is ‘auto’. Note that Dynesty’s ‘hslice’ option is not supported within IMAGINE.

• update_interval (int or float) – If an integer is passed, only update the proposal distribution every update_interval-th likelihood call. If a float is passed, update the proposal after every round(update_interval * nlive)-th likelihood call. Larger update intervals larger can be more efficient when the likelihood function is quick to evaluate. Default behavior is to target a roughly constant change in prior volume, with 1.5 for ‘unif’, 0.15 * walks for ‘rwalk’ and ‘rstagger’, 0.9 * ndim * slices for ‘slice’, 2.0 * slices for ‘rslice’, and 25.0 * slices for ‘hslice’.

• enlarge (float) – Enlarge the volumes of the specified bounding object(s) by this fraction. The preferred method is to determine this organically using bootstrapping. If bootstrap > 0, this defaults to 1.0. If bootstrap = 0, this instead defaults to 1.25.

• bootstrap (int) – Compute this many bootstrapped realizations of the bounding objects. Use the maximum distance found to the set of points left out during each iteration to enlarge the resulting volumes. Can lead to unstable bounding ellipsoids. Default is 0 (no bootstrap).

• vol_dec (float) – For the ‘multi’ bounding option, the required fractional reduction in volume after splitting an ellipsoid in order to to accept the split. Default is 0.5.

• vol_check (float) – For the ‘multi’ bounding option, the factor used when checking if the volume of the original bounding ellipsoid is large enough to warrant > 2 splits via ell.vol > vol_check * nlive * pointvol. Default is 2.0.

• walks (int) – For the ‘rwalk’ sampling option, the minimum number of steps (minimum 2) before proposing a new live point. Default is 25.

• facc (float) – The target acceptance fraction for the ‘rwalk’ sampling option. Default is 0.5. Bounded to be between [1. / walks, 1.].

• slices (int) – For the ‘slice’ and ‘rslice’ sampling options, the number of times to execute a “slice update” before proposing a new live point. Default is 5. Note that ‘slice’ cycles through all dimensions when executing a “slice update”.

Note

Instances of this class are callable. Look at the DynestyPipeline.call() for details.

call(**kwargs)[source]

Runs the IMAGINE pipeline using the Dynesty sampler

Returns: results – Dynesty sampling results dict

## imagine.pipelines.emcee_pipeline module¶

class imagine.pipelines.emcee_pipeline.EmceePipeline(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]

Analysis pipeline with the MCMC sampler emcee

See base class for initialization details.

The chains are considered converged once the total number of iterations becomes smaller than convergence_factor times the autocorrelation time.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

• resume (bool) – If False the Pipeline the sampling starts from the beginning, overwriting any previous work in the chains_directory. Otherwise, tries to resume a previous run.
• nwalkers (int) – Number of walkers
• max_nsteps (int) – Maximum number of iterations
• nsteps_check (int) – The sampler will check for convergence every nsteps_check
• convergence_factor (float) – Factor used to compute the convergence
• burnin_factor (int) – Number of autocorrelation times to be discarded from main results
• thin_factor (float) – Factor used to choose how the chain will be “thinned” after running
• custom_initial_positions (list) – List containig the the starting positions to be used for the walkers. If absent (default), initial positions are randomly sampled from the prior distribution.
call(**kwargs)[source]
Returns: results – A dictionary containing the sampler results (usually in its native format) dict
get_intermediate_results()[source]
SUPPORTS_MPI = True

## imagine.pipelines.multinest_pipeline module¶

class imagine.pipelines.multinest_pipeline.MultinestPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]

Bayesian analysis pipeline with pyMultinest

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

• resume (bool) – If False the Pipeline the sampling starts from the beginning, overwriting any previous work in the chains_directory. Otherwise, tries to resume a previous run.
• n_live_points (int) – Number of live points to be used.
• evidence_tolerance (float) – A value of 0.5 should give good enough accuracy.
• max_iter (int) – Maximum number of iterations. 0 (default) is unlimited (i.e. only stops after convergence).
• log_zero (float) – Points with loglike < logZero will be ignored by MultiNest
• importance_nested_sampling (bool) – If True (default), Multinest will use Importance Nested Sampling (see arXiv:1306.2144)
• sampling_efficiency (float) – Efficieny of the sampling. 0.8 (default) and 0.3 are recommended values for parameter estimation & evidence evaluation respectively.
• multimodal (bool) – If True, MultiNest will attempt to separate out the modes using a clustering algorithm.
• mode_tolerance (float) – MultiNest can find multiple modes and specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local log-evidence value greater than a particular value in which case mode_tolerance should be set to that value. If there isn’t any particularly interesting mode_tolerance value, then it should be set to a very negative number (e.g. -1e90, default).
• null_log_evidence (float) – If multimodal is True, MultiNest can find multiple modes and also specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local log-evidence value greater than a particular value in which case nullZ should be set to that value. If there isn’t any particulrly interesting nullZ value, then nullZ should be set to a very large negative number (e.g. -1.d90).
• n_clustering_params (int) – Mode separation is done through a clustering algorithm. Mode separation can be done on all the parameters (in which case nCdims should be set to ndims) & it can also be done on a subset of parameters (in which case nCdims < ndims) which might be advantageous as clustering is less accurate as the dimensionality increases. If nCdims < ndims then mode separation is done on the first nCdims parameters.
• max_modes (int) – Maximum number of modes (if multimodal is True).

Note

Instances of this class are callable. Look at the MultinestPipeline.call() for details.

call(**kwargs)[source]

Runs the IMAGINE pipeline using the MultiNest sampler

Returns: results – pyMultinest sampling results in a dictionary containing the keys: logZ (the log-evidence), logZerror (the error in log-evidence) and samples (equal weighted posterior) dict
get_intermediate_results()[source]
SUPPORTS_MPI = True

## imagine.pipelines.pipeline module¶

class imagine.pipelines.pipeline.Pipeline(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]

Base class used for for initialing Bayesian analysis pipeline

likelihood_rescaler

Rescale log-likelihood value

Type: double
random_type

If set to ‘fixed’, the exact same set of ensemble seeds will be used for the evaluation of all fields, generated using the master_seed. If set to ‘controllable’, each individual field will get their own set of ensemble fields, but multiple runs will lead to the same results, as they are based on the same master_seed. If set to ‘free’, every time the pipeline is run, the master_seed is reset to a different value, and the ensemble seeds for each individual field are drawn based on this.

Type: str
master_seed

Master seed used by the random number generators

Type: int
Parameters: simulator (imagine.simulators.simulator.Simulator) – Simulator object factory_list (list) – List or tuple of field factory objects likelihood (imagine.likelihoods.likelihood.Likelihood) – Likelihood object ensemble_size (int) – Number of observable realizations to be generated by the simulator run_directory (str) – Directory where the pipeline state and reports are saved. chains_directory (str) – Path of the directory where the chains should be saved. By default, this is saved to a ‘chains’ subdirectory of run_directory. prior_correlations (dict) – Dictionary used to set up prior distribution correlations. If two parameters are A and B are correlated a priori, an entry should be added to the prior_correlations dictionary in the form (name_A, name_B): True, to extract the correlation from the samples (in the case of CustomPriors) or (name_A, name_B): value otherwise. show_summary_reports (bool) – If True (default), shows/saves a corner plot and shows the evidence after the pipeline run has finished show_progress_reports (bool) – If True, shows/saves a simple progress of the sampler during the run. n_evals_report (int) – The number of likelihood evaluations before showing a progress report
__call__(*args, save_pipeline_state=True, **kwargs)[source]

Call self as a function.

call(**kwargs)[source]
clean_chains_directory()[source]

Removes the contents of the chains directory

corner_plot(**kwargs)[source]

Calls imagine.tools.visualization.corner_plot() to make a corner plot of samples produced by this Pipeline

evidence_report(sdigits=4)[source]
get_BIC()[source]

Computes the Bayesian Information Criterion (BIC), this is done using the simple expression

$BIC = k \ln n - 2 \ln L(\theta_{\rm MAP})$

where $$k$$ is the number of parameters, $$n$$ is the total number of data points, and $$\hat{L}$$ is the likelihood function at a reference point :math:theta_{rm MAP}.

Traditionally, this information criterion uses maximum likelihood as a reference point (i.e. $$\theta_{\rm MLE}$$). By default, however, this method uses the likelihod at the MAP as the reference. motivated by the heuristic that, if the choice of prior is a sensible one, the MAP a better representation of the model performance than the MLE.

get_MAP(initial_guess='auto', include_units=True, return_optimizer_result=False, **kwargs)[source]

Computes the parameter values at the Maximum A Posteriori

This method uses scipy.optimize.minimize for the calculation. By default, it will use the ‘Powell’ (which does not require the calculation of gradients or the assumption of continuity). Also by default the bounds keywords use the ranges specified in the priors.

The MAP estimate is stored internally to be used by the properties: MAP_model and MAP_simulation.

Parameters: include_units (bool) – If True (default), returns the result as list of Quantities . Otherwise, returns a single numpy array with the parameter values in their default units. return_optimizer_result (bool) – If False (default) only the MAP values are returned. Otherwise, a tuple containing the values and the output of scipy.optimize.minimize is returned instead. initial_guess (str or numpy.ndarray) – The initial guess used by the optimizer. If set to ‘centre’, the centre of each parameter range is used (obtained using parameter_central_value()). If set to ‘samples’, the median values of the samples produced by a previous posterior sampling are used (the Pipeline has to have been run before). If set to ‘auto’ (default), use ‘samples’ if available, and ‘centre’ otherwise. Alternatively, an array of active parameter values with the starting position may be provided. **kwargs – Any other keyword arguments are passed directly to scipy.optimize.minimize. MAP (list or array) – Parameter values at the position of the maximum of the posterior distribution result (scipy.optimize.OptimizeResult) – Only if return_optimizer_result is set to True, this is returned together with the MAP.
get_intermediate_results()[source]
get_par_names()[source]
likelihood_convergence_report(cmap='cmr.chroma', **kwargs)[source]

Prepares a standard set of plots of a likelihood convergence report (produced by the Pipeline.prepare_likelihood_convergence_report() method).

Parameters: cmap (str) – Colormap to be used for the lineplots **kwargs – Keyword arguments that will be supplied to prepare_likelihood_convergence_report (see its docstring for details).
classmethod load(directory_path='.')[source]

Loads the state of a Pipeline object

Parameters: directory_path (str) – Path to the directory where the Pipeline state should be saved
log_probability_unnormalized(theta)[source]

The log of the unnormalized posterior probability – i.e. the sum of the log-likelihood and the log of the prior probability.

Parameters: theta (np.ndarray) – Array of parameter values (in their default units) log_prob – log(likelihood(theta))+log(prior(theta)) float
parameter_central_value()[source]

Gets central point in the parameter space of a given pipeline

The ranges are extracted from each prior. The result is a pure list of numbers corresponding to the values in the native units of each prior.

For non-finite ranges (e.g. [-inf,inf]), a zero central value is assumed.

Returns: central_values – A list of parameter values at the centre of each parameter range list
posterior_report(sdigits=2, **kwargs)[source]

Displays the best fit values and 1-sigma errors for each active parameter. Also produces a corner plot of the samples, which is saved to the run directory.

If running on a jupyter-notebook, a nice LaTeX display is used, and the plot is shown.

Parameters: sdigits (int) – The number of significant digits to be used
prepare_likelihood_convergence_report(min_Nens=10, max_Nens=50, n_seeds=1, n_points=5, include_centre=True)[source]

Constructs a report dataset based on a given Pipeline setup, which can be used for studying the likelihood convergence in a particular problem

The pipeline’s ensemble size is temporarily set to Nens*n_seeds, and (for each point) the present pipeline setup is used to compute a Simulations dictionary object. Subsets of this simulations object are then produced and the likelihood computed.

The end result is a pandas.DataFrame containing the following columns:

• likelihood - The likelihood value.
• likelihood_std - The likelihood dispersion, estimated by bootstrapping the available ensemble and computing the standard deviation.
• ensemble_size - Size of the ensemble of simulations used.
• ipoint - Index of the point used.
• iseed - Index of the random (master) seed used.
• param_values - Values of the parameters at a given point.
Parameters: min_Nens (int) – Minimum ensemble size to be considered max_Nens (int) – Maximum ensemble size to be examined n_seeds (int) – Number of different (master) random seeds to be used n_points (int) – Number of points to be evaluated. Points are randomly drawn from the prior distribution (but see include_centre). include_centre (bool) – If True, the first point is taken as the value corresponding to the centre of each parameter range. results – A pandas.DataFrame object containing the report data. pandas.DataFrame
prior_pdf(cube)[source]

Probability distribution associated with the all parameters being used by the multiple Field Factories

Parameters: cube (np.ndarray) – Each row of the array corresponds to a different parameter value in the sampling (dimensionless, but in the standard units of the prior). rtn – Prior probability of the parameter choice specified by cube float
prior_transform(cube)[source]

Prior transform cube

Takes a cube containing a uniform sampling of values and maps then onto a distribution compatible with the priors specified in the Field Factories.

If prior correlations were specified, these are applied to the cube (assumes the correlations can be well described by the correlations between gaussians).

Parameters: cube (array) – Each row of the array corresponds to a different parameter in the sampling. The modified cube cube
progress_report()[source]

Reports the progress of the inference

save(**kwargs)[source]

Saves the state of the Pipeline

The run_directory set in initialization is used. Any distributed data is gathered and the pipeline is serialized and saved to disk.

test(n_points=3, include_centre=True, verbose=True)[source]

Tests the present IMAGINE pipeline evaluating the likelihood on a small number of points and reporting the run-time.

n_points : int
Number of points to evaluate the likelihood on. The first point corresponds to the centre of the active parameter ranges (unless include_centre is set to False) and the other are randomly sampled from the prior distributions.
include_centre : bool
If True, the initial point will be obtained from the centre of the active parameter ranges.
Returns: mean_time – The average execution time of a single likelihood evaluation astropy.units.Quantity
tidy_up()[source]

Resets internal state before a new run

BIC
MAP_model

Maximum a posteriori (MAP) model

List of Field objects corresponding to the mode of the posterior distribution. This does not require a previous run of the Pipeline, as the MAP is computed by maximizing the unnormalized posterior distribution.

This convenience property uses the results from the latest call of the Pipeline.get_MAP() method. If Pipeline.get_MAP() has never been called, the MAP is found calling it with with default arguments.

See Pipeline.get_MAP() for details.

MAP_simulation

Simulation corresponding to the MAP_model.

active_parameters

List of all the active parameters

chains_directory

Directory where the chains are stored (NB details of what is stored are sampler-dependent)

distribute_ensemble

If True, whenever the sampler requires a likelihood evaluation, the ensemble of stochastic fields realizations is distributed among all the nodes.

Otherwise, each likelihood evaluations will go through the whole ensemble size on a single node. See Parallelisation for details.

ensemble_size
factory_list

List of the Field Factories currently being used.

Updating the factory list automatically extracts active_parameters, parameter ranges and priors from each field factory.

likelihood

The Likelihood object used by the pipeline

log_evidence

Natural logarithm of the marginal likelihood or Bayesian model evidence, $$\ln\mathcal{Z}$$, where

$\mathcal{Z} = P(d|m) = \int_{\Omega_\theta} P(d | \theta, m) P(\theta | m) \mathrm{d}\theta .$

Note

Available only after the pipeline is run.

log_evidence_err

Error estimate in the natural logarithm of the Bayesian model evidence. Available once the pipeline is run.

Note

Available only after the pipeline is run.

median_model

Posterior median model

List of Field objects corresponding to the median values of the distributions of parameter values found after a Pipeline run.

median_simulation

Simulation corresponding to the median_model.

posterior_summary

A dictionary containing a summary of posterior statistics for each of the active parameters. These are: ‘median’, ‘errlo’ (15.87th percentile), ‘errup’ (84.13th percentile), ‘mean’ and ‘stdev’.

prior_correlations
priors

Dictionary containing priors for all active parameters

run_directory

Directory where the chains are stored (NB details of what is stored are sampler-dependent)

sampler_supports_mpi
samples

An astropy.table.QTable object containing parameter values of the samples produced in the run.

sampling_controllers

Settings used by the sampler (e.g. ‘dlogz’). See the documentation of each specific pipeline subclass for details.

After the pipeline runs, this property is updated to reflect the actual final choice of sampling controllers (including default values).

simulator

The Simulator object used by the pipeline

wrapped_parameters

List of parameters which are periodic or “wrapped around”

## imagine.pipelines.ultranest_pipeline module¶

class imagine.pipelines.ultranest_pipeline.UltranestPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]

Bayesian analysis pipeline with UltraNest

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

• resume (bool) – If False the Pipeline the sampling starts from the beginning, erasing any previous work in the chains_directory. Otherwise, tries to resume a previous run.
• dlogz (float) – Target evidence uncertainty. This is the std between bootstrapped logz integrators.
• dKL (float) – Target posterior uncertainty. This is the Kullback-Leibler divergence in nat between bootstrapped integrators.
• frac_remain (float) – Integrate until this fraction of the integral is left in the remainder. Set to a low number (1e-2 … 1e-5) to make sure peaks are discovered. Set to a higher number (0.5) if you know the posterior is simple.
• Lepsilon (float) – Terminate when live point likelihoods are all the same, within Lepsilon tolerance. Increase this when your likelihood function is inaccurate, to avoid unnecessary search.
• min_ess (int) – Target number of effective posterior samples.
• max_iters (int) – maximum number of integration iterations.
• max_ncalls (int) – stop after this many likelihood evaluations.
• max_num_improvement_loops (int) – run() tries to assess iteratively where more samples are needed. This number limits the number of improvement loops.
• min_num_live_points (int) – minimum number of live points throughout the run
• cluster_num_live_points (int) – require at least this many live points per detected cluster
• num_test_samples (int) – test transform and likelihood with this number of random points for errors first. Useful to catch bugs.
• draw_multiple (bool) – draw more points if efficiency goes down. If set to False, few points are sampled at once.
• num_bootstraps (int) – number of logZ estimators and MLFriends region bootstrap rounds.
• update_interval_iter_fraction (float) – Update region after (update_interval_iter_fraction*nlive) iterations.

Note

Instances of this class are callable. Look at the UltranestPipeline.call() for details.

call(**kwargs)[source]

Runs the IMAGINE pipeline using the UltraNest ReactiveNestedSampler.

Any keyword argument provided is used to update the sampling_controllers.

Returns: results – UltraNest sampling results in a dictionary containing the keys: logZ (the log-evidence), logZerror (the error in log-evidence) and samples (equal weighted posterior) dict

Notes

See base class for other attributes/properties and methods

SUPPORTS_MPI = True

## Module contents¶

class imagine.pipelines.DynestyPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]

Bayesian analysis pipeline with dynesty

This pipeline may use DynamicNestedSampler if the sampling parameter ‘dynamic’ is set to True (default) or NestedSampler if ‘dynamic is False (default).

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

• dynamic (bool) – If True (default), use dynesty.DynamicNestedSampler otherwise uses dynesty.NestedSampler.

• dlogz (float) – Iteration will stop, in the dynamic==False case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est) - ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e-3 * (nlive - 1) + 0.01. Otherwise, the default is 0.01.

• dlogz_init (float) – The baseline run will stop, in the dynamic==True case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est) - ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e-3 * (nlive - 1) + 0.01. Otherwise, the default is 0.01.

• nlive (int) – If dynamic is False, this sets the number of live points used. Default is 400.

• nlive_init (int) – If dynamic is True, this sets the number of live points used during the initial (“baseline”) nested sampling run. Default is 400.

• nlive_batch (int) – If dynamic is True, this sets the number of live points used when adding additional samples from a nested sampling run within each batch. Default is 400.

• logl_max (float) – Iteration will stop when the sampled ln(likelihood) exceeds the threshold set by logl_max. Default is no bound (np.inf).

• logl_max_init (float) – The baseline run will stop, in the dynamic==True case, when the sampled ln(likelihood) exceeds this threshold. Default is no bound (np.inf).

• maxiter (int) – Maximum number of iterations. Iteration may stop earlier if the termination condition is reached. Default is (no limit).

• maxiter_init (int) – If dynamic is True, this sets the maximum number of iterations for the initial baseline nested sampling run. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).

• maxiter_batch (int) – If dynamic is True, this sets the maximum number of iterations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).

• maxcall (int) – Maximum number of likelihood evaluations (without considering the initial points, i.e. maxcall_effective = maxcall + nlive). Iteration may stop earlier if termination condition is reached. Default is sys.maxsize (no limit).

• maxcall_init (int) – If dynamic is True, maximum number of likelihood evaluations in the baseline run.

• maxcall_batch (int) – If dynamic is True, maximum number of likelihood evaluations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).

• maxbatch (int) – If dynamic is True, maximum number of batches allowed. Default is sys.maxsize (no limit).

• use_stop (bool, optional) – Whether to evaluate our stopping function after each batch. Disabling this can improve performance if other stopping criteria such as maxcall are already specified. Default is True.

• n_effective (int) – Minimum number of effective posterior samples. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).

• n_effective_init (int) – Minimum number of effective posterior samples during the baseline run. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).

• add_live (bool) – Whether or not to add the remaining set of live points to the list of samples at the end of each run. Default is True.

• print_progress (bool) – Whether or not to output a simple summary of the current run that updates with each iteration. Default is True.

• print_func (function) – A function that prints out the current state of the sampler. If not provided, the default results.print_fn() is used.

• save_bounds (bool) –

Whether or not to save past bounding distributions used to bound

the live points internally. Default is True.

• bound ({‘none’, ‘single’, ‘multi’, ‘balls’, ‘cubes’}) – Method used to approximately bound the prior using the current set of live points. Conditions the sampling methods used to propose new live points. Choices are no bound (‘none’), a single bounding ellipsoid (‘single’), multiple bounding ellipsoids (‘multi’), balls centered on each live point (‘balls’), and cubes centered on each live point (‘cubes’). Default is ‘multi’.

• sample ({‘auto’, ‘unif’, ‘rwalk’, ‘rstagger’,) – ‘slice’, ‘rslice’} Method used to sample uniformly within the likelihood constraint, conditioned on the provided bounds. Unique methods available are: uniform sampling within the bounds(‘unif’), random walks with fixed proposals (‘rwalk’), random walks with variable (“staggering”) proposals (‘rstagger’), multivariate slice sampling along preferred orientations (‘slice’), and “random” slice sampling along all orientations (‘rslice’). ‘auto’ selects the sampling method based on the dimensionality of the problem (from ndim). When ndim < 10, this defaults to ‘unif’. When 10 <= ndim <= 20, this defaults to ‘rwalk’. When ndim > 20, this defaults to ‘slice’. ‘rstagger’ and ‘rslice’ are provided as alternatives for ‘rwalk’ and ‘slice’, respectively. Default is ‘auto’. Note that Dynesty’s ‘hslice’ option is not supported within IMAGINE.

• update_interval (int or float) – If an integer is passed, only update the proposal distribution every update_interval-th likelihood call. If a float is passed, update the proposal after every round(update_interval * nlive)-th likelihood call. Larger update intervals larger can be more efficient when the likelihood function is quick to evaluate. Default behavior is to target a roughly constant change in prior volume, with 1.5 for ‘unif’, 0.15 * walks for ‘rwalk’ and ‘rstagger’, 0.9 * ndim * slices for ‘slice’, 2.0 * slices for ‘rslice’, and 25.0 * slices for ‘hslice’.

• enlarge (float) – Enlarge the volumes of the specified bounding object(s) by this fraction. The preferred method is to determine this organically using bootstrapping. If bootstrap > 0, this defaults to 1.0. If bootstrap = 0, this instead defaults to 1.25.

• bootstrap (int) – Compute this many bootstrapped realizations of the bounding objects. Use the maximum distance found to the set of points left out during each iteration to enlarge the resulting volumes. Can lead to unstable bounding ellipsoids. Default is 0 (no bootstrap).

• vol_dec (float) – For the ‘multi’ bounding option, the required fractional reduction in volume after splitting an ellipsoid in order to to accept the split. Default is 0.5.

• vol_check (float) – For the ‘multi’ bounding option, the factor used when checking if the volume of the original bounding ellipsoid is large enough to warrant > 2 splits via ell.vol > vol_check * nlive * pointvol. Default is 2.0.

• walks (int) – For the ‘rwalk’ sampling option, the minimum number of steps (minimum 2) before proposing a new live point. Default is 25.

• facc (float) – The target acceptance fraction for the ‘rwalk’ sampling option. Default is 0.5. Bounded to be between [1. / walks, 1.].

• slices (int) – For the ‘slice’ and ‘rslice’ sampling options, the number of times to execute a “slice update” before proposing a new live point. Default is 5. Note that ‘slice’ cycles through all dimensions when executing a “slice update”.

Note

Instances of this class are callable. Look at the DynestyPipeline.call() for details.

call(**kwargs)[source]

Runs the IMAGINE pipeline using the Dynesty sampler

Returns: results – Dynesty sampling results dict
class imagine.pipelines.MultinestPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]

Bayesian analysis pipeline with pyMultinest

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

• resume (bool) – If False the Pipeline the sampling starts from the beginning, overwriting any previous work in the chains_directory. Otherwise, tries to resume a previous run.
• n_live_points (int) – Number of live points to be used.
• evidence_tolerance (float) – A value of 0.5 should give good enough accuracy.
• max_iter (int) – Maximum number of iterations. 0 (default) is unlimited (i.e. only stops after convergence).
• log_zero (float) – Points with loglike < logZero will be ignored by MultiNest
• importance_nested_sampling (bool) – If True (default), Multinest will use Importance Nested Sampling (see arXiv:1306.2144)
• sampling_efficiency (float) – Efficieny of the sampling. 0.8 (default) and 0.3 are recommended values for parameter estimation & evidence evaluation respectively.
• multimodal (bool) – If True, MultiNest will attempt to separate out the modes using a clustering algorithm.
• mode_tolerance (float) – MultiNest can find multiple modes and specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local log-evidence value greater than a particular value in which case mode_tolerance should be set to that value. If there isn’t any particularly interesting mode_tolerance value, then it should be set to a very negative number (e.g. -1e90, default).
• null_log_evidence (float) – If multimodal is True, MultiNest can find multiple modes and also specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local log-evidence value greater than a particular value in which case nullZ should be set to that value. If there isn’t any particulrly interesting nullZ value, then nullZ should be set to a very large negative number (e.g. -1.d90).
• n_clustering_params (int) – Mode separation is done through a clustering algorithm. Mode separation can be done on all the parameters (in which case nCdims should be set to ndims) & it can also be done on a subset of parameters (in which case nCdims < ndims) which might be advantageous as clustering is less accurate as the dimensionality increases. If nCdims < ndims then mode separation is done on the first nCdims parameters.
• max_modes (int) – Maximum number of modes (if multimodal is True).

Note

Instances of this class are callable. Look at the MultinestPipeline.call() for details.

call(**kwargs)[source]

Runs the IMAGINE pipeline using the MultiNest sampler

Returns: results – pyMultinest sampling results in a dictionary containing the keys: logZ (the log-evidence), logZerror (the error in log-evidence) and samples (equal weighted posterior) dict
get_intermediate_results()[source]
SUPPORTS_MPI = True
class imagine.pipelines.Pipeline(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]

Base class used for for initialing Bayesian analysis pipeline

likelihood_rescaler

Rescale log-likelihood value

Type: double
random_type

If set to ‘fixed’, the exact same set of ensemble seeds will be used for the evaluation of all fields, generated using the master_seed. If set to ‘controllable’, each individual field will get their own set of ensemble fields, but multiple runs will lead to the same results, as they are based on the same master_seed. If set to ‘free’, every time the pipeline is run, the master_seed is reset to a different value, and the ensemble seeds for each individual field are drawn based on this.

Type: str
master_seed

Master seed used by the random number generators

Type: int
Parameters: simulator (imagine.simulators.simulator.Simulator) – Simulator object factory_list (list) – List or tuple of field factory objects likelihood (imagine.likelihoods.likelihood.Likelihood) – Likelihood object ensemble_size (int) – Number of observable realizations to be generated by the simulator run_directory (str) – Directory where the pipeline state and reports are saved. chains_directory (str) – Path of the directory where the chains should be saved. By default, this is saved to a ‘chains’ subdirectory of run_directory. prior_correlations (dict) – Dictionary used to set up prior distribution correlations. If two parameters are A and B are correlated a priori, an entry should be added to the prior_correlations dictionary in the form (name_A, name_B): True, to extract the correlation from the samples (in the case of CustomPriors) or (name_A, name_B): value otherwise. show_summary_reports (bool) – If True (default), shows/saves a corner plot and shows the evidence after the pipeline run has finished show_progress_reports (bool) – If True, shows/saves a simple progress of the sampler during the run. n_evals_report (int) – The number of likelihood evaluations before showing a progress report
__call__(*args, save_pipeline_state=True, **kwargs)[source]

Call self as a function.

call(**kwargs)[source]
clean_chains_directory()[source]

Removes the contents of the chains directory

corner_plot(**kwargs)[source]

Calls imagine.tools.visualization.corner_plot() to make a corner plot of samples produced by this Pipeline

evidence_report(sdigits=4)[source]
get_BIC()[source]

Computes the Bayesian Information Criterion (BIC), this is done using the simple expression

$BIC = k \ln n - 2 \ln L(\theta_{\rm MAP})$

where $$k$$ is the number of parameters, $$n$$ is the total number of data points, and $$\hat{L}$$ is the likelihood function at a reference point :math:theta_{rm MAP}.

Traditionally, this information criterion uses maximum likelihood as a reference point (i.e. $$\theta_{\rm MLE}$$). By default, however, this method uses the likelihod at the MAP as the reference. motivated by the heuristic that, if the choice of prior is a sensible one, the MAP a better representation of the model performance than the MLE.

get_MAP(initial_guess='auto', include_units=True, return_optimizer_result=False, **kwargs)[source]

Computes the parameter values at the Maximum A Posteriori

This method uses scipy.optimize.minimize for the calculation. By default, it will use the ‘Powell’ (which does not require the calculation of gradients or the assumption of continuity). Also by default the bounds keywords use the ranges specified in the priors.

The MAP estimate is stored internally to be used by the properties: MAP_model and MAP_simulation.

Parameters: include_units (bool) – If True (default), returns the result as list of Quantities . Otherwise, returns a single numpy array with the parameter values in their default units. return_optimizer_result (bool) – If False (default) only the MAP values are returned. Otherwise, a tuple containing the values and the output of scipy.optimize.minimize is returned instead. initial_guess (str or numpy.ndarray) – The initial guess used by the optimizer. If set to ‘centre’, the centre of each parameter range is used (obtained using parameter_central_value()). If set to ‘samples’, the median values of the samples produced by a previous posterior sampling are used (the Pipeline has to have been run before). If set to ‘auto’ (default), use ‘samples’ if available, and ‘centre’ otherwise. Alternatively, an array of active parameter values with the starting position may be provided. **kwargs – Any other keyword arguments are passed directly to scipy.optimize.minimize. MAP (list or array) – Parameter values at the position of the maximum of the posterior distribution result (scipy.optimize.OptimizeResult) – Only if return_optimizer_result is set to True, this is returned together with the MAP.
get_intermediate_results()[source]
get_par_names()[source]
likelihood_convergence_report(cmap='cmr.chroma', **kwargs)[source]

Prepares a standard set of plots of a likelihood convergence report (produced by the Pipeline.prepare_likelihood_convergence_report() method).

Parameters: cmap (str) – Colormap to be used for the lineplots **kwargs – Keyword arguments that will be supplied to prepare_likelihood_convergence_report (see its docstring for details).
classmethod load(directory_path='.')[source]

Loads the state of a Pipeline object

Parameters: directory_path (str) – Path to the directory where the Pipeline state should be saved
log_probability_unnormalized(theta)[source]

The log of the unnormalized posterior probability – i.e. the sum of the log-likelihood and the log of the prior probability.

Parameters: theta (np.ndarray) – Array of parameter values (in their default units) log_prob – log(likelihood(theta))+log(prior(theta)) float
parameter_central_value()[source]

Gets central point in the parameter space of a given pipeline

The ranges are extracted from each prior. The result is a pure list of numbers corresponding to the values in the native units of each prior.

For non-finite ranges (e.g. [-inf,inf]), a zero central value is assumed.

Returns: central_values – A list of parameter values at the centre of each parameter range list
posterior_report(sdigits=2, **kwargs)[source]

Displays the best fit values and 1-sigma errors for each active parameter. Also produces a corner plot of the samples, which is saved to the run directory.

If running on a jupyter-notebook, a nice LaTeX display is used, and the plot is shown.

Parameters: sdigits (int) – The number of significant digits to be used
prepare_likelihood_convergence_report(min_Nens=10, max_Nens=50, n_seeds=1, n_points=5, include_centre=True)[source]

Constructs a report dataset based on a given Pipeline setup, which can be used for studying the likelihood convergence in a particular problem

The pipeline’s ensemble size is temporarily set to Nens*n_seeds, and (for each point) the present pipeline setup is used to compute a Simulations dictionary object. Subsets of this simulations object are then produced and the likelihood computed.

The end result is a pandas.DataFrame containing the following columns:

• likelihood - The likelihood value.
• likelihood_std - The likelihood dispersion, estimated by bootstrapping the available ensemble and computing the standard deviation.
• ensemble_size - Size of the ensemble of simulations used.
• ipoint - Index of the point used.
• iseed - Index of the random (master) seed used.
• param_values - Values of the parameters at a given point.
Parameters: min_Nens (int) – Minimum ensemble size to be considered max_Nens (int) – Maximum ensemble size to be examined n_seeds (int) – Number of different (master) random seeds to be used n_points (int) – Number of points to be evaluated. Points are randomly drawn from the prior distribution (but see include_centre). include_centre (bool) – If True, the first point is taken as the value corresponding to the centre of each parameter range. results – A pandas.DataFrame object containing the report data. pandas.DataFrame
prior_pdf(cube)[source]

Probability distribution associated with the all parameters being used by the multiple Field Factories

Parameters: cube (np.ndarray) – Each row of the array corresponds to a different parameter value in the sampling (dimensionless, but in the standard units of the prior). rtn – Prior probability of the parameter choice specified by cube float
prior_transform(cube)[source]

Prior transform cube

Takes a cube containing a uniform sampling of values and maps then onto a distribution compatible with the priors specified in the Field Factories.

If prior correlations were specified, these are applied to the cube (assumes the correlations can be well described by the correlations between gaussians).

Parameters: cube (array) – Each row of the array corresponds to a different parameter in the sampling. The modified cube cube
progress_report()[source]

Reports the progress of the inference

save(**kwargs)[source]

Saves the state of the Pipeline

The run_directory set in initialization is used. Any distributed data is gathered and the pipeline is serialized and saved to disk.

test(n_points=3, include_centre=True, verbose=True)[source]

Tests the present IMAGINE pipeline evaluating the likelihood on a small number of points and reporting the run-time.

n_points : int
Number of points to evaluate the likelihood on. The first point corresponds to the centre of the active parameter ranges (unless include_centre is set to False) and the other are randomly sampled from the prior distributions.
include_centre : bool
If True, the initial point will be obtained from the centre of the active parameter ranges.
Returns: mean_time – The average execution time of a single likelihood evaluation astropy.units.Quantity
tidy_up()[source]

Resets internal state before a new run

BIC
MAP_model

Maximum a posteriori (MAP) model

List of Field objects corresponding to the mode of the posterior distribution. This does not require a previous run of the Pipeline, as the MAP is computed by maximizing the unnormalized posterior distribution.

This convenience property uses the results from the latest call of the Pipeline.get_MAP() method. If Pipeline.get_MAP() has never been called, the MAP is found calling it with with default arguments.

See Pipeline.get_MAP() for details.

MAP_simulation

Simulation corresponding to the MAP_model.

active_parameters

List of all the active parameters

chains_directory

Directory where the chains are stored (NB details of what is stored are sampler-dependent)

distribute_ensemble

If True, whenever the sampler requires a likelihood evaluation, the ensemble of stochastic fields realizations is distributed among all the nodes.

Otherwise, each likelihood evaluations will go through the whole ensemble size on a single node. See Parallelisation for details.

ensemble_size
factory_list

List of the Field Factories currently being used.

Updating the factory list automatically extracts active_parameters, parameter ranges and priors from each field factory.

likelihood

The Likelihood object used by the pipeline

log_evidence

Natural logarithm of the marginal likelihood or Bayesian model evidence, $$\ln\mathcal{Z}$$, where

$\mathcal{Z} = P(d|m) = \int_{\Omega_\theta} P(d | \theta, m) P(\theta | m) \mathrm{d}\theta .$

Note

Available only after the pipeline is run.

log_evidence_err

Error estimate in the natural logarithm of the Bayesian model evidence. Available once the pipeline is run.

Note

Available only after the pipeline is run.

median_model

Posterior median model

List of Field objects corresponding to the median values of the distributions of parameter values found after a Pipeline run.

median_simulation

Simulation corresponding to the median_model.

posterior_summary

A dictionary containing a summary of posterior statistics for each of the active parameters. These are: ‘median’, ‘errlo’ (15.87th percentile), ‘errup’ (84.13th percentile), ‘mean’ and ‘stdev’.

prior_correlations
priors

Dictionary containing priors for all active parameters

run_directory

Directory where the chains are stored (NB details of what is stored are sampler-dependent)

sampler_supports_mpi
samples

An astropy.table.QTable object containing parameter values of the samples produced in the run.

sampling_controllers

Settings used by the sampler (e.g. ‘dlogz’). See the documentation of each specific pipeline subclass for details.

After the pipeline runs, this property is updated to reflect the actual final choice of sampling controllers (including default values).

simulator

The Simulator object used by the pipeline

wrapped_parameters

List of parameters which are periodic or “wrapped around”

class imagine.pipelines.UltranestPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]

Bayesian analysis pipeline with UltraNest

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

• resume (bool) – If False the Pipeline the sampling starts from the beginning, erasing any previous work in the chains_directory. Otherwise, tries to resume a previous run.
• dlogz (float) – Target evidence uncertainty. This is the std between bootstrapped logz integrators.
• dKL (float) – Target posterior uncertainty. This is the Kullback-Leibler divergence in nat between bootstrapped integrators.
• frac_remain (float) – Integrate until this fraction of the integral is left in the remainder. Set to a low number (1e-2 … 1e-5) to make sure peaks are discovered. Set to a higher number (0.5) if you know the posterior is simple.
• Lepsilon (float) – Terminate when live point likelihoods are all the same, within Lepsilon tolerance. Increase this when your likelihood function is inaccurate, to avoid unnecessary search.
• min_ess (int) – Target number of effective posterior samples.
• max_iters (int) – maximum number of integration iterations.
• max_ncalls (int) – stop after this many likelihood evaluations.
• max_num_improvement_loops (int) – run() tries to assess iteratively where more samples are needed. This number limits the number of improvement loops.
• min_num_live_points (int) – minimum number of live points throughout the run
• cluster_num_live_points (int) – require at least this many live points per detected cluster
• num_test_samples (int) – test transform and likelihood with this number of random points for errors first. Useful to catch bugs.
• draw_multiple (bool) – draw more points if efficiency goes down. If set to False, few points are sampled at once.
• num_bootstraps (int) – number of logZ estimators and MLFriends region bootstrap rounds.
• update_interval_iter_fraction (float) – Update region after (update_interval_iter_fraction*nlive) iterations.

Note

Instances of this class are callable. Look at the UltranestPipeline.call() for details.

call(**kwargs)[source]

Runs the IMAGINE pipeline using the UltraNest ReactiveNestedSampler.

Any keyword argument provided is used to update the sampling_controllers.

Returns: results – UltraNest sampling results in a dictionary containing the keys: logZ (the log-evidence), logZerror (the error in log-evidence) and samples (equal weighted posterior) dict

Notes

See base class for other attributes/properties and methods

SUPPORTS_MPI = True