imagine.pipelines package¶
Submodules¶
imagine.pipelines.dynesty_pipeline module¶

class
imagine.pipelines.dynesty_pipeline.
DynestyPipeline
(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]¶ Bases:
imagine.pipelines.pipeline.Pipeline
Bayesian analysis pipeline with dynesty
This pipeline may use
DynamicNestedSampler
if the sampling parameter ‘dynamic’ is set to True (default) orNestedSampler
if ‘dynamic` is False (default).See base class for initialization details.
The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.
Other Parameters: dynamic (bool) – If True (default), use
dynesty.DynamicNestedSampler
otherwise usesdynesty.NestedSampler
.dlogz (float) – Iteration will stop, in the dynamic==False case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est)  ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e3 * (nlive  1) + 0.01. Otherwise, the default is 0.01.
dlogz_init (float) – The baseline run will stop, in the dynamic==True case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est)  ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e3 * (nlive  1) + 0.01. Otherwise, the default is 0.01.
nlive (int) – If dynamic is False, this sets the number of live points used. Default is 400.
nlive_init (int) – If dynamic is True, this sets the number of live points used during the initial (“baseline”) nested sampling run. Default is 400.
nlive_batch (int) – If dynamic is True, this sets the number of live points used when adding additional samples from a nested sampling run within each batch. Default is 400.
logl_max (float) – Iteration will stop when the sampled ln(likelihood) exceeds the threshold set by logl_max. Default is no bound (np.inf).
logl_max_init (float) – The baseline run will stop, in the dynamic==True case, when the sampled ln(likelihood) exceeds this threshold. Default is no bound (np.inf).
maxiter (int) – Maximum number of iterations. Iteration may stop earlier if the termination condition is reached. Default is (no limit).
maxiter_init (int) – If dynamic is True, this sets the maximum number of iterations for the initial baseline nested sampling run. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxiter_batch (int) – If dynamic is True, this sets the maximum number of iterations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxcall (int) – Maximum number of likelihood evaluations (without considering the initial points, i.e. maxcall_effective = maxcall + nlive). Iteration may stop earlier if termination condition is reached. Default is sys.maxsize (no limit).
maxcall_init (int) – If dynamic is True, maximum number of likelihood evaluations in the baseline run.
maxcall_batch (int) – If dynamic is True, maximum number of likelihood evaluations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxbatch (int) – If dynamic is True, maximum number of batches allowed. Default is sys.maxsize (no limit).
use_stop (bool, optional) – Whether to evaluate our stopping function after each batch. Disabling this can improve performance if other stopping criteria such as
maxcall
are already specified. Default is True.n_effective (int) – Minimum number of effective posterior samples. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).
n_effective_init (int) – Minimum number of effective posterior samples during the baseline run. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).
add_live (bool) – Whether or not to add the remaining set of live points to the list of samples at the end of each run. Default is True.
print_progress (bool) – Whether or not to output a simple summary of the current run that updates with each iteration. Default is True.
print_func (function) – A function that prints out the current state of the sampler. If not provided, the default
results.print_fn()
is used.save_bounds (bool) –
 Whether or not to save past bounding distributions used to bound
the live points internally. Default is True.
bound ({‘none’, ‘single’, ‘multi’, ‘balls’, ‘cubes’}) – Method used to approximately bound the prior using the current set of live points. Conditions the sampling methods used to propose new live points. Choices are no bound (‘none’), a single bounding ellipsoid (‘single’), multiple bounding ellipsoids (‘multi’), balls centered on each live point (‘balls’), and cubes centered on each live point (‘cubes’). Default is ‘multi’.
sample ({‘auto’, ‘unif’, ‘rwalk’, ‘rstagger’,) – ‘slice’, ‘rslice’} Method used to sample uniformly within the likelihood constraint, conditioned on the provided bounds. Unique methods available are: uniform sampling within the bounds(‘unif’), random walks with fixed proposals (‘rwalk’), random walks with variable (“staggering”) proposals (‘rstagger’), multivariate slice sampling along preferred orientations (‘slice’), and “random” slice sampling along all orientations (‘rslice’). ‘auto’ selects the sampling method based on the dimensionality of the problem (from ndim). When ndim < 10, this defaults to ‘unif’. When 10 <= ndim <= 20, this defaults to ‘rwalk’. When ndim > 20, this defaults to ‘slice’. ‘rstagger’ and ‘rslice’ are provided as alternatives for ‘rwalk’ and ‘slice’, respectively. Default is ‘auto’. Note that Dynesty’s ‘hslice’ option is not supported within IMAGINE.
update_interval (int or float) – If an integer is passed, only update the proposal distribution every update_intervalth likelihood call. If a float is passed, update the proposal after every round(update_interval * nlive)th likelihood call. Larger update intervals larger can be more efficient when the likelihood function is quick to evaluate. Default behavior is to target a roughly constant change in prior volume, with 1.5 for ‘unif’, 0.15 * walks for ‘rwalk’ and ‘rstagger’, 0.9 * ndim * slices for ‘slice’, 2.0 * slices for ‘rslice’, and 25.0 * slices for ‘hslice’.
enlarge (float) – Enlarge the volumes of the specified bounding object(s) by this fraction. The preferred method is to determine this organically using bootstrapping. If bootstrap > 0, this defaults to 1.0. If bootstrap = 0, this instead defaults to 1.25.
bootstrap (int) – Compute this many bootstrapped realizations of the bounding objects. Use the maximum distance found to the set of points left out during each iteration to enlarge the resulting volumes. Can lead to unstable bounding ellipsoids. Default is 0 (no bootstrap).
vol_dec (float) – For the ‘multi’ bounding option, the required fractional reduction in volume after splitting an ellipsoid in order to to accept the split. Default is 0.5.
vol_check (float) – For the ‘multi’ bounding option, the factor used when checking if the volume of the original bounding ellipsoid is large enough to warrant > 2 splits via ell.vol > vol_check * nlive * pointvol. Default is 2.0.
walks (int) – For the ‘rwalk’ sampling option, the minimum number of steps (minimum 2) before proposing a new live point. Default is 25.
facc (float) – The target acceptance fraction for the ‘rwalk’ sampling option. Default is 0.5. Bounded to be between [1. / walks, 1.].
slices (int) – For the ‘slice’ and ‘rslice’ sampling options, the number of times to execute a “slice update” before proposing a new live point. Default is 5. Note that ‘slice’ cycles through all dimensions when executing a “slice update”.
Note
Instances of this class are callable. Look at the
DynestyPipeline.call()
for details.
imagine.pipelines.emcee_pipeline module¶

class
imagine.pipelines.emcee_pipeline.
EmceePipeline
(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]¶ Bases:
imagine.pipelines.pipeline.Pipeline
Analysis pipeline with the MCMC sampler emcee
See base class for initialization details.
The chains are considered converged once the total number of iterations becomes smaller than convergence_factor times the autocorrelation time.
The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.
Other Parameters:  resume (bool) – If False the Pipeline the sampling starts from the beginning, overwriting any previous work in the chains_directory. Otherwise, tries to resume a previous run.
 nwalkers (int) – Number of walkers
 max_nsteps (int) – Maximum number of iterations
 nsteps_check (int) – The sampler will check for convergence every nsteps_check
 convergence_factor (float) – Factor used to compute the convergence
 burnin_factor (int) – Number of autocorrelation times to be discarded from main results
 thin_factor (float) – Factor used to choose how the chain will be “thinned” after running
 custom_initial_positions (list) – List containig the the starting positions to be used for the walkers. If absent (default), initial positions are randomly sampled from the prior distribution.

call
(**kwargs)[source]¶ Returns: results – A dictionary containing the sampler results (usually in its native format) Return type: dict

SUPPORTS_MPI
= True¶
imagine.pipelines.multinest_pipeline module¶

class
imagine.pipelines.multinest_pipeline.
MultinestPipeline
(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]¶ Bases:
imagine.pipelines.pipeline.Pipeline
Bayesian analysis pipeline with pyMultinest
See base class for initialization details.
The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.
Other Parameters:  resume (bool) – If False the Pipeline the sampling starts from the beginning, overwriting any previous work in the chains_directory. Otherwise, tries to resume a previous run.
 n_live_points (int) – Number of live points to be used.
 evidence_tolerance (float) – A value of 0.5 should give good enough accuracy.
 max_iter (int) – Maximum number of iterations. 0 (default) is unlimited (i.e. only stops after convergence).
 log_zero (float) – Points with loglike < logZero will be ignored by MultiNest
 importance_nested_sampling (bool) – If True (default), Multinest will use Importance Nested Sampling (see arXiv:1306.2144)
 sampling_efficiency (float) – Efficieny of the sampling. 0.8 (default) and 0.3 are recommended values for parameter estimation & evidence evaluation respectively.
 multimodal (bool) – If True, MultiNest will attempt to separate out the modes using a clustering algorithm.
 mode_tolerance (float) – MultiNest can find multiple modes and specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local logevidence value greater than a particular value in which case mode_tolerance should be set to that value. If there isn’t any particularly interesting mode_tolerance value, then it should be set to a very negative number (e.g. 1e90, default).
 null_log_evidence (float) – If multimodal is True, MultiNest can find multiple modes and also specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local logevidence value greater than a particular value in which case nullZ should be set to that value. If there isn’t any particulrly interesting nullZ value, then nullZ should be set to a very large negative number (e.g. 1.d90).
 n_clustering_params (int) – Mode separation is done through a clustering algorithm. Mode separation can be done on all the parameters (in which case nCdims should be set to ndims) & it can also be done on a subset of parameters (in which case nCdims < ndims) which might be advantageous as clustering is less accurate as the dimensionality increases. If nCdims < ndims then mode separation is done on the first nCdims parameters.
 max_modes (int) – Maximum number of modes (if multimodal is True).
Note
Instances of this class are callable. Look at the
MultinestPipeline.call()
for details.
call
(**kwargs)[source]¶ Runs the IMAGINE pipeline using the MultiNest sampler
Returns: results – pyMultinest sampling results in a dictionary containing the keys: logZ (the logevidence), logZerror (the error in logevidence) and samples (equal weighted posterior) Return type: dict

SUPPORTS_MPI
= True¶
imagine.pipelines.pipeline module¶

class
imagine.pipelines.pipeline.
Pipeline
(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]¶ Bases:
imagine.tools.class_tools.BaseClass
Base class used for for initialing Bayesian analysis pipeline

likelihood_rescaler
¶ Rescale loglikelihood value
Type: double

random_type
¶ If set to ‘fixed’, the exact same set of ensemble seeds will be used for the evaluation of all fields, generated using the master_seed. If set to ‘controllable’, each individual field will get their own set of ensemble fields, but multiple runs will lead to the same results, as they are based on the same master_seed. If set to ‘free’, every time the pipeline is run, the master_seed is reset to a different value, and the ensemble seeds for each individual field are drawn based on this.
Type: str
Parameters:  simulator (imagine.simulators.simulator.Simulator) – Simulator object
 factory_list (list) – List or tuple of field factory objects
 likelihood (imagine.likelihoods.likelihood.Likelihood) – Likelihood object
 ensemble_size (int) – Number of observable realizations to be generated by the simulator
 run_directory (str) – Directory where the pipeline state and reports are saved.
 chains_directory (str) – Path of the directory where the chains should be saved. By default, this is saved to a ‘chains’ subdirectory of run_directory.
 prior_correlations (dict) – Dictionary used to set up prior distribution correlations. If two parameters are A and B are correlated a priori, an entry should be added to the prior_correlations dictionary in the form (name_A, name_B): True, to extract the correlation from the samples (in the case of CustomPriors) or (name_A, name_B): value otherwise.
 show_summary_reports (bool) – If True (default), shows/saves a corner plot and shows the evidence after the pipeline run has finished
 show_progress_reports (bool) – If True, shows/saves a simple progress of the sampler during the run.
 n_evals_report (int) – The number of likelihood evaluations before showing a progress report

corner_plot
(**kwargs)[source]¶ Calls
imagine.tools.visualization.corner_plot()
to make a corner plot of samples produced by this Pipeline

get_BIC
()[source]¶ Computes the Bayesian Information Criterion (BIC), this is done using the simple expression
\[BIC = k \ln n  2 \ln L(\theta_{\rm MAP})\]where \(k\) is the number of parameters, \(n\) is the total number of data points, and \(\hat{L}\) is the likelihood function at a reference point :math:`theta_{rm MAP}.
Traditionally, this information criterion uses maximum likelihood as a reference point (i.e. \(\theta_{\rm MLE}\)). By default, however, this method uses the likelihod at the MAP as the reference. motivated by the heuristic that, if the choice of prior is a sensible one, the MAP a better representation of the model performance than the MLE.

get_MAP
(initial_guess='auto', include_units=True, return_optimizer_result=False, **kwargs)[source]¶ Computes the parameter values at the Maximum A Posteriori
This method uses scipy.optimize.minimize for the calculation. By default, it will use the ‘Powell’ (which does not require the calculation of gradients or the assumption of continuity). Also by default the bounds keywords use the ranges specified in the priors.
The MAP estimate is stored internally to be used by the properties:
MAP_model
andMAP_simulation
.Parameters:  include_units (bool) – If True (default), returns the result as list of Quantities <astropy.units.Quantity>. Otherwise, returns a single numpy array with the parameter values in their default units.
 return_optimizer_result (bool) – If False (default) only the MAP values are returned. Otherwise, a tuple containing the values and the output of scipy.optimize.minimize is returned instead.
 initial_guess (str or numpy.ndarray) – The initial guess used by the optimizer. If set to ‘centre’, the
centre of each parameter range is used (obtained using
parameter_central_value()
). If set to ‘samples’, the median values of the samples produced by a previous posterior sampling are used (the Pipeline has to have been run before). If set to ‘auto’ (default), use ‘samples’ if available, and ‘centre’ otherwise. Alternatively, an array of active parameter values with the starting position may be provided.  **kwargs – Any other keyword arguments are passed directly to scipy.optimize.minimize.
Returns:  MAP (list or array) – Parameter values at the position of the maximum of the posterior distribution
 result (scipy.optimize.OptimizeResult) – Only if return_optimizer_result is set to True, this is returned together with the MAP.

likelihood_convergence_report
(cmap='cmr.chroma', **kwargs)[source]¶ Prepares a standard set of plots of a likelihood convergence report (produced by the
Pipeline.prepare_likelihood_convergence_report()
method).Parameters:  cmap (str) – Colormap to be used for the lineplots
 **kwargs – Keyword arguments that will be supplied to prepare_likelihood_convergence_report (see its docstring for details).

classmethod
load
(directory_path='.')[source]¶ Loads the state of a Pipeline object
Parameters: directory_path (str) – Path to the directory where the Pipeline state should be saved Note
This method uses
imagine.tools.io.load_pipeline()

log_probability_unnormalized
(theta)[source]¶ The log of the unnormalized posterior probability – i.e. the sum of the loglikelihood and the log of the prior probability.
Parameters: theta (np.ndarray) – Array of parameter values (in their default units) Returns: log_prob – log(likelihood(theta))+log(prior(theta)) Return type: float

parameter_central_value
()[source]¶ Gets central point in the parameter space of a given pipeline
The ranges are extracted from each prior. The result is a pure list of numbers corresponding to the values in the native units of each prior.
For nonfinite ranges (e.g. [inf,inf]), a zero central value is assumed.
Returns: central_values – A list of parameter values at the centre of each parameter range Return type: list

posterior_report
(sdigits=2, **kwargs)[source]¶ Displays the best fit values and 1sigma errors for each active parameter. Also produces a corner plot of the samples, which is saved to the run directory.
If running on a jupyternotebook, a nice LaTeX display is used, and the plot is shown.
Parameters: sdigits (int) – The number of significant digits to be used

prepare_likelihood_convergence_report
(min_Nens=10, max_Nens=50, n_seeds=1, n_points=5, include_centre=True)[source]¶ Constructs a report dataset based on a given Pipeline setup, which can be used for studying the likelihood convergence in a particular problem
The pipeline’s ensemble size is temporarily set to Nens*n_seeds, and (for each point) the present pipeline setup is used to compute a Simulations dictionary object. Subsets of this simulations object are then produced and the likelihood computed.
The end result is a
pandas.DataFrame
containing the following columns: likelihood  The likelihood value.
 likelihood_std  The likelihood dispersion, estimated by bootstrapping the available ensemble and computing the standard deviation.
 ensemble_size  Size of the ensemble of simulations used.
 ipoint  Index of the point used.
 iseed  Index of the random (master) seed used.
 param_values  Values of the parameters at a given point.
Parameters:  min_Nens (int) – Minimum ensemble size to be considered
 max_Nens (int) – Maximum ensemble size to be examined
 n_seeds (int) – Number of different (master) random seeds to be used
 n_points (int) – Number of points to be evaluated. Points are randomly drawn from the prior distribution (but see include_centre).
 include_centre (bool) – If True, the first point is taken as the value corresponding to the centre of each parameter range.
Returns: results – A pandas.DataFrame object containing the report data.
Return type:

prior_pdf
(cube)[source]¶ Probability distribution associated with the all parameters being used by the multiple Field Factories
Parameters: cube (np.ndarray) – Each row of the array corresponds to a different parameter value in the sampling (dimensionless, but in the standard units of the prior). Returns: rtn – Prior probability of the parameter choice specified by cube Return type: float

prior_transform
(cube)[source]¶ Prior transform cube
Takes a cube containing a uniform sampling of values and maps then onto a distribution compatible with the priors specified in the Field Factories.
If prior correlations were specified, these are applied to the cube (assumes the correlations can be well described by the correlations between gaussians).
Parameters: cube (array) – Each row of the array corresponds to a different parameter in the sampling. Returns: The modified cube Return type: cube

save
(**kwargs)[source]¶ Saves the state of the Pipeline
The run_directory set in initialization is used. Any distributed data is gathered and the pipeline is serialized and saved to disk.
Note
This method uses
imagine.tools.io.save_pipeline()

test
(n_points=3, include_centre=True, verbose=True)[source]¶ Tests the present IMAGINE pipeline evaluating the likelihood on a small number of points and reporting the runtime.
 n_points : int
 Number of points to evaluate the likelihood on. The first point corresponds to the centre of the active parameter ranges (unless include_centre is set to False) and the other are randomly sampled from the prior distributions.
 include_centre : bool
 If True, the initial point will be obtained from the centre of the active parameter ranges.
Returns: mean_time – The average execution time of a single likelihood evaluation Return type: astropy.units.Quantity

BIC
¶

MAP_model
¶ Maximum a posteriori (MAP) model
List of Field objects corresponding to the mode of the posterior distribution. This does not require a previous run of the Pipeline, as the MAP is computed by maximizing the unnormalized posterior distribution.
This convenience property uses the results from the latest call of the
Pipeline.get_MAP()
method. IfPipeline.get_MAP()
has never been called, the MAP is found calling it with with default arguments.See
Pipeline.get_MAP()
for details.

active_parameters
¶ List of all the active parameters

chains_directory
¶ Directory where the chains are stored (NB details of what is stored are samplerdependent)

distribute_ensemble
¶ If True, whenever the sampler requires a likelihood evaluation, the ensemble of stochastic fields realizations is distributed among all the nodes.
Otherwise, each likelihood evaluations will go through the whole ensemble size on a single node. See Parallelisation for details.

ensemble_size
¶

factory_list
¶ List of the
Field Factories
currently being used.Updating the factory list automatically extracts active_parameters, parameter ranges and priors from each field factory.

likelihood
¶ The
Likelihood
object used by the pipeline

log_evidence
¶ Natural logarithm of the marginal likelihood or Bayesian model evidence, \(\ln\mathcal{Z}\), where
\[\mathcal{Z} = P(dm) = \int_{\Omega_\theta} P(d  \theta, m) P(\theta  m) \mathrm{d}\theta .\]Note
Available only after the pipeline is run.

log_evidence_err
¶ Error estimate in the natural logarithm of the Bayesian model evidence. Available once the pipeline is run.
Note
Available only after the pipeline is run.

median_model
¶ Posterior median model
List of Field objects corresponding to the median values of the distributions of parameter values found after a Pipeline run.

median_simulation
¶ Simulation corresponding to the
median_model
.

posterior_summary
¶ A dictionary containing a summary of posterior statistics for each of the active parameters. These are: ‘median’, ‘errlo’ (15.87th percentile), ‘errup’ (84.13th percentile), ‘mean’ and ‘stdev’.

prior_correlations
¶

priors
¶ Dictionary containing priors for all active parameters

run_directory
¶ Directory where the chains are stored (NB details of what is stored are samplerdependent)

sampler_supports_mpi
¶

samples
¶ An
astropy.table.QTable
object containing parameter values of the samples produced in the run.

sampling_controllers
¶ Settings used by the sampler (e.g. ‘dlogz’). See the documentation of each specific pipeline subclass for details.
After the pipeline runs, this property is updated to reflect the actual final choice of sampling controllers (including default values).

wrapped_parameters
¶ List of parameters which are periodic or “wrapped around”

imagine.pipelines.ultranest_pipeline module¶

class
imagine.pipelines.ultranest_pipeline.
UltranestPipeline
(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]¶ Bases:
imagine.pipelines.pipeline.Pipeline
Bayesian analysis pipeline with UltraNest
See base class for initialization details.
The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.
Other Parameters:  resume (bool) – If False the Pipeline the sampling starts from the beginning, erasing any previous work in the chains_directory. Otherwise, tries to resume a previous run.
 dlogz (float) – Target evidence uncertainty. This is the std between bootstrapped logz integrators.
 dKL (float) – Target posterior uncertainty. This is the KullbackLeibler divergence in nat between bootstrapped integrators.
 frac_remain (float) – Integrate until this fraction of the integral is left in the remainder. Set to a low number (1e2 … 1e5) to make sure peaks are discovered. Set to a higher number (0.5) if you know the posterior is simple.
 Lepsilon (float) – Terminate when live point likelihoods are all the same, within Lepsilon tolerance. Increase this when your likelihood function is inaccurate, to avoid unnecessary search.
 min_ess (int) – Target number of effective posterior samples.
 max_iters (int) – maximum number of integration iterations.
 max_ncalls (int) – stop after this many likelihood evaluations.
 max_num_improvement_loops (int) – run() tries to assess iteratively where more samples are needed. This number limits the number of improvement loops.
 min_num_live_points (int) – minimum number of live points throughout the run
 cluster_num_live_points (int) – require at least this many live points per detected cluster
 num_test_samples (int) – test transform and likelihood with this number of random points for errors first. Useful to catch bugs.
 draw_multiple (bool) – draw more points if efficiency goes down. If set to False, few points are sampled at once.
 num_bootstraps (int) – number of logZ estimators and MLFriends region bootstrap rounds.
 update_interval_iter_fraction (float) – Update region after (update_interval_iter_fraction*nlive) iterations.
Note
Instances of this class are callable. Look at the
UltranestPipeline.call()
for details.
call
(**kwargs)[source]¶ Runs the IMAGINE pipeline using the UltraNest
ReactiveNestedSampler
.Any keyword argument provided is used to update the sampling_controllers.
Returns: results – UltraNest sampling results in a dictionary containing the keys: logZ (the logevidence), logZerror (the error in logevidence) and samples (equal weighted posterior) Return type: dict Notes
See base class for other attributes/properties and methods

SUPPORTS_MPI
= True¶
Module contents¶

class
imagine.pipelines.
DynestyPipeline
(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]¶ Bases:
imagine.pipelines.pipeline.Pipeline
Bayesian analysis pipeline with dynesty
This pipeline may use
DynamicNestedSampler
if the sampling parameter ‘dynamic’ is set to True (default) orNestedSampler
if ‘dynamic` is False (default).See base class for initialization details.
The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.
Other Parameters: dynamic (bool) – If True (default), use
dynesty.DynamicNestedSampler
otherwise usesdynesty.NestedSampler
.dlogz (float) – Iteration will stop, in the dynamic==False case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est)  ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e3 * (nlive  1) + 0.01. Otherwise, the default is 0.01.
dlogz_init (float) – The baseline run will stop, in the dynamic==True case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est)  ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e3 * (nlive  1) + 0.01. Otherwise, the default is 0.01.
nlive (int) – If dynamic is False, this sets the number of live points used. Default is 400.
nlive_init (int) – If dynamic is True, this sets the number of live points used during the initial (“baseline”) nested sampling run. Default is 400.
nlive_batch (int) – If dynamic is True, this sets the number of live points used when adding additional samples from a nested sampling run within each batch. Default is 400.
logl_max (float) – Iteration will stop when the sampled ln(likelihood) exceeds the threshold set by logl_max. Default is no bound (np.inf).
logl_max_init (float) – The baseline run will stop, in the dynamic==True case, when the sampled ln(likelihood) exceeds this threshold. Default is no bound (np.inf).
maxiter (int) – Maximum number of iterations. Iteration may stop earlier if the termination condition is reached. Default is (no limit).
maxiter_init (int) – If dynamic is True, this sets the maximum number of iterations for the initial baseline nested sampling run. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxiter_batch (int) – If dynamic is True, this sets the maximum number of iterations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxcall (int) – Maximum number of likelihood evaluations (without considering the initial points, i.e. maxcall_effective = maxcall + nlive). Iteration may stop earlier if termination condition is reached. Default is sys.maxsize (no limit).
maxcall_init (int) – If dynamic is True, maximum number of likelihood evaluations in the baseline run.
maxcall_batch (int) – If dynamic is True, maximum number of likelihood evaluations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxbatch (int) – If dynamic is True, maximum number of batches allowed. Default is sys.maxsize (no limit).
use_stop (bool, optional) – Whether to evaluate our stopping function after each batch. Disabling this can improve performance if other stopping criteria such as
maxcall
are already specified. Default is True.n_effective (int) – Minimum number of effective posterior samples. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).
n_effective_init (int) – Minimum number of effective posterior samples during the baseline run. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).
add_live (bool) – Whether or not to add the remaining set of live points to the list of samples at the end of each run. Default is True.
print_progress (bool) – Whether or not to output a simple summary of the current run that updates with each iteration. Default is True.
print_func (function) – A function that prints out the current state of the sampler. If not provided, the default
results.print_fn()
is used.save_bounds (bool) –
 Whether or not to save past bounding distributions used to bound
the live points internally. Default is True.
bound ({‘none’, ‘single’, ‘multi’, ‘balls’, ‘cubes’}) – Method used to approximately bound the prior using the current set of live points. Conditions the sampling methods used to propose new live points. Choices are no bound (‘none’), a single bounding ellipsoid (‘single’), multiple bounding ellipsoids (‘multi’), balls centered on each live point (‘balls’), and cubes centered on each live point (‘cubes’). Default is ‘multi’.
sample ({‘auto’, ‘unif’, ‘rwalk’, ‘rstagger’,) – ‘slice’, ‘rslice’} Method used to sample uniformly within the likelihood constraint, conditioned on the provided bounds. Unique methods available are: uniform sampling within the bounds(‘unif’), random walks with fixed proposals (‘rwalk’), random walks with variable (“staggering”) proposals (‘rstagger’), multivariate slice sampling along preferred orientations (‘slice’), and “random” slice sampling along all orientations (‘rslice’). ‘auto’ selects the sampling method based on the dimensionality of the problem (from ndim). When ndim < 10, this defaults to ‘unif’. When 10 <= ndim <= 20, this defaults to ‘rwalk’. When ndim > 20, this defaults to ‘slice’. ‘rstagger’ and ‘rslice’ are provided as alternatives for ‘rwalk’ and ‘slice’, respectively. Default is ‘auto’. Note that Dynesty’s ‘hslice’ option is not supported within IMAGINE.
update_interval (int or float) – If an integer is passed, only update the proposal distribution every update_intervalth likelihood call. If a float is passed, update the proposal after every round(update_interval * nlive)th likelihood call. Larger update intervals larger can be more efficient when the likelihood function is quick to evaluate. Default behavior is to target a roughly constant change in prior volume, with 1.5 for ‘unif’, 0.15 * walks for ‘rwalk’ and ‘rstagger’, 0.9 * ndim * slices for ‘slice’, 2.0 * slices for ‘rslice’, and 25.0 * slices for ‘hslice’.
enlarge (float) – Enlarge the volumes of the specified bounding object(s) by this fraction. The preferred method is to determine this organically using bootstrapping. If bootstrap > 0, this defaults to 1.0. If bootstrap = 0, this instead defaults to 1.25.
bootstrap (int) – Compute this many bootstrapped realizations of the bounding objects. Use the maximum distance found to the set of points left out during each iteration to enlarge the resulting volumes. Can lead to unstable bounding ellipsoids. Default is 0 (no bootstrap).
vol_dec (float) – For the ‘multi’ bounding option, the required fractional reduction in volume after splitting an ellipsoid in order to to accept the split. Default is 0.5.
vol_check (float) – For the ‘multi’ bounding option, the factor used when checking if the volume of the original bounding ellipsoid is large enough to warrant > 2 splits via ell.vol > vol_check * nlive * pointvol. Default is 2.0.
walks (int) – For the ‘rwalk’ sampling option, the minimum number of steps (minimum 2) before proposing a new live point. Default is 25.
facc (float) – The target acceptance fraction for the ‘rwalk’ sampling option. Default is 0.5. Bounded to be between [1. / walks, 1.].
slices (int) – For the ‘slice’ and ‘rslice’ sampling options, the number of times to execute a “slice update” before proposing a new live point. Default is 5. Note that ‘slice’ cycles through all dimensions when executing a “slice update”.
Note
Instances of this class are callable. Look at the
DynestyPipeline.call()
for details.

class
imagine.pipelines.
MultinestPipeline
(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]¶ Bases:
imagine.pipelines.pipeline.Pipeline
Bayesian analysis pipeline with pyMultinest
See base class for initialization details.
The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.
Other Parameters:  resume (bool) – If False the Pipeline the sampling starts from the beginning, overwriting any previous work in the chains_directory. Otherwise, tries to resume a previous run.
 n_live_points (int) – Number of live points to be used.
 evidence_tolerance (float) – A value of 0.5 should give good enough accuracy.
 max_iter (int) – Maximum number of iterations. 0 (default) is unlimited (i.e. only stops after convergence).
 log_zero (float) – Points with loglike < logZero will be ignored by MultiNest
 importance_nested_sampling (bool) – If True (default), Multinest will use Importance Nested Sampling (see arXiv:1306.2144)
 sampling_efficiency (float) – Efficieny of the sampling. 0.8 (default) and 0.3 are recommended values for parameter estimation & evidence evaluation respectively.
 multimodal (bool) – If True, MultiNest will attempt to separate out the modes using a clustering algorithm.
 mode_tolerance (float) – MultiNest can find multiple modes and specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local logevidence value greater than a particular value in which case mode_tolerance should be set to that value. If there isn’t any particularly interesting mode_tolerance value, then it should be set to a very negative number (e.g. 1e90, default).
 null_log_evidence (float) – If multimodal is True, MultiNest can find multiple modes and also specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local logevidence value greater than a particular value in which case nullZ should be set to that value. If there isn’t any particulrly interesting nullZ value, then nullZ should be set to a very large negative number (e.g. 1.d90).
 n_clustering_params (int) – Mode separation is done through a clustering algorithm. Mode separation can be done on all the parameters (in which case nCdims should be set to ndims) & it can also be done on a subset of parameters (in which case nCdims < ndims) which might be advantageous as clustering is less accurate as the dimensionality increases. If nCdims < ndims then mode separation is done on the first nCdims parameters.
 max_modes (int) – Maximum number of modes (if multimodal is True).
Note
Instances of this class are callable. Look at the
MultinestPipeline.call()
for details.
call
(**kwargs)[source]¶ Runs the IMAGINE pipeline using the MultiNest sampler
Returns: results – pyMultinest sampling results in a dictionary containing the keys: logZ (the logevidence), logZerror (the error in logevidence) and samples (equal weighted posterior) Return type: dict

SUPPORTS_MPI
= True¶

class
imagine.pipelines.
Pipeline
(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]¶ Bases:
imagine.tools.class_tools.BaseClass
Base class used for for initialing Bayesian analysis pipeline

likelihood_rescaler
¶ Rescale loglikelihood value
Type: double

random_type
¶ If set to ‘fixed’, the exact same set of ensemble seeds will be used for the evaluation of all fields, generated using the master_seed. If set to ‘controllable’, each individual field will get their own set of ensemble fields, but multiple runs will lead to the same results, as they are based on the same master_seed. If set to ‘free’, every time the pipeline is run, the master_seed is reset to a different value, and the ensemble seeds for each individual field are drawn based on this.
Type: str
Parameters:  simulator (imagine.simulators.simulator.Simulator) – Simulator object
 factory_list (list) – List or tuple of field factory objects
 likelihood (imagine.likelihoods.likelihood.Likelihood) – Likelihood object
 ensemble_size (int) – Number of observable realizations to be generated by the simulator
 run_directory (str) – Directory where the pipeline state and reports are saved.
 chains_directory (str) – Path of the directory where the chains should be saved. By default, this is saved to a ‘chains’ subdirectory of run_directory.
 prior_correlations (dict) – Dictionary used to set up prior distribution correlations. If two parameters are A and B are correlated a priori, an entry should be added to the prior_correlations dictionary in the form (name_A, name_B): True, to extract the correlation from the samples (in the case of CustomPriors) or (name_A, name_B): value otherwise.
 show_summary_reports (bool) – If True (default), shows/saves a corner plot and shows the evidence after the pipeline run has finished
 show_progress_reports (bool) – If True, shows/saves a simple progress of the sampler during the run.
 n_evals_report (int) – The number of likelihood evaluations before showing a progress report

corner_plot
(**kwargs)[source]¶ Calls
imagine.tools.visualization.corner_plot()
to make a corner plot of samples produced by this Pipeline

get_BIC
()[source]¶ Computes the Bayesian Information Criterion (BIC), this is done using the simple expression
\[BIC = k \ln n  2 \ln L(\theta_{\rm MAP})\]where \(k\) is the number of parameters, \(n\) is the total number of data points, and \(\hat{L}\) is the likelihood function at a reference point :math:`theta_{rm MAP}.
Traditionally, this information criterion uses maximum likelihood as a reference point (i.e. \(\theta_{\rm MLE}\)). By default, however, this method uses the likelihod at the MAP as the reference. motivated by the heuristic that, if the choice of prior is a sensible one, the MAP a better representation of the model performance than the MLE.

get_MAP
(initial_guess='auto', include_units=True, return_optimizer_result=False, **kwargs)[source]¶ Computes the parameter values at the Maximum A Posteriori
This method uses scipy.optimize.minimize for the calculation. By default, it will use the ‘Powell’ (which does not require the calculation of gradients or the assumption of continuity). Also by default the bounds keywords use the ranges specified in the priors.
The MAP estimate is stored internally to be used by the properties:
MAP_model
andMAP_simulation
.Parameters:  include_units (bool) – If True (default), returns the result as list of Quantities <astropy.units.Quantity>. Otherwise, returns a single numpy array with the parameter values in their default units.
 return_optimizer_result (bool) – If False (default) only the MAP values are returned. Otherwise, a tuple containing the values and the output of scipy.optimize.minimize is returned instead.
 initial_guess (str or numpy.ndarray) – The initial guess used by the optimizer. If set to ‘centre’, the
centre of each parameter range is used (obtained using
parameter_central_value()
). If set to ‘samples’, the median values of the samples produced by a previous posterior sampling are used (the Pipeline has to have been run before). If set to ‘auto’ (default), use ‘samples’ if available, and ‘centre’ otherwise. Alternatively, an array of active parameter values with the starting position may be provided.  **kwargs – Any other keyword arguments are passed directly to scipy.optimize.minimize.
Returns:  MAP (list or array) – Parameter values at the position of the maximum of the posterior distribution
 result (scipy.optimize.OptimizeResult) – Only if return_optimizer_result is set to True, this is returned together with the MAP.

likelihood_convergence_report
(cmap='cmr.chroma', **kwargs)[source]¶ Prepares a standard set of plots of a likelihood convergence report (produced by the
Pipeline.prepare_likelihood_convergence_report()
method).Parameters:  cmap (str) – Colormap to be used for the lineplots
 **kwargs – Keyword arguments that will be supplied to prepare_likelihood_convergence_report (see its docstring for details).

classmethod
load
(directory_path='.')[source]¶ Loads the state of a Pipeline object
Parameters: directory_path (str) – Path to the directory where the Pipeline state should be saved Note
This method uses
imagine.tools.io.load_pipeline()

log_probability_unnormalized
(theta)[source]¶ The log of the unnormalized posterior probability – i.e. the sum of the loglikelihood and the log of the prior probability.
Parameters: theta (np.ndarray) – Array of parameter values (in their default units) Returns: log_prob – log(likelihood(theta))+log(prior(theta)) Return type: float

parameter_central_value
()[source]¶ Gets central point in the parameter space of a given pipeline
The ranges are extracted from each prior. The result is a pure list of numbers corresponding to the values in the native units of each prior.
For nonfinite ranges (e.g. [inf,inf]), a zero central value is assumed.
Returns: central_values – A list of parameter values at the centre of each parameter range Return type: list

posterior_report
(sdigits=2, **kwargs)[source]¶ Displays the best fit values and 1sigma errors for each active parameter. Also produces a corner plot of the samples, which is saved to the run directory.
If running on a jupyternotebook, a nice LaTeX display is used, and the plot is shown.
Parameters: sdigits (int) – The number of significant digits to be used

prepare_likelihood_convergence_report
(min_Nens=10, max_Nens=50, n_seeds=1, n_points=5, include_centre=True)[source]¶ Constructs a report dataset based on a given Pipeline setup, which can be used for studying the likelihood convergence in a particular problem
The pipeline’s ensemble size is temporarily set to Nens*n_seeds, and (for each point) the present pipeline setup is used to compute a Simulations dictionary object. Subsets of this simulations object are then produced and the likelihood computed.
The end result is a
pandas.DataFrame
containing the following columns: likelihood  The likelihood value.
 likelihood_std  The likelihood dispersion, estimated by bootstrapping the available ensemble and computing the standard deviation.
 ensemble_size  Size of the ensemble of simulations used.
 ipoint  Index of the point used.
 iseed  Index of the random (master) seed used.
 param_values  Values of the parameters at a given point.
Parameters:  min_Nens (int) – Minimum ensemble size to be considered
 max_Nens (int) – Maximum ensemble size to be examined
 n_seeds (int) – Number of different (master) random seeds to be used
 n_points (int) – Number of points to be evaluated. Points are randomly drawn from the prior distribution (but see include_centre).
 include_centre (bool) – If True, the first point is taken as the value corresponding to the centre of each parameter range.
Returns: results – A pandas.DataFrame object containing the report data.
Return type:

prior_pdf
(cube)[source]¶ Probability distribution associated with the all parameters being used by the multiple Field Factories
Parameters: cube (np.ndarray) – Each row of the array corresponds to a different parameter value in the sampling (dimensionless, but in the standard units of the prior). Returns: rtn – Prior probability of the parameter choice specified by cube Return type: float

prior_transform
(cube)[source]¶ Prior transform cube
Takes a cube containing a uniform sampling of values and maps then onto a distribution compatible with the priors specified in the Field Factories.
If prior correlations were specified, these are applied to the cube (assumes the correlations can be well described by the correlations between gaussians).
Parameters: cube (array) – Each row of the array corresponds to a different parameter in the sampling. Returns: The modified cube Return type: cube

save
(**kwargs)[source]¶ Saves the state of the Pipeline
The run_directory set in initialization is used. Any distributed data is gathered and the pipeline is serialized and saved to disk.
Note
This method uses
imagine.tools.io.save_pipeline()

test
(n_points=3, include_centre=True, verbose=True)[source]¶ Tests the present IMAGINE pipeline evaluating the likelihood on a small number of points and reporting the runtime.
 n_points : int
 Number of points to evaluate the likelihood on. The first point corresponds to the centre of the active parameter ranges (unless include_centre is set to False) and the other are randomly sampled from the prior distributions.
 include_centre : bool
 If True, the initial point will be obtained from the centre of the active parameter ranges.
Returns: mean_time – The average execution time of a single likelihood evaluation Return type: astropy.units.Quantity

BIC
¶

MAP_model
¶ Maximum a posteriori (MAP) model
List of Field objects corresponding to the mode of the posterior distribution. This does not require a previous run of the Pipeline, as the MAP is computed by maximizing the unnormalized posterior distribution.
This convenience property uses the results from the latest call of the
Pipeline.get_MAP()
method. IfPipeline.get_MAP()
has never been called, the MAP is found calling it with with default arguments.See
Pipeline.get_MAP()
for details.

active_parameters
¶ List of all the active parameters

chains_directory
¶ Directory where the chains are stored (NB details of what is stored are samplerdependent)

distribute_ensemble
¶ If True, whenever the sampler requires a likelihood evaluation, the ensemble of stochastic fields realizations is distributed among all the nodes.
Otherwise, each likelihood evaluations will go through the whole ensemble size on a single node. See Parallelisation for details.

ensemble_size
¶

factory_list
¶ List of the
Field Factories
currently being used.Updating the factory list automatically extracts active_parameters, parameter ranges and priors from each field factory.

likelihood
¶ The
Likelihood
object used by the pipeline

log_evidence
¶ Natural logarithm of the marginal likelihood or Bayesian model evidence, \(\ln\mathcal{Z}\), where
\[\mathcal{Z} = P(dm) = \int_{\Omega_\theta} P(d  \theta, m) P(\theta  m) \mathrm{d}\theta .\]Note
Available only after the pipeline is run.

log_evidence_err
¶ Error estimate in the natural logarithm of the Bayesian model evidence. Available once the pipeline is run.
Note
Available only after the pipeline is run.

median_model
¶ Posterior median model
List of Field objects corresponding to the median values of the distributions of parameter values found after a Pipeline run.

median_simulation
¶ Simulation corresponding to the
median_model
.

posterior_summary
¶ A dictionary containing a summary of posterior statistics for each of the active parameters. These are: ‘median’, ‘errlo’ (15.87th percentile), ‘errup’ (84.13th percentile), ‘mean’ and ‘stdev’.

prior_correlations
¶

priors
¶ Dictionary containing priors for all active parameters

run_directory
¶ Directory where the chains are stored (NB details of what is stored are samplerdependent)

sampler_supports_mpi
¶

samples
¶ An
astropy.table.QTable
object containing parameter values of the samples produced in the run.

sampling_controllers
¶ Settings used by the sampler (e.g. ‘dlogz’). See the documentation of each specific pipeline subclass for details.
After the pipeline runs, this property is updated to reflect the actual final choice of sampling controllers (including default values).

wrapped_parameters
¶ List of parameters which are periodic or “wrapped around”


class
imagine.pipelines.
UltranestPipeline
(*, simulator, factory_list, likelihood, ensemble_size=1, run_directory=None, chains_directory=None, prior_correlations=None, show_summary_reports=True, show_progress_reports=False, n_evals_report=500)[source]¶ Bases:
imagine.pipelines.pipeline.Pipeline
Bayesian analysis pipeline with UltraNest
See base class for initialization details.
The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.
Other Parameters:  resume (bool) – If False the Pipeline the sampling starts from the beginning, erasing any previous work in the chains_directory. Otherwise, tries to resume a previous run.
 dlogz (float) – Target evidence uncertainty. This is the std between bootstrapped logz integrators.
 dKL (float) – Target posterior uncertainty. This is the KullbackLeibler divergence in nat between bootstrapped integrators.
 frac_remain (float) – Integrate until this fraction of the integral is left in the remainder. Set to a low number (1e2 … 1e5) to make sure peaks are discovered. Set to a higher number (0.5) if you know the posterior is simple.
 Lepsilon (float) – Terminate when live point likelihoods are all the same, within Lepsilon tolerance. Increase this when your likelihood function is inaccurate, to avoid unnecessary search.
 min_ess (int) – Target number of effective posterior samples.
 max_iters (int) – maximum number of integration iterations.
 max_ncalls (int) – stop after this many likelihood evaluations.
 max_num_improvement_loops (int) – run() tries to assess iteratively where more samples are needed. This number limits the number of improvement loops.
 min_num_live_points (int) – minimum number of live points throughout the run
 cluster_num_live_points (int) – require at least this many live points per detected cluster
 num_test_samples (int) – test transform and likelihood with this number of random points for errors first. Useful to catch bugs.
 draw_multiple (bool) – draw more points if efficiency goes down. If set to False, few points are sampled at once.
 num_bootstraps (int) – number of logZ estimators and MLFriends region bootstrap rounds.
 update_interval_iter_fraction (float) – Update region after (update_interval_iter_fraction*nlive) iterations.
Note
Instances of this class are callable. Look at the
UltranestPipeline.call()
for details.
call
(**kwargs)[source]¶ Runs the IMAGINE pipeline using the UltraNest
ReactiveNestedSampler
.Any keyword argument provided is used to update the sampling_controllers.
Returns: results – UltraNest sampling results in a dictionary containing the keys: logZ (the logevidence), logZerror (the error in logevidence) and samples (equal weighted posterior) Return type: dict Notes
See base class for other attributes/properties and methods

SUPPORTS_MPI
= True¶