imagine.pipelines package¶

Submodules¶

imagine.pipelines.dynesty_pipeline module¶

class imagine.pipelines.dynesty_pipeline.DynestyPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, chains_directory=None)[source]¶

Bases: imagine.pipelines.pipeline.Pipeline

Bayesian analysis pipeline with dynesty

This pipeline may use DynamicNestedSampler if the sampling parameter ‘dynamic’ is set to True or NestedSampler if ‘dynamic` is False (default).

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

dynamic (bool) – If True, use dynesty.DynamicNestedSampler otherwise uses dynesty.NestedSampler
dlogz (float) – Iteration will stop, in the dynamic==False case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est) - ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e-3 * (nlive - 1) + 0.01. Otherwise, the default is 0.01.
dlogz_init (float) – The baseline run will stop, in the dynamic==True case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est) - ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e-3 * (nlive - 1) + 0.01. Otherwise, the default is 0.01.
nlive (int) – If dynamic is False, this sets the number of live points used. Default is 400.
nlive_init (int) – If dynamic is True, this sets the number of live points used during the initial (“baseline”) nested sampling run. Default is 400.
nlive_batch (int) – If dynamic is True, this sets the number of live points used when adding additional samples from a nested sampling run within each batch. Default is 400.
logl_max (float) – Iteration will stop when the sampled ln(likelihood) exceeds the threshold set by logl_max. Default is no bound (np.inf).
logl_max_init (float) – The baseline run will stop, in the dynamic==True case, when the sampled ln(likelihood) exceeds this threshold. Default is no bound (np.inf).
maxiter (int) – Maximum number of iterations. Iteration may stop earlier if the termination condition is reached. Default is (no limit).
maxiter_init (int) – If dynamic is True, this sets the maximum number of iterations for the initial baseline nested sampling run. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxiter_batch (int) – If dynamic is True, this sets the maximum number of iterations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxcall (int) – Maximum number of likelihood evaluations (without considering the initial points, i.e. maxcall_effective = maxcall + nlive). Iteration may stop earlier if termination condition is reached. Default is sys.maxsize (no limit).
maxcall_init (int) – If dynamic is True, maximum number of likelihood evaluations in the baseline run.
maxcall_batch (int) – If dynamic is True, maximum number of likelihood evaluations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxbatch (int) – If dynamic is True, maximum number of batches allowed. Default is sys.maxsize (no limit).
use_stop (bool, optional) – Whether to evaluate our stopping function after each batch. Disabling this can improve performance if other stopping criteria such as maxcall are already specified. Default is True.
n_effective (int) – Minimum number of effective posterior samples. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).
n_effective_init (int) – Minimum number of effective posterior samples during the baseline run. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).
add_live (bool) – Whether or not to add the remaining set of live points to the list of samples at the end of each run. Default is True.
print_progress (bool) – Whether or not to output a simple summary of the current run that updates with each iteration. Default is True.
print_func (function) – A function that prints out the current state of the sampler. If not provided, the default results.print_fn() is used.
save_bounds (bool) –

Whether or not to save past bounding distributions used to bound

the live points internally. Default is True.
bound ({‘none’, ‘single’, ‘multi’, ‘balls’, ‘cubes’}) – Method used to approximately bound the prior using the current set of live points. Conditions the sampling methods used to propose new live points. Choices are no bound (‘none’), a single bounding ellipsoid (‘single’), multiple bounding ellipsoids (‘multi’), balls centered on each live point (‘balls’), and cubes centered on each live point (‘cubes’). Default is ‘multi’.
sample ({‘auto’, ‘unif’, ‘rwalk’, ‘rstagger’,) – ‘slice’, ‘rslice’} Method used to sample uniformly within the likelihood constraint, conditioned on the provided bounds. Unique methods available are: uniform sampling within the bounds(‘unif’), random walks with fixed proposals (‘rwalk’), random walks with variable (“staggering”) proposals (‘rstagger’), multivariate slice sampling along preferred orientations (‘slice’), and “random” slice sampling along all orientations (‘rslice’). ‘auto’ selects the sampling method based on the dimensionality of the problem (from ndim). When ndim < 10, this defaults to ‘unif’. When 10 <= ndim <= 20, this defaults to ‘rwalk’. When ndim > 20, this defaults to ‘slice’. ‘rstagger’ and ‘rslice’ are provided as alternatives for ‘rwalk’ and ‘slice’, respectively. Default is ‘auto’. Note that Dynesty’s ‘hslice’ option is not supported within IMAGINE.
update_interval (int or float) – If an integer is passed, only update the proposal distribution every update_interval-th likelihood call. If a float is passed, update the proposal after every round(update_interval * nlive)-th likelihood call. Larger update intervals larger can be more efficient when the likelihood function is quick to evaluate. Default behavior is to target a roughly constant change in prior volume, with 1.5 for ‘unif’, 0.15 * walks for ‘rwalk’ and ‘rstagger’, 0.9 * ndim * slices for ‘slice’, 2.0 * slices for ‘rslice’, and 25.0 * slices for ‘hslice’.
enlarge (float) – Enlarge the volumes of the specified bounding object(s) by this fraction. The preferred method is to determine this organically using bootstrapping. If bootstrap > 0, this defaults to 1.0. If bootstrap = 0, this instead defaults to 1.25.
bootstrap (int) – Compute this many bootstrapped realizations of the bounding objects. Use the maximum distance found to the set of points left out during each iteration to enlarge the resulting volumes. Can lead to unstable bounding ellipsoids. Default is 0 (no bootstrap).
vol_dec (float) – For the ‘multi’ bounding option, the required fractional reduction in volume after splitting an ellipsoid in order to to accept the split. Default is 0.5.
vol_check (float) – For the ‘multi’ bounding option, the factor used when checking if the volume of the original bounding ellipsoid is large enough to warrant > 2 splits via ell.vol > vol_check * nlive * pointvol. Default is 2.0.
walks (int) – For the ‘rwalk’ sampling option, the minimum number of steps (minimum 2) before proposing a new live point. Default is 25.
facc (float) – The target acceptance fraction for the ‘rwalk’ sampling option. Default is 0.5. Bounded to be between [1. / walks, 1.].
slices (int) – For the ‘slice’ and ‘rslice’ sampling options, the number of times to execute a “slice update” before proposing a new live point. Default is 5. Note that ‘slice’ cycles through all dimensions when executing a “slice update”.

Note

Instances of this class are callable. Look at the DynestyPipeline.call() for details.

call(**kwargs)[source]¶

Runs the IMAGINE pipeline using the Dynesty sampler

Returns:	results – Dynesty sampling results
Return type:	dict

imagine.pipelines.multinest_pipeline module¶

class imagine.pipelines.multinest_pipeline.MultinestPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, chains_directory=None)[source]¶

Bases: imagine.pipelines.pipeline.Pipeline

Bayesian analysis pipeline with pyMultinest

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

resume (bool) – If False the Pipeline the sampling starts from the beginning, overwriting any previous work in the chains_directory. Otherwise, tries to resume a previous run.
n_live_points (int) – Number of live points to be used.
evidence_tolerance (float) – A value of 0.5 should give good enough accuracy.
max_iter (int) – Maximum number of iterations. 0 (default) is unlimited (i.e. only stops after convergence).
log_zero (float) – Points with loglike < logZero will be ignored by MultiNest
importance_nested_sampling (bool) – If True (default), Multinest will use Importance Nested Sampling (see arXiv:1306.2144)
sampling_efficiency (float) – Efficieny of the sampling. 0.8 (default) and 0.3 are recommended values for parameter estimation & evidence evaluation respectively.
multimodal (bool) – If True, MultiNest will attempt to separate out the modes using a clustering algorithm.
mode_tolerance (float) – MultiNest can find multiple modes and specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local log-evidence value greater than a particular value in which case mode_tolerance should be set to that value. If there isn’t any particularly interesting mode_tolerance value, then it should be set to a very negative number (e.g. -1e90, default).
null_log_evidence (float) – If multimodal is True, MultiNest can find multiple modes and also specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local log-evidence value greater than a particular value in which case nullZ should be set to that value. If there isn’t any particulrly interesting nullZ value, then nullZ should be set to a very large negative number (e.g. -1.d90).
n_clustering_params (int) – Mode separation is done through a clustering algorithm. Mode separation can be done on all the parameters (in which case nCdims should be set to ndims) & it can also be done on a subset of parameters (in which case nCdims < ndims) which might be advantageous as clustering is less accurate as the dimensionality increases. If nCdims < ndims then mode separation is done on the first nCdims parameters.
max_modes (int) – Maximum number of modes (if multimodal is True).

Note

Instances of this class are callable. Look at the MultinestPipeline.call() for details.

call(**kwargs)[source]¶

Runs the IMAGINE pipeline using the MultiNest sampler

Returns:	results – pyMultinest sampling results in a dictionary containing the keys: logZ (the log-evidence), logZerror (the error in log-evidence) and samples (equal weighted posterior)
Return type:	dict

SUPPORTS_MPI = True¶

imagine.pipelines.pipeline module¶

class imagine.pipelines.pipeline.Pipeline(*, simulator, factory_list, likelihood, ensemble_size=1, chains_directory=None)[source]¶

Bases: imagine.tools.class_tools.BaseClass

Base class used for for initialing Bayesian analysis pipeline

dynesty_parameter_dict¶

extra parameters for controlling Dynesty i.e., ‘nlive’, ‘bound’, ‘sample’

Type:	dict

sample_callback¶

not implemented yet

Type:	bool

likelihood_rescaler¶

Rescale log-likelihood value

Type:	double

random_type¶

If set to ‘fixed’, the exact same set of ensemble seeds will be used for the evaluation of all fields, generated using the master_seed. If set to ‘controllable’, each individual field will get their own set of ensemble fields, but multiple runs will lead to the same results, as they are based on the same master_seed. If set to ‘free’, every time the pipeline is run, the master_seed is reset to a different value, and the ensemble seeds for each individual field are drawn based on this.

Type:	str

master_seed¶

Master seed used by the random number generators

Type:	int

Parameters:

simulator (imagine.simulators.simulator.Simulator) – Simulator object
factory_list (list) – List or tuple of field factory objects
likelihood (imagine.likelihoods.likelihood.Likelihood) – Likelihood object
prior (imagine.priors.prior.Prior) – Prior object
ensemble_size (int) – Number of observable realizations PER COMPUTING NODE to be generated in simulator
chains_directory (str) – Path of the directory where the chains should be saved

__call__(*args, **kwargs)[source]¶: Call self as a function.

call(**kwargs)[source]¶

posterior_report(sdigits=2)[source]¶

Displays the best fit values and 1-sigma errors for each active parameter.

If running on a jupyter-notebook, a nice LaTeX display is used.

Parameters:	sdigits (int) – The number of significant digits to be used

prior_pdf(cube)[source]¶

Probability distribution associated with the all parameters being used by the multiple Field Factories

Parameters:	cube (array) – Each row of the array corresponds to a different parameter in the sampling.
Returns:	The modified cube
Return type:	cube_rtn

prior_transform(cube)[source]¶

Prior transform cube (i.e. MultiNest style prior).

Takes a cube containing a uniform sampling of values and maps then onto a distribution compatible with the priors specified in the Field Factories.

Parameters:	cube (array) – Each row of the array corresponds to a different parameter in the sampling. Warning: the function will modify cube inplace.
Returns:	The modified cube
Return type:	cube

tidy_up()[source]¶: Resets internal state before a new run

active_parameters¶: List of all the active parameters

active_ranges¶: Ranges of all active parameters

chains_directory¶: Directory where the chains are stored (NB details of what is stored are sampler-dependent)

distribute_ensemble¶

If True, whenever the sampler requires a likelihood evaluation, the ensemble of stochastic fields realizations is distributed among all the nodes.

Otherwise, each likelihood evaluations will go through the whole ensemble size on a single node. See Parallelisation for details.

ensemble_size¶

factory_list¶

List of the Field Factories currently being used.

Updating the factory list automatically extracts active_parameters, parameter ranges and priors from each field factory.

likelihood¶: The Likelihood object used by the pipeline

log_evidence¶: Natural logarithm of the marginal likelihood or Bayesian model evidence, \(\ln\mathcal{Z}\), where

\[\mathcal{Z} = P(d|m) = \int_{\Omega_\theta} P(d | \theta, m) P(\theta | m) \mathrm{d}\theta .\]

Note

Available only after the pipeline is run.

log_evidence_err¶: Error estimate in the natural logarithm of the Bayesian model evidence. Available once the pipeline is run.

Note

Available only after the pipeline is run.

posterior_summary¶: A dictionary containing a summary of posterior statistics for each of the active parameters. These are: ‘median’, ‘errlo’ (15.87th percentile), ‘errup’ (84.13th percentile), ‘mean’ and ‘stdev’.

priors¶: Dictionary containing priors for all active parameters

sampler_supports_mpi¶

samples¶: An astropy.table.QTable object containing parameter values of the samples produced in the run.

samples_scaled¶: An astropy.table.QTable object containing parameter values of the samples produced in the run, scaled to the interval [0,1].

sampling_controllers¶

Settings used by the sampler (e.g. ‘dlogz’). See the documentation of each specific pipeline subclass for details.

After the pipeline runs, this property is updated to reflect the actual final choice of sampling controllers (including default values).

simulator¶: The Simulator object used by the pipeline

imagine.pipelines.ultranest_pipeline module¶

class imagine.pipelines.ultranest_pipeline.UltranestPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, chains_directory=None)[source]¶

Bases: imagine.pipelines.pipeline.Pipeline

Bayesian analysis pipeline with UltraNest

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

resume (bool) – If False the Pipeline the sampling starts from the beginning, erasing any previous work in the chains_directory. Otherwise, tries to resume a previous run.
dlogz (float) – Target evidence uncertainty. This is the std between bootstrapped logz integrators.
dKL (float) – Target posterior uncertainty. This is the Kullback-Leibler divergence in nat between bootstrapped integrators.
frac_remain (float) – Integrate until this fraction of the integral is left in the remainder. Set to a low number (1e-2 … 1e-5) to make sure peaks are discovered. Set to a higher number (0.5) if you know the posterior is simple.
Lepsilon (float) – Terminate when live point likelihoods are all the same, within Lepsilon tolerance. Increase this when your likelihood function is inaccurate, to avoid unnecessary search.
min_ess (int) – Target number of effective posterior samples.
max_iters (int) – maximum number of integration iterations.
max_ncalls (int) – stop after this many likelihood evaluations.
max_num_improvement_loops (int) – run() tries to assess iteratively where more samples are needed. This number limits the number of improvement loops.
min_num_live_points (int) – minimum number of live points throughout the run
cluster_num_live_points (int) – require at least this many live points per detected cluster
num_test_samples (int) – test transform and likelihood with this number of random points for errors first. Useful to catch bugs.
draw_multiple (bool) – draw more points if efficiency goes down. If set to False, few points are sampled at once.
num_bootstraps (int) – number of logZ estimators and MLFriends region bootstrap rounds.
update_interval_iter_fraction (float) – Update region after (update_interval_iter_fraction*nlive) iterations.

Note

Instances of this class are callable. Look at the UltranestPipeline.call() for details.

call(**kwargs)[source]¶

Runs the IMAGINE pipeline using the UltraNest ReactiveNestedSampler.

Any keyword argument provided is used to update the sampling_controllers.

Returns:	results – UltraNest sampling results in a dictionary containing the keys: logZ (the log-evidence), logZerror (the error in log-evidence) and samples (equal weighted posterior)
Return type:	dict

Notes

See base class for other attributes/properties and methods

SUPPORTS_MPI = True¶

Module contents¶

class imagine.pipelines.DynestyPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, chains_directory=None)[source]¶

Bases: imagine.pipelines.pipeline.Pipeline

Bayesian analysis pipeline with dynesty

This pipeline may use DynamicNestedSampler if the sampling parameter ‘dynamic’ is set to True or NestedSampler if ‘dynamic` is False (default).

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

dynamic (bool) – If True, use dynesty.DynamicNestedSampler otherwise uses dynesty.NestedSampler
dlogz (float) – Iteration will stop, in the dynamic==False case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est) - ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e-3 * (nlive - 1) + 0.01. Otherwise, the default is 0.01.
dlogz_init (float) – The baseline run will stop, in the dynamic==True case, when the estimated contribution of the remaining prior volume to the total evidence falls below this threshold. Explicitly, the stopping criterion is ln(z + z_est) - ln(z) < dlogz, where z is the current evidence from all saved samples and z_est is the estimated contribution from the remaining volume. If add_live is True, the default is 1e-3 * (nlive - 1) + 0.01. Otherwise, the default is 0.01.
nlive (int) – If dynamic is False, this sets the number of live points used. Default is 400.
nlive_init (int) – If dynamic is True, this sets the number of live points used during the initial (“baseline”) nested sampling run. Default is 400.
nlive_batch (int) – If dynamic is True, this sets the number of live points used when adding additional samples from a nested sampling run within each batch. Default is 400.
logl_max (float) – Iteration will stop when the sampled ln(likelihood) exceeds the threshold set by logl_max. Default is no bound (np.inf).
logl_max_init (float) – The baseline run will stop, in the dynamic==True case, when the sampled ln(likelihood) exceeds this threshold. Default is no bound (np.inf).
maxiter (int) – Maximum number of iterations. Iteration may stop earlier if the termination condition is reached. Default is (no limit).
maxiter_init (int) – If dynamic is True, this sets the maximum number of iterations for the initial baseline nested sampling run. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxiter_batch (int) – If dynamic is True, this sets the maximum number of iterations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxcall (int) – Maximum number of likelihood evaluations (without considering the initial points, i.e. maxcall_effective = maxcall + nlive). Iteration may stop earlier if termination condition is reached. Default is sys.maxsize (no limit).
maxcall_init (int) – If dynamic is True, maximum number of likelihood evaluations in the baseline run.
maxcall_batch (int) – If dynamic is True, maximum number of likelihood evaluations for the nested sampling run within each batch. Iteration may stop earlier if the termination condition is reached. Default is sys.maxsize (no limit).
maxbatch (int) – If dynamic is True, maximum number of batches allowed. Default is sys.maxsize (no limit).
use_stop (bool, optional) – Whether to evaluate our stopping function after each batch. Disabling this can improve performance if other stopping criteria such as maxcall are already specified. Default is True.
n_effective (int) – Minimum number of effective posterior samples. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).
n_effective_init (int) – Minimum number of effective posterior samples during the baseline run. If the estimated “effective sample size” (ESS) exceeds this number, sampling will terminate. Default is no ESS (np.inf).
add_live (bool) – Whether or not to add the remaining set of live points to the list of samples at the end of each run. Default is True.
print_progress (bool) – Whether or not to output a simple summary of the current run that updates with each iteration. Default is True.
print_func (function) – A function that prints out the current state of the sampler. If not provided, the default results.print_fn() is used.
save_bounds (bool) –

Whether or not to save past bounding distributions used to bound

the live points internally. Default is True.
bound ({‘none’, ‘single’, ‘multi’, ‘balls’, ‘cubes’}) – Method used to approximately bound the prior using the current set of live points. Conditions the sampling methods used to propose new live points. Choices are no bound (‘none’), a single bounding ellipsoid (‘single’), multiple bounding ellipsoids (‘multi’), balls centered on each live point (‘balls’), and cubes centered on each live point (‘cubes’). Default is ‘multi’.
sample ({‘auto’, ‘unif’, ‘rwalk’, ‘rstagger’,) – ‘slice’, ‘rslice’} Method used to sample uniformly within the likelihood constraint, conditioned on the provided bounds. Unique methods available are: uniform sampling within the bounds(‘unif’), random walks with fixed proposals (‘rwalk’), random walks with variable (“staggering”) proposals (‘rstagger’), multivariate slice sampling along preferred orientations (‘slice’), and “random” slice sampling along all orientations (‘rslice’). ‘auto’ selects the sampling method based on the dimensionality of the problem (from ndim). When ndim < 10, this defaults to ‘unif’. When 10 <= ndim <= 20, this defaults to ‘rwalk’. When ndim > 20, this defaults to ‘slice’. ‘rstagger’ and ‘rslice’ are provided as alternatives for ‘rwalk’ and ‘slice’, respectively. Default is ‘auto’. Note that Dynesty’s ‘hslice’ option is not supported within IMAGINE.
update_interval (int or float) – If an integer is passed, only update the proposal distribution every update_interval-th likelihood call. If a float is passed, update the proposal after every round(update_interval * nlive)-th likelihood call. Larger update intervals larger can be more efficient when the likelihood function is quick to evaluate. Default behavior is to target a roughly constant change in prior volume, with 1.5 for ‘unif’, 0.15 * walks for ‘rwalk’ and ‘rstagger’, 0.9 * ndim * slices for ‘slice’, 2.0 * slices for ‘rslice’, and 25.0 * slices for ‘hslice’.
enlarge (float) – Enlarge the volumes of the specified bounding object(s) by this fraction. The preferred method is to determine this organically using bootstrapping. If bootstrap > 0, this defaults to 1.0. If bootstrap = 0, this instead defaults to 1.25.
bootstrap (int) – Compute this many bootstrapped realizations of the bounding objects. Use the maximum distance found to the set of points left out during each iteration to enlarge the resulting volumes. Can lead to unstable bounding ellipsoids. Default is 0 (no bootstrap).
vol_dec (float) – For the ‘multi’ bounding option, the required fractional reduction in volume after splitting an ellipsoid in order to to accept the split. Default is 0.5.
vol_check (float) – For the ‘multi’ bounding option, the factor used when checking if the volume of the original bounding ellipsoid is large enough to warrant > 2 splits via ell.vol > vol_check * nlive * pointvol. Default is 2.0.
walks (int) – For the ‘rwalk’ sampling option, the minimum number of steps (minimum 2) before proposing a new live point. Default is 25.
facc (float) – The target acceptance fraction for the ‘rwalk’ sampling option. Default is 0.5. Bounded to be between [1. / walks, 1.].
slices (int) – For the ‘slice’ and ‘rslice’ sampling options, the number of times to execute a “slice update” before proposing a new live point. Default is 5. Note that ‘slice’ cycles through all dimensions when executing a “slice update”.

Note

Instances of this class are callable. Look at the DynestyPipeline.call() for details.

call(**kwargs)[source]¶

Runs the IMAGINE pipeline using the Dynesty sampler

Returns:	results – Dynesty sampling results
Return type:	dict

class imagine.pipelines.MultinestPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, chains_directory=None)[source]¶

Bases: imagine.pipelines.pipeline.Pipeline

Bayesian analysis pipeline with pyMultinest

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

resume (bool) – If False the Pipeline the sampling starts from the beginning, overwriting any previous work in the chains_directory. Otherwise, tries to resume a previous run.
n_live_points (int) – Number of live points to be used.
evidence_tolerance (float) – A value of 0.5 should give good enough accuracy.
max_iter (int) – Maximum number of iterations. 0 (default) is unlimited (i.e. only stops after convergence).
log_zero (float) – Points with loglike < logZero will be ignored by MultiNest
importance_nested_sampling (bool) – If True (default), Multinest will use Importance Nested Sampling (see arXiv:1306.2144)
sampling_efficiency (float) – Efficieny of the sampling. 0.8 (default) and 0.3 are recommended values for parameter estimation & evidence evaluation respectively.
multimodal (bool) – If True, MultiNest will attempt to separate out the modes using a clustering algorithm.
mode_tolerance (float) – MultiNest can find multiple modes and specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local log-evidence value greater than a particular value in which case mode_tolerance should be set to that value. If there isn’t any particularly interesting mode_tolerance value, then it should be set to a very negative number (e.g. -1e90, default).
null_log_evidence (float) – If multimodal is True, MultiNest can find multiple modes and also specify which samples belong to which mode. It might be desirable to have separate samples and mode statistics for modes with local log-evidence value greater than a particular value in which case nullZ should be set to that value. If there isn’t any particulrly interesting nullZ value, then nullZ should be set to a very large negative number (e.g. -1.d90).
n_clustering_params (int) – Mode separation is done through a clustering algorithm. Mode separation can be done on all the parameters (in which case nCdims should be set to ndims) & it can also be done on a subset of parameters (in which case nCdims < ndims) which might be advantageous as clustering is less accurate as the dimensionality increases. If nCdims < ndims then mode separation is done on the first nCdims parameters.
max_modes (int) – Maximum number of modes (if multimodal is True).

Note

Instances of this class are callable. Look at the MultinestPipeline.call() for details.

call(**kwargs)[source]¶

Runs the IMAGINE pipeline using the MultiNest sampler

Returns:	results – pyMultinest sampling results in a dictionary containing the keys: logZ (the log-evidence), logZerror (the error in log-evidence) and samples (equal weighted posterior)
Return type:	dict

SUPPORTS_MPI = True¶

class imagine.pipelines.Pipeline(*, simulator, factory_list, likelihood, ensemble_size=1, chains_directory=None)[source]¶

Bases: imagine.tools.class_tools.BaseClass

Base class used for for initialing Bayesian analysis pipeline

dynesty_parameter_dict¶

extra parameters for controlling Dynesty i.e., ‘nlive’, ‘bound’, ‘sample’

Type:	dict

sample_callback¶

not implemented yet

Type:	bool

likelihood_rescaler¶

Rescale log-likelihood value

Type:	double

random_type¶

If set to ‘fixed’, the exact same set of ensemble seeds will be used for the evaluation of all fields, generated using the master_seed. If set to ‘controllable’, each individual field will get their own set of ensemble fields, but multiple runs will lead to the same results, as they are based on the same master_seed. If set to ‘free’, every time the pipeline is run, the master_seed is reset to a different value, and the ensemble seeds for each individual field are drawn based on this.

Type:	str

master_seed¶

Master seed used by the random number generators

Type:	int

Parameters:

simulator (imagine.simulators.simulator.Simulator) – Simulator object
factory_list (list) – List or tuple of field factory objects
likelihood (imagine.likelihoods.likelihood.Likelihood) – Likelihood object
prior (imagine.priors.prior.Prior) – Prior object
ensemble_size (int) – Number of observable realizations PER COMPUTING NODE to be generated in simulator
chains_directory (str) – Path of the directory where the chains should be saved

__call__(*args, **kwargs)[source]¶: Call self as a function.

call(**kwargs)[source]¶

posterior_report(sdigits=2)[source]¶

Displays the best fit values and 1-sigma errors for each active parameter.

If running on a jupyter-notebook, a nice LaTeX display is used.

Parameters:	sdigits (int) – The number of significant digits to be used

prior_pdf(cube)[source]¶

Probability distribution associated with the all parameters being used by the multiple Field Factories

Parameters:	cube (array) – Each row of the array corresponds to a different parameter in the sampling.
Returns:	The modified cube
Return type:	cube_rtn

prior_transform(cube)[source]¶

Prior transform cube (i.e. MultiNest style prior).

Takes a cube containing a uniform sampling of values and maps then onto a distribution compatible with the priors specified in the Field Factories.

Parameters:	cube (array) – Each row of the array corresponds to a different parameter in the sampling. Warning: the function will modify cube inplace.
Returns:	The modified cube
Return type:	cube

tidy_up()[source]¶: Resets internal state before a new run

active_parameters¶: List of all the active parameters

active_ranges¶: Ranges of all active parameters

chains_directory¶: Directory where the chains are stored (NB details of what is stored are sampler-dependent)

distribute_ensemble¶

If True, whenever the sampler requires a likelihood evaluation, the ensemble of stochastic fields realizations is distributed among all the nodes.

Otherwise, each likelihood evaluations will go through the whole ensemble size on a single node. See Parallelisation for details.

ensemble_size¶

factory_list¶

List of the Field Factories currently being used.

Updating the factory list automatically extracts active_parameters, parameter ranges and priors from each field factory.

likelihood¶: The Likelihood object used by the pipeline

log_evidence¶: Natural logarithm of the marginal likelihood or Bayesian model evidence, \(\ln\mathcal{Z}\), where

\[\mathcal{Z} = P(d|m) = \int_{\Omega_\theta} P(d | \theta, m) P(\theta | m) \mathrm{d}\theta .\]

Note

Available only after the pipeline is run.

log_evidence_err¶: Error estimate in the natural logarithm of the Bayesian model evidence. Available once the pipeline is run.

Note

Available only after the pipeline is run.

posterior_summary¶: A dictionary containing a summary of posterior statistics for each of the active parameters. These are: ‘median’, ‘errlo’ (15.87th percentile), ‘errup’ (84.13th percentile), ‘mean’ and ‘stdev’.

priors¶: Dictionary containing priors for all active parameters

sampler_supports_mpi¶

samples¶: An astropy.table.QTable object containing parameter values of the samples produced in the run.

samples_scaled¶: An astropy.table.QTable object containing parameter values of the samples produced in the run, scaled to the interval [0,1].

sampling_controllers¶

Settings used by the sampler (e.g. ‘dlogz’). See the documentation of each specific pipeline subclass for details.

After the pipeline runs, this property is updated to reflect the actual final choice of sampling controllers (including default values).

simulator¶: The Simulator object used by the pipeline

class imagine.pipelines.UltranestPipeline(*, simulator, factory_list, likelihood, ensemble_size=1, chains_directory=None)[source]¶

Bases: imagine.pipelines.pipeline.Pipeline

Bayesian analysis pipeline with UltraNest

See base class for initialization details.

The sampler behaviour is controlled using the sampling_controllers property. A description of these can be found below.

Other Parameters:

resume (bool) – If False the Pipeline the sampling starts from the beginning, erasing any previous work in the chains_directory. Otherwise, tries to resume a previous run.
dlogz (float) – Target evidence uncertainty. This is the std between bootstrapped logz integrators.
dKL (float) – Target posterior uncertainty. This is the Kullback-Leibler divergence in nat between bootstrapped integrators.
frac_remain (float) – Integrate until this fraction of the integral is left in the remainder. Set to a low number (1e-2 … 1e-5) to make sure peaks are discovered. Set to a higher number (0.5) if you know the posterior is simple.
Lepsilon (float) – Terminate when live point likelihoods are all the same, within Lepsilon tolerance. Increase this when your likelihood function is inaccurate, to avoid unnecessary search.
min_ess (int) – Target number of effective posterior samples.
max_iters (int) – maximum number of integration iterations.
max_ncalls (int) – stop after this many likelihood evaluations.
max_num_improvement_loops (int) – run() tries to assess iteratively where more samples are needed. This number limits the number of improvement loops.
min_num_live_points (int) – minimum number of live points throughout the run
cluster_num_live_points (int) – require at least this many live points per detected cluster
num_test_samples (int) – test transform and likelihood with this number of random points for errors first. Useful to catch bugs.
draw_multiple (bool) – draw more points if efficiency goes down. If set to False, few points are sampled at once.
num_bootstraps (int) – number of logZ estimators and MLFriends region bootstrap rounds.
update_interval_iter_fraction (float) – Update region after (update_interval_iter_fraction*nlive) iterations.

Note

Instances of this class are callable. Look at the UltranestPipeline.call() for details.

call(**kwargs)[source]¶

Runs the IMAGINE pipeline using the UltraNest ReactiveNestedSampler.

Any keyword argument provided is used to update the sampling_controllers.

Returns:	results – UltraNest sampling results in a dictionary containing the keys: logZ (the log-evidence), logZerror (the error in log-evidence) and samples (equal weighted posterior)
Return type:	dict

Notes

See base class for other attributes/properties and methods

SUPPORTS_MPI = True¶