imagine.tools package¶

Submodules¶

imagine.tools.carrier_mapper module¶

The mapper module is designed for implementing distribution mapping functions.

imagine.tools.carrier_mapper.exp_mapper(x, a=0, b=1)[source]¶

Maps x from [0, 1] into the interval [exp(a), exp(b)].

Parameters:	x (float) – The variable to be mapped. a (float) – The lower parameter value limit. b (float) – The upper parameter value limit.
Returns:	The mapped parameter value.
Return type:	numpy.float64

imagine.tools.carrier_mapper.unity_mapper(x, a=0.0, b=1.0)[source]¶

Maps x from [0, 1] into the interval [a, b].

Parameters:	x (float) – The variable to be mapped. a (float) – The lower parameter value limit. b (float) – The upper parameter value limit.
Returns:	The mapped parameter value.
Return type:	numpy.float64

imagine.tools.class_tools module¶

class imagine.tools.class_tools.BaseClass[source]¶

Bases: object

REQ_ATTRS = []¶

imagine.tools.class_tools.req_attr(meth)[source]¶

imagine.tools.config module¶

IMAGINE global configuration¶

The default behaviour of some aspects of IMAGINE can be set using global rc configuration variables.

These can be accessed and modified using the imagine.rc dictionary or setting the corresponding environment variables (named ‘IMAGINE_’+RC_VAR_NAME).

For example to set the default path for the hamx executable, one can either do:

import imagine
imagine.rc.hammurabi_hamx_path = 'my_desired_path'

or, alternatively, set this as an environment variable before the exectution of the script:

export IMAGINE_HAMMURABI_HAMX_PATH='my_desired_path'

The following list describes all the available global settings variables.

IMAGINE rc variables

temp_dir: Default temporary directory used by IMAGINE. If not set, a temporary directory will be created at /tmp/ with a safe name.
distributed_arrays: If True, arrays containing covariances are distributed among different MPI processes (and so are the corresponding array operations).
pipeline_default_seed: The default value for the master seed used by a Pipeline object (see Pipeline.master_seed).
pipeline_distribute_ensemble: The default value of (see Pipeline.distribute_ensemble).
hammurabi_hamx_path: Default location of the Hammurabi X executable file, hamx.

imagine.tools.covariance_estimator module¶

This module contains estimation algorithms for the covariance matrix based on a finite number of samples.

For the testing suits, please turn to “imagine/tests/tools_tests.py”.

imagine.tools.covariance_estimator.empirical_cov(data)[source]¶

Empirical covariance estimator

Given some data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated from

\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{N} \sum_{i=1}^N D_{ij}\]

\[\text{cov} = \tfrac{1}{N} U^T U\]

Notes

While conceptually simple, this is usually not the best option.

Parameters:	data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size).
Returns:	cov – Distributed (not copied) covariance matrix in global shape (data size, data size), each node takes part of the rows.
Return type:	numpy.ndarray

imagine.tools.covariance_estimator.empirical_mcov(data)[source]¶

Empirical covariance estimator

Given some data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated from

\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{N} \sum_{i=1}^N D_{ij}\]

\[\text{cov} = \tfrac{1}{N} U^T U\]

Notes

While conceptually simple, this is usually not the best option.

Parameters:	data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size).
Returns:	mean (numpy.ndarray) – Copied ensemble mean (on all nodes). cov (numpy.ndarray) – Distributed (not copied) covariance matrix in global shape (data size, data size), each node takes part of the rows.

imagine.tools.covariance_estimator.oas_cov(data)[source]¶

Estimate covariance with the Oracle Approximating Shrinkage algorithm.

Given some \(n\times m\) data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated in the following way.

\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{n} \sum_{i=1}^n D_{ij}\]

Let

\[S = \tfrac{1}{n} U^T U\,,\; T = \text{tr}(S)\quad\text{and}\quad V = \text{tr}(S^2)\]

\[\tilde\rho = \min\left[1,\frac{(1-2/m)V + T^2}{ (n+1-2/m)(V-T^2/m)}\right]\]

The covariance is given by

\[\text{cov}_\text{OAS} = (1-\rho)S + \tfrac{1}{N} \rho T I_m\]

Parameters:	data (numpy.ndarray) – Distributed data in global shape (ensemble_size, data_size).
Returns:	cov – Covariance matrix in global shape (data_size, data_size).
Return type:	numpy.ndarray

imagine.tools.covariance_estimator.oas_mcov(data)[source]¶

Estimate covariance with the Oracle Approximating Shrinkage algorithm.

See imagine.tools.covariance_estimator.oas_cov for details. This function aditionally returns the computed ensemble mean.

Parameters:	data (numpy.ndarray) – Distributed data in global shape (ensemble_size, data_size).
Returns:	mean (numpy.ndarray) – Copied ensemble mean (on all nodes). cov (numpy.ndarray) – Distributed covariance matrix in shape (data_size, data_size).

imagine.tools.covariance_estimator.diagonal_cov(data)[source]¶

Assumes the covariance matrix is simply a diagonal matrix whose values correspond to the sample variances

Parameters:	data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size).
Returns:	cov – Covariance matrix
Return type:	numpy.ndarray

imagine.tools.covariance_estimator.diagonal_mcov(data)[source]¶

Assumes the covariance matrix is simply a diagonal matrix whose values correspond to the sample variances

Parameters:	data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size).
Returns:	mean (numpy.ndarray) – Ensemble mean cov (numpy.ndarray) – Covariance matrix

imagine.tools.io module¶

imagine.tools.io.save_pipeline(pipeline, use_hickle=False)[source]¶

Saves the state of a Pipeline object

Parameters:	pipeline (imagine.pipelines.pipeline.Pipeline) – The pipeline object one would like to save use_hickle (bool) – If False (default) the state is saved using the cloudpickle package. Otherwise, experimental support to hickle is enabled.

imagine.tools.io.load_pipeline(directory_path='.')[source]¶

Loads the state of a Pipeline object

Parameters:	directory_path (str) – Path to the directory where the Pipeline state should be saved

imagine.tools.masker module¶

This module defines methods related to masking out distributed data and/or the associated covariance matrix. For the testing suits, please turn to “imagine/tests/tools_tests.py”.

Implemented with numpy.ndarray raw data.

imagine.tools.masker.mask_cov(cov, mask)[source]¶

Applies mask to the observable covariance.

Parameters:	cov ((distributed) numpy.ndarray) – Covariance matrix of observables in global shape (data size, data size) each node contains part of the global rows (if imagine.rc[‘distributed_arrays’]=True). mask (numpy.ndarray) – Copied mask map in shape (1, data size).
Returns:	masked_cov – Masked covariance matrix of shape (masked data size, masked data size).
Return type:	numpy.ndarray

imagine.tools.masker.mask_var(var, mask)[source]¶

Applies a mask to an observable.

Parameters:	var (numpy.ndarray) – Variance data mask (numpy.ndarray) – Copied mask map in shape (1, data size) on each node.
Returns:	Masked observable of shape (masked data size).
Return type:	numpy.ndarray

imagine.tools.masker.mask_obs(obs, mask)[source]¶

Applies a mask to an observable.

Parameters:	data (distributed numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size) each node contains part of the global rows. mask (numpy.ndarray) – Copied mask map in shape (1, data size) on each node.
Returns:	Masked observable of shape (ensemble size, masked data size).
Return type:	numpy.ndarray

imagine.tools.misc module¶

imagine.tools.misc.adjust_error_intervals(value, errlo, errup, sdigits=2, return_ndec=False)[source]¶

Takes the value of a quantity value with associated errors errlo and errup; and prepares them to be reported as \(v^{+err\,up}_{-err\,down}\). This is done by adjusting the number of decimal places of all the argumetns so that the errors have at least sdigits significant digits. Optionally, this number of decimal places may be returned.

Parameters:

value (int or float or astropy.Quantity) – Value of quantity.
errlo, errup (int or float or astropy.Quantity) – Associated lower and upper errors of value.
sdigits (int, optional) – Minimum number of significant digits in the errors
return_ndec (bool, optional) – If True, also returns the number of decimal points used

Returns:

value (float) – Rounded value
errlo, errup (float) – Assimetric error values
n (int) – If return_ndec is True, the number of decimal places is returned

imagine.tools.misc.is_notebook()[source]¶: Finds out whether python is running in a Jupyter notebook or as a shell.

imagine.tools.misc.unit_checker(unit, list_of_quant)[source]¶

Checks the consistency of units of a list of quantities, converting them all to the same units, if needed.

Parameters:

unit (astropy.Unit) – Unit to be used for the quantities in the list. If set to None, the units of the first list item are used.
list_of_quant (list) – List of quantities to be checked.

Returns:

unit (astropy.Unit) – The common unit used
list_of_values – Contains the quantities of list_of_quant converted to floats using the common unit unit

imagine.tools.mpi_helper module¶

This MPI helper module is designed for parallel computing and data handling.

For the testing suits, please turn to “imagine/tests/tools_tests.py”.

imagine.tools.mpi_helper.mpi_arrange(size)[source]¶

With known global size, number of mpi nodes, and current rank, returns the begin and end index for distributing the global size.

Parameters:	size (integer (positive)) – The total size of target to be distributed. It can be a row size or a column size.
Returns:	result – The begin and end index [begin,end] for slicing the target.
Return type:	numpy.uint

imagine.tools.mpi_helper.mpi_shape(data)[source]¶

Returns the global number of rows and columns of given distributed data.

Parameters:	data (numpy.ndarray) – The distributed data.
Returns:	result – Glboal row and column number.
Return type:	numpy.uint

imagine.tools.mpi_helper.mpi_prosecutor(data)[source]¶

Check if the data is distributed in the correct way covariance matrix is distributed exactly the same manner as multi-realization data if not, an error will be raised.

Parameters:	data (numpy.ndarray) – The distributed data to be examined.

imagine.tools.mpi_helper.mpi_mean(data)[source]¶

calculate the mean of distributed array prefers averaging along column direction but if given (1,n) data shape the average is done along row direction the result note that the numerical values will be converted into double

Parameters:	data (numpy.ndarray) – Distributed data.
Returns:	result – Copied data mean, which means the mean is copied to all nodes.
Return type:	numpy.ndarray

imagine.tools.mpi_helper.mpi_trans(data)[source]¶

Transpose distributed data, note that the numerical values will be converted into double.

Parameters:	data (numpy.ndarray) – Distributed data.
Returns:	result – Transposed data in distribution.
Return type:	numpy.ndarray

imagine.tools.mpi_helper.mpi_mult(left, right)[source]¶

Calculate matrix multiplication of two distributed data, the result is data1*data2 in multi-node distribution note that the numerical values will be converted into double. We send the distributed right rows into other nodes (aka cannon method).

Parameters:	left (numpy.ndarray) – Distributed left side data. right (numpy.ndarray) – Distributed right side data.
Returns:	result – Distributed multiplication result.
Return type:	numpy.ndarray

imagine.tools.mpi_helper.mpi_trace(data)[source]¶

Computes the trace of the given distributed data.

Parameters:	data (numpy.ndarray) – Array of data distributed over different processes.
Returns:	result – Copied trace of given data.
Return type:	numpy.float64

imagine.tools.mpi_helper.mpi_diag(data)[source]¶

Gets the diagonal of a distributed matrix

Parameters:	data (numpy.ndarray) – Array of data distributed over different processes.
Returns:	result – Diagonal
Return type:	numpy.ndarray

imagine.tools.mpi_helper.mpi_new_diag(data)[source]¶

Constructs a distributed matrix with a given diagonal

Parameters:	data (numpy.ndarray) – Array of data distributed over different processes.
Returns:	result – Diagonal
Return type:	numpy.ndarray

imagine.tools.mpi_helper.mpi_eye(size)[source]¶

Produces an eye matrix according of shape (size,size) distributed over the various running MPI processes

Parameters:	size (integer) – Distributed matrix size.
Returns:	result – Distributed eye matrix.
Return type:	numpy.ndarray, double data type

imagine.tools.mpi_helper.mpi_distribute_matrix(full_matrix)[source]¶

Parameters:	size (integer) – Distributed matrix size.
Returns:	result – Distributed eye matrix.
Return type:	numpy.ndarray, double data type

imagine.tools.mpi_helper.mpi_lu_solve(operator, source)[source]¶

Simple LU Gauss method WITHOUT pivot permutation.

Parameters:	operator (distributed numpy.ndarray) – Matrix representation of the left-hand-side operator. source (copied numpy.ndarray) – Vector representation of the right-hand-side source.
Returns:	result – Copied solution to the linear algebra problem.
Return type:	numpy.ndarray, double data type

imagine.tools.mpi_helper.mpi_slogdet(data)[source]¶

Computes log determinant according to simple LU Gauss method WITHOUT pivot permutation.

Parameters:	data (numpy.ndarray) – Array of data distributed over different processes.
Returns:	sign (numpy.ndarray) – Single element numpy array containing the sign of the determinant (copied to all nodes). logdet (numpy.ndarray) – Single element numpy array containing the log of the determinant (copied to all nodes).

imagine.tools.mpi_helper.mpi_global(data)[source]¶

Gathers data spread accross different processes.

Parameters:	data (numpy.ndarray) – Array of data distributed over different processes.
Returns:	global array – The root process returns the gathered data, other processes return None.
Return type:	numpy.ndarray

imagine.tools.mpi_helper.mpi_local(data)[source]¶

Distributes data over available processes

Parameters:	data (numpy.ndarray) – Array of data to be distributed over available processes.
Returns:	local array – Return the distributed array on all preocesses.
Return type:	numpy.ndarray

imagine.tools.parallel_ops module¶

Interface module which allows automatically switching between the routines in the imagine.tools.mpi_helper module and their:py:mod:numpy or pure Python equivalents, depending on the contents of imagine.rc['distributed_arrays']

imagine.tools.parallel_ops.pshape(data)[source]¶: imagine.tools.mpi_helper.mpi_shape() or numpy.ndarray.shape() depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.prosecutor(data)[source]¶: imagine.tools.mpi_helper.mpi_prosecutor() or nothing depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.pmean(data)[source]¶: imagine.tools.mpi_helper.mpi_mean() or numpy.mean() depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.pvar(data)[source]¶: imagine.tools.mpi_helper.mpi_var() or numpy.var() depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.ptrans(data)[source]¶: imagine.tools.mpi_helper.mpi_mean() or numpy.ndarray.T() depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.pmult(left, right)[source]¶: imagine.tools.mpi_helper.mpi_mult() or numpy.matmul() depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.ptrace(data)[source]¶: imagine.tools.mpi_helper.mpi_trace() or numpy.trace() depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.pdiag(data)[source]¶: imagine.tools.mpi_helper.mpi_diag() or numpy.diagonal() depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.pnewdiag(data)[source]¶: imagine.tools.mpi_helper.mpi_new_diag() or numpy.diag() depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.peye(size)[source]¶: imagine.tools.mpi_helper.mpi_eye() or numpy.eye() depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.distribute_matrix(full_matrix)[source]¶: imagine.tools.mpi_helper.mpi_distribute_matrix() or nothing depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.plu_solve(operator, source)[source]¶

imagine.tools.mpi_helper.mpi_lu_solve() or numpy.linalg.solve() depending on imagine.rc['distributed_arrays'].

Notes

In the non-distributed case, the source is transposed before the calculation

imagine.tools.parallel_ops.pslogdet(data)[source]¶: imagine.tools.mpi_helper.mpi_slogdet() or numpy.linalg.slogdet() depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.pglobal(data)[source]¶: imagine.tools.mpi_helper.mpi_global() or nothing depending on imagine.rc['distributed_arrays'].

imagine.tools.parallel_ops.plocal(data)[source]¶: imagine.tools.mpi_helper.mpi_local() or nothing depending on imagine.rc['distributed_arrays'].

imagine.tools.random_seed module¶

This module provides a time-thread dependent seed value.

For the testing suites, please turn to “imagine/tests/tools_tests.py”.

imagine.tools.random_seed.ensemble_seed_generator(size)[source]¶

Generates fixed random seed values for each realization in ensemble.

Parameters:	size (int) – Number of realizations in ensemble.
Returns:	seeds – An array of random seeds.
Return type:	numpy.ndarray

imagine.tools.random_seed.seed_generator(trigger)[source]¶

Sets trigger as 0 will generate time-thread dependent method otherwise returns the trigger as seed.

Parameters:	trigger (int) – Non-negative pre-fixed seed.
Returns:	seed – A random seed value.
Return type:	int

imagine.tools.timer module¶

Timer class is designed for time recording.

class imagine.tools.timer.Timer[source]¶

Bases: object

Class designed for time recording.

Simply provide an event name to the tick method to start recording. The tock method stops the recording and the record property allow one to access the recorded time.

tick(event)[source]¶

Starts timing with a given event name.

Parameters:	event (str) – Event name (will be key of the record attribute).

tock(event)[source]¶

Stops timing of the given event.

Parameters:	event (str) – Event name (will be key of the record attribute).

record¶: Dictionary of recorded times using event name as keys.

imagine.tools.visualization module¶

This module contains convenient standard plotting functions

imagine.tools.visualization.corner_plot(pipeline=None, truths_dict=None, show_sigma=True, param_names=None, table=None, samples=None, live_samples=None, **kwargs)[source]¶

Makes a corner plot.

If a Pipeline object is supplied, it will be used to collect all the necessary information. Alternatively, one can supply either a astropy.table.Table or a numpy.ndarray containing with different parameters as columns.

The plotting is done using the corner package, and extra keyword parameters are passed directly to it

Parameters:	pipeline (imagine.pipelines.pipeline.Pipeline) – Pipeline from which samples are read in the default case. truths_dict (dict) – Dictionary containing active parameters as keys and the expected values as values show_sigma (bool) – If True, plots the 1, 2 and 3-sigma contours. param_names (list) – If present, only parameters from this list will be plotted table (astropy.Table) – If present, samples from this table are used instead of the Pipeline. samples (numpy.ndarray) – If present, samples are read from this array live_samples (numpy.ndarray) – If this array is present, a second set of samples are shown in the plots.
Returns:	corner_fig – Figure containing the generated corner plot
Return type:	matplotlib.Figure

imagine.tools.visualization.show_likelihood_convergence_report(rep, cmap='cmr.chroma')[source]¶

Prepares a standard set of plots of a likelihood convergence report (produced by the Pipeline.prepare_likelihood_convergence_report() method).

Parameters:	cmap (str) – Colormap to be used for the lineplots

imagine.tools.visualization.show_observable(obs, realization=0, title=None, cartesian_axes='yz', show_variances=False, is_covariance=False, **kwargs)[source]¶

Displays the contents of a single realisation of an Observable object.

Parameters:

obs (imagine.observables.observable.Observable) – Observable object whose contents one wants to plot
realization (int) – Index of the ensemble realization to be plotted
cartesian_axes (str) – If plotting a tabular observable using cartesian coordinates, this allows selecting which two axes should be used for the plot. E.g. ‘xy’, ‘zy’, ‘xz’. Default: ‘yz’.
**kwargs – Parameters to be passed to the apropriate plotting routine (either healpy.visufunc.mollview or matplotlib.pyplot.imshow).

imagine.tools.visualization.show_observable_dict(obs_dict, max_realizations=None, show_variances=False, **kwargs)[source]¶

Plots the contents of an ObservableDict object.

Parameters:

obs_dict (imagine.observables.observable.ObservableDict) – ObservableDict object whose contents one wants to plot.
max_realization (int) – Index of the maximum ensemble realization to be plotted. If None, the whole ensemble is shown.
show_variances (bool) – If True and if obs_dict is a Covariances object, shows variance maps instead of covariance matrix
**kwargs – Parameters to be passed to the apropriate plotting routine (either healpy.visufunc.mollview() or matplotlib.pyplot.imshow()).

imagine.tools.visualization.trace_plot(samples=None, live_samples=None, likelihood=None, lnX=None, parameter_names=None, cmap='cmr.ocean', color_live='#e34a33', fig=None, hist_bins=30)[source]¶

Produces a set of “trace plots” for a nested sampling run, showing the position of “dead” points as a function of prior mass. Also plots the distributions of dead points accumulated until now, and the distributions of live points.

Parameters:	samples (numpy.ndarray) – (Nsamples, Npars)-array containing the rejected points likelihood (numpy.ndarray) – Nsamples-array containing the log likelihood values lnX (numpy.ndarray) – Nsamples-array containing the “prior mass” parameter_names (list or tuple) – List of the nPars active parameter names live_samples (numpy.ndarray, optional) – (Nsamples, Npars)-array containing the present live points cmap (str) – Name of the colormap to be used color_live (str) – Colour used for the live points distributions (if those are present) fig (matplotlib.Figure) – If a previous figure was generated, it can be passed to this function for update using this argument hist_bins (int) – The number of bins used for the histograms
Returns:	fig – The figure produced
Return type:	matplotlib.Figure

Module contents¶

class imagine.tools.BaseClass[source]¶

Bases: object

REQ_ATTRS = []¶

class imagine.tools.Timer[source]¶

Bases: object

Class designed for time recording.

Simply provide an event name to the tick method to start recording. The tock method stops the recording and the record property allow one to access the recorded time.

tick(event)[source]¶

Starts timing with a given event name.

Parameters:	event (str) – Event name (will be key of the record attribute).

tock(event)[source]¶

Stops timing of the given event.

Parameters:	event (str) – Event name (will be key of the record attribute).

record¶: Dictionary of recorded times using event name as keys.

imagine.tools.exp_mapper(x, a=0, b=1)[source]¶

Maps x from [0, 1] into the interval [exp(a), exp(b)].

Parameters:	x (float) – The variable to be mapped. a (float) – The lower parameter value limit. b (float) – The upper parameter value limit.
Returns:	The mapped parameter value.
Return type:	numpy.float64

imagine.tools.unity_mapper(x, a=0.0, b=1.0)[source]¶

Maps x from [0, 1] into the interval [a, b].

Parameters:	x (float) – The variable to be mapped. a (float) – The lower parameter value limit. b (float) – The upper parameter value limit.
Returns:	The mapped parameter value.
Return type:	numpy.float64

imagine.tools.req_attr(meth)[source]¶

imagine.tools.empirical_cov(data)[source]¶

Empirical covariance estimator

Given some data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated from

\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{N} \sum_{i=1}^N D_{ij}\]

\[\text{cov} = \tfrac{1}{N} U^T U\]

Notes

While conceptually simple, this is usually not the best option.

Parameters:	data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size).
Returns:	cov – Distributed (not copied) covariance matrix in global shape (data size, data size), each node takes part of the rows.
Return type:	numpy.ndarray

imagine.tools.empirical_mcov(data)[source]¶

Empirical covariance estimator

Given some data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated from

\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{N} \sum_{i=1}^N D_{ij}\]

\[\text{cov} = \tfrac{1}{N} U^T U\]

Notes

While conceptually simple, this is usually not the best option.

Parameters:	data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size).
Returns:	mean (numpy.ndarray) – Copied ensemble mean (on all nodes). cov (numpy.ndarray) – Distributed (not copied) covariance matrix in global shape (data size, data size), each node takes part of the rows.

imagine.tools.oas_cov(data)[source]¶

Estimate covariance with the Oracle Approximating Shrinkage algorithm.

Given some \(n\times m\) data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated in the following way.

\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{n} \sum_{i=1}^n D_{ij}\]

Let

\[S = \tfrac{1}{n} U^T U\,,\; T = \text{tr}(S)\quad\text{and}\quad V = \text{tr}(S^2)\]

\[\tilde\rho = \min\left[1,\frac{(1-2/m)V + T^2}{ (n+1-2/m)(V-T^2/m)}\right]\]

The covariance is given by

\[\text{cov}_\text{OAS} = (1-\rho)S + \tfrac{1}{N} \rho T I_m\]

Parameters:	data (numpy.ndarray) – Distributed data in global shape (ensemble_size, data_size).
Returns:	cov – Covariance matrix in global shape (data_size, data_size).
Return type:	numpy.ndarray

imagine.tools.oas_mcov(data)[source]¶

Estimate covariance with the Oracle Approximating Shrinkage algorithm.

See imagine.tools.covariance_estimator.oas_cov for details. This function aditionally returns the computed ensemble mean.

Parameters:	data (numpy.ndarray) – Distributed data in global shape (ensemble_size, data_size).
Returns:	mean (numpy.ndarray) – Copied ensemble mean (on all nodes). cov (numpy.ndarray) – Distributed covariance matrix in shape (data_size, data_size).

imagine.tools.diagonal_cov(data)[source]¶

Assumes the covariance matrix is simply a diagonal matrix whose values correspond to the sample variances

Parameters:	data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size).
Returns:	cov – Covariance matrix
Return type:	numpy.ndarray

imagine.tools.diagonal_mcov(data)[source]¶

Assumes the covariance matrix is simply a diagonal matrix whose values correspond to the sample variances

Parameters:	data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size).
Returns:	mean (numpy.ndarray) – Ensemble mean cov (numpy.ndarray) – Covariance matrix

imagine.tools.save_pipeline(pipeline, use_hickle=False)[source]¶

Saves the state of a Pipeline object

Parameters:	pipeline (imagine.pipelines.pipeline.Pipeline) – The pipeline object one would like to save use_hickle (bool) – If False (default) the state is saved using the cloudpickle package. Otherwise, experimental support to hickle is enabled.

imagine.tools.load_pipeline(directory_path='.')[source]¶

Loads the state of a Pipeline object

Parameters:	directory_path (str) – Path to the directory where the Pipeline state should be saved

imagine.tools.mask_cov(cov, mask)[source]¶

Applies mask to the observable covariance.

Parameters:	cov ((distributed) numpy.ndarray) – Covariance matrix of observables in global shape (data size, data size) each node contains part of the global rows (if imagine.rc[‘distributed_arrays’]=True). mask (numpy.ndarray) – Copied mask map in shape (1, data size).
Returns:	masked_cov – Masked covariance matrix of shape (masked data size, masked data size).
Return type:	numpy.ndarray

imagine.tools.mask_var(var, mask)[source]¶

Applies a mask to an observable.

Parameters:	var (numpy.ndarray) – Variance data mask (numpy.ndarray) – Copied mask map in shape (1, data size) on each node.
Returns:	Masked observable of shape (masked data size).
Return type:	numpy.ndarray

imagine.tools.mask_obs(obs, mask)[source]¶

Applies a mask to an observable.

Parameters:	data (distributed numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size) each node contains part of the global rows. mask (numpy.ndarray) – Copied mask map in shape (1, data size) on each node.
Returns:	Masked observable of shape (ensemble size, masked data size).
Return type:	numpy.ndarray

imagine.tools.mpi_arrange(size)[source]¶

With known global size, number of mpi nodes, and current rank, returns the begin and end index for distributing the global size.

Parameters:	size (integer (positive)) – The total size of target to be distributed. It can be a row size or a column size.
Returns:	result – The begin and end index [begin,end] for slicing the target.
Return type:	numpy.uint

imagine.tools.mpi_shape(data)[source]¶

Returns the global number of rows and columns of given distributed data.

Parameters:	data (numpy.ndarray) – The distributed data.
Returns:	result – Glboal row and column number.
Return type:	numpy.uint

imagine.tools.mpi_prosecutor(data)[source]¶

Check if the data is distributed in the correct way covariance matrix is distributed exactly the same manner as multi-realization data if not, an error will be raised.

Parameters:	data (numpy.ndarray) – The distributed data to be examined.

imagine.tools.mpi_mean(data)[source]¶

calculate the mean of distributed array prefers averaging along column direction but if given (1,n) data shape the average is done along row direction the result note that the numerical values will be converted into double

Parameters:	data (numpy.ndarray) – Distributed data.
Returns:	result – Copied data mean, which means the mean is copied to all nodes.
Return type:	numpy.ndarray

imagine.tools.mpi_trans(data)[source]¶

Transpose distributed data, note that the numerical values will be converted into double.

Parameters:	data (numpy.ndarray) – Distributed data.
Returns:	result – Transposed data in distribution.
Return type:	numpy.ndarray

imagine.tools.mpi_mult(left, right)[source]¶

Calculate matrix multiplication of two distributed data, the result is data1*data2 in multi-node distribution note that the numerical values will be converted into double. We send the distributed right rows into other nodes (aka cannon method).

Parameters:	left (numpy.ndarray) – Distributed left side data. right (numpy.ndarray) – Distributed right side data.
Returns:	result – Distributed multiplication result.
Return type:	numpy.ndarray

imagine.tools.mpi_trace(data)[source]¶

Computes the trace of the given distributed data.

Parameters:	data (numpy.ndarray) – Array of data distributed over different processes.
Returns:	result – Copied trace of given data.
Return type:	numpy.float64

imagine.tools.mpi_diag(data)[source]¶

Gets the diagonal of a distributed matrix

Parameters:	data (numpy.ndarray) – Array of data distributed over different processes.
Returns:	result – Diagonal
Return type:	numpy.ndarray

imagine.tools.mpi_new_diag(data)[source]¶

Constructs a distributed matrix with a given diagonal

Parameters:	data (numpy.ndarray) – Array of data distributed over different processes.
Returns:	result – Diagonal
Return type:	numpy.ndarray

imagine.tools.mpi_eye(size)[source]¶

Produces an eye matrix according of shape (size,size) distributed over the various running MPI processes

Parameters:	size (integer) – Distributed matrix size.
Returns:	result – Distributed eye matrix.
Return type:	numpy.ndarray, double data type

imagine.tools.mpi_distribute_matrix(full_matrix)[source]¶

Parameters:	size (integer) – Distributed matrix size.
Returns:	result – Distributed eye matrix.
Return type:	numpy.ndarray, double data type

imagine.tools.mpi_lu_solve(operator, source)[source]¶

Simple LU Gauss method WITHOUT pivot permutation.

Parameters:	operator (distributed numpy.ndarray) – Matrix representation of the left-hand-side operator. source (copied numpy.ndarray) – Vector representation of the right-hand-side source.
Returns:	result – Copied solution to the linear algebra problem.
Return type:	numpy.ndarray, double data type

imagine.tools.mpi_slogdet(data)[source]¶

Computes log determinant according to simple LU Gauss method WITHOUT pivot permutation.

Parameters:	data (numpy.ndarray) – Array of data distributed over different processes.
Returns:	sign (numpy.ndarray) – Single element numpy array containing the sign of the determinant (copied to all nodes). logdet (numpy.ndarray) – Single element numpy array containing the log of the determinant (copied to all nodes).

imagine.tools.mpi_global(data)[source]¶

Gathers data spread accross different processes.

Parameters:	data (numpy.ndarray) – Array of data distributed over different processes.
Returns:	global array – The root process returns the gathered data, other processes return None.
Return type:	numpy.ndarray

imagine.tools.mpi_local(data)[source]¶

Distributes data over available processes

Parameters:	data (numpy.ndarray) – Array of data to be distributed over available processes.
Returns:	local array – Return the distributed array on all preocesses.
Return type:	numpy.ndarray

imagine.tools.pshape(data)[source]¶: imagine.tools.mpi_helper.mpi_shape() or numpy.ndarray.shape() depending on imagine.rc['distributed_arrays'].

imagine.tools.prosecutor(data)[source]¶: imagine.tools.mpi_helper.mpi_prosecutor() or nothing depending on imagine.rc['distributed_arrays'].

imagine.tools.pmean(data)[source]¶: imagine.tools.mpi_helper.mpi_mean() or numpy.mean() depending on imagine.rc['distributed_arrays'].

imagine.tools.pvar(data)[source]¶: imagine.tools.mpi_helper.mpi_var() or numpy.var() depending on imagine.rc['distributed_arrays'].

imagine.tools.ptrans(data)[source]¶: imagine.tools.mpi_helper.mpi_mean() or numpy.ndarray.T() depending on imagine.rc['distributed_arrays'].

imagine.tools.pmult(left, right)[source]¶: imagine.tools.mpi_helper.mpi_mult() or numpy.matmul() depending on imagine.rc['distributed_arrays'].

imagine.tools.ptrace(data)[source]¶: imagine.tools.mpi_helper.mpi_trace() or numpy.trace() depending on imagine.rc['distributed_arrays'].

imagine.tools.pdiag(data)[source]¶: imagine.tools.mpi_helper.mpi_diag() or numpy.diagonal() depending on imagine.rc['distributed_arrays'].

imagine.tools.pnewdiag(data)[source]¶: imagine.tools.mpi_helper.mpi_new_diag() or numpy.diag() depending on imagine.rc['distributed_arrays'].

imagine.tools.peye(size)[source]¶: imagine.tools.mpi_helper.mpi_eye() or numpy.eye() depending on imagine.rc['distributed_arrays'].

imagine.tools.distribute_matrix(full_matrix)[source]¶: imagine.tools.mpi_helper.mpi_distribute_matrix() or nothing depending on imagine.rc['distributed_arrays'].

imagine.tools.plu_solve(operator, source)[source]¶

imagine.tools.mpi_helper.mpi_lu_solve() or numpy.linalg.solve() depending on imagine.rc['distributed_arrays'].

Notes

In the non-distributed case, the source is transposed before the calculation

imagine.tools.pslogdet(data)[source]¶: imagine.tools.mpi_helper.mpi_slogdet() or numpy.linalg.slogdet() depending on imagine.rc['distributed_arrays'].

imagine.tools.pglobal(data)[source]¶: imagine.tools.mpi_helper.mpi_global() or nothing depending on imagine.rc['distributed_arrays'].

imagine.tools.plocal(data)[source]¶: imagine.tools.mpi_helper.mpi_local() or nothing depending on imagine.rc['distributed_arrays'].

imagine.tools.ensemble_seed_generator(size)[source]¶

Generates fixed random seed values for each realization in ensemble.

Parameters:	size (int) – Number of realizations in ensemble.
Returns:	seeds – An array of random seeds.
Return type:	numpy.ndarray

imagine.tools.seed_generator(trigger)[source]¶

Sets trigger as 0 will generate time-thread dependent method otherwise returns the trigger as seed.

Parameters:	trigger (int) – Non-negative pre-fixed seed.
Returns:	seed – A random seed value.
Return type:	int