imagine.tools package¶
Submodules¶
imagine.tools.carrier_mapper module¶
The mapper module is designed for implementing distribution mapping functions.
-
imagine.tools.carrier_mapper.
exp_mapper
(x, a=0, b=1)[source]¶ Maps x from [0, 1] into the interval [exp(a), exp(b)].
Parameters: - x (float) – The variable to be mapped.
- a (float) – The lower parameter value limit.
- b (float) – The upper parameter value limit.
Returns: The mapped parameter value.
Return type: numpy.float64
-
imagine.tools.carrier_mapper.
unity_mapper
(x, a=0.0, b=1.0)[source]¶ Maps x from [0, 1] into the interval [a, b].
Parameters: - x (float) – The variable to be mapped.
- a (float) – The lower parameter value limit.
- b (float) – The upper parameter value limit.
Returns: The mapped parameter value.
Return type: numpy.float64
imagine.tools.class_tools module¶
imagine.tools.config module¶
IMAGINE global configuration¶
The default behaviour of some aspects of IMAGINE can be set using global rc configuration variables.
These can be accessed and modified using the
imagine.rc
dictionary or setting the
corresponding environment variables (named ‘IMAGINE_’+RC_VAR_NAME).
For example to set the default path for the hamx executable, one can either do:
import imagine
imagine.rc.hammurabi_hamx_path = 'my_desired_path'
or, alternatively, set this as an environment variable before the exectution of the script:
export IMAGINE_HAMMURABI_HAMX_PATH='my_desired_path'
The following list describes all the available global settings variables.
- IMAGINE rc variables
- temp_dir
- Default temporary directory used by IMAGINE. If not set, a temporary directory will be created at /tmp/ with a safe name.
- distributed_arrays
- If True, arrays containing covariances are distributed among different MPI processes (and so are the corresponding array operations).
- pipeline_default_seed
- The default value for the master seed used by a Pipeline object
(see
Pipeline.master_seed
). - pipeline_distribute_ensemble
- The default value of
(see
Pipeline.distribute_ensemble
). - hammurabi_hamx_path
- Default location of the Hammurabi X executable file, hamx.
imagine.tools.covariance_estimator module¶
This module contains estimation algorithms for the covariance matrix based on a finite number of samples.
For the testing suits, please turn to “imagine/tests/tools_tests.py”.
-
imagine.tools.covariance_estimator.
empirical_cov
(data)[source]¶ Empirical covariance estimator
Given some data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated from
\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{N} \sum_{i=1}^N D_{ij}\]\[\text{cov} = \tfrac{1}{N} U^T U\]Notes
While conceptually simple, this is usually not the best option.
Parameters: data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size). Returns: cov – Distributed (not copied) covariance matrix in global shape (data size, data size), each node takes part of the rows. Return type: numpy.ndarray
-
imagine.tools.covariance_estimator.
empirical_mcov
(data)[source]¶ Empirical covariance estimator
Given some data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated from
\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{N} \sum_{i=1}^N D_{ij}\]\[\text{cov} = \tfrac{1}{N} U^T U\]Notes
While conceptually simple, this is usually not the best option.
Parameters: data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size). Returns: - mean (numpy.ndarray) – Copied ensemble mean (on all nodes).
- cov (numpy.ndarray) – Distributed (not copied) covariance matrix in global shape (data size, data size), each node takes part of the rows.
-
imagine.tools.covariance_estimator.
oas_cov
(data)[source]¶ Estimate covariance with the Oracle Approximating Shrinkage algorithm.
Given some \(n\times m\) data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated in the following way.
\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{n} \sum_{i=1}^n D_{ij}\]Let
\[S = \tfrac{1}{n} U^T U\,,\; T = \text{tr}(S)\quad\text{and}\quad V = \text{tr}(S^2)\]\[\tilde\rho = \min\left[1,\frac{(1-2/m)V + T^2}{ (n+1-2/m)(V-T^2/m)}\right]\]The covariance is given by
\[\text{cov}_\text{OAS} = (1-\rho)S + \tfrac{1}{N} \rho T I_m\]Parameters: data (numpy.ndarray) – Distributed data in global shape (ensemble_size, data_size). Returns: cov – Covariance matrix in global shape (data_size, data_size). Return type: numpy.ndarray
-
imagine.tools.covariance_estimator.
oas_mcov
(data)[source]¶ Estimate covariance with the Oracle Approximating Shrinkage algorithm.
See imagine.tools.covariance_estimator.oas_cov for details. This function aditionally returns the computed ensemble mean.
Parameters: data (numpy.ndarray) – Distributed data in global shape (ensemble_size, data_size). Returns: - mean (numpy.ndarray) – Copied ensemble mean (on all nodes).
- cov (numpy.ndarray) – Distributed covariance matrix in shape (data_size, data_size).
-
imagine.tools.covariance_estimator.
diagonal_cov
(data)[source]¶ Assumes the covariance matrix is simply a diagonal matrix whose values correspond to the sample variances
Parameters: data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size). Returns: cov – Covariance matrix Return type: numpy.ndarray
-
imagine.tools.covariance_estimator.
diagonal_mcov
(data)[source]¶ Assumes the covariance matrix is simply a diagonal matrix whose values correspond to the sample variances
Parameters: data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size). Returns: - mean (numpy.ndarray) – Ensemble mean
- cov (numpy.ndarray) – Covariance matrix
imagine.tools.io module¶
-
imagine.tools.io.
save_pipeline
(pipeline, use_hickle=False)[source]¶ Saves the state of a Pipeline object
Parameters: - pipeline (imagine.pipelines.pipeline.Pipeline) – The pipeline object one would like to save
- use_hickle (bool) – If False (default) the state is saved using the cloudpickle package. Otherwise, experimental support to hickle is enabled.
imagine.tools.masker module¶
This module defines methods related to masking out distributed data and/or the associated covariance matrix. For the testing suits, please turn to “imagine/tests/tools_tests.py”.
Implemented with numpy.ndarray raw data.
-
imagine.tools.masker.
mask_cov
(cov, mask)[source]¶ Applies mask to the observable covariance.
Parameters: - cov ((distributed) numpy.ndarray) – Covariance matrix of observables in global shape (data size, data size) each node contains part of the global rows (if imagine.rc[‘distributed_arrays’]=True).
- mask (numpy.ndarray) – Copied mask map in shape (1, data size).
Returns: masked_cov – Masked covariance matrix of shape (masked data size, masked data size).
Return type:
-
imagine.tools.masker.
mask_var
(var, mask)[source]¶ Applies a mask to an observable.
Parameters: - var (numpy.ndarray) – Variance data
- mask (numpy.ndarray) – Copied mask map in shape (1, data size) on each node.
Returns: Masked observable of shape (masked data size).
Return type:
-
imagine.tools.masker.
mask_obs
(obs, mask)[source]¶ Applies a mask to an observable.
Parameters: - data (distributed numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size) each node contains part of the global rows.
- mask (numpy.ndarray) – Copied mask map in shape (1, data size) on each node.
Returns: Masked observable of shape (ensemble size, masked data size).
Return type:
imagine.tools.misc module¶
-
imagine.tools.misc.
adjust_error_intervals
(value, errlo, errup, sdigits=2, return_ndec=False)[source]¶ Takes the value of a quantity value with associated errors errlo and errup; and prepares them to be reported as \(v^{+err\,up}_{-err\,down}\). This is done by adjusting the number of decimal places of all the argumetns so that the errors have at least sdigits significant digits. Optionally, this number of decimal places may be returned.
Parameters: - value (int or float or astropy.Quantity) – Value of quantity.
- errlo, errup (int or float or astropy.Quantity) – Associated lower and upper errors of value.
- sdigits (int, optional) – Minimum number of significant digits in the errors
- return_ndec (bool, optional) – If True, also returns the number of decimal points used
Returns: - value (float) – Rounded value
- errlo, errup (float) – Assimetric error values
- n (int) – If return_ndec is True, the number of decimal places is returned
-
imagine.tools.misc.
is_notebook
()[source]¶ Finds out whether python is running in a Jupyter notebook or as a shell.
-
imagine.tools.misc.
unit_checker
(unit, list_of_quant)[source]¶ Checks the consistency of units of a list of quantities, converting them all to the same units, if needed.
Parameters: - unit (astropy.Unit) – Unit to be used for the quantities in the list. If set to None, the units of the first list item are used.
- list_of_quant (list) – List of quantities to be checked.
Returns: - unit (astropy.Unit) – The common unit used
- list_of_values – Contains the quantities of list_of_quant converted to floats using the common unit unit
imagine.tools.mpi_helper module¶
This MPI helper module is designed for parallel computing and data handling.
For the testing suits, please turn to “imagine/tests/tools_tests.py”.
-
imagine.tools.mpi_helper.
mpi_arrange
(size)[source]¶ With known global size, number of mpi nodes, and current rank, returns the begin and end index for distributing the global size.
Parameters: size (integer (positive)) – The total size of target to be distributed. It can be a row size or a column size. Returns: result – The begin and end index [begin,end] for slicing the target. Return type: numpy.uint
-
imagine.tools.mpi_helper.
mpi_shape
(data)[source]¶ Returns the global number of rows and columns of given distributed data.
Parameters: data (numpy.ndarray) – The distributed data. Returns: result – Glboal row and column number. Return type: numpy.uint
-
imagine.tools.mpi_helper.
mpi_prosecutor
(data)[source]¶ Check if the data is distributed in the correct way covariance matrix is distributed exactly the same manner as multi-realization data if not, an error will be raised.
Parameters: data (numpy.ndarray) – The distributed data to be examined.
-
imagine.tools.mpi_helper.
mpi_mean
(data)[source]¶ calculate the mean of distributed array prefers averaging along column direction but if given (1,n) data shape the average is done along row direction the result note that the numerical values will be converted into double
Parameters: data (numpy.ndarray) – Distributed data. Returns: result – Copied data mean, which means the mean is copied to all nodes. Return type: numpy.ndarray
-
imagine.tools.mpi_helper.
mpi_trans
(data)[source]¶ Transpose distributed data, note that the numerical values will be converted into double.
Parameters: data (numpy.ndarray) – Distributed data. Returns: result – Transposed data in distribution. Return type: numpy.ndarray
-
imagine.tools.mpi_helper.
mpi_mult
(left, right)[source]¶ Calculate matrix multiplication of two distributed data, the result is data1*data2 in multi-node distribution note that the numerical values will be converted into double. We send the distributed right rows into other nodes (aka cannon method).
Parameters: - left (numpy.ndarray) – Distributed left side data.
- right (numpy.ndarray) – Distributed right side data.
Returns: result – Distributed multiplication result.
Return type:
-
imagine.tools.mpi_helper.
mpi_trace
(data)[source]¶ Computes the trace of the given distributed data.
Parameters: data (numpy.ndarray) – Array of data distributed over different processes. Returns: result – Copied trace of given data. Return type: numpy.float64
-
imagine.tools.mpi_helper.
mpi_diag
(data)[source]¶ Gets the diagonal of a distributed matrix
Parameters: data (numpy.ndarray) – Array of data distributed over different processes. Returns: result – Diagonal Return type: numpy.ndarray
-
imagine.tools.mpi_helper.
mpi_new_diag
(data)[source]¶ Constructs a distributed matrix with a given diagonal
Parameters: data (numpy.ndarray) – Array of data distributed over different processes. Returns: result – Diagonal Return type: numpy.ndarray
-
imagine.tools.mpi_helper.
mpi_eye
(size)[source]¶ Produces an eye matrix according of shape (size,size) distributed over the various running MPI processes
Parameters: size (integer) – Distributed matrix size. Returns: result – Distributed eye matrix. Return type: numpy.ndarray, double data type
-
imagine.tools.mpi_helper.
mpi_distribute_matrix
(full_matrix)[source]¶ Parameters: size (integer) – Distributed matrix size. Returns: result – Distributed eye matrix. Return type: numpy.ndarray, double data type
-
imagine.tools.mpi_helper.
mpi_lu_solve
(operator, source)[source]¶ Simple LU Gauss method WITHOUT pivot permutation.
Parameters: - operator (distributed numpy.ndarray) – Matrix representation of the left-hand-side operator.
- source (copied numpy.ndarray) – Vector representation of the right-hand-side source.
Returns: result – Copied solution to the linear algebra problem.
Return type: numpy.ndarray, double data type
-
imagine.tools.mpi_helper.
mpi_slogdet
(data)[source]¶ Computes log determinant according to simple LU Gauss method WITHOUT pivot permutation.
Parameters: data (numpy.ndarray) – Array of data distributed over different processes. Returns: - sign (numpy.ndarray) – Single element numpy array containing the sign of the determinant (copied to all nodes).
- logdet (numpy.ndarray) – Single element numpy array containing the log of the determinant (copied to all nodes).
-
imagine.tools.mpi_helper.
mpi_global
(data)[source]¶ Gathers data spread accross different processes.
Parameters: data (numpy.ndarray) – Array of data distributed over different processes. Returns: global array – The root process returns the gathered data, other processes return None. Return type: numpy.ndarray
-
imagine.tools.mpi_helper.
mpi_local
(data)[source]¶ Distributes data over available processes
Parameters: data (numpy.ndarray) – Array of data to be distributed over available processes. Returns: local array – Return the distributed array on all preocesses. Return type: numpy.ndarray
imagine.tools.parallel_ops module¶
Interface module which allows automatically switching between the routines in
the imagine.tools.mpi_helper
module and their:py:mod:numpy
or pure Python equivalents, depending on the contents of
imagine.rc['distributed_arrays']
-
imagine.tools.parallel_ops.
pshape
(data)[source]¶ imagine.tools.mpi_helper.mpi_shape()
ornumpy.ndarray.shape()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
prosecutor
(data)[source]¶ imagine.tools.mpi_helper.mpi_prosecutor()
or nothing depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
pmean
(data)[source]¶ imagine.tools.mpi_helper.mpi_mean()
ornumpy.mean()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
pvar
(data)[source]¶ imagine.tools.mpi_helper.mpi_var()
ornumpy.var()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
ptrans
(data)[source]¶ imagine.tools.mpi_helper.mpi_mean()
ornumpy.ndarray.T()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
pmult
(left, right)[source]¶ imagine.tools.mpi_helper.mpi_mult()
ornumpy.matmul()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
ptrace
(data)[source]¶ imagine.tools.mpi_helper.mpi_trace()
ornumpy.trace()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
pdiag
(data)[source]¶ imagine.tools.mpi_helper.mpi_diag()
ornumpy.diagonal()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
pnewdiag
(data)[source]¶ imagine.tools.mpi_helper.mpi_new_diag()
ornumpy.diag()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
peye
(size)[source]¶ imagine.tools.mpi_helper.mpi_eye()
ornumpy.eye()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
distribute_matrix
(full_matrix)[source]¶ imagine.tools.mpi_helper.mpi_distribute_matrix()
or nothing depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
plu_solve
(operator, source)[source]¶ imagine.tools.mpi_helper.mpi_lu_solve()
ornumpy.linalg.solve()
depending onimagine.rc['distributed_arrays']
.Notes
In the non-distributed case, the source is transposed before the calculation
-
imagine.tools.parallel_ops.
pslogdet
(data)[source]¶ imagine.tools.mpi_helper.mpi_slogdet()
ornumpy.linalg.slogdet()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
pglobal
(data)[source]¶ imagine.tools.mpi_helper.mpi_global()
or nothing depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.parallel_ops.
plocal
(data)[source]¶ imagine.tools.mpi_helper.mpi_local()
or nothing depending onimagine.rc['distributed_arrays']
.
imagine.tools.random_seed module¶
This module provides a time-thread dependent seed value.
For the testing suites, please turn to “imagine/tests/tools_tests.py”.
-
imagine.tools.random_seed.
ensemble_seed_generator
(size)[source]¶ Generates fixed random seed values for each realization in ensemble.
Parameters: size (int) – Number of realizations in ensemble. Returns: seeds – An array of random seeds. Return type: numpy.ndarray
imagine.tools.timer module¶
Timer class is designed for time recording.
-
class
imagine.tools.timer.
Timer
[source]¶ Bases:
object
Class designed for time recording.
Simply provide an event name to the tick method to start recording. The tock method stops the recording and the record property allow one to access the recorded time.
-
tick
(event)[source]¶ Starts timing with a given event name.
Parameters: event (str) – Event name (will be key of the record attribute).
-
tock
(event)[source]¶ Stops timing of the given event.
Parameters: event (str) – Event name (will be key of the record attribute).
-
record
¶ Dictionary of recorded times using event name as keys.
-
imagine.tools.visualization module¶
This module contains convenient standard plotting functions
-
imagine.tools.visualization.
corner_plot
(pipeline=None, truths_dict=None, show_sigma=True, param_names=None, table=None, samples=None, live_samples=None, **kwargs)[source]¶ Makes a corner plot.
If a
Pipeline
object is supplied, it will be used to collect all the necessary information. Alternatively, one can supply either aastropy.table.Table
or anumpy.ndarray
containing with different parameters as columns.The plotting is done using the
corner
package, and extra keyword parameters are passed directly to itParameters: - pipeline (imagine.pipelines.pipeline.Pipeline) – Pipeline from which samples are read in the default case.
- truths_dict (dict) – Dictionary containing active parameters as keys and the expected values as values
- show_sigma (bool) – If True, plots the 1, 2 and 3-sigma contours.
- param_names (list) – If present, only parameters from this list will be plotted
- table (astropy.Table) – If present, samples from this table are used instead of the Pipeline.
- samples (numpy.ndarray) – If present, samples are read from this array
- live_samples (numpy.ndarray) – If this array is present, a second set of samples are shown in the plots.
Returns: corner_fig – Figure containing the generated corner plot
Return type: matplotlib.Figure
-
imagine.tools.visualization.
show_likelihood_convergence_report
(rep, cmap='cmr.chroma')[source]¶ Prepares a standard set of plots of a likelihood convergence report (produced by the
Pipeline.prepare_likelihood_convergence_report()
method).Parameters: cmap (str) – Colormap to be used for the lineplots
-
imagine.tools.visualization.
show_observable
(obs, realization=0, title=None, cartesian_axes='yz', show_variances=False, is_covariance=False, **kwargs)[source]¶ Displays the contents of a single realisation of an Observable object.
Parameters: - obs (imagine.observables.observable.Observable) – Observable object whose contents one wants to plot
- realization (int) – Index of the ensemble realization to be plotted
- cartesian_axes (str) – If plotting a tabular observable using cartesian coordinates, this allows selecting which two axes should be used for the plot. E.g. ‘xy’, ‘zy’, ‘xz’. Default: ‘yz’.
- **kwargs – Parameters to be passed to the apropriate plotting routine (either healpy.visufunc.mollview or matplotlib.pyplot.imshow).
-
imagine.tools.visualization.
show_observable_dict
(obs_dict, max_realizations=None, show_variances=False, **kwargs)[source]¶ Plots the contents of an ObservableDict object.
Parameters: - obs_dict (imagine.observables.observable.ObservableDict) – ObservableDict object whose contents one wants to plot.
- max_realization (int) – Index of the maximum ensemble realization to be plotted. If None, the whole ensemble is shown.
- show_variances (bool) – If True and if obs_dict is a Covariances object, shows variance maps instead of covariance matrix
- **kwargs – Parameters to be passed to the apropriate plotting routine
(either
healpy.visufunc.mollview()
ormatplotlib.pyplot.imshow()
).
-
imagine.tools.visualization.
trace_plot
(samples=None, live_samples=None, likelihood=None, lnX=None, parameter_names=None, cmap='cmr.ocean', color_live='#e34a33', fig=None, hist_bins=30)[source]¶ Produces a set of “trace plots” for a nested sampling run, showing the position of “dead” points as a function of prior mass. Also plots the distributions of dead points accumulated until now, and the distributions of live points.
Parameters: - samples (numpy.ndarray) – (Nsamples, Npars)-array containing the rejected points
- likelihood (numpy.ndarray) – Nsamples-array containing the log likelihood values
- lnX (numpy.ndarray) – Nsamples-array containing the “prior mass”
- parameter_names (list or tuple) – List of the nPars active parameter names
- live_samples (numpy.ndarray, optional) – (Nsamples, Npars)-array containing the present live points
- cmap (str) – Name of the colormap to be used
- color_live (str) – Colour used for the live points distributions (if those are present)
- fig (matplotlib.Figure) – If a previous figure was generated, it can be passed to this function for update using this argument
- hist_bins (int) – The number of bins used for the histograms
Returns: fig – The figure produced
Return type: matplotlib.Figure
Module contents¶
-
class
imagine.tools.
Timer
[source]¶ Bases:
object
Class designed for time recording.
Simply provide an event name to the tick method to start recording. The tock method stops the recording and the record property allow one to access the recorded time.
-
tick
(event)[source]¶ Starts timing with a given event name.
Parameters: event (str) – Event name (will be key of the record attribute).
-
tock
(event)[source]¶ Stops timing of the given event.
Parameters: event (str) – Event name (will be key of the record attribute).
-
record
¶ Dictionary of recorded times using event name as keys.
-
-
imagine.tools.
exp_mapper
(x, a=0, b=1)[source]¶ Maps x from [0, 1] into the interval [exp(a), exp(b)].
Parameters: - x (float) – The variable to be mapped.
- a (float) – The lower parameter value limit.
- b (float) – The upper parameter value limit.
Returns: The mapped parameter value.
Return type: numpy.float64
-
imagine.tools.
unity_mapper
(x, a=0.0, b=1.0)[source]¶ Maps x from [0, 1] into the interval [a, b].
Parameters: - x (float) – The variable to be mapped.
- a (float) – The lower parameter value limit.
- b (float) – The upper parameter value limit.
Returns: The mapped parameter value.
Return type: numpy.float64
-
imagine.tools.
empirical_cov
(data)[source]¶ Empirical covariance estimator
Given some data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated from
\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{N} \sum_{i=1}^N D_{ij}\]\[\text{cov} = \tfrac{1}{N} U^T U\]Notes
While conceptually simple, this is usually not the best option.
Parameters: data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size). Returns: cov – Distributed (not copied) covariance matrix in global shape (data size, data size), each node takes part of the rows. Return type: numpy.ndarray
-
imagine.tools.
empirical_mcov
(data)[source]¶ Empirical covariance estimator
Given some data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated from
\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{N} \sum_{i=1}^N D_{ij}\]\[\text{cov} = \tfrac{1}{N} U^T U\]Notes
While conceptually simple, this is usually not the best option.
Parameters: data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size). Returns: - mean (numpy.ndarray) – Copied ensemble mean (on all nodes).
- cov (numpy.ndarray) – Distributed (not copied) covariance matrix in global shape (data size, data size), each node takes part of the rows.
-
imagine.tools.
oas_cov
(data)[source]¶ Estimate covariance with the Oracle Approximating Shrinkage algorithm.
Given some \(n\times m\) data matrix, \(D\), where rows are different samples and columns different properties, the covariance can be estimated in the following way.
\[U_{ij} = D_{ij} - \overline{D}_j\,,\; \text{with}\; \overline{D}_j=\tfrac{1}{n} \sum_{i=1}^n D_{ij}\]Let
\[S = \tfrac{1}{n} U^T U\,,\; T = \text{tr}(S)\quad\text{and}\quad V = \text{tr}(S^2)\]\[\tilde\rho = \min\left[1,\frac{(1-2/m)V + T^2}{ (n+1-2/m)(V-T^2/m)}\right]\]The covariance is given by
\[\text{cov}_\text{OAS} = (1-\rho)S + \tfrac{1}{N} \rho T I_m\]Parameters: data (numpy.ndarray) – Distributed data in global shape (ensemble_size, data_size). Returns: cov – Covariance matrix in global shape (data_size, data_size). Return type: numpy.ndarray
-
imagine.tools.
oas_mcov
(data)[source]¶ Estimate covariance with the Oracle Approximating Shrinkage algorithm.
See imagine.tools.covariance_estimator.oas_cov for details. This function aditionally returns the computed ensemble mean.
Parameters: data (numpy.ndarray) – Distributed data in global shape (ensemble_size, data_size). Returns: - mean (numpy.ndarray) – Copied ensemble mean (on all nodes).
- cov (numpy.ndarray) – Distributed covariance matrix in shape (data_size, data_size).
-
imagine.tools.
diagonal_cov
(data)[source]¶ Assumes the covariance matrix is simply a diagonal matrix whose values correspond to the sample variances
Parameters: data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size). Returns: cov – Covariance matrix Return type: numpy.ndarray
-
imagine.tools.
diagonal_mcov
(data)[source]¶ Assumes the covariance matrix is simply a diagonal matrix whose values correspond to the sample variances
Parameters: data (numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size). Returns: - mean (numpy.ndarray) – Ensemble mean
- cov (numpy.ndarray) – Covariance matrix
-
imagine.tools.
save_pipeline
(pipeline, use_hickle=False)[source]¶ Saves the state of a Pipeline object
Parameters: - pipeline (imagine.pipelines.pipeline.Pipeline) – The pipeline object one would like to save
- use_hickle (bool) – If False (default) the state is saved using the cloudpickle package. Otherwise, experimental support to hickle is enabled.
-
imagine.tools.
load_pipeline
(directory_path='.')[source]¶ Loads the state of a Pipeline object
Parameters: directory_path (str) – Path to the directory where the Pipeline state should be saved
-
imagine.tools.
mask_cov
(cov, mask)[source]¶ Applies mask to the observable covariance.
Parameters: - cov ((distributed) numpy.ndarray) – Covariance matrix of observables in global shape (data size, data size) each node contains part of the global rows (if imagine.rc[‘distributed_arrays’]=True).
- mask (numpy.ndarray) – Copied mask map in shape (1, data size).
Returns: masked_cov – Masked covariance matrix of shape (masked data size, masked data size).
Return type:
-
imagine.tools.
mask_var
(var, mask)[source]¶ Applies a mask to an observable.
Parameters: - var (numpy.ndarray) – Variance data
- mask (numpy.ndarray) – Copied mask map in shape (1, data size) on each node.
Returns: Masked observable of shape (masked data size).
Return type:
-
imagine.tools.
mask_obs
(obs, mask)[source]¶ Applies a mask to an observable.
Parameters: - data (distributed numpy.ndarray) – Ensemble of observables, in global shape (ensemble size, data size) each node contains part of the global rows.
- mask (numpy.ndarray) – Copied mask map in shape (1, data size) on each node.
Returns: Masked observable of shape (ensemble size, masked data size).
Return type:
-
imagine.tools.
mpi_arrange
(size)[source]¶ With known global size, number of mpi nodes, and current rank, returns the begin and end index for distributing the global size.
Parameters: size (integer (positive)) – The total size of target to be distributed. It can be a row size or a column size. Returns: result – The begin and end index [begin,end] for slicing the target. Return type: numpy.uint
-
imagine.tools.
mpi_shape
(data)[source]¶ Returns the global number of rows and columns of given distributed data.
Parameters: data (numpy.ndarray) – The distributed data. Returns: result – Glboal row and column number. Return type: numpy.uint
-
imagine.tools.
mpi_prosecutor
(data)[source]¶ Check if the data is distributed in the correct way covariance matrix is distributed exactly the same manner as multi-realization data if not, an error will be raised.
Parameters: data (numpy.ndarray) – The distributed data to be examined.
-
imagine.tools.
mpi_mean
(data)[source]¶ calculate the mean of distributed array prefers averaging along column direction but if given (1,n) data shape the average is done along row direction the result note that the numerical values will be converted into double
Parameters: data (numpy.ndarray) – Distributed data. Returns: result – Copied data mean, which means the mean is copied to all nodes. Return type: numpy.ndarray
-
imagine.tools.
mpi_trans
(data)[source]¶ Transpose distributed data, note that the numerical values will be converted into double.
Parameters: data (numpy.ndarray) – Distributed data. Returns: result – Transposed data in distribution. Return type: numpy.ndarray
-
imagine.tools.
mpi_mult
(left, right)[source]¶ Calculate matrix multiplication of two distributed data, the result is data1*data2 in multi-node distribution note that the numerical values will be converted into double. We send the distributed right rows into other nodes (aka cannon method).
Parameters: - left (numpy.ndarray) – Distributed left side data.
- right (numpy.ndarray) – Distributed right side data.
Returns: result – Distributed multiplication result.
Return type:
-
imagine.tools.
mpi_trace
(data)[source]¶ Computes the trace of the given distributed data.
Parameters: data (numpy.ndarray) – Array of data distributed over different processes. Returns: result – Copied trace of given data. Return type: numpy.float64
-
imagine.tools.
mpi_diag
(data)[source]¶ Gets the diagonal of a distributed matrix
Parameters: data (numpy.ndarray) – Array of data distributed over different processes. Returns: result – Diagonal Return type: numpy.ndarray
-
imagine.tools.
mpi_new_diag
(data)[source]¶ Constructs a distributed matrix with a given diagonal
Parameters: data (numpy.ndarray) – Array of data distributed over different processes. Returns: result – Diagonal Return type: numpy.ndarray
-
imagine.tools.
mpi_eye
(size)[source]¶ Produces an eye matrix according of shape (size,size) distributed over the various running MPI processes
Parameters: size (integer) – Distributed matrix size. Returns: result – Distributed eye matrix. Return type: numpy.ndarray, double data type
-
imagine.tools.
mpi_distribute_matrix
(full_matrix)[source]¶ Parameters: size (integer) – Distributed matrix size. Returns: result – Distributed eye matrix. Return type: numpy.ndarray, double data type
-
imagine.tools.
mpi_lu_solve
(operator, source)[source]¶ Simple LU Gauss method WITHOUT pivot permutation.
Parameters: - operator (distributed numpy.ndarray) – Matrix representation of the left-hand-side operator.
- source (copied numpy.ndarray) – Vector representation of the right-hand-side source.
Returns: result – Copied solution to the linear algebra problem.
Return type: numpy.ndarray, double data type
-
imagine.tools.
mpi_slogdet
(data)[source]¶ Computes log determinant according to simple LU Gauss method WITHOUT pivot permutation.
Parameters: data (numpy.ndarray) – Array of data distributed over different processes. Returns: - sign (numpy.ndarray) – Single element numpy array containing the sign of the determinant (copied to all nodes).
- logdet (numpy.ndarray) – Single element numpy array containing the log of the determinant (copied to all nodes).
-
imagine.tools.
mpi_global
(data)[source]¶ Gathers data spread accross different processes.
Parameters: data (numpy.ndarray) – Array of data distributed over different processes. Returns: global array – The root process returns the gathered data, other processes return None. Return type: numpy.ndarray
-
imagine.tools.
mpi_local
(data)[source]¶ Distributes data over available processes
Parameters: data (numpy.ndarray) – Array of data to be distributed over available processes. Returns: local array – Return the distributed array on all preocesses. Return type: numpy.ndarray
-
imagine.tools.
pshape
(data)[source]¶ imagine.tools.mpi_helper.mpi_shape()
ornumpy.ndarray.shape()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
prosecutor
(data)[source]¶ imagine.tools.mpi_helper.mpi_prosecutor()
or nothing depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
pmean
(data)[source]¶ imagine.tools.mpi_helper.mpi_mean()
ornumpy.mean()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
pvar
(data)[source]¶ imagine.tools.mpi_helper.mpi_var()
ornumpy.var()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
ptrans
(data)[source]¶ imagine.tools.mpi_helper.mpi_mean()
ornumpy.ndarray.T()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
pmult
(left, right)[source]¶ imagine.tools.mpi_helper.mpi_mult()
ornumpy.matmul()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
ptrace
(data)[source]¶ imagine.tools.mpi_helper.mpi_trace()
ornumpy.trace()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
pdiag
(data)[source]¶ imagine.tools.mpi_helper.mpi_diag()
ornumpy.diagonal()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
pnewdiag
(data)[source]¶ imagine.tools.mpi_helper.mpi_new_diag()
ornumpy.diag()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
peye
(size)[source]¶ imagine.tools.mpi_helper.mpi_eye()
ornumpy.eye()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
distribute_matrix
(full_matrix)[source]¶ imagine.tools.mpi_helper.mpi_distribute_matrix()
or nothing depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
plu_solve
(operator, source)[source]¶ imagine.tools.mpi_helper.mpi_lu_solve()
ornumpy.linalg.solve()
depending onimagine.rc['distributed_arrays']
.Notes
In the non-distributed case, the source is transposed before the calculation
-
imagine.tools.
pslogdet
(data)[source]¶ imagine.tools.mpi_helper.mpi_slogdet()
ornumpy.linalg.slogdet()
depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
pglobal
(data)[source]¶ imagine.tools.mpi_helper.mpi_global()
or nothing depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
plocal
(data)[source]¶ imagine.tools.mpi_helper.mpi_local()
or nothing depending onimagine.rc['distributed_arrays']
.
-
imagine.tools.
ensemble_seed_generator
(size)[source]¶ Generates fixed random seed values for each realization in ensemble.
Parameters: size (int) – Number of realizations in ensemble. Returns: seeds – An array of random seeds. Return type: numpy.ndarray