IMAGINE Components¶

In the following sections we describe each of the basic components of IMAGINE. We also demonstrate how to write wrappers that allow the inclusion of external (pre-existing) code, and provide code templates for this.

Contents

Fields ¶

In IMAGINE terminology, a field refers to any computational model of a spatially varying physical quantity, such as the Galactic Magnetic Field (GMF), the thermal electron distribution, or the Cosmic Ray (CR) distribution. Generally, a field object will have a set of parameters — e.g. a GMF field object may have a pitch angle, scale radius, amplitude, etc. Field objects are used as inputs by the Simulators, which simulate those physical models, i.e. they allow constructing observables based on a set models.

During the sampling, the Pipeline does not handle fields directly but instead relies on Field Factory objects. While the field objects will do the actual computation of the physical field, given a set of physical parameters and a coordinate grid, the field factory objects take care of book-keeping tasks: they hold the parameter ranges, default values (in case of inactive parameters) and Priors associated with each parameter of that field.

To convert ones own model into a IMAGINE-compatible field, one must create a subclass of one of the base classes available the imagine.fields.base_fields, most likely using one of the available templates (discussed in the sections below) to write a wrapper to the original code. If the basic field type one is interested in is not available as a basic field, one can create it directly subclassing imagine.fields.field.Field — and if this could benefit the wider community, please consider submitting a pull request or openning an issue requesting the inclusion of the new field type!

It is assumed that Field objects can be expressed as a parametrised mapping of a coordinate grid into a physical field. The grid is represented by a IMAGINE Grid object, discussed in detail in the next section. If the field is of random or stochastic nature (e.g. the density field of a turbulent medium), IMAGINE will compute a finite ensemble of different realisations which will later be used in the inference to determine the likelihood of the actual observation, accounting for the model’s expected variability.

To test a Field class, FieldFoo, one can instantiate the field object:

bar = FieldFoo(grid=example_grid, parameters={'answer': 42*u.cm}, ensemble_size=2)

where example_grid is a previously instantiated grid object. The argument parameters receives a dictionary of all the parameters used by FieldFoo, these are usually expressed as dimensional quantities (using astropy.units). Finally, the argument ensemble_size, as the name suggests allows requestion a number of different realisations of the field (for non-stochastic fields, all these will be references to the same data).

To further illustrate, assuming we are dealing with a scalar, the (spherical) radial dependence of the above defined bar can be easily plotted using:

import matplotlib.pyplot as plt
bar_data = bar.get_data(ensemble_index)
plt.plot(bar.grid.r_spherical.ravel(), bar_data.ravel())

The design of any field is done writing a subclass of one of the classes in imagine.fields.base_fields or imagine.Field that overrides the method compute_field(seed), using it to compute the field on each spatial point. For this, the coordinate grid on which the field should be evaluated can be accessed from self.grid and the parameters from self.parameters. The same parameters must be listed in the PARAMETER_NAMES field class attribute (see the templates below for examples).

Grid ¶

Field objects require (with the exception of Dummy fields) a coordinate grid to operate. In IMAGINE this is expressed as an instance of the imagine.fields.grid.BaseGrid class, which represents coordinates as a set of three 3-dimensional arrays. The grid object supports cartesian, cylindrical and spherical coordinate systems, handling any conversions between these automatically through the properties.

The convention is that \(0\) of the coordinates corresponds to the Galaxy (or galaxy) centre, with the \(z\) coordinate giving the distance to the midplane.

To construct a grid with uniformly-distributed coordinates one can use the imagine.fields.UniformGrid that accompanies IMAGINE. For example, one can create a grid where the cylindrical coordinates are equally spaced using:

from imagine.fields import UniformGrid
cylindrical_grid = UniformGrid(box=[[0.25*u.kpc, 15*u.kpc],
                                    [-180*u.deg, np.pi*u.rad],
                                    [-15*u.kpc, 15*u.kpc]],
                               resolution = [9,12,9],
                               grid_type = 'cylindrical')

The box argument contains the lower and upper limits of the coordinates (respectively \(r\), \(\phi\) and \(z\)), resolution specifies the number of points for each dimension, and grid_type chooses this to be cylindrical coordinates.

The coordinate grid can be accessed through the properties cylindrical_grid.x, cylindrical_grid.y, cylindrical_grid.z, cylindrical_grid.r_cylindrical, cylindrical_grid.r_spherical, cylindrical_grid.theta (polar angle), and cylindrical_grid.phi (azimuthal angle), with conversions handled automatically. Thus, if one wants to access, for instance, the corresponding \(x\) cartesian coordinate values, this can be done simply using:

cylindrical_grid.x[:,:,:]

To create a personalised (non-uniform) grid, one needs to subclass imagine.fields.grid.BaseGrid and override the method generate_coordinates. The UniformGrid class should itself provide a good example/template of how to do this.

Thermal electrons ¶

A new model for the distribution of thermal electrons can be introduced subclassing imagine.fields.ThermalElectronDensityField according to the template below.

from imagine.fields import ThermalElectronDensityField
import numpy as np
import MY_GALAXY_MODEL # Substitute this by your own code


class ThermalElectronsDensityTemplate(ThermalElectronDensityField):
    """ Here comes the description of the electron density model """

    # Class attributes
    NAME = 'name_of_the_thermal_electrons_field'

    # Is this field stochastic or not. Only necessary if True
    STOCHASTIC_FIELD = True
    # If there are any dependencies, they should be included in this list
    DEPENDENCIES_LIST = []
    # List of all parameters for the field
    PARAMETER_NAMES = ['Parameter_A', 'Parameter_B']

    def compute_field(self, seed):
        # If this is an stochastic field, the integer `seed `must be
        # used to set the random seed for a single realisation.
        # Otherwise, `seed` should be ignored.

        # The coordinates can be accessed from an internal grid object
        x_coord = self.grid.x
        y_coord = self.grid.y
        z_coord = self.grid.y
        # Alternatively, one can use cylindrical or spherical coordinates
        r_cyl_coord = self.grid.r_cylindrical
        r_sph_coord = self.grid.r_spherical
        theta_coord = self.grid.theta
        phi_coord = self.grid.phi

        # One can access the parameters supplied in the following way
        param_A = self.parameters['Parameter_A']
        param_B = self.parameters['Parameter_B']

        # Now you can interface with previous code or implement here
        # your own model for the thermal electrons distribution.
        # Returns the electron number density at each grid point
        # in units of (or convertible to) cm**-3
        return MY_GALAXY_MODEL.compute_ne(param_A, param_B,
                                          r_sph_coord, theta_coord, phi_coord,
                                          # If the field is stochastic
                                          # it can use the seed
                                          # to generate a realisation
                                          seed)

Note that the return value of the method compute_field() must be of type astropy.units.Quantity, with shape consistent with the coordinate grid, and units of \(\rm cm^{-3}\).

The template assumes that one already possesses a model for distribution of thermal \(e^-\) in a module MY_GALAXY_MODEL. Such model needs to be able to map an arbitrary coordinate grid into densities.

Of course, one can also write ones model (if it is simple enough) into the derived subclass definition. On example of a class derived from imagine.fields.ThermalElectronDensityField can be seen bellow:

from imagine.fields import ThermalElectronDensityField

class ExponentialThermalElectrons(ThermalElectronDensityField):
    """Example: thermal electron density of an (double) exponential disc"""

    NAME = 'exponential_disc_thermal_electrons'
    PARAMETER_NAMES = ['central_density',
                       'scale_radius',
                       'scale_height']

    def compute_field(self, seed):
        R = self.grid.r_cylindrical
        z = self.grid.z
        Re = self.parameters['scale_radius']
        he = self.parameters['scale_height']
        n0 = self.parameters['central_density']

        return n0*np.exp(-R/Re)*np.exp(-np.abs(z/he))

Magnetic Fields ¶

One can add a new model for magnetic fields subclassing imagine.fields.MagneticField as illustrated in the template below.

from imagine.fields import MagneticField
import astropy.units as u
import numpy as np
# Substitute this by your own code
import MY_GMF_MODEL


class MagneticFieldTemplate(MagneticField):
    """ Here comes the description of the magnetic field model """

    # Class attributes
    NAME = 'name_of_the_magnetic_field'

    # Is this field stochastic or not. Only necessary if True
    STOCHASTIC_FIELD = True
    # If there are any dependencies, they should be included in this list
    DEPENDENCIES_LIST = []
    # List of all parameters for the field
    PARAMETER_NAMES = ['Parameter_A', 'Parameter_B']

    def compute_field(self, seed):
        # If this is an stochastic field, the integer `seed `must be
        # used to set the random seed for a single realisation.
        # Otherwise, `seed` should be ignored.

        # The coordinates can be accessed from an internal grid object
        x_coord = self.grid.x
        y_coord = self.grid.y
        z_coord = self.grid.y
        # Alternatively, one can use cylindrical or spherical coordinates
        r_cyl_coord = self.grid.r_cylindrical
        r_sph_coord = self.grid.r_spherical
        theta_coord = self.grid.theta; phi_coord = self.grid.phi

        # One can access the parameters supplied in the following way
        param_A = self.parameters['Parameter_A']
        param_B = self.parameters['Parameter_B']

        # Now one can interface with previous code, or implement a
        # particular magnetic field
        Bx, By, Bz = MY_GMF_MODEL.compute_B(param_A, param_B,
                                            x_coord, y_coord, z_coord,
                                            # If the field is stochastic
                                            # it can use the seed
                                            # to generate a realisation
                                            seed)

        # Creates an empty output magnetic field Quantity with
        # the correct shape and units
        MF_array = np.empty(self.data_shape) * u.microgauss
        # and saves the pre-computed components
        MF_array[:,:,:,0] = Bx
        MF_array[:,:,:,1] = By
        MF_array[:,:,:,2] = Bz

        return MF_array

It was assumed the existence of a hypothetical module MY_GALAXY_MODEL which, given a set of parameters and three 3-arrays containing coordinate values, computes the magnetic field vector at each point.

The method compute_field() must return an astropy.units.Quantity, with shape (Nx,Ny,Nz,3) where Ni is the corresponding grid resolution and the last axis corresponds to the component (with x, y and z associated with indices 0, 1 and 2, respectively). The Quantity returned by the method must correpond to a magnetic field (i.e. units must be \(\mu\rm G\), \(\rm G\), \(\rm nT\), or similar).

A simple example, comprising a constant magnetic field can be seen below:

from imagine.fields import MagneticField

class ConstantMagneticField(MagneticField):
    """Example: constant magnetic field"""
    field_name = 'constantB'

    NAME = 'constant_B'
    PARAMETER_NAMES = ['Bx', 'By', 'Bz']

    def compute_field(self, seed):
        # Creates an empty array to store the result
        B = np.empty(self.data_shape) * self.parameters['Bx'].unit
        # For a magnetic field, the output must be of shape:
        # (Nx,Ny,Nz,Nc) where Nc is the index of the component.
        # Computes Bx
        B[:, :, :, 0] = self.parameters['Bx']
        # Computes By
        B[:, :, :, 1] = self.parameters['By']
        # Computes Bz
        B[:, :, :, 2] = self.parameters['Bz']
        return B

Cosmic ray electrons ¶

Under development

Dummy ¶

There are situations when one may want to sample parameters which are not used to evaluate a field on a grid before being sent to a Simulator object. One possible use for this is representing a global property of the physical system which affects the observations (for instance, some global property of the ISM or, if modelling an external galaxy, the position of the galaxy).

Another common use of dummy fields is when a field is generated at runtime by the simulator. One example are the built-in fields available in Hammurabi: instead of requesting IMAGINE to produce one of these fields and hand it to Hammurabi to compute the associated synchrotron observables, one can use dummy fields to request Hammurabi to generate these fields internally for a given choice of parameters.

Using dummy fields to bypass the design of a full IMAGINE Field may simplify implementation of a Simulator wrapper and (sometimes) may be offer good performance. However, this practice of generating the actual field within the Simulator breaks the modularity of IMAGINE, and it becomes impossible to check the validity of the results plugging the same field on a different Simulator. Thus, use this with care!

A dummy field can be implemented by subclassing imagine.fields.DummyField as shown bellow.

from imagine.fields import DummyField

class DummyFieldTemplate(DummyField):
    """
    Description of the dummy field
    """

    # Class attributes
    NAME = 'name_of_the_dummy_field'

    @property
    def field_checklist(self):
        return {'Parameter_A': 'parameter_A_settings',
                'Parameter_B': None}
    @property
    def simulator_controllist(self):
        return {'simulator_property_A': 'some_setting'}

Dummy fields are generally Simulator-specific and the properties field_checklist and simulator_controllist are convenient ways of sending extra settings information to the associated Simulator. The values in field_checklist allow transmitting settings associated with specific parameters, while the dictionary simulator_controllist can be used to tell how the presence of the the current dummy field should modify the Simulator’s global settings.

For example, in the case of Hammurabi, a dummy field can be used to request one of its built-in fields, which has to be set up by modifying Hammurabi’s XML parameter files. In this particular case, field_checklist is used to supply the position of a parameter in the XML file, while simulator_controllist indicates how to modify the switch in the XML file that enables this specific built-in field (see the The Hammurabi simulator tutorial for details).

Field Factories ¶

Field Factories, represented in IMAGINE by a imagine.fields.field_factory.FieldFactory object are an additional layer of infrastructure used by the samplers to provide the connection between the sampling of points in the likelihood space and the field object that will be given to the simulator.

A Field Factory object has a list of all of the field’s parameters and a list of the subset of those that are to be varied in the sampler — the latter are called the active parameters. The Field Factory also holds the allowed value ranges for each parameter, the default values (which are used for inactive parameters) and the prior distribution associated with each (active) parameter.

At each step the Pipeline request the Field Factory for the next point in parameter space, and the Factory supplies it the form of a Field object constructed with that particular choice of parameters. This can then be handed by the Pipeline to the Simulator, which computes simulated Observables for comparison with the measured observables in the Likelihood module.

Given a Field YourFieldClass (which must be an instance of a class derived from Field), one can easily construct a FieldFactory object following:

from imagine import FieldFactory
my_factory = FieldFactory(field_class=YourFieldClass,
                          grid=your_grid,
                          active_parameters=['param_1_active'],
                          default_parameters = {'param_2_inactive': value_2,
                                                'param_3_inactive': value_3},
                          priors={'param_1_active': YourPriorChoice})

The object YourPriorChoice must be an instance of imagine.priors.Prior (see section Priors for details). A flat prior (i.e. a uniform distribution, where all parameter values are equally likely) can be set using the imagine.priors.FlatPrior class.

Since re-using a FieldFactory associated with a given Field is not uncommon, it is sometimes convenient to create a specialized subclass for a particular field, where the typical/recommended choices of parameters and priors are saved. This can be done following the template below:

from imagine.fields import FieldFactory
from imagine.priors import FlatPrior, GaussianPrior
# Substitute this by your own code
from MY_PACKAGE import MY_FIELD_CLASS
from MY_PACKAGE import A_std_val, B_std_val, A_min, A_max, B_min, B_max, B_sig

class FieldFactoryTemplate(FieldFactory):
    """Example: field factory for YourFieldClass"""

    # Class attributes
    # Field class this factory uses
    FIELD_CLASS = MY_FIELD_CLASS

    # Default values are used for inactive parameters
    DEFAULT_PARAMETERS = {'Parameter_A': A_std_val,
                          'Parameter_B': B_std_val}

    # All parameters need a range and a prior
    PRIORS = {'Parameter_A': FlatPrior(xmin=A_min, xmax=A_max),
              'Parameter_B': GaussianPrior(mu=B_std_val, sigma=B_sig)}

One can initialize this specialized FieldFactory subclass by supplying the grid on which the corresponding Field will be evaluated:

myFactory = FieldFactoryTemplate(grid=cartesian_grid)

The standard values defined in the subclass can be adjusted in the instance using the attributes myFactory.active_parameters, myFactory.priors and myFactory.default_parameters (note that the latter are dictionaries, but the attribution = is modified so that it updates the corresponding internal dictionaries instead of replacing them).

The factory object myFactory can now be handled to the Pipeline, which will generate new fields by calling the myFactory() object.

Datasets ¶

Dataset objects are helpers used for the inclusion of observational data in IMAGINE. They convert the measured data and uncertainties to a standard format which can be later handed to an observable dictionary. There are two main types of datasets: Tabular datasets and HEALPix datasets.

Repository datasets ¶

A number of ready-to-use datasets are available at the community maintained imagine-datasets repository.

To be able to use this, it is necessary to install the imagine_datasets extension package (note that this comes already installed if you are using the docker image). This can be done executing the following command:

conda activate imagine  # if using conda
pip install git+https://github.com/IMAGINE-Consortium/imagine-datasets.git

Below the usage of an imported dataset is illustrated:

import imagine as img
import imagine_datasets as img_data

# Loads the datasets (will download the datasets!)
dset_fd = img_data.HEALPix.fd.Oppermann2012(Nside=32)
dset_sync = img_data.HEALPix.sync.Planck2018_Commander_U(Nside=32)

# Initialises ObservableDict objects
measurements = img.observables.Measurements(dset_fd, dset_sync)

# Shows the contents
measurements.show()

Observable names ¶

Each type of observable has an agreed/conventional name. The presently available observable names are:

Observable names

‘fd’ - Faraday depth
‘dm’ - Dispersion measure
‘sync’ - Synchrotron emission
- with tag ‘I’ - Total intensity
- with tag ‘Q’ - Stokes Q
- with tag ‘U’ - Stokes U
- with tag ‘PI’ - polarisation intensity
- with tag ‘PA’ - polarisation angle

Tabular datasets ¶

As the name indicates, in tabular datasets the observational data was originally in tabular format, i.e. a table where each row corresponds to a different position in the sky and columns contain (at least) the sky coordinates, the measurement and the associated error. A final requirement is that the dataset is stored in a dictionary-like object i.e. the columns can be selected by column name (for example, a Python dictionary, a Pandas DataFrame, or an astropy Table).

To construct a tabular dataset, one needs to initialize imagine.observables.TabularDataset. Below, a simple example of this, which fetches (using the package astroquery) a catalog from ViZieR and stores in in an IMAGINE tabular dataset object:

from astroquery.vizier import Vizier
from imagine.observables import TabularDataset

# Fetches the catalogue
catalog = Vizier.get_catalogs('J/ApJ/714/1170')[0]

# Loads it to the TabularDataset (the catalogue obj actually contains units)
RM_Mao2010 = TabularDataset(catalog, name='fd', units=catalog['RM'].unit,
                            data_col='RM', err_col='e_RM', tag=None,
                            lat_col='GLAT', lon_col='GLON')

From this point the object RM_Mao2010 can be appended to a Measurements. We refer the reader to the the Including new observational data tutorial and the TabularDataset api documentation and for further details.

HEALPix datasets ¶

HEALPix datasets will generally comprise maps of the full-sky, where HEALPix pixelation is employed. For standard observables, the datasets can be initialized by simply supplying a Quantity array containing the data to the corresponding class. Below some examples, employing the classes FaradayDepthHEALPixDataset, DispersionMeasureHEALPixDataset and SynchrotronHEALPixDataset, respectively:

from imagine.observables import FaradayDepthHEALPixDataset
from imagine.observables import DispersionMeasureHEALPixDataset
from imagine.observables import SynchrotronHEALPixDataset

my_FD_dset = FaradayDepthHEALPixDataset(data=fd_data_array,
                                        error=fd_data_array_error)

my_DM_dset = DispersionMeasureHEALPixDataset(data=fd_data_array,
                                             cov=fd_data_array_covariance)

sync_dset = SynchrotronHEALPixDataset(data=stoke_Q_data,
                                      error=stoke_Q_data_error
                                      frequency=23*u.GHz, type='Q')

In the first and third examples, it was assumed that the covariance was diagonal, and therefore can be described by an error associated with each pixel, which is specified with the keyword argument error (the error is assumed to correspond to the square root of the variance in each datapoint).

In the second example, the covariance associated with the data is instead specified supplying a two-dimensional array using the the cov keyword argument.

The final example also requires the user to supply the frequency of the observation and the subtype (in this case, ‘Q’).

Observables and observable dictionaries ¶

In IMAGINE, observable quantities (either measured or simulated) are represented internally by instances of the imagine.observables.Observable class. These are grouped in observable dictionaries (subclasses of imagine.observables.ObservableDict) which are used to exchange multiple observables between IMAGINE’s components. There three main kinds of observable dictionaries: Measurements, Simulations, and Covariances. There is also an auxiliary observable dictionary: the Masks.

Measurements ¶

The imagine.observables.Measurements object is used, as the name implies, to hold a set of actual measured physical datasets (e.g. a set of intensity maps of the sky at different wavelengths).

A Measurements object must be provided to initialize Simulators (allowing them to know which datasets need to be computed) and Likelihoods.

There are a number of ways data can be provided to Measurements. The simplest case is when the data is stored in a Datasets objects, one can provide them to the measurements object upon initialization:

measurements = img.observables.Measurements(dset1, dset2, dset3)

One can also append a dateset to a already initialized Measurements object:

measurements.append(dataset=dset4)

The final (not usually recommended) option is appending the data manually to the ObservableDict, which can be done in the following way:

measurements.append(name=key, data=data,
                    cov_data=cov, otype='HEALPix')

In this, data should contain a ndarray or Quantity, cov_data should contain the covariance matrix, and otype must indicate whether the data corresponds to: a ‘HEALPix’ map; ‘tabular’; or a ‘plain’ array.

The name argument refers to the key that will be used to hash the elements in the dictionary. This has to have the following form:

key = (data_name, data_freq, Nside_or_str, tag)

The first value should be the one of the observable names
If data is independent from frequency, data_freq is set to None, otherwise it is the frequency in GHz.
The third value in the key-tuple should contain the HEALPix Nside (for maps) or the string ‘tab’ for tabular data.
Finally, the last value, ext can be ‘I’, ‘Q’, ‘U’, ‘PI’, ‘PA’, None or other customized tags depending on the nature of the observable.

The contents of a Measurements object can be accessed as a dictionary, using the keys with the above structure:

observable = measurements[('sync', 30, 32, 'I')]

Assuming that this key is present, the observable object returned by the above line will be an instance of the Observable class. The data contents and properties can be then accessed using its properties/attributes, for example:

data_array = observable.global_data
data_units = observable.unit

Where the data_array will be a (1, N)-array.

Measurements, support the show method (exemplified in Repository datasets) which displays a summary of the data in the ObservableDict.

Simulations ¶

The Simulations object is the ObservableDict that is returned when one runs an IMAGINE Simulator.

If the model does not involve any stochastic fields (or if the ensemble size is chosen to be 1), the Simulations object produced by a simulator has exactly the same structure as Measurements object.

For larger ensemble sizes the behaviour is slightly different. The global_data attribute will return an array of shape (N_ens, N), containing one realisation per row. The show method will display the simulated results for the full ensemble, with one realisation per row (this can be limited using max_realizations keyword, which sets the maximum number of rows to be displayed).

Simulations objects provide two specialized convenience methods to manipulate the simulated data in the case of ensemble_size>1. The sub_sim method allows one to construct a new Simulations object using only a subset of the original ensemble (note that this can also be used prepare resamplings for bootstrapping). The estimate_covariances method uses the finite ensemble in the Simulations object to estimate the covariance matrices associated with each observable, which are returned as Covariances object.

Covariances ¶

imagine.observables.Covariances are a special type of ObservableDict which stores the set of covariance matrices associated with set of observables. Its behavour is similar to the Measurements, the main difference being that the global_data attribute returns a (N,N) covariance matrix instead of (1,N)-array.

In practice, one rarely needs to initialize or work with an independent Covariances object. This is because a Covariances object is automatically generated when one initializes a Measurements a Dataset, and stored in the Measurements object cov attribute:

# The covariances associated with a measurements object
covariances = measurements.cov

When the original dataset only contained variance information (e.g. specified using the error keyword when preparing a Dataset) the full covariance matrix is not constructed on initialization. Instead, the variance alone is stored internally and the covariance matrix construction is delayed until the first time the properties global_data or data are requested. If imagine.rc['distributed_arrays'] is set to True (not the default), the covariance matrix is spread among the available MPI processes.

As many observational datasets are very large (i.e. very large N), and only contain known variance information, it often unnecessary (or simply impractical) to operate with full (sparse) covariance matrices. Instead, one can access the variance stored in the Covariances object using the var property:

# An Observable object containing covariance information for a given key
cov_observable = covariances[('sync', 30, 32, 'I')]
# Reads the N-array containing the variances
variances = cov_observable.var
# Reads the (N,N)-array containing the covariance matrix
# (constructing it, if necessary)
cov_matrix = cov_observable.global_data

Thus, when working with large datasets some care should be take when choosing (or designining) the Likelihood class which will manipulate the Covariances object: some of the likelihoods (e.g. EnsembleLikelihoodDiagonal) only access the var attribute, keeping the RAM memory usage under control even in the case of very large maps, while others will try construct the full covariance matrix, potentially leading out-of-memory errors.

Masks ¶

imagine.observables.Masks which stores masks which can be applied to HEALPix maps. The mask itself is specified in the form of an array of zeros and ones.

Simulators ¶

Simulators are reponsible for mapping a set of Fields onto a set of Observables. They are represented by a subclass of imagine.simulators.Simulator.

from imagine.simulators import Simulator
import numpy as np
import MY_SIMULATOR  # Substitute this by your own code

class SimulatorTemplate(Simulator):
    """
    Detailed description of the simulator
    """
    # The quantity that will be simulated (e.g. 'fd', 'sync', 'dm')
    # Any observable quantity absent in this list is ignored by the simulator
    SIMULATED_QUANTITIES = ['my_observable_quantity']
    # A list or set of what is required for the simulator to work
    REQUIRED_FIELD_TYPES = ['dummy', 'magnetic_field']
    # Fields which may be used if available
    OPTIONAL_FIELD_TYPES = ['thermal_electron_density']
    # One must specify which grid is compatible with this simulator
    ALLOWED_GRID_TYPES = ['cartesian']
    # Tells whether this simulator supports using different grids
    USE_COMMON_GRID = False

    def __init__(self, measurements, **extra_args):
        # Send the measurements to parent class
        super().__init__(measurements)
        # Any initialization task involving **extra_args can be done *here*
        pass

    def simulate(self, key, coords_dict, realization_id, output_units):
        """
        This is the main function you need to override to create your simulator.
        The simulator will cycle through a series of Measurements and create
        mock data using this `simulate` function for each of them.

        Parameters
        ----------
        key : tuple
            Information about the observable one is trying to simulate
        coords_dict : dictionary
            If the trying to simulate data associated with discrete positions
            in the sky, this dictionary contains arrays of coordinates.
        realization_id : int
            The index associated with the present realisation being computed.
        output_units : astropy.units.Unit
            The requested output units.
        """
        # The argument key provide extra information about the specific
        # measurement one is trying to simulate
        obs_quantity, freq_Ghz, Nside, tag = key

        # If the simulator is working on tabular data, the observed
        # coordinates can be accessed from coords_dict, e.g.
        lat, lon = coords_dict['lat'], coords_dict['lon']

        # Fields can be accessed from a dictionary stored in self.fields
        B_field_values = self.fields['magnetic_field']
        # If a dummy field is being used, instead of an actual realisation,
        # the parameters can be accessed from self.fields['dummy']
        my_dummy_field_parameters = self.fields['dummy']
        # Checklists allow _dummy fields_ to send specific information to
        # simulators about specific parameters
        checklist_params = self.field_checklist
        # Controllists in dummy fields contain a dict of simulator settings
        simulator_settings = self.controllist

        # If a USE_COMMON_GRID is set to True, the grid it can be accessed from
        # grid = self.grid

        # Otherwise, if fields are allowed to use different grids, one can
        # get the grid from the self.grids dictionary and the field type
        grid_B = self.grids['magnetic_field']

        # Finally we can _simulate_, using whichever information is needed
        # and your own MY_SIMULATOR code:
        results = MY_SIMULATOR.simulate(simulator_settings,
                                        grid_B.x, grid_B.y, grid_B.z,
                                        lat, lon, freq_Ghz, B_field_values,
                                        my_dummy_field_parameters,
                                        checklist_params)
        # The results should be in a 1-D array of size compatible with
        # your dataset. I.e. for tabular data: results.size = lat.size
        # (or any other coordinate)
        # and for HEALPix data  results.size = 12*(Nside**2)

        # Note: Awareness of other observables
        # While this method will be called for each individual observable
        # the other observables can be accessed from self.observables
        # Thus, if your simulator is capable of computing multiple observables
        # at the same time, the results can be saved to an attribute on the first
        # call of `simulate` and accessed from this cache later.
        # To break the degeneracy between multiple realisations (which will
        # request the same key), the realisation_id can be used
        # (see Hammurabi implementation for an example)
        return results

Hammurabi simulator ¶

Likelihoods ¶

Likelihoods define how to quantitatively compare the simulated and measured observables. They are represented within IMAGINE by an instance of class derived from imagine.likelihoods.Likelihood. There are two pre-implemented subclasses within IMAGINE:

imagine.likelihoods.SimpleLikelihood: this is the traditional method, which is like a \(\chi^2\) based on the covariance matrix of the measurements (i.e., noise).

imagine.likelihoods.EnsembleLikelihood: combines covariance matrices from measurements with the expected galactic variance from models that include a stochastic component.

Likelihoods need to be initialized before running the pipeline, and require measurements (at the front end). In most cases, data sets will not have covariance matrices but only noise values, in which case the covariance matrix is only the diagonal.

from imagine.likelihoods import EnsembleLikelihood
likelihood = EnsembleLikelihood(data, covariance_matrix)

The optional input argument covariance_matrix does not have to contain covariance matrices corresponding to all entries in input data. The Likelihood automatically defines the proper way for the various cases.

If the EnsembleLikelihood is used, then the sampler will be run multiple times at each point in likelihood space to create an ensemble of simulated observables.

Priors ¶

A powerful aspect of a fully Baysean analysis approach is the possibility of explicitly stating any prior expectations about the parameter values based on previous knowledge. A prior is represented by an instance of imagine.priors.prior.GeneralPrior or one of its subclasses.

To use a prior, one has to initialize it and include it in the associated Field Factories. The simplest choice is a FlatPrior (i.e. any parameter within the range are equally likely before the looking at the observational data), which can be initialized in the following way:

from imagine.priors import FlatPrior
import astropy.units as u
myFlatPrior = FlatPrior(interval=[-2,10]*u.pc)

where the range for this parameter was chosen to be between \(-2\) and \(10\rm pc\).

After the flat prior, a common choice is that the parameter values are characterized by a Gaussian distribution around some central value. This can be achieved using the imagine.priors.basic_priors.GaussianPrior class. As an example, let us suppose one has a parameter which characterizes the strength of a component of the magnetic field, and that ones prior expectation is that this should be gaussian distributed with mean \(1\mu\rm G\) and standard deviation \(5\mu\rm G\). Moreover, let us assumed that the model only works within the range \([-30\mu \rm G, 30\mu G]\). A prior consistent with these requirements can be achieved using:

from imagine.priors import GaussianPrior
import astropy.units as u
myGaussianPrior = GaussianPrior(mu=1*u.microgauss, sigma=5*u.microgauss,
                                interval=[-30*u.microgauss,30*u.microgauss])

Pipeline ¶

The final building block of an IMAGINE pipeline is the Pipline object. When working on a problem with IMAGINE one will always go through the following steps:

1. preparing a list of the field factories which define the theoretical models one wishes to constrain and specifying any priors;

2. preparing a measurements dictionary, with the observational data to be used for the inference; and

3. initializing a likelihood object, which defines how the likelihood function should be estimated;

once this is done, one can supply all these to a Pipeline object, which will sample the posterior distribution and estimate the evidence. This can be done in the following way:

from imagine.pipelines import UltranestPipeline
# Initialises the pipeline
pipeline = UltranestPipeline(simulator=my_simulator,
                                 factory_list=my_factory_list,
                                 likelihood=my_likelihood,
                                 ensemble_size=my_ensemble_size_choice)
# Runs the pipeline
pipeline()

After running, the results can be accessed through the attributes of the Pipeline object (e.g. pipeline.samples, which contains the parameters values of the samples produced in the run).

But what exactly is the Pipeline? The Pipeline base class takes care of interfacing between all the different IMAGINE components and sets the scene so that a Monte Carlo sampler can explore the parameter space and compute the results (i.e. posterior and evidence).

Different samplers are implemented as sub-classes of the Pipeline There are 3 samplers included in IMAGINE standard distribution (alternatives can be found in some of the IMAGINE Consortium repositories), these are: the MultinestPipeline, the UltranestPipeline and the DynestyPipeline.

One can include a new sampler in IMAGINE by creating a sub-class of imagine.Pipeline. The following template illustrates this procedure:

from imagine.pipelines import Pipeline
import numpy as np
import MY_SAMPLER  # Substitute this by your own code

class PipelineTemplate(Pipeline):
    """
    Detailed description of sampler being adopted
    """
    # Class attributes
    # Does this sampler support MPI? Only necessary if True
    SUPPORTS_MPI = False

    def call(self, **kwargs):
        """
        Runs the IMAGINE pipeline

        Returns
        -------
        results : dict
            A dictionary containing the sampler results
            (usually in its native format)
        """
        # Initializes a sampler object
        # Here we provide a list of common options
        self.sampler = MY_SAMPLER.Sampler(
            # Active parameter names can be obtained from
            param_names=self.active_parameters,
            # The likelihood function is available in
            loglike=self._likelihood_function,
            # Some samplers need a "prior transform function"
            prior_transform=self.prior_transform,
            # Other samplers need the prior PDF, which is
            prior_pdf=self.prior_pdf,
            # Sets the directory where the sampler writes the chains
            chains_dir=self.chains_directory,
            # Sets the seed used by the sampler
            seed=self.master_seed
            )

        # Most samplers have a `run` method, which should be executed
        self.sampling_controllers.update(kwargs)
        self.results = self.sampler.run(**self.sampling_controllers)

        # The samples should be converted to a numpy array and saved
        # to self._samples_array. This should have different samples
        # on different rows and each column corresponds to an active
        # parameter
        self._samples_array = self.results['samples']
        # The log of the computed evidence and its error estimate
        # should also be stored in the following way
        self._evidence = self.results['logz']
        self._evidence_err = self.results['logzerr']

        return self.results


    def get_intermediate_results(self):
        # If the sampler saves intermediate results on disk or internally,
        # these should be read here, so that a progress report can be produced
        # every `pipeline.n_evals_report` likelihood evaluations.
        # For this to work, the following should be added to the
        # `intermediate_results` dictionary (currently commented out):

        ## Sets current rejected/dead points, as a numpy array of shape (n, npar)
        #self.intermediate_results['rejected_points'] = rejected
        ## Sets likelihood value of *rejected* points
        #self.intermediate_results['logLikelihood'] = likelihood_rejected
        ## Sets the prior volume/mass associated with each rejected point
        #self.intermediate_results['lnX'] = rejected_data[:, nPar+1]
        ## Sets current live nested sampling points (optional)
        #self.intermediate_results['live_points'] = live

        pass

Disclaimer ¶

Nothing is written in stone and the base classes may be updated with time (so, always remember to report the code release when you make use of IMAGINE). Suggestions and improvements are welcome as GitHub issues or pull requests