Pulsar simulations#

You can use CWInPy to generate a simulated population of neutron star sources that can be used for subsequent parameter estimation and hierarchical analyses. Currently, the simulation code assumes a source signal model from a rigid triaxial rotator emitting from the \(l=m=2\) quadrupole mode. It is also assumes that the simulated population are known pulsars and therefore the data has been heterodyned using a known phase evolution.

The simulation can either: generate a selection of entirely fake pulsars, with distributions for amplitude (defined through either the gravitational-wave strain amplitude \(h_0\), the mass quadrupole \(Q_{22}\), or the ellipticity), source location (right ascension, declination and distance) and pulsar orientation parameters supplied by the user (or set to defaults for everything other than \(Q_{22}\)); or, use supplied pulsars parameters and heterodyned data into which to inject signals drawn from a population of \(Q_{22}\).

The simulations are generated using the PEPulsarSimulationDAG class which is described at the bottom of the page. Note that the simulation code described here does not actually create the simulated gravitational-wave detector data. This is done in the individual cwinpy_pe jobs generated by the script.

Simulation using fake data#

As a first example we will create a simulation that generates a population of 1000 pulsars with their \(Q_{22}\) values drawn from an exponential distribution with a mean of 1033 kg m2, and sets up parameter estimation jobs for each. This population will be drawn from the default distribution on the pulsar location (uniform over the sky and uniform within a distance range between 0.1 and 10 kpc) and the default distribution on the pulsar orientation (uniform on the hemisphere in inclination and polarisation angle and uniform in initial phase). We will set a prior for the parameter estimation that has a uniform distribution in \(Q_{22}\) between 0 and 1040 kg m2.

from cwinpy.pe.simulation import PEPulsarSimulationDAG
from bilby.core.prior import PriorDict, Uniform, Exponential, Sine
import numpy as np

# set the Q22 distribution
mean = 1e33
ampdist = Exponential(name="q22", mu=mean)

# set the prior distribution for use in parameter estimation
prior = PriorDict({
    "q22": Uniform(0.0, 1e40, name="q22"),
    "iota": Sine(name="iota"),
    "phi0": Uniform(0.0, np.pi, name="phi0"),
    "psi": Uniform(0.0, np.pi / 2, name="psi"),
})

# set the detectors to use (and generate fake data for)
detectors = ["H1", "L1"]

# generate the population
run = PEPulsarSimulationDAG(ampdist=ampdist, prior=prior, npulsars=1000, detector=detectors)

As no basedir argument was supplied this will produce the following directories and files in your current working directory:

$ ls -p
configs/  log/  priors/  pulsars/  results/ submit/

The pulsars directory will contain the 1000 pulsar parameter files with names based on their generated right ascension and declination (clashing positions will have alphabetical identifiers appended). The priors directory will contain 1000 prior files named after each pulsar. The configs directory will contain the 1000 cwinpy_pe configuration files. The results directory will be where a directory for each pulsar’s results outputs will be placed. The submit directory will contain the files required for running the HTCondor DAG. There will be a Condor submit file for each pulsar, which by default will be labelled cwinpy_pe_PSRNAME.submit`, where ``PSRNAME will be the pulsar name (note that names based on a positive declination will have + replaced with plus to make them allowable Condor job names). There will also be a dag_cwinpy_pe.submit file containing the HTCondor DAG file. The log directory will contain the log files for each of the jobs.

Note

For each pulsar you can run multiple parallel independent runs that will be merged by using the n_parallel option of PEPulsarSimulationDAG. This will create configuration files and submit files for each of the parallel runs. In the results directory for each pulsar a par{NUM} directory will be created for each parallel run, where {NUM} runs from 0 to one less than the number of parallel runs. The results of the parallel runs will be merged as part of the DAG, so the submit directory will also contain submit files for the merge jobs.

We can take a look at the distribution of parameters that gets produced with:

import corner
import glob
import numpy as np
from cwinpy import PulsarParameters

params = np.zeros((1000, 8))
for i, parfile in enumerate(glob.glob("pulsars/*.par")):
    psr = PulsarParameters(parfile)
    params[i, :] = [psr["Q22"], psr["DIST"], psr["F0"], psr["RAJ"], psr["DECJ"], psr["IOTA"], psr["PHI0"], psr["PSI"]]

# make a corner plot of the distributions
labels = [r"$Q_{22}$ (kg m$^2$)", "distance (kpc)", r"$f_0$ (Hz)", r"$\alpha$ (rad)", r"$\delta$ (rad)", r"$\iota$ (rad)", r"$\phi_0$ (rad)", r"$\psi$ (rad)"]
fig = corner.corner(params, labels=labels, hist_kwargs={"density": True})

# overplot the Q22 distribution
ax = fig.get_axes()
lims = ax.get_xlim()
q22s = np.linspace(lims[0], lims[1], 100)
ax[0].plot(q22s, q22dist.prob(q22s))

If you are on a machine, or computer cluster, with HTCondor installed these parameter estimation jobs can be run using

$ condor_submit_dag cwinpy_pe_pipeline_DATETAG_01.submit

where DATETAG is a date string in the DAG file name based on the date you created it on.

You can perform an analysis using simulated data, but with real (or pre-defined) pulsar parameter files. If you have a set of pulsar parameters files then these can be passed to the PEPulsarSimulationDAG class by giving a directory containing all the files, or a dictionary of file paths keyed to the pulsar names. This takes the place of supplying a number of pulsars to be simulated. For example, if we had parameter files in the directory my_pulsars this could be used with:

pardir = "my_pulsars"
run = PEPulsarSimulationDAG(ampdist, prior=prior, parfiles=pardir, detector=detectors)

Note that you should not call the parameter file directory pulsars if it is in the same base directory that you are generating the simulation in. That directory is reserved for pulsar parameter files that contain the simulated population signal parameters for adding into the data.

Different distributions#

You do not have to use the default distributions on the pulsar positions or orientation parameters. If you had a model for a location distribution (and it can be put in the form of a bilby prior) in equatorial (right ascension and declination) or Galactic (galactic longitude and latitude) coordinates and distance, or cartesian Galactocentric coordinates (in kpc), then these can be used.

If using the equatorial system the prior dictionary must be keyed with "ra", "dec" and "dist" with angles in radians and distance in kpc. If using the Galactic coordinate system the prior dictionary must be keyed with "l", "b" and "dist", again with angles in radians and distance in kpc. For the Galactocentric system the prior dictionary must be keyed with "x", "y" and "z" in kpc.

For example, suppose we modelled the Galaxy as a thin Gaussian disk with a scale (standard deviation of the Gaussian) of 9 kpc in the x-y plane, and 0.15 kpc in the z-direction, we could use this for our pulsar population with:

from bilby.core.prior import PriorDict, Gaussian
posdist = PriorDict({
    "x": Gaussian(0.0, 9.0, name="x"),
    "y": Gaussian(0.0, 9.0, name="y"),
    "z": Gaussian(0.0, 0.15, name="z"),
})

run = PEPulsarSimulationDAG(
    ampdist, npulsars=1000, detector=detectors, posdist=posdist,
)

# let's plot the sky distribution in RA and dec
from matplotlib import pyplot as pl
import glob
import numpy as np
from cwinpy import PulsarParameters

params = np.zeros((1000, 2))
for i, parfile in enumerate(glob.glob("pulsars/*.par")):
    psr = PulsarParameters(parfile)
    params[i, :] = [-psr["RAJ"] if psr["RAJ"] < np.pi else -(psr["RAJ"] - 2 * np.pi), psr["DECJ"]]
fig, ax = pl.subplots(1, 1, subplot_kw={"projection": "mollweide"})
ax.scatter(params[:,0], params[:,1], s=8)
ax.set_xlabel("Right ascension")
ax.set_ylabel("Declination")
fig.savefig("mollweide.png", dpi=150, tight_layout=True)

If we wanted to fix a parameter like right ascension then we could use a Delta function prior in that parameter, e.g.:

from bilby.core.prior import PriorDict, DeltaFunction, Uniform, Cosine
posdist = PriorDict({
    "ra": DeltaFunction(1.3, name="ra"),  # fix RA at 1.3 radians
    "dec": Cosine(name="dec"),
    "dist": Uniform(0.1, 10.0, name="dist"),
})

The source orientation parameter distribution can also be set to something other than the default. For example, to set the sources to all be linearly polarised you could use:

from bilby.core.prior import PriorDict, DeltaFunction
oridist = PriorDict({
    "iota": DeltaFunction(np.pi / 2., name="iota"),
})
run = PEPulsarSimulationDAG(
    ampdist, npulsars=100, detector=detectors, oridist=oridist,
)

In this case, as the other orientation parameters are not supplied they will automatically get set to their default distributions.

Simulating a realistic distribution#

Rather than guessing at a pulsar distribution we could try and use the true distribution of pulsars. To do this we can make use of the psrqpy package to get the currently known pulsars from the ATNF pulsar catalogue, and, for example, the astroML package [1] to fit the distribution using a Gaussian Mixture Model via extreme deconvolution [2] (other packages can be used for this).

from psrqpy import QueryATNF
from astroML.density_estimation import XDGMM
from matplotlib import pyplot as pl
import glob
import numpy as np
from cwinpy import PulsarParameters

from cwinpy.pe.simulation import PEPulsarSimulationDAG
from bilby.core.prior import PriorDict, Uniform, Exponential, Sine

# get the Galactic X, Y, Z coordinates for all pulsars (excluding Globular Clusters)
query = QueryATNF(params=["XX", "ZZ", "YY", "RAJD", "DECJD"], condition="~assoc(GC)")

# extract the position values and put them into an array
t = query.table
mask = ~t["XX"].data.mask  # values that are not masked (e.g., NaNs)
positions = np.vstack(
    (t["XX"].data.data[mask], t["YY"].data.data[mask], t["ZZ"].data.data[mask])
).T

# initialise the extreme deconvolution (try 20 components)
gmm = XDGMM(n_components=20, max_iter=500, tol=1e-2, verbose=True)

# set some errors on the positions (required for XDGMM), say 5% on all values
dx = positions[:,0] * 0.05
dy = positions[:,1] * 0.05
dz = positions[:,2] * 0.05
poserr = np.zeros(positions.shape + positions.shape[-1:])
diag = np.arange(positions.shape[-1])
poserr[:, diag, diag] = np.vstack([dx ** 2, dy ** 2, dz ** 2]).T

# fit the positions (this may take some time!)
gmm.fit(positions, Xerr=poserr)

# set a Multivariate Gaussian distribution
names = ["x", "y", "z"]
mvg = bilby.core.prior.MultivariateGaussianDist(
    names,
    nmodes=20,
    mus=gmm.mu.tolist(),
    covs=gmm.V.tolist(),
    weights=gmm.alpha.tolist(),
)

# set the position distribution as a bilby Prior
posdist = PriorDict()
for name in names:
    posdist[name] = bilby.core.prior.MultivariateGaussian(mvg, name)

# create a simulated population of pulsars
detectors = "H1"
mean = 1e33
ampdist = Exponential(name="q22", mu=mean)
run = PEPulsarSimulationDAG(
    ampdist, npulsars=1000, detector=detectors, posdist=posdist,
)

# plot the distribution (c)
params = np.zeros((1000, 2))
for i, parfile in enumerate(glob.glob("pulsars/*.par")):
    psr = PulsarParameters(parfile)
    params[i, :] = [-psr["RAJ"] if psr["RAJ"] < np.pi else -(psr["RAJ"] - 2 * np.pi), psr["DECJ"]]
pparams = np.zeros((len(t), 2))
for i, rasdecs in enumerate(zip(np.deg2rad(t["RAJD"]), np.deg2rad(t["DECJD"]))):
    pparams[i, :] = [-rasdecs[0] if rasdecs[0] < np.pi else -(rasdecs[0] - 2 * np.pi), rasdecs[1]]
fig, ax = pl.subplots(1, 1, subplot_kw={"projection": "mollweide"})
ax.scatter(params[:,0], params[:,1], s=8, color="b", alpha=0.5, label="Simulation")
ax.scatter(pparams[:,0], pparams[:,1], s=5, color="none", edgecolor="r", alpha=0.25, label="Pulsars")
ax.set_xlabel("Right ascension")
ax.set_ylabel("Declination")
ax.legend()
fig.savefig("distribution.png", dpi=150, tight_layout=True)

A similar thing could be done to generate the distribution of pulsar frequencies.

Distance uncertainty#

You may want to run parameter estimation on a simulation for which you include a prior on the distance for each pulsar. One way to do this when using a selection of pregenerated or real pulsars is to pass the prior argument a dictionary of prior files or bilby PriorDict’s, keyed to each pulsar, which contain a "dist" value with the particular distance prior required. This can also be achieved by instead passing a dictionary of priors, or dictionary of distance standard deviations (in kpc), keyed to the pulsar names, to the distance_err argument, e.g.,:

from bilby.core.prior import Gaussian, Uniform, Sine, PriorDict, Exponential

# assume we have two pulsars (J0000+0000 and J0100+0100) with parameter
# files in the directory "pardir" for which we will set Gaussian priors
# with standard deviations of 0.3 and 0.2 kpc, respectively.
dist_err = {"J0000+0000": 0.3, "J0100+0100": 0.2}
prior = PriorDict({
    "q22": Uniform(0.0, 1e40, name="q22"),
    "iota": Sine(name="iota"),
    "phi0": Uniform(0.0, np.pi, name="phi0"),
    "psi": Uniform(0.0, np.pi / 2, name="psi"),
})
ampdist = Exponential(name="q22", mu=1e33)
run = PEPulsarSimulationDAG(
    ampdist=ampdist, parfiles="pardir", prior=prior, distance_err=dist_err, detector="H1"
)

Note

In the latter case it will generate a Gaussian prior on the distance and assumes that the pulsar distance is given as this is used as the mean of the distribution.

If generating a population of pulsars, you can use the distance_err argument to set a Gaussian distance prior for all pulsars. If you pass a number it will be taken as the fractional distance uncertainty and used to set the standard deviation of a Gaussian distribution (truncated at zero), with a mean based on the distance that is drawn for that source. At the moment this can only be a single fractional error on the distance used for all pulsars.

Simulation using real data#

Rather than the simulation running on simulated data and simulated pulsars the code can be used to add signals to a set of real data and use pre-defined or real pulsar parameter files.

We will assume the following example directory tree for which there are heterodyned data files for a selection of pulsars and for two detectors (H1 and L1):

root
 ├── data          # directory to contain the heterodyned data for all pulsars
 |    ├── H1       # directory to contain the H1 data for all pulsars
 |    |    ├── fine-H1-pulsar1.txt  # heterodyned data for pulsar1 (file must contain the pulsar name)
 |    |    ├── fine-H1-pulsar2.txt  # heterodyned data for pulsar2
 |    |    └── ...
 |    └── L1       # directory to contain the L1 data for all pulsars
 |         ├── fine-L1-pulsar1.txt  # heterodyned data for pulsar1 (file must contain the pulsar name)
 |         ├── fine-L1-pulsar2.txt  # heterodyned data for pulsar2
 |         └── ...
 └── pulsars       # directory containing pulsar parameter files used for heterodyning

We could run a simulation that adds signals to these with mass quadrupoles drawn from an exponential distribution using:

from bilby.core.prior import PriorDict, Uniform, Exponential, Sine
from cwinpy.pe.simulation import PEPulsarSimulationDAG
import numpy as np

prior = PriorDict({
    "q22": Uniform(0.0, 1e36, name="q22"),
    "iota": Sine(name="iota"),
    "phi0": Uniform(0.0, np.pi, name="phi0"),
    "psi": Uniform(0.0, np.pi / 2, name="psi"),
})

detectors = ["H1", "L1"]
mean = 1e32  # the mean of the exponential distribution
ampdist = Exponential(name="q22", mu=mean)

# a dictionary of data file directories keyed to the detector
datafiles = {"L1": "/root/data/L1", "H1": "/root/data/H1"}

# the directory containing the pulsar parameter files
parfiles = "/root/pulsars"

run = PEPulsarSimulationDAG(
    ampdist=ampdist,
    prior=prior,
    datafiles=datafiles,
    parfiles=parfiles,
)

Note

The original data files are not changed by the simulation, so they will not contain the simulated signals.

By default, if the pulsar parameter files contain gravitational wave parameters (the amplitude or mass quadrupole, and orientation parameters) or the distance, then these will be overwritten (in new versions of the files) by any parameters generated in the simulation. If you want to keep those parameters then the overwrite_parameters argument can be set to False.

Simulation class API#

The API for the PEPulsarSimulationDAG class is described below:

class PEPulsarSimulationDAG(ampdist=None, prior=None, distance_err=None, parfiles=None, overwrite_parameters=True, datafiles=None, npulsars=None, basedir=None, detector='H1', starttime=None, endtime=None, timestep=None, posdist=None, oridist=None, fdist=None, submit=False, accountuser=None, accountgroup=None, sampler='dynesty', sampler_kwargs=None, outputsnr=True, numba=True, n_parallel=1)#

Bases: object

This class will generate a HTCondor Dagman job to create a number of simulated gravitational-wave signals from pulsars. These signals will either be added into random Gaussian noise, or real heterodyned data. The generated signals have source parameters drawn from an input distribution. They can be generated using real pulsars parameters, including the pulsar distance, or using fake pulsars with sky locations and distances drawn from input distribution (by default the sky location distribution will be uniform on the sky).

These signals will be analysed using the cwinpy_pe script to sample the posterior probability distributions of the required parameter space, including the mass quadrupole.

If not supplied, the default priors that the parameter estimation will use are:

>>> from bilby.core.prior import PriorDict, Uniform, Sine
>>> import numpy as np
>>> prior = PriorDict({
...     "h0": Uniform(0.0, 1e-22, name="h0"),
...     "iota": Sine(name="iota"),
...     "phi0": Uniform(0.0, np.pi, name="phi0"),
...     "psi": Uniform(0.0, np.pi / 2, name="psi"),
... })
Parameters:
  • ampdist (bilby.core.prior.Prior, BaseDistribution) – The distribution from which the signal amplitude values will be drawn for the simulated signals. This can contain the gravitational-wave strain amplitude keyed with "h0", the mass quadrupole keyed with "q22", or the fiducial ellipticity keyed with "epsilon" ("ell", or "ellipticity").

  • prior (str, dict) – This can be a single bilby-style prior dictionary, or a path to a file, giving a prior distribution to use for signal parameter estimation for all pulsars, or a dictionary keyed to pulsar names containing prior dictionaries or prior files for each pulsar. If not given the default distribution given above will used used for all pulsars.

  • distance_err (float, dict) – If this is given, then a prior on distance will be added to the prior dictionary for parameter estimation. If this is a float, then that value will be taken as a fractional uncertainty on the pulsar distance and a Gaussian prior will be added using that as the standard deviation. If this is a dictionary it should be keyed to pulsar names and provide either a bilby.core.prior.Prior named "dist" or a standard deviation on the distance in kpc.

  • parfiles (dict, str) – If using real pulsar parameter files pass a dictionary of paths to individual files keyed to the pulsar name, or pass the path to a directory containing the parameter files.

  • overwrite_parameters (bool) – If using real/pre-created pulsar parameter files set this to False if you want any orientation or amplitude parameters that they contain to not be overwritten by those drawn from the supplied distributions. Default is True.

  • datafiles (dict, str) – If using real data for each pulsar then pass a dictionary of paths to the data files keyed to the pulsar name (or detector then pulsar name), or a directory containing the heterodyned data files.

  • npulsars (int) – The number of pulsars to include in the simulation.

  • posdist (bilby.core.prior.PriorDict) – The distribution from which to randomly draw pulsar positions if required. This defaults to a uniform distribution on the sky and uniform in distance between 0.1 and 10 kpc. If you want to specify a distribution in right ascension, declination and distance they must be in the prior dictionary with the keys "ra" (radians), "dec" (radians) and "dist" (kpc), respectively. Alternatively, you can specify a distribution in Galactocentric coordinates using the dictionary keys "x", "y" and "z" all in units of kpc, or Galactic coordinates using the dictionary keys "l" (Galactic longitude in radians), "b" (Galactic latitude in radians) and "dist" (kpc). These will be converted into the equivalent RA, declination and distance.

  • oridist (bilby.core.prior.PriorDict) – The distribution if the pulsar orientation parameters. This defaults to uniform over a hemisphere in inclination and polarisation angle, and uniform over \(\pi\) radians in rotational phase. The dictionary keys must be "phi0" (initial phase), "iota" (inclination angle), and "psi" (polarisation angle), all in radians.

  • fdist (bilby.core.prior.Prior) – The distribution from which to draw pulsar spin frequencies if required. This defaults to a uniform distribution between 10 and 750 Hz.

  • basedir (str) – The base directory into which the simulations and outputs will be placed. If None then the current working directory will be used.

  • detector (str, list) – A string, or list of strings, of detector prefixes for the simulated data. This defaults to a single detector - the LIGO Hanford Observatory - from which the simulated noise will be drawn from the advanced detector design sensitivity curve (e.g., [3]).

  • starttime (int, float, dict) – A GPS time, or dictionary of GPS times keyed to detectors, giving the start time for any simulated data being generated. If not given the start time defaults to 1000000000.

  • endtime (int, float, dict) – A GPS time, or dictionary of GPS times keyed to detectors, giving the end time for any simulated data being generated. If not given the end time defaults to 1000086400.

  • timestep (int, float) – The time step, in seconds, between data points for simulated data. If not given this defaults to 60.

  • submit (bool) – Set whether to submit the Condor DAG or not.

  • accountuser (str) – Value to give to the ‘account_user’ parameter in the Condor submit file. Default is to set no value.

  • accountgroup (str) – Value to give to the ‘account_user_group’ parameter in the Condor submit file. Default is to set no value.

  • sampler (str) – The sampler to use. Defaults to dynesty.

  • sampler_kwargs (dict) – A dictionary of keyword arguments for the sampler. Defaults to None.

  • outputsnr (bool) – Set whether to output the injected and recovered signal-to-noise ratios. Defaults to True.

  • numba (bool) – Set whether or not to use the likelihood with numba enabled. Defaults to True.

  • n_parallel (int) – Set the number of parallel sampler jobs to run for each pulsar, which will be combined to form the final samplers. Defaults to 1.

create_config()#

Create the configuration parser for the DAG.

create_pulsars()#

Create the pulsar parameter files based on the supplied distributions.

makedirs(dir)#

Make a directory tree recursively.

References#