Data handling#

Here we document classes to deal with data products from searches for continuous sources of gravitational waves that can subsequently be used for inference.

Heterodyned Data#

The data product used for source parameter inference in the Bayesian targeted pulsar searches perform by the LIGO Scientific Collaboration and Virgo Collaboration (e.g., [1], [2]) is a heavily filtered and downsampled complex time series [3].

In forming the data it assumes that the source phase evolution is described by a Taylor expansion of the form:

\[\phi(t') = 2 \pi C\left(f_0(t'-T_0) + \frac{1}{2}\dot{f}_0(t'-T_0)^2 + \dots\right)\]

where \(f_0\) is the source rotation frequency, \(\dot{f}_0\) is the first frequency derivative (higher order derivatives can also be used), \(t'\) is the time in a frame at rest with respect to the pulsar (the pulsar proper time, which for isolated pulsars is assumed to be consistent with the solar system barycentre), \(T_0\) is the time reference epoch in the pulsar proper time, and \(C\) is an emission mechanism-dependent scaling factor (often set to 2, for emission from the \(l=m=2\) quadrupole mode). In searches for known pulsars, the frequency, its derivatives, and the source position are assumed to be known from the observed properties of the pulsar.

The raw real time series of \(h(t)\) from a detector is “heterodyned”,

\[h'(t) = h(t) e^{-2\pi i \phi(t + \Delta t(t))},\]

to remove the high-frequency variation of the potential source signal, where \(\Delta t(t)\) is a time- and source-and-position-dependent delay term that accounts for the Doppler and relativistic delays between the time at the detector and the pulsar proper time (see, e.g., Equation 1 of [4]). The resulting complex series \(h'(t)\) is low-pass filtered (effectively a band-pass filter on the two-sided complex data) using a high order Butterworth filter, often with a knee frequency of 0.25 Hz, and down-sampled, via averaging, to a far lower sample rate than the original raw data. In general, gravitational-wave detector data is sampled at 16384 Hz, and often the heterodyned data is downsampled to 1/60 Hz (one sample per minute).

Reading and writing#

The HeterodynedData object can be written to and read from both ASCII and HDF5 file formats. The ASCII read/writer is mainly to access legacy data as produced by the lalpulsar_heterodyne code within LALSuite. However, it is advisable to use the HDF5 routines if purely using CWInPy for analyses. The HeterodynedData class has both a read() and write() method based on those for a GWPy TimesSeries object.

ASCII#

The HeterodynedData class will automatically detect an ASCII file provided it has the file extension .txt or, if it is gzipped, .txt.gz. Any comments in the file taken from lines starting with a # or % will be stored in the object. The format of data required in an ASCII text file is shown within the HeterodynedData API documentation below. Given a text file called data.txt, it could either be read in as:

>>> from cwinpy.data import HeterodynedData
>>> data = HeterodynedData("data.txt")

or using the read() method with:

>>> data = HeterodynedData.read("data.txt")

If you have an instance of a HeterodynedData class called, e.g., het, then writing to an ASCII file can be done using the write() method with:

>>> het.write("data.txt")

If the file you are writing to already exists it will be overwritten.

A disadvantage of using the ASCII format is that you cannot stored additional metadata such as the detector, pulsar parameter file used, and any injected signal, whereas this will be stored in a HDF5 file.

HDF5#

HDF5 is a flexible binary storage format. In the context of a HeterodynedData object, it will store both the complex time series of data and metadata about the object, such as the detector and contents of the pulsar parameter file used for heterodyning. The HDF5 file will be detected if it has the file extension .hdf, .hdf5, or .h5. A HDF5 file called data.hdf5 can be read in with, e.g.:

>>> from cwinpy.data import HeterodynedData
>>> data = HeterodynedData("data.hdf5")

or

>>> data = HeterodynedData.read("data.hdf5")

If you want to apply non-default keyword arguments for processing the data after it’s read in, e.g., setting a window for the compute_running_median() method, then the former method is recommended as the standard keywords can be passed to the class. Otherwise, you have to explicitly run the processing methods that you require.

If you have an instance of a HeterodynedData class called, e.g., het, then writing to a HDF5 file can be done using the write() method with:

>>> het.write("data.hdf5")

The “Dataset” within the HDF5 file within which the data is stored is called HeterodynedData. So, the data could be read in just using the Python h5py package using:

>>> import h5py
>>> hfile = h5py.File("data.hdf5", "r")
>>> dataset = hfile["HeterodynedData"]
>>> # the time series data can be found in:
>>> timeseries = dataset[()]
>>> # the metadata can be found as a dictionary with:
>>> metadata = dict(dataset.attrs)
>>> hfile.close()

If the file you are writing to already exists then by default it will not be overwritten and an exception will be raised. To overwrite a file use:

>>> het.write("data.hdf5", overwrite=True)

Combining data#

Merging data#

If you have multiple heterodyned data files for a given detector and pulsar it might make sense to combine them into one HeterodynedData object. The read() method can in fact be passed a list of heterodyned data files, and provided that they are compatible (i.e., they are for the same detector, the same pulsar, and the same frequency scale factor) they will be merged into one using the merge() method. By default, the resulting object will be sorted in ascending time order even if the input files were not sorted in time order. If you had two files data1.hdf5 and data2.hdf5 they could be read in and merged with:

>>> from cwinpy.data import HeterodynedData
>>> data = HeterodynedData.read(["data1.hdf5", "data2.hdf5"])

Data from multiple detectors#

You may want to store data for one pulsar, but from multiple detectors, or multiple frequency scale factor data streams. This can be done using the MultiHeterodynedData class. You can use the add_data() method to add in multiple data sets, which are stored in a dictionary-like format keyed by the detectors. If you had heterodyned data files for H1 and L1 called dataH1.hdf5 and dataL1.hdf5, respectively, they could be stored with:

>>> from cwinpy.data import MultiHeterodynedData
>>> data = MultiHeterodynedData()
>>> data.add_data("dataH1.hdf5")
>>> data.add_data("dataL1.hdf5")

This type of object is used when performing parameter estimation for a single pulsar with data from multiple detectors.

Data API#

class MultiHeterodynedData(data=None, times=None, detector=None, **kwargs)#

Bases: object

A class to contain time series’ of heterodyned data, using the HeterodynedData class, for multiple detectors/data streams.

Parameters:
  • data (str, array_like, dict, HeterodynedData) – The heterodyned data either as a string giving a file path, an array of data, or a dictionary of file paths/data arrays, that are keyed on valid detector names.

  • times (array_like, dict) – If data is an array, or dictionary of arrays, then times must be set giving the time stamps for the data values. If times is a dictionary then it should be keyed on the same detector names as in data.

  • detector (str, lal.Detector) – If data is a file name or data array then detector must be given as a string or lal.Detector.

Notes

See the HeterodynedData documentation for information on additional keyword arguments.

add_data(data, times=None, detector=None)#

Add heterodyned data to the class.

Parameters:
  • data (str, array_like, dict, HeterodynedData) – The heterodyned data either as a string giving a file path, an array of data, a dictionary of file paths/data arrays that are keyed on valid detector names, or a HeterodynedData object.

  • times (array_like, dict) – If data is an array, or dictionary of arrays, then times must be set giving the time stamps for the data values. If times is a dictionary then it should be keyed on the same detector names as in data.

  • detector (str, lal.Detector) – If data is a file name or data array then detector must be given as a string or lal.Detector.

property detectors#

Return the list of detectors contained in the object.

property freq_factors#

Return the this of heterodyne frequency scaling factors for each data set contained in the object.

property injection_snr#

Get the coherent optimal signal-to-noise ratio of an injected signal in all heterodyned data sets. See cwinpy.data.HeterodynedData.injection_snr().

property pars#

Return the list of heterodyne source parameter files for each data set contained in the object.

periodogram(det=None, together=False, figsize=None, remove_outliers=None, thresh=None, labelsize=None, fontsize=None, legendsize=None, fontname=None, labelname=None, fraction_labels=None, fraction_label_num=None, **plotkwargs)#

Plot all, or some of, the periodograms of the time series’ contained in the class. The general arguments can be seen in cwinpy.data.HeterodynedData.periodogram() and additional arguments are given below.

Parameters:
  • together (bool, False) – Set to True to put all the plots onto one figure, otherwise they will be created on individual Figure objects.

  • det (str) – If a detector name is supplied, then only the time series’ for that detector will be plotted.

Returns:

A Figure object, or list of Figure objects.

Return type:

list

plot(det=None, together=False, which='abs', figsize=(12, 4), remove_outliers=False, thresh=3.5, zero_time=False, labelsize=None, fontsize=None, legendsize=None, fontname=None, labelname=None, **plotkwargs)#

Plot all, or some of, the time series’ contained in the class. The general arguments can be seen in cwinpy.data.HeterodynedData.plot() and additional arguments are given below.

Parameters:
  • together (bool, False) – Set to True to put all the plots onto one figure, otherwise they will be created on individual Figure objects.

  • det (str) – If a detector name is supplied, then only the time series’ for that detector will be plotted.

Returns:

A Figure object, or list of Figure objects.

Return type:

list

power_spectrum(det=None, together=False, figsize=None, remove_outliers=None, thresh=None, labelsize=None, fontsize=None, legendsize=None, fontname=None, labelname=None, dt=None, fraction_labels=None, fraction_label_num=None, average=None, window=None, overlap=None, **plotkwargs)#

Plot all, or some of, the power spectral densities of the time series’ contained in the class. The general arguments can be seen in cwinpy.data.HeterodynedData.power_spectrum() and additional arguments are given below.

Parameters:
  • together (bool, False) – Set to True to put all the plots onto one figure, otherwise they will be created on individual Figure objects.

  • det (str) – If a detector name is supplied, then only the time series’ for that detector will be plotted.

Returns:

A Figure object, or list of Figure objects.

Return type:

list

signal_snr(signalpar)#

Get the coherent signal-to-noise ratio of a given signal. See cwinpy.data.HeterodynedData.signal_snr().

spectrogram(det=None, together=False, figsize=None, remove_outliers=None, thresh=None, labelsize=None, fontsize=None, legendsize=None, fontname=None, labelname=None, fraction_labels=None, fraction_label_num=None, dt=None, overlap=None, window=None, **plotkwargs)#

Plot all, or some of, the spectograms of the time series’ contained in the class. The general arguments can be seen in spectrogram() and additional arguments are given below.

Parameters:
  • together (bool, False) – Set to True to put all the plots onto one figure, otherwise they will be created on individual Figure objects.

  • det (str) – If a detector name is supplied, then only the time series’ for that detector will be plotted.

Returns:

A Figure object, or list of Figure objects.

Return type:

list

class HeterodynedData(data=None, times=None, par=None, detector=None, window=30, inject=False, injpar=None, injtimes=None, freqfactor=2.0, fakeasd=None, fakeseed=None, issigma=False, bbthreshold='default', bbminlength=5, bbmaxlength=inf, remove_outliers=False, thresh=3.5, comments='', earthephemeris=None, sunephemeris=None, timeephemeris=None, **kwargs)#

Bases: TimeSeriesBase

A class to contain a time series of heterodyned data.

Some examples of input data are:

1. The path to a file containing (gzipped) ascii text with the following three columns:

# GPS time stamps   real strain   imaginary strain
1000000000.0         2.3852e-25    3.4652e-26
1000000060.0        -1.2963e-26    9.7423e-25
1000000120.0         5.4852e-25   -1.8964e-25
...

or four columns:

# GPS time stamps   real strain   imaginary strain   std. dev.
1000000000.0         2.3852e-25    3.4652e-26        1.0e-25
1000000060.0        -1.2963e-26    9.7423e-25        1.0e-25
1000000120.0         5.4852e-25   -1.8964e-25        1.0e-25
...

where any row that starts with a # or a % is considered a comment.

2. A 1-dimensional array of complex data, and accompanying array of time values, e.g.,

>>> import numpy as np
>>> N = 100  # the data length
>>> data = np.random.randn(N) + 1j*np.random.randn(N)
>>> times = np.linspace(1000000000., 1000005940., N)

or, a 2-dimensional array with the real and complex values held in separate columns, e.g.,

>>> import numpy as np
>>> N = 100  # the data length
>>> data = np.random.randn(N, 2)
>>> times = np.linspace(1000000000., 1000005940., N)

or, a 2-dimensional array with the real and complex values held in separate columns, and a third column holding the standard deviation for each entry, e.g.,

>>> import numpy as np
>>> N = 100  # the data length
>>> stds = np.ones(N)  # standard deviations
>>> data = np.array([stds*np.random.randn(N),
>>> ...              stds*np.random.randn(N), stds]).T
>>> times = np.linspace(1000000000., 1000005940., N)
Parameters:
  • data (str, list, array_like) – A file (plain ascii text, gzipped ascii text, ascii CSV, or HDF5 file) containing a time series of heterodyned data, or list of files names (from the same detector/frequency scale factor/pulsar) to be concatenated, or an array containing the complex heterodyned data.

  • times (array_like) – If the data was passed using the data argument, then the associated time stamps should be passed using this argument.

  • par (str, PulsarParameters) – A parameter file, or PulsarParameters object containing the parameters with which the data was heterodyned.

  • detector (str, lal.Detector) – A string, or lal.Detector object, identifying the detector from which the data was generated.

  • window (int, 30) – The length of a window used for calculating a running median over the data. If set to zero the running median will just be initialised with zero values.

  • inject (bool, False) – Set to True to add a simulated signal to the data based on the parameters supplied in injpar, or par if injpar is not given.

  • injpar (str, PulsarParameters) – A parameter file name or PulsarParameters object containing values for the injected signal. A par file must also have been provided, and the injected signal will assume that the data has already been heterodyned using the parameters from par, which could be different.

  • injtimes (list, None) – A list containing pairs of times between which to add the simulated signal. By default the signal will be added into the whole data set.

  • freqfactor (float, 2.0) – The frequency scale factor for the data signal, e.g., a value of two for emission from the l=m=2 mode at twice the rotation frequency of the source.

  • fakeasd (float, str) – A amplitude spectral density value (in 1/sqrt(Hz)) at which to generate simulated Gaussian noise to add to the data. Alternatively, if a string is passed, and that string represents a known detector, then the amplitude spectral density for that detector at design sensitivity will be used (this requires a par value to be included, which contains the source rotation frequency).

  • fakeseed ((int, class:numpy.random.Generator), None) – A seed for the random number generator used to create the fake data (see numpy.random.seed() and numpy.random.Generator for more information).

  • issigma (bool) – Set to True if the fakeasd value passed is actually a noise standard deviation value rather than an amplitude spectral density.

  • bbthreshold ((str, float), "default") – The threshold method, or value for the bayesian_blocks() function.

  • bbminlength (int, 5) – The minimum length (in numbers of data points) of a chunk that the data can be split into by the bayesian_blocks() function. To perform no splitting of the data set this value to be larger than the total data length, e.g., inf.

  • bbmaxlength (int, inf) – The maximum length (in numbers of data points) of a chunk that the data can be split into by the bayesian_blocks() function. By default this is inf, i.e., chunks can be as long as possible.

  • remove_outliers (bool, False) – If True outliers will be found (using find_outliers()) and removed from the data. They will not be stored anywhere in the class.

  • thresh (float, 3.5) – The modified z-score threshold for outlier removal (see find_outliers())

  • comments (str) – A string containing any comments about the data.

  • earthephemeris (str, None) – The path to the Earth ephemeris used for the signal phase model.

  • sunephemeris (str, None) – The path to the Sun ephemeris used for the signal phase model.

  • timeephemeris (str, None) – The path to the time correction ephemeris used for the signal phase model.

Generate a new TimeSeriesBase.

add_noise(asd, issigma=False, seed=None)#

Add white Gaussian noise to the data based on a supplied one-sided noise amplitude spectral density (in 1/sqrt(Hz)).

If generating noise from a given detector’s design curve, a frequency is required, which itself requires a pulsar parameter file to have been supplied.

Parameters:
  • asd (float, str) – The noise amplitude spectral density (1/sqrt(Hz)) at which to generate the Gaussian noise, or a string containing a valid detector name for which the design sensitivity ASD can be used, or a file containing an amplitude spectral density frequency series.

  • issigma (bool, False) – If issigma is True then the value passed to asd is assumed to be a dimensionless time domain standard deviation for the noise level rather than an amplitude spectral density.

  • seed (int, numpy.random.Generator, None) – A seed for the random number generator used to create the fake data (see numpy.random.seed() and numpy.random.Generator for more information).

as_timeseries()#

Return the data as a gwpy.timeseries.TimeSeries.

bayesian_blocks(**kwargs)#

Apply a Bayesian-Block-style algorithm to cut the data (after subtraction of a running median) up into chunks with different statistical properties using the formalism described in Section 2.4 of [5]. Within each chunk the data should be well described by a single Gaussian distribution with zero mean.

Splitting of the data relies on a threshold on the natural logarithm of the odds comparing the hypothesis that the data is best described by two different contiguous zero mean Gaussian distributions with different unknown variances to the hypothesis that the data is described by a single zero mean Gaussian with unknown variance. The former hypothesis is a compound hypothesis consisting of the sum of evidences for the split in the data at any point.

The 'default' threshold for splitting is empirically derived in [5] for the cases that the prior odds between the two hypotheses is equal, and has a 1% false alarm probability for splitting data that is actually drawn from a single zero mean Gaussian. The 'trials' threshold comes from assigning equal priors to the single Gaussian hypothesis and the full compound hypotheses that there is a split (in the 'default' threshold it implicitly assume the single Gaussian hypothesis and each numerator sub-hypothesis have equal prior probability). This is essentially like a trials factor. Alternatively, the threshold value can be any real number.

Parameters:
  • threshold (str, float) – A string giving the method for determining the threshold for splitting the data (described above), or a value of the threshold.

  • minlength (int) – The minimum length that a chunk can be split into. Defaults to 5.

  • maxlength (int) – The maximum length that a chunk can be split into. Defaults to inf.

property bbmaxlength#

The maximum length of a data chunk.

property bbminlength#

The minimum length of a chunk that the data can be split into by the Bayesian Blocks algorithm.

property bbthreshold#

The threshold method/value for cutting the data in the Bayesian Blocks algorithm.

property change_point_indices#

Return a list of indices of statistical change points in the data.

property change_point_ratios#

Return a list of the log marginal likelihood ratios for the statistical change points in the data.

property chunk_lengths#

A list with the lengths of the chunks into which the data has been split.

property comments#

Any comments on the data

compute_running_median(N=30)#

Calculate a running median from the data with the real and imaginary parts separately. The running median will be calculated using a window of samples of a given number. This does not account for any gaps in the data, so could contain discontinuities.

Parameters:

N (int, 30) – The window length of the running median. Defaults to 30 points. If set to 0 the running median will be initialised as an array of zeros.

Returns:

A numpy.ndarray array containing the data with the running median subtracted.

Return type:

array_like

compute_variance(change_points=None, N=30)#

Compute the (sample) variance of the data within a set of change points. The variance will be calculated after subtraction of a running median. As the data is complex, we calculate the variance of a vector in which the real and imaginary components are concatenated. This is equivalent to a two-sided power spectral density.

Parameters:
  • change_points (array_like, None) – An array of indices of statistical change points within the data

  • N (int, 30) – The window size (in terms of data point number) of the running median.

Returns:

A numpy.ndarray of variances for each data point.

Return type:

array_like

property cwinpy_heterodyne_pipeline_config#

If the HeterodynedData object was created through Heterodyne being called within a HTCondor DAG, which itself was set up using the cwinpy_heterodyne_pipeline script, then this attribute can contain the contents of the configuration file that created the DAG.

property cwinpy_version#

Return the version of CWInPy used to produce the dataset.

property data#

A numpy.ndarray containing the heterodyned data.

property detector#

The name of the detector from which the data came.

property dt#

X-axis sample separation

Type:

~astropy.units.Quantity scalar

property filter_history#

An array with the “history” of any filters used during the heterodyning.

find_outliers(thresh=3.5)#

Find, and return the indices of, and “outliers” in the data. This is a modified version of the median-absolute-deviation (MAD) function from [6], using the algorithm of [7].

Parameters:

thresh (float, 3.5) – The modified z-score to use as a threshold. Real or imaginary data with a modified z-score (based on the median absolute deviation) greater than this value will be classified as outliers.

Returns:

A boolean numpy.ndarray that is True for values that are outliers.

Return type:

array_like

property freq_factor#

The scale factor of the source rotation frequency with which the data was heterodyned.

property freqfactor#

Alias for freq_factor to be consistent with input keyword arguments.

generate_roq(priors, **kwargs)#

Generate reduced order quadrature interpolators for likelihood calculations for each data chunk. See GenerateROQ for additional keyword arguments. The model argument is not required as that will always be set to "HeterodynedCWSimulator", while the par and det arguments will default to be those in the HeterodynedData object.

Parameters:

priors (PriorDict) – A PriorDict containing the parameters and prior distributions, which will be used to generate the model reduced basis set.

heterodyne(phase, stride=1, singlesided=False, datasegments=None)#

Heterodyne the data (see gwpy.timeseries.TimeSeries.heterodyne()) for details). Unlike gwpy.timeseries.TimeSeries.heterodyne() this will heterodyne unevenly sampled data, although contiguous segments of data will be truncated if they do not contain an integer number of strides. Additional parameters are given below:

Parameters:

datasegments (list) – A list of pairs of times within which to include data. Data outside these times will be removed.

property heterodyne_arguments#

The dictionary of arguments passed to cwinpy.heterodyne.Heterodyne if it was used to create the current HeterodynedData object. If the HeterodynedData object was created via a merge of many individual HeterodynedData objects then this will be a list with a dictionary from each of the merged objects.

Note

This attribute will only be saved if writing the data to a HDF5 file, but not if writing to an ascii text file.

property include_bsb#

A boolean stating whether the heterodyne included Binary System barycentring.

property include_fitwaves#

A boolean stating whether the heterodyne included corrections for any red noise FITWAVES parameters.

property include_glitch#

A boolean stating whether the heterodyne included corrections for any glitch phase evolution.

property include_ssb#

A boolean stating whether the heterodyne included Solar System barycentring.

inject_signal(injpar=None, injtimes=None, inject=True, **kwargs)#

Inject a simulated signal into the data.

Parameters:
  • injpar ((str, PulsarParameters)) – A parameter file or object containing the parameters for the simulated signal.

  • injtimes (list) – A list of pairs of time values between which to inject the signal.

  • inject (bool) – If True (default) the simulated signal will be generated and added to the data. If False the signal will be created and returned, but not added into the data.

property injection#

Return a boolean to describe whether the data contains a user-generated injection or not.

property injection_data#

The pure simulated signal that was added to the data.

property injection_snr#

Return the optimal signal-to-noise ratio using the pure injected signal and true noise calculated using:

\[\rho = \sqrt{\sum_i \left(\left[\frac{\Re{(s_i)}}{\sigma_i}\right]^2 + \left[\frac{\Im{(s_i)}}{\sigma_i}\right]^2\right)}\]

where and \(s\) is the pure signal and \(\sigma\) is the estimated noise standard deviation.

property injtimes#

A list of times at which an injection was added to the data.

property input_stds#

A boolean stating whether the standard deviations where provides as an input.

is_compatible(other)#

Check if another class:~cwinpy.data.HeterodynedData object is “compatible”, i.e., contains data for the same detector, pulsar and frequency scale, as the current object.

Parameters:

other (class:~cwinpy.data.HeterodynedData) – Another class:~cwinpy.data.HeterodynedData object.

Returns:

Returns True if compatible otherwise it will raise an exception.

Return type:

bool

property laldetector#

The lal.Detector containing the detector’s response and location.

make_signal(signalpar=None, **kwargs)#

Make a signal at the data time stamps given a parameter file.

Note that the antenna response applied to the signal will be that after averaging over the data time step (e.g., if a time step of 30 minutes is used then the antenna response will be the average of ±15 minutes around the timestamp). However, it assumes other variations are slower, so it does not average out any differences in the phase evolution between the heterodyne parameters and any injected parameters (if specified as different) and just produced a point estimate at the data timestamp.

Note

If generating a simulated signal in IPython or a Jupyter notebook, this function can be very slow (taking minutes compared to a fraction of a second), due to redirections of stdout/stderr in the SWIG-wrapped LIGOTimeGPS class. To avoid this the call to make_signal() should be either done within a context manager, e.g.,:

import lal

hetdata = HeterodynedData(...)

with lal.no_swig_redirect_standard_output_error():
    hetdata.make_signal(...)

or by globally disabling redirection with:

import lal

lal.swig_redirect_standard_output_error(True)

hetdata = HeterodynedData(...)
hetdata.make_signal(...)
Parameters:

signalpar (str, PulsarParameters) – A parameter file or object containing the parameters for the simulated signal.

Returns:

A complex numpy.ndarray containing the signal.

Return type:

array_like

merge(other, sort=True)#

Merge another class:~cwinpy.data.HeterodynedData with the current one in-place. The times series will be sorted to be in ascending order if required.

Parameters:
  • other (class:~cwinpy.data.HeterodynedData) – Another class:~cwinpy.data.HeterodynedData object.

  • sort (bool) – Sort the merged data in ascending time order.

property num_chunks#

The number of chunks into which the data has been split.

property outlier_mask#

Masking array to remove outliers.

property outlier_thresh#

The modified z-score threshold for removing outliers (see find_outliers()).

property outliers_removed#

Return a boolean stating whether outliers have been removed from the data set or not.

periodogram(plot=True, ax=None, remove_outliers=False, thresh=3.5, fraction_labels=True, fraction_label_num=4, figsize=(6, 5), labelsize=None, labelname=None, fontsize=None, fontname=None, legendsize=None, **plotkwargs)#

Compute and plot a two-sided periodogram of the data using scipy.signal.periodogram(). Note that this uses zero-padded uniformly sampled data, rather than using the Lomb-Scargle method (such as astropy.stats.LombScargle) that can deal with data with gaps, but doesn’t work for complex data.

See spectrogram() for input parameters, excluding dt, window and overlap. The default figure size is (6, 5).

Parameters:

plotkwargs – Keyword parameters for matplotlib.pyplot.plot().

Returns:

  • array_like – The frequency series

  • array_like – The periodogram power

  • figure – The Figure is a plot is requested.

plot(which='abs', figsize=(12, 4), ax=None, remove_outliers=False, thresh=3.5, zero_time=False, labelsize=None, fontsize=None, legendsize=None, fontname=None, labelname=None, **plotkwargs)#

Plot the data time series.

Parameters:
  • which (str, 'abs') – Say whehther to plot the absolute value of the data, 'abs', the 'real' component of the data, the 'imag' component of the data, or 'both' the real and imaginary components.

  • figsize (tuple, (12, 4)) – A tuple with the size of the figure. Values set in rcparams will override this value.

  • ax (Axes) – A matplotlib.axes.Axes onto which to add the figure.

  • remove_outliers (bool, False) – Set whether to remove outlier for the plot. If outliers removal was already specified when creating the HeterodynedData object they will automatically not be included in the plot.

  • thresh (float, 3.5) – The threshold for outlier removal (see find_outliers()).

  • zero_time (bool, False) – Start the time axis at zero.

  • labelsize (int) – Set the fontsize for the axes tick labels.

  • fontsize (int) – Set the fontsize for the axes labels.

  • legendsize (int) – Set the fontsize for the legend (defaults to be the same as the value or fontsize).

  • fontname (str) – Set the font name for the axes labels and legend.

  • labelname (str) – Set the font name for the axes tick labels. If not set, this will default to the value given in fontname.

  • plotkwargs – Keyword arguments to be passed to matplotlib.pyplot.plot().

Returns:

The matplotlib.figure.Figure containing the plot.

Return type:

figure

Examples

To plot both the real and imginary data one would do:

>>> import numpy as np
>>> from cwinpy import HeterodynedData
>>> # create some fake data (as an example)
>>> times = np.linspace(1000000000., 1000086340., 1440)
>>> het = HeterdynedData(times=times, fakeasd=1e-48)
>>> # plot real data
>>> fig = het.plot(which='both')
power_spectrum(plot=True, ax=None, remove_outliers=False, thresh=3.5, fraction_labels=True, fraction_label_num=4, average='median', dt=86400, figsize=(6, 5), labelsize=None, labelname=None, fontsize=None, fontname=None, legendsize=None, window=None, overlap=0.5, asd=False, **plotkwargs)#

Compute and plot the power spectral density of the data. This computes the spectrogram, and averages the power over time.

See spectrogram() for input parameters. The default figure size is (6, 5).

Parameters:
  • average (str, 'median') – The method by which to “average” the spectrum in time. This can be ‘median’ (the default), ‘mean’, ‘harmonic_mean’, ‘max’ (return the maximum) or ‘min’ (return the minimum).

  • asd (bool) – If True, the amplitude spectral density will be returned rather than the power spectrum.

  • plotkwargs – Keyword parameters for matplotlib.pyplot.plot().

Returns:

  • array_like – The frequency series

  • array_like – The power spectrum

  • figure – The Figure is a plot is requested.

classmethod read(source, *args, **kwargs)#

Read in a time series of data from a given file or list of files. Currently this supports ascii text files as described for the HeterodynedData class or HDF5 files.

See gwpy.timeseries.TimeSeries.read() for more information.

The available built-in formats are:

Format

Read

Write

Auto-identify

csv

Yes

Yes

Yes

h5

Yes

No

No

hdf

Yes

Yes

Yes

hdf5

Yes

Yes

Yes

txt

Yes

Yes

Yes

txt.gz

Yes

Yes

Yes

remove(idx)#

Create a mask to effectively remove values at given indices from the object. This will recalculate the Bayesian Blocks data splitting and variances if required.

Parameters:

idx (int, array_like) – A list of indices to remove.

remove_outliers(thresh=3.5)#

Remove any outliers from the object using the method described in cwinpy.data.HeterodynedData.find_outliers().

Parameters:

thresh (float) –

property running_median#

A ndarray containing the running median of the data.

segment_list()#

Using the heterodyned data timestamps, generate a list of segments (i.e., set of times between which there are uniformly spaced time stamps). This is returns as a list of 2-lists, with each item giving the start and end time (in GPS seconds) of that segment.

set_ephemeris(earth=None, sun=None, time=None)#

Set the solar system ephemeris and time correction files.

Parameters:
  • earth (str, dict, None) – The Earth ephemeris file used for the phase model. Defaults to None, in which case the ephemeris files will be determined from the pulsar parameter file information. If a dictionary is passed, it should be keyed to ephemeris types (e.g., DE405) and point to the equivalent file. The type in the par file will be used to determine which key to use.

  • sun (str, dict, None) – The Sun ephemeris file used for the phase model. Defaults to None, in which case the ephemeris files will be determined from the pulsar parameter file information. If a dictionary is passed, it should be keyed to ephemeris types (e.g., DE405) and point to the equivalent file. The type in the par file will be used to determine which key to use.

  • time (str, dict, None) – The time correction ephemeris file used for the phase model. Defaults to None, in which case the ephemeris files will be determined from the pulsar parameter file information. If a dictionary is passed, it should be keyed to unit type (e.g., TCB) and point to the equivalent file. The type in the par file will be used to determine which key to use.

signal_snr(signalpar)#

Get the signal-to-noise ratio of a signal based on the supplied parameter file.

Parameters:

signalpar (str, PulsarParameters) – A parameter file or object containing the parameters for the simulated signal.

Returns:

The signal-to-noise ratio.

Return type:

float

spectrogram(dt=86400, window=None, overlap=0.5, plot=True, ax=None, remove_outliers=False, thresh=3.5, fraction_labels=True, fraction_label_num=4, figsize=(12, 4), labelsize=None, fontsize=None, fontname=None, labelname=None, legendsize=None, **plotkwargs)#

Compute and plot a spectrogram from the data using the matplotlib.mlab.specgram() function.

Parameters:
  • dt ((float, int)) – The length of time (in seconds) for each spectrogram time bin. The default is 86400 seconds (i.e., one day).

  • window ((callable, np.ndarray)) – The window to apply to each FFT block. Default is to use scipy.signal.tukey() with the alpha parameter set to 0.1.

  • overlap ((float, int)) – If a floating point number between [0, 1) this gives the fractional overlap between adjacent FFT blocks (which defaults to 0.5, i.e., a 50% overlap). If an integer of 1 or more this is the number of points to overlap between adjacent FFT blocks (this is how the argument is used in specgram()).

  • plot (bool, True) – By default a plot of the spectrogram will be produced (this can be plotted on a supplied Axes or Figure), but the plotting can be turned off if this is set to False.

  • ax ((axes, figure)) – If ax is a matplotlib.axes.Axes then the spectrogram will be plotted on the supplied axis.

  • remove_outliers (bool, False) – Set to True to remove outliers points before generating the spectrogram. This is not required if the class was created with the remove_outliers keyword already set to True.

  • thresh (float, 3.5) – The modified z-score threshold for outlier removal (see find_outliers()).

  • fraction_labels (bool, True) – Set to True to output the frequency labels on the plot as fractions.

  • fraction_label_num (int, 4) – The fraction labels will be spaced at Fs/fraction_label_num intervals, between the upper and lower Nyquist values. The default if 4, i.e., spacing will be at a quarter of the Nyquist frequency.

  • figsize (tuple, (12, 4)) – A tuple containing the size (in inches) to set for the figure.

  • labelsize (int) – Set the fontsize for the axes tick labels.

  • fontsize (int) – Set the fontsize for the axes labels.

  • legendsize (int) – Set the fontsize for the legend (defaults to be the same as the value or fontsize).

  • fontname (str) – Set the font name for the axes labels and legend.

  • labelname (str) – Set the font name for the axes tick labels. If not set, this will default to the value given in fontname.

  • plotkwargs – Keyword arguments for matplotlib.pyplot.imshow().

Returns:

  • array_like – A numpy.ndarray of frequencies for the spectrogram

  • array_like – A 2d numpy.ndarray of the spectrogram power at each frequency and time

  • array_like – A numpy.ndarray of the central times of each FFT in the spectrogram.

  • figure – The Figure containing the spectrogram plot. This is not returned if plot is set to False.

property stds#

The standard deviations of the data points.

subtract_running_median()#

Subtract the running median from the data.

Returns:

A ndarray array containing the data with with running median subtracted.

Return type:

array_like

property times#

Positions of the data on the x-axis

Type:

~astropy.units.Quantity array

property tottime#

The total time (in seconds) of the data.

property vars#

The variances of the data points.

property window#

The running median window length.

write(target, *args, **kwargs)#

Write this HeterodynedData object to a file.

The available built-in formats are:

Format

Read

Write

Auto-identify

csv

Yes

Yes

Yes

h5

Yes

Yes

No

hdf

Yes

Yes

Yes

hdf5

Yes

Yes

Yes

txt

Yes

Yes

Yes

txt.gz

Yes

Yes

Yes

Data references#