############
Sky-shifting
############

CWInPy can be used to produce :ref:`posterior probability distributions<Known pulsar parameter
estimation>` for the unknown signal parameters of a continuous gravitational-wave source, i.e., a
pulsar. These allow you to constrain, for example, the gravitational-wave amplitude, or orientation
of the source. However, before making inferences about a particular source's parameters it is useful
to be sure that the signal you are observing is actually astrophysical in origin. Gravitational-wave
detector data contains many non-astrophysical artifacts and also its properties vary over time,
i.e., the noise is the data is not drawn from a purely Gaussian `non-stationary process
<https://en.wikipedia.org/wiki/Stationary_process>`_ (see, e.g., [1]_). Some of these artifacts are
narrow-band spectral line features, which can wander across the frequency band of an expected signal
and potentially mimic a continuous signal to some extent [2]_. Even Gaussian noise can occasionally
conspire to look slightly "signal-like", i.e., produce posteriors in amplitude that peak away from
zero and have some coherent structure detectors.

It is therefore useful to have some way to assess whether a signal is likely to be a detector
artifact or not, i.e., compare some statistic generated from the data containing the signal to a
known background distribution of that statistic for realistic noise-only detector data. We
unfortunately cannot turn signals off or shield the detector from them to work out a noise-only
background. Given a continuous signal is at a specific frequency, the noise-only background could
potentially be generated by looking at many different frequencies. However, because the detector
noise amplitude is very frequency dependent and the spectral features vary across the band, this
means different frequencies are not very representative of the noise around our signal of interest.
This is where sky-shifting [3]_ comes in.

For a long-duration signal, its exact phase evolution is very dependent on its sky location, due to
the Doppler and relativistic modulations required for that location. Therefore, for a particular
source you can completely decohere the signal by :ref:`heterodyning<Heterodyning data>` it at the
wrong sky location. This process is slightly analogous to process of "time shifting" used to
establish the significance of transient gravitational-wave signals (see, e.g., Section 3.6 of [4]_).
If one performs the heterodyne with many random sky shifts and in each case then performs parameter
estimation on the signal and calculates an appropriate statistic, e.g., a coherent versus incoherent
Bayesian odds as given in Equation (32) of [5]_, then the distribution of this statistic can be
compared with that from the un-shifted sky location.

The sky-shifting process works as follows:

#. Perform a "coarse" heterodyne, i.e., heterodyne the data without accounting for any Doppler
   corrections and just using terms of the Taylor expansion in the frequency evolution (this is the
   first stage of the :ref:`"two-stage" heterodyne approach<Example: two stage heterodyne>`).
   Filter and downsample the data, but making sure the filter is wide enough to accommodate the
   Doppler modulation of the source.
#. Randomly generate a number of new sky locations in the same ecliptic
   hemisphere as the source. For each of these new (sky-shifted, *off-source* or *background*)
   locations and the original (*on-source* or *foreground*) location, perform an additional
   heterodyne of the "coarse" data (this is the second stage of the :ref:`"two-stage" heterodyne
   approach<Example: two stage heterodyne>`), using the expected Doppler modulation for that
   position.
#. Perform :ref:`parameter estimation<Known pulsar parameter estimation>` on the heterodyned data
   for each sky location (including the original un-shifted data) and calculate the Bayesian odds
   (assuming a nested sampling algorithm has been used).
#. Histogram the distribution of sky-shifted (*off-source* or *background*) odds values and perform
   a kernel density estimate/distribution fit to compare that distribution to the value from the
   true un-shifted (*on-source*) location.

To make this process easy, CWInPy provides the ``cwinpy_skyshift_pipeline`` script, which sets up
the full :ref:`known pulsar analysis pipeline<Known pulsar analysis pipeline>` for an individual
source. This can either take some "Quick setup" :ref:`command-line arguments<Sky-shift command-line
arguments>` to run on open data with some default settings or can take a
``cwinpy_knope_pipeline``-style :ref:`configuration file<Configuration file>`. It will launch a
HTCondor DAG for the whole process except production of the odds distributions, which must be done
manually using the :func:`~cwinpy.knope.skyshift.skyshift_results` function.

Using the default settings, the pipeline will generate the following directory tree structure:

.. code-block:: bash

   skyshift
    ├── prior.txt          # file containing the prior to use for parameter estimation       
    ├── cache              # directory containing files listing the gravitational-wave data frames for each detector
    ├── configs            # directory containing all the configuration files for cwinpy_heterodyne and cwinpy_pe
    ├── log                # directory containing all the HTCondor job log files
    ├── pulsars            # directory containing pulsar parameter (.par) files for each sky-shifted location
    ├── results            # directory containing the parameter estimation output for each source
    |    ├── pulsar1       # directory containing the parameter estimation output for the first location
    |    ├── pulsar2       # directory containing the parameter estimation output for the second location
    |    └── ...
    ├── segments           # directory containing the data science segments to use for each pulsar
    ├── stage1             # directory containing the output of the first stage heterodyne for each detector
    |    ├── detector1     # directory containing the output of the first stage heterodyne for first detector
    |    ├── detector2     # directory containing the output of the first stage heterodyne for second detector
    |    └── ...
    ├── stage2             # directory containing the output of the second stage heterodyne for each detector and source location
    |    ├── detector1     # directory containing the output of the second stage heterodyne for first detector
    |    ├── detector2     # directory containing the output of the second stage heterodyne for second detector
    |    └── ...
    └── submit             # directory containing the HTCondor DAG file and submit files

Sky-shift command-line arguments
--------------------------------

The command line arguments for ``cwinpy_skyshift_pipeline`` can be found using:

.. command-output:: cwinpy_skyshift_pipeline --help

Example: hardware injection
===========================

An easy way to test the sky-shifting analysis is by looking at one of the hardware injection
signals. The ``cwinpy_skyshift_pipeline`` script below sets up the analysis to run on O1 data for
the ``PULSAR03`` injection with 500 sky-shifts. By default this will run with both the LIGO
detectors, H1 and L1, with parameter estimation performed both coherently with both detectors and on
each of the individual detectors.

.. code-block:: bash

   cwinpy_skyshift_pipeline --run O1 --pulsar PULSAR03 --nshifts 500 --accounting-group aluk.dev.o1.cw.targeted.bayesian

The above line automatically launches the HTCondor jobs that run the analysis.

.. note::

   After the pipeline completes, there will be many "resume" files that can take up a lot of space.
   It is worth moving into the ``results`` directory and deleting these, e.g.:

   .. code-block::

      rm */*.pickle

Spectra and parameter estimation
--------------------------------

To show that the sky-shifting truly does decohere any signal, we can look at spectra and the final
parameter estimation for the un-shifted result compared to a sky-shifted result. The spectra can be
plotted (from the ``skyshift`` directory) with:

.. code-block:: python

   import os
   import lal
   from cwinpy import HeterodynedData
   from matplotlib import pyplot as plt

   hetdir = "stage2"

   dets = ["H1", "L1"]
   times = {"H1": "1129136736-1137253524", "L1": "1126164689-1137250767"}

   onsource = "JPULSAR03"
   offsource = "JPULSAR03A"

   fig, ax = plt.subplots(1, 2, figsize=(9, 5))

   for i, det in enumerate(dets):
       # on source heterodyned data
       ondata = HeterodynedData.read(os.path.join(hetdir, det, f"heterodyne_{onsource}_{det}_2_{times[det]}.hdf5"))
       ondata.power_spectrum(remove_outliers=True, dt=int(lal.DAYSID_SI * 10), label="on-source", lw=2, color="k", ax=ax[i])

       # off-source (sky-shifted) data
       offdata = HeterodynedData.read(os.path.join(hetdir, det, f"heterodyne_{offsource}_{det}_2_{times[det]}.hdf5"))
       offdata.power_spectrum(remove_outliers=True, dt=int(lal.DAYSID_SI * 10), label="off-source", alpha=0.8, ls="--", ax=ax[i])

       ax[i].set_title(f"{det}")
       ax[i].legend(loc="upper right")

   fig.tight_layout()
   fig.savefig(f"skyshift_hwinj_spectrum.png", dpi=200)

.. thumbnail:: skyshifting/skyshift_hwinj_spectrum.png
   :width: 600px
   :align: center

It is obvious from these that the strong hardware injection signal has been wiped out as a result of
heterodyning using the wrong sky position.

The posterior probability distributions on the signal parameters, for the joint H1 and L1 analysis,
can be plotted with:

.. code-block:: python

   import os

   from cwinpy.info import HW_INJ
   from cwinpy.plot import Plot

   resdir = "results"

   onsource = "JPULSAR03"
   offsource = "JPULSAR03A"
   pulnum = 3

   for title, pulsar in zip(["on-source", "off-source"], [onsource, offsource]):
       resfile = os.path.join(resdir, pulsar, f"cwinpy_pe_H1L1_{pulsar}_result.hdf5")

       p = Plot(
           resfile,
           parameters=["h0", "iota", "phi0", "psi"],
           plottype="corner",
           pulsar=(HW_INJ["O1"]["hw_inj_files"][pulnum] if pulsar == "JPULSAR03" else None),
       )

       fig = p.plot()
       p.fig.suptitle(f"{title}")
       p.save(f"{pulsar}_H1L1_plot.png", dpi=200)
       p.fig.close()

.. thumbnail:: skyshifting/JPULSAR03_H1L1_plot.png
   :width: 275px
   :align: left
   :group: injection

.. thumbnail:: skyshifting/JPULSAR03A_H1L1_plot.png
   :width: 275px
   :align: right
   :group: injection

where the left image shows the "on-source" posteriors and the right the "off-source" (aka
sky-shifted) posteriors.

Odds distribution
-----------------

The distribution of odds values can be plotted in a variety of different ways using the
:func:`~cwinpy.knope.skyshift.skyshift_results` function. This can either plot the distribution in
histogram form or on ths sky (if the `healpy <https://healpy.readthedocs.io/>`__ package is
installed more options are available). :func:`~cwinpy.knope.skyshift.skyshift_results` will
calculate the odds using the :func:`~cwinpy.pe.peutils.results_odds` function for all the results
produced by the sky-shift pipeline. As there are two detectors in this analysis, we can use the
"coherent versus incoherent" odds as a way to assess the significance of the hardware injection
signal.

A histogram of the background, sky-shifted, odds, can be plotted with the odds of on-source sky
position shown as a dashed vertical line, using (from the ``skyshift`` directory):

.. code-block:: python

   from cwinpy.info import HW_INJ
   from cwinpy.knope.skyshift import skyshift_results

   s, t, fig = skyshift_results(
       "pulsars",
       HW_INJ["O1"]["hw_inj_files"][3],  # location of parameter file for hardware injection 03
       "results",
       plot="hist",
   )

   fig.show()

.. thumbnail:: skyshifting/oddshist1.png
   :width: 600px
   :align: center

By default, this shows the "coherent versus incoherent" odds and uses a logarithmic scale for the
y-axis. The ``s`` variable will hold a 3x500 array containing the right ascension, declination and
odds for each sky-shifted source, while ``t`` will be tuple of these values for the on-source
location. These can be passed to :func:`~cwinpy.knope.skyshift.skyshift_results` rather than using
the results directories if wanted to make further plots.

For example, we can plot the `complementary cumulative distribution function
<https://en.wikipedia.org/wiki/Cumulative_distribution_function#Complementary_cumulative_distribution_function_(tail_distribution)>`__
(aka the `survival function <https://en.wikipedia.org/wiki/Survival_function>`__) and include a fit
to a `gamma distribution <https://en.wikipedia.org/wiki/Gamma_distribution>`__, with:

.. code-block:: python

   _, _, fig = skyshift_results(s, t, plot="invcdf", dist="gamma")
   fig.show()

.. thumbnail:: skyshifting/oddshist2.png
   :width: 600px
   :align: center

The plot displays the probability of the background producing an odds value equal to or greater than
that from the on-source location. In this case that probability is essentially zero. However, it is
worth noting that the background distribution appears to diverge from a gamma distribution in the
tail.

Sky distribution
----------------

The :func:`~cwinpy.knope.skyshift.skyshift_results` function can also be used to show the
distribution of odds values over the sky. If you do not have `healpy
<https://healpy.readthedocs.io/>`__ installed, you can plot this using a
:func:`matplotlib.pyplot.hexbin` plot, assuming ``s`` and ``t`` have already been produced as
:ref:`above<Odds distribution>` with:

.. code-block:: python

   _, _, fig = skyshift_results(s, t, plot="hexbin")
   fig.show()

.. thumbnail:: skyshifting/skydist1.png
   :width: 600px
   :align: center

If you have `healpy <https://healpy.readthedocs.io/>`__ installed you can plot "hammer", "lambert",
"mollweide", "cart", or "aitoff" projection plots, e.g.,

.. code-block:: python

   _, _, fig = skyshift_results(s, t, plot="mollweide")
   fig.show()

.. thumbnail:: skyshifting/skydist2.png
   :width: 600px
   :align: center

This shows how only half the sky is covered when generating the sky-shifted positions.

Odds versus signal-to-noise ratio
---------------------------------

It can be useful to compare how the odds for each sky location are correlated with the recovered
signal-to-noise ratio (SNR). This can be done using the :func:`~cwinpy.pe.peutils.plot_snr_vs_odds`
function, which makes use of the :func:`~cwinpy.pe.peutils.optimal_snr` function. This will find the
maximum a-posteriori, or maximum likelihood, sample from the parameter estimation for each sky
location and calculate the optimal matched filter signal-to-noise ratio of a signal with those
parameters. The :func:`~cwinpy.pe.peutils.plot_snr_vs_odds` can take in the heterodyned data
directory and parameter estimation directory and compute odds and SNR itself, but below we will
assume you use the :func:`~cwinpy.pe.peutils.results_odds` and
:func:`~cwinpy.pe.peutils.optimal_snr` functions to given values that can be reused.

.. code-block:: python

   from cwinpy.pe.peutils import optimal_snr, plot_snr_vs_odds, results_odds

   # get dictionary of odds ratios
   odds = results_odds("results", oddstype="cvi")

   # get SNRs (this may take some time, so save the output if you want to re-use it!)
   snrs = optimal_snr("results", "stage2")

   # extract the coherent H1-L1 SNRs
   snrsjoint = {p: snrs[p]["H1L1"] for p in snrs}

   # plot the on-source values
   fig, ax = plot_snr_vs_odds(snrsjoint, odds, oddstype="cvi", scatterc="odds")
   
   # highlight on-source value
   ax.plot(snrsjoint["JPULSAR03"], odds["JPULSAR03"], marker="o", c="m", ms=15, mfc="none")
   
   fig.show()

.. thumbnail:: skyshifting/snrsvsodds1.png
   :width: 600px
   :align: center

.. note::

   When you have lots of sky-shifts the :func:`~cwinpy.pe.peutils.optimal_snr` function can take
   10s of minutes to compute, so you will have to be patient!

Example: analysis outlier
=========================

In the search for gravitational-wave signals from pulsars using LIGO O1 data [6]_, the coherent
versus incoherent signal odds for pulsar J1932+17 was an outlier when compared to these for the rest
of the pulsars in the search sample. The sky-shifting method can be used to assess the significance
of the outlier. In this case, as in [3]_, a log-uniform prior will be used on the gravitational-wave
amplitude :math:`h_0` (this is the uninformative Jeffreys prior for a scale parameter such as
:math:`h_0`), and therefore a configuration file must be used to set up the
``cwinpy_skyshift_pipeline`` rather than just using the "quick setup" options. The following
configuration file, called ``outlier.ini``, is used to run on O1 data for both LIGO detectors:

.. literalinclude:: skyshifting/exampleconfig.ini
   :language: ini

which points to a ``prior.txt`` file containing:

.. literalinclude:: skyshifting/prior.txt
   :language: text

and a pulsar parameter file, as used by the O1 analysis based on observations from the `Jodrell Bank
Observatory <https://www.jodrellbank.manchester.ac.uk/research/groups/pulsars/>`__, containing:

.. literalinclude:: skyshifting/J1932+17.par
   :language: text

The analysis was the set up to run 500 sky-shifts with:

.. code-block:: bash

   cwinpy_skyshift_pipeline --nshifts 500 outlier.ini

When using a configuration file the HTCondor jobs do not get automatically submitted unless
specified in the file. Therefore, in this case the jobs were subsequently submitted using:

.. code-block:: bash

   condor_submit_dag submit/cwinpy_skyshift.dag

As in the :ref:`previous example<Example: hardware injection>`, we can plot the distribution of odds
values. To get a histogram with the 

.. code-block:: python

   from cwinpy.knope.skyshift import skyshift_results

   s, t, fig = skyshift_results(
       "pulsars",
       "J1932+17.par",
       "results",
       plot="invcdf",
       kde=True,
   )
   fig.show()

.. thumbnail:: skyshifting/oddshistoutlier.png
   :width: 600px
   :align: center

This overplots a kernel density estimate of the distribution (the gamma distribution used
:ref:`above<Odds distribution>` does not provide a good fit in this case). It can be seen that
despite the on-source value being an outlier, it is still within the distribution of background
values and there are in fact larger odds in the background. The background distrbution suggests that
there is about a 2% chance of getting an odds value equal or higher to the on-source value. In a
search for hundreds of pulsars it would therefore be unsurprising to find such an outlier.

We can see how these odds values are distributed on the sky with:

.. code-block:: python

   _, _, fig = skyshift_results(s, t, plot="mollweide")

   fig.show()

.. thumbnail:: skyshifting/skydistoutlier.png
   :width: 600px
   :align: center  

As in the :ref:`previous example<Odds versus signal-to-noise ratio>`, we can look at the
distribution of odds values plotted against the recovered signal-to-noise ratios. As above, we
can calculate the SNRs using the (default) maximum a-posteriori sample:

.. code-block:: python

   from cwinpy.pe.peutils import optimal_snr, plot_snr_vs_odds, results_odds

   oddsoutlier = results_odds("results", oddstype="cvi")
   snrsoutlier = optimal_snr("results", "stage2")

   snrsjoint = {p: snrsoutlier[p]["H1L1"] for p in snrsoutlier}
   fig, ax = plot_snr_vs_odds(snrsjoint, oddsoutliers, oddstype="cvi", scatterc="odds", xscale="log")

   # highlight on-source value
   ax.plot(snrsjoint["J1932+17"], oddsoutliers["J1932+17"], marker="o", c="m", ms=15, mfc="none")

   fig.show()

.. thumbnail:: skyshifting/snrsvsoddsoutlier.png
   :width: 600px
   :align: center

In the above plots, we've used a log-scale for the SNR axis due to the odd-looking SNR distribution.
This SNR distribution is a product of using the log-uniform prior on the signal amplitude
:math:`h_0`, causing most of the maximum a-posteriori values to be peaked close to zero. Even so, we
see that the outlier signal is within a cluster of cases with SNR values above around 3, although it
is not a unique outlier. We can instead switch to plotting the SNR generated using the maximum
likelihood sample to give:

.. code-block:: python
   
   snrsoutliermaxL = optimal_snr("results", "stage2", which="likelihood")
   snrsjoint = {p: snrsoutlier[p]["H1L1"] for p in snrsoutliermaxL}
   fig, ax = plot_snr_vs_odds(snrsjoint, oddsoutliers, oddstype="cvi", scatterc="odds", xscale="log")

   # highlight on-source value
   ax.plot(snrsjoint["J1932+17"], oddsoutliers["J1932+17"], marker="o", c="m", ms=15, mfc="none")

   fig.show()

.. thumbnail:: skyshifting/snrsvsoddsoutliermaxl.png
   :width: 600px
   :align: center

Sky-shifting API
================

.. automodule:: cwinpy.knope.skyshift
   :members: 

Sky-shifting references
-----------------------

.. [1] `D. Davis et al
    <https://ui.adsabs.harvard.edu/abs/2021CQGra..38m5014D/abstract>`_, *CQG*, **38**, 135014 (2021).

.. [2] `P. Covas et al
    <https://ui.adsabs.harvard.edu/abs/2018PhRvD..97h2002C/abstract>`_, *PRD*, **97**, 082002 (2018).

.. [3] `M. Isi, S. Mastrogiovanni, M. Pitkin & O. J. Piccinni
    <https://ui.adsabs.harvard.edu/abs/2020PhRvD.102l3027I/abstract>`_, *PRD*, **102**, 123027 (2020).

.. [4] `S. A. Usman et al
    <https://ui.adsabs.harvard.edu/abs/2016CQGra..33u5004U/abstract>`_, *CQG*, **33**, 215004 (2016).

.. [5] M. Pitkin, M. Isi, J. Veitch & G. Woan, `arXiv:1705.08978v1
    <https://arxiv.org/abs/1705.08978v1>`_ (2017).

.. [6] `B. P. Abbott et al <https://ui.adsabs.harvard.edu/abs/2017ApJ...839...12A/abstract>`_,
    *ApJ*, **839**, 12 (2017).