## S-PASS Cosmic-web Paper Accepted

Our paper on the cosmic-web has now been accepted to the Monthly Notices of the Royal Astronomical Society (MNRAS). Although we didn’t detect any significant emission from the cosmic-web, we were able to set primordial magnetic field limits comparable to the state-of-the-art CMB limits. You can find the paper on arXiv here.

## MWA Cosmic-web paper

I’m excited to say that the paper describing the cross-correlation of the low-frequency sky with tracers of the cosmic-web has now been accepted for publication! Tessa Vernstrom did a wonderful job; the tightest constraints on the synchrotron cosmic-web yet.

Can we teach a machine to recognize what we can’t? This may sound like an obvious yes to astronomers, because astronomers are constantly working in regimes where the signal is the same order of magnitude as the noise, and we often need to manipulate the data to extract a measurement that was not apparent with our own eyes. To someone working in machine-learning, where to goal is often to teach (or train) a machine to perform simple recognition tasks that human do effortlessly, the answer to this question would probably be not without a lot of data, and they’d also be right. What happens, however, when we don’t have enough examples of a type of object to train on, or even scarier, when the experts have a difficult time recognizing the objects even when they know they are there?

My colleagues and I are facing this problem right now. To discuss the problem, let me give a little background.

The newly constructed Australia Square Kilometre Array Pathfinder Telescope (ASKAP) will be one of the most powerful survey radio telescopes in the work, operating between 700-1800 MHz in both continuum (total brightness) and spectral-line (brightness as a function of frequency) modes. The telescope will conduct a dedicated polarization survey (called POSSUM), with the primary goal of detecting Faraday rotation measures (RM) from background radio sources. The RM, which measures the integrated amount of thermal electrons and magnetic field along the line of sight, can hold the key to unraveling several mysteries of cosmic magnetism. One problem that we have in preparing a catalog of RMs for polarized sources, is distinguishing between those that are a simple “Faraday thin” source or a more complex source with multiple components or “Faraday thick”. Below I show an example of a simple Faraday thin source, where the polarization angle of the (radio) light has been rotated by a simple cloud of material.

The problem is, it has been shown that polarized sources with two, closely spaced Faraday thin components (and are thus complex) can look like a Faraday thin source, and further, the RM that is measured will be different than the two individual RMs (and it might not be the average of them either). I’ll describe next how I think we can approach this problem with deep convolutional neural networks.

## Astrophysical Machine Learning Course

I’m excited to say that I’ll be teaching an impromptu course this spring on “Astrophysical Machine Learning”. It’s impromptu because I didn’t expect to teach it, it just worked out that there were a lot of students in my engineering physics course this Fall that got interested when I showed my work in class. Right now there are only about six students enrolled, with several more sitting in. I’m putting the course webpage online here, and the students will be sharing there work (with each other at first) on Github. I haven’t used Github for any of my work yet, so I’m excited to learn as we progress.

A side benefit for me (and the students) is that we’ll be quickly breaking into groups and working on real research problems, many of them centered around ML applications for source-finding and classification of radio sources. I’m the chair of the “Cosmic-Web” Key Science Project for the Evolutionary Map of the Universe survey to be conducted with the ASKAP telescope. Of particular interest to me are source-finding algorithms for diffuse sources (see below), were it is often difficult to find and characterize them when there are imbedded compact sources. Below is an example of a diffuse source (a simple cluster radio halo) with background point-sources imbedded within, taken from a simulation of what the EMU survey will be capable of. Early science for ASKAP is happening right now so the time is right to test some of this out on real data!

Posted in Uncategorized | 1 Comment

## Constraining Primordial Magnetic Fields with S-PASS

I’ve just submitted to MNRAS my newest project conducted as part of the S-band All Sky Polarization Survey (S-PASS) project. We’ve taken the 2.3 GHz total intensity (just radio brightness) map of the southern sky provided by S-PASS and performed a cross-correlation with a model of the cosmic-web (obtained from a constrained magnetohydrodynamic simulation of the local Universe, see below) in order to set an upper limit on the cosmological magnetic fields in filaments of B>0.03 $\mu$G (B>0.13 $\mu$G if you take a density weighted average). As a side note, we were also able to infer an upper limit on the primordial magnetic field of B$_{PMF}$> 1.0 nG! This limit is better than that obtained by Planck, and on par with the recent combined Planck + South Pole Telescope limit, though it’s highly model dependent. An advance peak can be found here (paper).   An image (from the paper) of the model radio emission is shown below, and the simulations where done by Klaus Dolag.

## Information Field Theory II

Last time we calculated a “map” of our signal, which was the expectation value of $s$ with respect to the a posteriori probability $P(s|d)$, and I stated that there where several ways of doing it. One important method is to introduce the notion of a generating function into the definition of the partition function $Z$.

$Z_d[J] \equiv \int \!\! \mathcal{D}s~ e^{-H[s] + J^{\dagger}s}$

where $Z_d[0]=Z$. In this way we can calculate any higher moment of the signal field via differentiation;

$\langle s(x_1) \cdots s(x_n) \rangle = \frac{1}{Z}\frac{\delta^n Z_d[J]}{\delta J(x_1) \cdots \delta J(x_n)} \vert_{J=0}$.

We should also introduce the connected correlation functions, which includes corrections for lower moments to the correlation function.

$\langle s(x_1) \cdots s(x_n) \rangle_c = \frac{\delta^n ln(Z_d[J])}{\delta J(x_1) \cdots \delta J(x_n)} \vert_{J=0}$.

In this way, the map is now the first derivative

$m_d = \langle s \rangle_c =\frac{\delta ln(Z_d[J])}{\delta J(x)} \vert_{J=0} = Dj$.

I’m still leaving the evaluation of the integral as an exercise, mostly because I don’t want to type out the latex! The important thing is that the generating function is what we’ll use once the Gaussian “free theory” has extra terms added on, and we will use Wick’s theorem to calculate the perturbation expansion in terms of the two point correlation function $\langle s(x) s(y) \rangle_c = D(x,y)$, which is just the propagator of the free theory.

## Information Field Theory I

Recently I’ve had the need to learn Information Field Theory (IFT) for some of my current research on detecting the (very faint) radio emission associated with the large-scale structure of the universe (an older paper on the subject is here). IFT is basically Bayesian statistics applied to fields, but in certain cases the a posteriori probability can be computed through perturbation theory. Feynman diagrams can be used to compute the expansion (below are examples of a few terms), so I was forced to go back to my QFT book to remember the basics of perturbation theory….which led me back to QFT in general…. so here I am.

So what is IFT and why does it look a lot like QFT? I haven’t yet talked about QFT, but I can begin to explain IFT, at least the basics.

Consider a signal $s=s(x)$, which is a function of position (a classical field), and we wish to make some inference about this field based on observational data $d$. A common inference would be to make an image (map) from incomplete data. We model our measurement process as $d=Rs+n$, were R is an operator describing the coupling between the data and signal, and n is random noise. Typically $s(x)$ is continuous and $d$ is discrete, so R will be some sort of selection function. It can also include more complicated transformations (e.g., Fourier transformations in the case of radio astronomy). In most cases, $n$ is gaussian, but it need not be.

We don’t know what the actual signal is, so $s$ is really just our model of what we think the signal is. In this case we’d like to compute the a posteriori probability that our model $s$ is the correct one given the data, denoted by $P(s|d)$. Bayes’ theorem tells us that

$P(s|d)=\frac{P(d|s) P(s)}{P(d)}$

where $P(d|s)$ is the probability of getting $d$ given $s$, called the likelihood, $P(s)$ is the prior probability of model $s$, and $P(d)$ is the evidence, which is the normalization factor given by

$P(d) = \int \mathcal{D}s P(d|s) P(s)$.

The integral is a functional integral of the likelihood over all possible configurations of $s$, weighted by the prior probability of that configuration $P(s)$. The likelihood contains information about the measurement process, and is typically a function the model signal’s parameters. There’s a lot out there on Bayes’ theorem so I won’t talk about it here, but is can be derived from basic facts about probabilities and Aristotelian logic.

The step taken in Ensslin et al. (2009) is to rewrite Bayes’ theorem in the form

$P(s|d) = \frac{e^{-H}}{Z}$

where $H[s] \equiv -ln(P(d|s) P(s))$ is the Hamiltonian, and $Z \equiv P(d)=\int \mathcal{D}s e^{-H}$. Things are starting to look suspiciously like statistical mechanics, where $Z$ is essentially the partition function; there are more links to stat. mech. that I can get into later. Let’s first consider an example.

Imagine that the signal we are trying to observe (make an image of) is a Gaussian random field (e.g., the cosmic microwave background fluctuations) denoted by $\mathcal{G}(s, S)$. Let’s assume that the value of $s(x)$ at $x$ is not known, but we do know the signal covariance $S=$, i.e., the variance $\sigma^2$ of the Gaussian field. We can also assume that our observational noise is Gaussian $\mathcal{G}(n, N)$ (it often is), with an unknown value $n$ at any point, but with a known covariance $N = $. In this case, the likelihood can be written as $P(d|s)=\mathcal{G}(d-Rs,N)$ and the prior probability is given by $P(s)=\mathcal{G}(s, S)$ ….. take a minute to convince yourself of this. In the case of the likelihood, if you assume $s$ is true, then the probability that you measure $d$ is going to depend on how far away $d$ is from $s$, and will be a Gaussian. Explicitly, these are given by

$P(d|s)=\frac{1}{\vert 2\pi N \vert}exp\left( -\frac{1}{2}(d-Rs)^{\dagger}N^{-1}(d-Rs)\right)$

$P(s)=\frac{1}{\vert 2\pi S \vert}exp\left( -\frac{1}{2}s^{\dagger}S^{-1}s\right)$.

This is all in matrix notation, where $\vert S \vert$ is the determinant of $S$. As an exercise one can compute the Hamiltonian

$H[s]=-ln(P(d|s)P(s))=\frac{1}{2}s^{\dagger}D^{-1}s - j^{\dagger}s + H_{0}$,

where

$D= [S^{-1} + R^{\dagger}N^{-1}R]^{-1}$

is called the propagator of the free theory, and the information source $j$ is given by

$j = R^{\dagger}N^{-1}d$.

The constant $H_{0}$ is the collection of all the terms independent of $s$, is is given by

$H_{0}=\frac{1}{2} d^{\dagger}N^{-1}d + \frac{1}{2} ln( \vert 2 \pi S \vert \vert 2 \pi N \vert )$.

To compute $P(s|d)$, one just needs to plug in this Hamiltonian into Bayes’ theorem and compute the partition function integral. To obtain what we originally wanted, which was a map $m(x)$ of the signal $s(x)$, we simply need to compute  the expectation value of $s$ with respect to $P(s|d)$, or

$m = _{(s|d)} = \int \mathcal{D}s P(s|d) s$.

This integral can be calculated in a number of ways. One way in particular is particularly useful for the perturbative extension of the theory, but for now one can compute directly that $m=Dj$. In the continuous limit this reads

$m(x) = \int dy D(x,y)j(y)$, and is represented by the diagram     x  y, where the external coordinate with no vertex is $x$, the vertex coordinate is $y$, and the line connecting them represents the propagator $D(x,y)$. The vertex represents the information source $j(y)$, and we should sum/integrate over the vertex (called internal later) coordinate. The intuitive understanding here is that $j$ contains all the data (projected onto the continuous space of the model, weighted by the noise via $j = R^{\dagger} N^{-1}d$), and $D(x,y)$ will “propagate” this information (it knows about the prior source covariance $S$) into unobserved parts of the map $m(x)$.

All of this was already well understood in image reconstruction theory (in the form of a Wiener filter), but with the formalism of IFT, we can next look at what happens if we have higher order perturbations to the free theory Hamiltonian (and one often does!). Look for part II later!