Astronomy Leads The Way In Big Data

Jay Kremer and colleagues at University of Copenhagen write in IEEE Intelligent Systems about, “Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy [1].

This article is a nice survey of the kinds of data that astronomers collect, and the challenges of analyzing, and, indeed, simply handling it all.

I have worked with Astronomers in the past, and one of the coolest things is that when they have a dataset that covers “everything”, they really mean everything—the entire Universe, at least as much as we can see from where we are. And it is so romantic. Every study deals with space and time, matter and energy, theory and observation. Astronomical data makes you feel tiny and insignificant. Yet we are part of this gigantic picture, and our brains are capable of learning so much about it.


Kramer walk through many aspects of  contemporary Astronomical data. They describe the data (visible light and spectrographic measures), which are captured in detailed images of the sky. Billions of pixels recorded from signals that have travelled incomprehensible distances over inconceivable time spans, to intersect with us here and now.

No human could view all this information, nor make sense of it. The data is run through pipelines that use algorithms to clean up the data and look for “interesting” stuff. These days, the processing also automatically generates catalogs of objects in the image, i.e., tries to find everything interesting in the image. Of course, the details depend on the data source and what you are looking for—stars, galaxies, planets, asteroids, or many other possible targets.

Over the years, astronomers have employed all kinds of image analysis, including machine learning techniques to automate these processes. In fact, many techniques pioneered by astronomers have been adopted for other uses. Astronomers have also pioneered the use of crowdsourced “citizen science” to aid the development and validation of these algorithms. Galaxy Zoo was one of the first and most successful such citizen science project, and has spawned dozens of clones.

In order to understand and answer questions about these massive datasets, e Astronomers have also pioneered statistical methods and search techniques. Kramer also discusses the difficult challenges of creating models that connect theory to the observational data. Much of astronomy is about trying to go from theoretical physics to “pixels in the image”, and vice versa.

Finally, they note that most of the data is openly available (though you really can’t download a copy, because it too freaking much). Most of the software is available, too. (This openness is possible largely because no one knows how to make money off astronomy, not even astronomers.) This means that there is opportunity for anyone to get into the game, to create new analyses, or to discover new science. Much of the data has hardly been studied at all, so who knows what you might be able to do?

In one sense, this article is nothing new. For centuries, Astronomy has led the development of instruments, data analysis, and theory. Looking out at the universe is both the hardest, and the most informative, scientific observations of all, and Astronomers are always working at the edge of what is technically possible.

In the past few years, there has been an accelerating trend to cut pubic funding for scientific research. The remaining funds are ever more tightly rationed, forcing hard choices, and difficult arguments about the relative benefits of different activities. Inevitably, there are strong pressures to reduce activities that have little obvious and direct benefit for people or important political interest groups.

One of the prime targets has been large-scale astrophysics, which requires expensive equipment and is, by definition, not about current life on Earth. It doesn’t even employ large numbers of people, at least once construction has finished. What good is it, except to fill the curiosity of a few egg heads?

This political picture is important to keep in mind when reading this article. They are responding to the “Why should we pay for these large investigations?”

In short, one reason to support Astronomy research is that this work can drive many data technologies that are increasingly important in may fields closer to home (and more profitable).

This is not the most romantic reason to do Astronomy, but it is a valid and important point.

  1. Jan Kremer, Kristoffer Stensbo-Smidt, Fabian Gieseke, Kim Steenstrup Pedersen, and Christian Igel, Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy. IEEE Intelligent Systems, 32 (2):16-22, 2017.


Space Saturday

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s