Of HiPSters and Spherinators: Providing Answers to Unknown Unknowns

Machine learning is one of the fastest growing areas of computer science, used to build Artificial Intelligence system models. At the Heidelberg Institute for Theoretical Studies (HITS), we are in the lucky position to witness the development and usage of machine learning tools at close quarters and in various fields, sometimes even across completely different disciplines.
So while in our last blog post on “Digital Baby Twins”, we zoomed in at cell level to simulate the first crucial months in a newborn’s life with the help of AI, this time we talk about machine-learning based tools to zoom out – and here we mean really far out – to galaxy level: Come and meet the “Spherinator”, from the known unknowns to the unknown unknowns! Or, in other words: Getting answers to questions you had no idea they existed.
Breaking new ground: using machine learning for simulations
Meaningful questions can change the way we think about a concept and challenge the way an existing model is used. But if asking the right questions is the essence of good science, how do we find them?
Part of the inspiration often comes from realizing that old tools have become inadequate, outlived their usefulness, or that their full potential is not being used. An excellent starting point for HITS astrophysicist and postdoc Sebastian Trujillo Gomez and his colleagues in the Astroinformatics group: Together with team leader Kai Polsterer and HITS IT specialist Bernd Doser they asked themselves why machine learning tools were extensively used in Astronomy but not for simulations:
“When reviewing the literature we found by far the majority of applications of AI in astronomy are aimed at automating tasks that humans are good at, allowing things like classification and detection of anomalies in extremely large surveys of the sky. There was a clear lack of applications for the other half of the scientific method: the process of coming up with hypotheses and testing them against data”, says Sebastian Trujillo Gomez.
But before we dive in deeper, let’s take one step back. What do we mean when we talk about simulations in astronomy? Well, one of the most popular examples of a still ongoing series of astrophysical simulations was also developed at HITS by Volker Springel and his team in the “Theoretical Astrophysics” group: The “Illustris” project – a set of large-scale cosmological simulations, including the largest simulation of galaxy formation to date – was the first one to simulate a big chunk of the universe and all its galaxies. The calculation tracks the expansion of the universe, the gravitational pull of matter onto itself, the motion or “hydrodynamics” of cosmic gas, as well as the formation of stars and black holes.
So how can machine learning advance the field?

Breaking the bias: finding the “unknown unknowns”
“Generally, machines are excellent at learning to perform tedious tasks very quickly. For this, they need only a large number of examples. This makes machine learning ideal for automating many tasks where we know the question that we aim to answer with the data. However, this approach is not so useful for finding new and unexpected patterns in the data as it is limited by our own intuitions and expectations and provides at most only answers to the ‘known unknowns’– thus missing what Donald Rumsfeld, the former United States Secretary of Defense, famously termed the ‘unknown unknowns’.”
To tackle this problem, Sebastian and his colleagues have developed new software tools to enable explorative access to the largest exascale cosmological simulations, like Illustris. These tools learn compressed representations of large samples of simulated (or real) galaxies from only the data and without human biases. They provide explorative access to the compressed representation using an interactive graphical interface, letting the user explore simulated and real galaxies in the same intuitive similarity space.
Thereby, they help astronomers maximize scientific breakthroughs by letting the machine learn unbiased interpretable representations of complex data, ranging from observational surveys to simulations. The tools automatically learn low-dimensional representations of complex objects such as galaxies in multimodal data (e.g. images, spectra, datacubes, simulated point clouds, etc.), and provide interactive explorative access to arbitrarily large datasets using a simple graphical interface. The framework is designed to be interpretable, work seamlessly across datasets regardless of their origin, and provide a path towards discovering the ‘unknown unknowns’.
Breaking the mold: squeezing the most out of datasets
“Spherinator and its colleague HiPSter, another related tool, work together seamlessly to take a catalog containing millions of simulated galaxies, and automatically arrange them on a spherical ‘map’ where galaxies with similar structural features like spiral arms, bulges, or bars are grouped together and very different ones are far apart. This lets researchers get a quick and intuitive visualization of the entire dataset that can lead to finding new interesting predictions of the simulation, as well as shortcomings of the models.”
By addressing these technical challenges, the researchers aim to empower scientists to more effectively extract valuable insights from their simulation data, without being hindered by biases or computational limitations.
So what are the next important questions to ask in this field and where will the inspiration come from? “We are inevitably entering the big data era in astrophysics in terms of both observations and simulations. We hope our tools help scientists squeeze the most information out of these new datasets, driving groundbreaking discoveries on the origin and evolution of galaxies and the nature of the mysterious dark matter and dark energy.“
More about the work of the Astroinformatics group at https://www.h-its.org/research/ain/, the paper by Kai Polsterer, Bernd Doser, Andreas Fehlner, and Sebastian Trujillo Gomez is available here: https://www.aimodels.fyi/papers/arxiv/spherinator-hipster-representation-learning-unbiased-knowledge-discovery.