Astronomical Data Processing Using SciQL, an SQL Based Query Language for Array Data

Authors: 
Ying Zhang, Bart Scheers, Martin Kersten, Milena Ivanova, Niels Nes
Presentation Date: 
Sunday, 6 November, 2011

This talk has been given at the Astronomical Data Analysis Software and Systems (ADASS) XXI on November 06-10, 2011 in Paris, France by Ying Zhang (CWI).

The ever growing use of high precision experimental instruments in astronomical projects, e.g., SDSS, LSST and LOFAR, amounts to an avalanche of data to be stored, curated and analysed. Ingestion of gigabytes and even terabytes of data on a daily basis is taking place in many projects, while planned experimental devices are expected to scale ingestion up to petabytes soon. Efficient data management as part of a data exploration infrastructure has become a discriminative factor for scientific progress. Relational database management systems (RDBMSs) are the prime means to fulfill the role of application mediator for data exchange and data persistence.

Nevertheless, scientific applications are still poorly served by contemporary RDBMSs. At best, the system provides a bridge towards an external library using user-defined functions, explicit import/export facilities or linked-in Java/C# interpreters. To bridge the gap between the needs of the data-intensive scientific research fields like astronomy and the current DBMS technologies, we introduce SciQL (pronounced as `cycle'), the first SQL-based query language for scientific applications with both tables and arrays as first class citizens. SciQL provides a seamless symbiosis of array-, set-, and sequence- interpretation. A key innovation is the extension of value-based grouping in SQL:2003 with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between the dimensional attributes of array cells. This leads to a generalization of window-based query processing with wide applicability in science domains.

In this talk, I will demonstrate the usefulness of SciQL for astronomical data processing with examples from transient radio phenomena detection. The Transients Key Science Project (KSP) of the LOw Frequency ARray (LOFAR) focuses on exploring and understanding the explosive and dynamic universe by observing transient and variable radio sources. With its 2688 dipoles (LBA), 200 MHz sampling, 2 polarisations and 12 bit digitisation, the LOFAR antennas are capable of producing 138 petabyte of raw data per day. To process this massive volume of data, extraordinarily efficient query plans and algorithms are a must. The key operations in the Transient KSP project include cross-catalogue correlation and radio pulsars detection. For traditional RDBMSs, they are extremely hard to express in SQL and optimise for query execution. With SciQL, however, such array data oriented operations can be expressed easily and concisely. Furthermore, by revealing the properties of array data, SciQL enables the potentials of the RDBMSs to better optimise query plans.

 

Partners

People

Alexander Marchuk
A.P. Ershov Institute of Informatics Systems
Alice Carpentier
Semantic Technology Institute, University of Innsbruck
Alina Dia Miron
Recognos Romania
Andreas Harth
AIFB Institute, Karlsruhe Institute of Technology
Anna Fensel
Semantic Technology Institute, University of Innsbruck
Barry Norton
AIFB Institute, Karlsruhe Institute of Technology
Benedikt Kämpgen
AIFB Institute, Karlsruhe Institute of Technology
Carlos Juiz
Universitat de les Illes Balears
Carolina Fortuna
Jozef Stefan Institute
Chris Bizer
Freie Universität Berlin
Daniel Fuleki
StrateGO Hungary - Creative Media Innovation Cluster
Daniele DellAglio
CEFRIEL
David Norheim
Computas
Dieter Fensel
Semantic Technology Institute, University of Innsbruck
Dumitru Roman
Stiftelsen SINTEF
Elena Simperl
AIFB Institute, Karlsruhe Institute of Technology
Francois Scharffe
University of Montpellier
Frank van Harmelen
Vrije Universiteit Amsterdam
Freddy Priyatna
Universidad Politécnica de Madrid
Giorgos Flouris
Foundation for Research and Technology Hellas
Graham Hench
Semantic Technology Institute International
Grigoris Antoniou
Foundation for Research and Technology Hellas
Ioana Ciuciu
Semantics Technology and Applications Research Laboratory
Irini Fundulaki
Foundation for Research and Technology Hellas
John Domingue
The Open University
Karl Aberer
Ecole Polytechnique Fédérale de Lausanne
Leonel Ruiz Miyares
Centre for Applied Linguistics
Lyndon Nixon
Semantic Technology Institute International
Marko Grobelnik
Jozef Stefan Institute
Marta Corubolo
CEFRIEL
Martin Kersten
Centrum Wiskunde & Informatica
Neil Chue Hong, EPPC
University of Edinburgh
Oscar Corcho
Universidad Politécnica de Madrid
Pablo Mendes
Freie Universität Berlin
Paolo Bouquet
Università degli Studi di Trento
Peter Mika
Yahoo Research Barcelona
Rajendra Akerkar
Western Norway Research Institute
Roberto García
Universitat de Lleida
Simeona Pellkvist
Semantic Technology Institute International
Simone Contessa
CEFRIEL
Snorri Gudmundsson
IceStat
Stefano Fumeo
CEFRIEL
Steffen Stadtmuller
AIFB Institute, Karlsruhe Institute of Technology
Thomas Bauereiss
Semantic Technology Institute, University of Innsbruck
Ying Zhang
Centrum Wiskunde & Informatica
York Sure
Leibniz Institute for the Social Sciences
Zoltan Miklos
Ecole Polytechnique Fédérale de Lausanne