Paper "Addressing the Nested Data Processing Gap: JSONiq Queries on Snowflake through Snowpark" accepted at ICDE'24

The following paper has been accepted at the 40th IEEE Conference on Data Engineering
(external pageICDE'24) to be held in Utrecht, the Netherlands on the 13-17 May 2024. This work has been done in collaboration with external pageSnowflake.

Title
Addressing the Nested Data Processing Gap: JSONiq Queries on Snowflake through Snowpark

Authors
Dan Graur (ETH Zürich), Remo Röthlisberger (ETH Zürich), Adrian Jenny (ETH Zürich), Ghislain Fourny (ETH Zürich), Filip Drozdowski (Snowflake), Choden Konigsmark (Snowflake), Ingo Müller (ETH Zürich), Gustavo Alonso (ETH Zürich) 

Abstract
The paper addresses an important issue that has occurred in semi-structured data processing, namely, despite the ever-increasing volume of nested data, no suitable querying solution exists that provides both high performance and an adequate query language that can effectively express nested data queries. Practitioners must either use relational systems with SQL, gaining high performance and scalability at the cost of an inadequate and sometimes insufficiently expressive SQL dialect, or NoSQL systems that offer suitable query languages, but have suboptimal performance. This work proposes a translation layer that can convert JSONiq, a NoSQL query language for semi-structured data, to Snowflake SQL, achieving on-par performance with reference hand-written queries for both nested and relational workloads. This gives practitioners a solution that yields state-of-the-art query performance without compromising on usability and query expressivity.

JavaScript has been disabled in your browser