COMPASS Talks

The Computing Platforms Seminar Series (COMPASS) is focused on talks by industry and academia around the general topic of computing platforms.


Databricks Lakeguard: Supporting Fine-grained Access Control and Multi-user Capabilities for Apache Spark Workloads

Abstract:

Enterprises want to apply fine-grained access control policies to manage increasingly complex data governance requirements. These rich policies should be uniformly applied across all their workloads. In this talk, we present Databricks Lakeguard, a unified governance system that enforces fine-grained data access policies, row-level filters, and column masks across all of an enterprise's data and AI workloads. To enforce user isolation, Lakeguard builds upon Spark Connect, a JDBC-like execution protocol, to separate the client application from the server and ensure version compatibility and leverages container isolation in Databricks' cluster manager to securely isolate user code from the core Spark engine. 
With Lakeguard, a user's permissions are enforced for any workload and in any supported language, SQL, Python, Scala, and R on multi-user compute. 
This work overcomes fragmented governance solutions, where fine-grained access control could only be enforced for SQL workloads, while big data processing with frameworks such as Spark relied on coarse-grained governance at the file level with cluster-bound data access.

Bio:

Stefania Leone is a Director of Product Management at Databricks working on Data Governance and the Databricks Runtime. She holds a PhD in Computer Science from ETH Zurich.
Martin Grund is a Principal Engineer at Databricks working on Data Governance and the Databricks Runtime. Martin has previously led the engineering for Amazon Redshift Spectrum and worked on Cloudera Impala. He holds a PhD in computer science from the Hasso-Plattner-Institute in Germany.


A novel preference-based query and an approach to chart the competitiveness of a dataset in the preference domain.

Abstract:

In this talk, we will first give an overview of the standard queries for multi-objective decision making, namely top-k and skyline queries, and list their individual shortcomings. Then, we will explore an approach that aims to bring the best from both worlds, based on a SIGMOD’21 paper and a TODS’25 article, and give an idea about the geometric nature of both the problem and its solution. We will then move on to the relevant problem of charting the competitiveness of a dataset with respect to different user preferences, based on a VLDBJ’24 article. Specifically, we will consider different measures of competitiveness and see how to (efficiently) represent the dataset’s competitiveness according to these measures in the form of a heat-map that covers the domain of possible user preferences.

Bio:

Kyriakos Mouratidis received his B.Sc. from Aristotle University of Thessaloniki in 2002 and his Ph.D. from Hong Kong University of Science and Technology in 2006, both in Computer Science. Currently, he is a Professor of Computer Science at Singapore Management University. His main research area is spatial databases, with a focus on continuous query processing, road network databases and spatial optimization problems. He has also worked on preference-based queries, wireless broadcasting systems, and outsourced database authentication. He publishes in the main venues for database research (e.g., SIGMOD, VLDB, TODS, VLDB Journal, etc.) and serves on program/organizing committees and editorial boards in the same community.

Past COMPASS talks

JavaScript has been disabled in your browser