2021

PhD/ DIAI Projects

@ED 386 : Sciences Mathématiques de Paris Centre

PhD student
Alexandra ROGOVA
(IRIF, DI ENS)

 

Supervisors
Amélie GHEERBRANT (IRIF, UPC),
Leonid LIBKIN (Laboratory for Foundations of Computer Science, University of Edinburgh)

 

Project Summary

(Former title: Query analytics in Cypher)

In a graph database, data is stored as a “graph”, or a collection of nodes, representing the entities, connected together by edges, representing relationships between the entities. Additional information about the entities and their relationships can be attached to the nodes and edges. This powerful model, popularized by Neo4j in 2010, is now a staple in various industries such as biology, social networks and banking. By putting the links between the data points front and center, graph databases allow users to reason not only about individual elements but also about the structure of the graph. Accordingly, the goal of a typical graph query is to find a “path” connecting specific nodes.

Because graph traversals inherently rely on transitivity, traditional query languages are not suitable in the graph context, and thus new languages have been created. In the theoretical community, the basic building blocks of a graph language are the Regular Path Queries (RPQs), which define path constraints by way of regular expressions. The expressive power and complexity of RPQs and their extensions (by union, conjunction, two-way navigation, data value comparisons and path properties for example) have been studied since the 1990s but their properties are barely beginning to be understood.

In 2019, a new International Organisation for Standardization (ISO) group was created to oversee the standardization of practical graph languages. This led to two new standards: SQL/PGQ and GQL. The idea of SQL/PGQ is to add a view-based pattern matching mechanism to SQL and interpret the relational
data as a graph only when necessary, whereas GQL is standalone and stores the data as a native graph. While this standardization work is a step in the right direction, there is one crucial ingredient missing: a corresponding theoretical model.

The goal of this project is to define a theoretical language for graph databases, akin to relational algebra for SQL, that reflects the essential aspects of GQL and SQL/PGQ while being simple enough for theoretical study.

 

Bienvenue (irif.fr)

 

Other projects