• Raul Castro Fernandez
  • March 6, 2024, at 4 PM
  • Room Turing Conseil (7th floor, Universite Paris Cite, 45 rue des Saints Peres, Paris 75006) & Online (Zoom)

 

Prof. Raul Castro Fernandez
(University of Chicago)

In my research, I ask what is the value of data and explore the potential of data markets to unlock that value. My group collaborates with economists, legal scholars, statisticians, and domain scientists. We build systems to share, discover, prepare, integrate, and process data.

I have traditionally worked on distributed query processing systems and continue to do so. I have received a SIGMOD’23 Test-of-time-Award. I am an assistant professor in the Department of Computer Science and on the Committee of Data Science at The University of Chicago. Before UChicago, I did a postdoc at MIT with Sam Madden and Mike Stonebraker. And before that, I completed a PhD at Imperial College London with Peter Pietzuch.

Abstract

Data shapes our social, economic, cultural, and technological environments. Data is valuable, so people seek it, inducing data to flow. The resulting dataflows distribute data and thus value. For example, large Internet companies profit from accessing data from their users, and engineers of large language models seek large and diverse data sources to train powerful models. It is possible to judge the impact of data in an environment by analyzing how the dataflows in that environment impact the participating agents. My research hypothesizes that it is also possible to design (better) data environments by controlling what dataflows materialize; not only can we analyze environments but also synthesize them.

In this talk, I present the research agenda on “data ecology,” which seeks to build the principles, theory, algorithms, and systems to design beneficial data environments. I will also present examples of data environments my group has designed, including data markets for machine learning, data-sharing, and data integration. I will conclude by discussing the impact of dataflows in data governance and how the ideas are interwoven with the concepts of trust, privacy, and the elusive notion of “data value.” As part of the technical discussion, I will complement the data market designs with the design of a data escrow system that permits controlling dataflows.

 

Other distinguished lectures