Funded projects in 2021

In November 2020 the Data Intelligence Institute of Paris (diiP) has selected 16 interdisciplinary projects that use data science and machine learning to be funded from January to December 2021: 13 master-level internships and 3 strategic projects. Discover more about these projects below.

Strategic Projects

Computer Science

Digital Pathology: when AI meets with anatomo-pathology

Other relevant disciplines: Engineering, Biology, Medicine

The fundamental question we address is: How to predict immunotherapy-related gene classes of tumor just by the phenotypic observation of tissues within Whole Slide Images (WSI) of digitized tissue? Or in other words, how to use multi-omic information (WSI, genomic, clinical, …) to improve health care. We leverage on a pending patent and a strong network of physicians to tackle this challenge combining computer vision and machine learning at a very high level of data annotation and processing.

Key words: Digital Pathology, Immuno-therapy, Computer Vision, Deep Learning

Project coordinator: Nicolas Loménie (Université Paris Cité)

Earth Sciences and Geosciences

PARKER — Planetary lidAR seeKing for lifE signatuRe

Other relevant disciplines: Mathematics/Statistics, Physics/Astronomy

The topography of planets bear the imprint of the processes that shape it. Deciphering the ”text” encoded in topography has been the goal of scholars for centuries (Darwin, Gilbert, others). For scholars looking into space, topography is a window in the processes that shape alien worlds. Here on Earth, it offers clues to how our planet might be different, and if so why. Despite centuries of theory that the abundance of life is shaping the surface of our planet (Darwin, 1881), the study of topography has yet to uncover a unique signal on Earth. Indeed as recently as 2006, Dietrich and Perron who asked the question is there a topographic signature of life? concluded that none could be detected from terrestrial data available back then. Today, with the revolution in Earth observations from space, we can approach this question with fresh eyes and extend it to other bodies of the Solar System. Doing so requires the interpretation of a wide range of data sets (including planetary LidAR for the Earth, Mars and the Moon), and cutting edge mathematical approaches to their integration. This integration and interpretation are the focus of this proposal shared by Earth scientists of IPGP and Mathematicians of MAP5.

Key words: LiDAR, surface of planets, image processing, geomorphology, critical zone, biomass, signal processing, aliasing

Project coordinator: Antoine Lucas (Université Paris Cité)

Medicine

Autoimmunity/inflammation Through RNAseq Analysis at the single Cell level for Therapeutic Innovation – ATRACTion

Other relevant disciplines: Computer Science, Biology

Primary Immune Deficiencies (PIDs) comprise different monogenic diseases related to developmental or functional dysfunction of one or several immune cell types. Autoimmunity/inflammation in PIDs can arise from different causes and its onset and presentation is highly unpredictable. As classical approaches to cellular and molecular analysis have many limitations, we propose a transversal project that uses single-cell technologies combined with multi-OMICS analysis in a spectrum of genetically well-defined PIDs resulting in autoimmunity/inflammation. The interpretation of the massive and complex data collected will be assisted by machine learning-driven network inferences and, therefore, enable the identification of personalised disease signatures. Our goal is to develop ground-breaking and transformative diagnostic and prognostic tools and to define personalised therapeutic approaches for pediatric patients suffering from autoimmunity/inflammation linked to PIDs.

Key words: Single Cell, Omics, Data Integration, Primary Immune Deficiencies, Personalized Medicine

Project coordinator: Mickaël Ménager (Université Paris Cité)

Masters Projects

Biology

Inferring cultural transmission of reproductive success through machine learning methods

Other relevant disciplines: Computer Science, Mathematics/Statistics, History

Cultural transmission of reproductive success has been observed in many human populations. This non-genetic transmission affects the genetic evolution of populations, yielding a decrease in genetic diversity and an increase of severe genetic diseases. The aim of this project is to develop methods based on Approximate Bayesian Computation and Machine Learning to infer this transmission from a sample of whole genomes, using simulated data as training sets. The intern will develop these methods and perform a cross-validation study on them. S/He will then apply the best methods to real data from the Saguenay Lac Saint-Jean in Quebec, where the occurrence of cultural transmission of reproductive success has already been shown using demographic data. The methods will then be applied to several populations from the 1000 Genome database with contrasted lifestyles (e.g. farmers and herders). This will allow, therefore, assessing the generality of cultural transmission of reproductive success in human populations.

Key words: Bioinformatics, Population Genomics, High computationnal statstical methods, Population history

Coordinator: Frédéric Austerlitz (Université Paris Cité)

Computer Science

Large image time series analysis for updating vineyard geographic databases

Other relevant disciplines: Earth Sciences/Geosciences

The available geographic databases, particularly in the agricultural landscapes, contain important information at parcel resolution for crop type detection (crop type maps) and monitoring (the RPG database). This database is mostly completed and updated by annual declarations of farmers within the framework of the Common Agricultural Policy (CAP) in Europe. Because of their manual aspect, these declarations may contain errors or inaccuracies and they are not exhaustive. Currently, thanks to the increasing availability of high spatial resolution satellite imagery and their accessibility via European programs, it is becoming possible to use these series of images to update this type of geographic database or to check the consistency of the data through an exploitation of the visual content of the images or of features extracted from the time sequence, more frequently. In particular, satellite image time series (SITS) make it possible to study from 2D+t imaging data the spatiotemporal evolutions of the territory, which may for example indicate a change in management of the cropping system. The objective of this internship is to use deep learning image analysis methods to deliver up-to-dated geographic vineyard databases in a timely and accurate manner over large areas and via an automatic analysis of SITS thanks to spatio-temporal image representations.

Key words: image time series analysis, deep learning, data indexation, optical satellite imagery, agriculture monitoring, crop type mapping, vineyard, VENUS images

Project coordinator: Camille Kurtz (Université Paris Cité)

Smoothing of incomplete air pollution regions of interest from satellite observations

Other relevant disciplines: Physics/Astronomy

Environmental questions such as atmospheric pollution and climate change are key issues for our modern society to live in a more sustainable world. Research in this field is to provide predictions and spatial descriptions of the various pollutants from the urban to the continental and the global scales. The LISA lab is one the world leading groups in satellite remote sensing of tropospheric composition, develops several satellite retrieval algorithms to determine ozone and aerosols atmospheric distributions from satellite observations and now has over a decade of satellite images covering urbanized regions. The LIPADE lab has a strong experience in the design of algorithms dedicated to the analysis of satellite image time series thanks to image analysis and AI paradigms. The motivation of this internship is to explore an atmospheric data processing chain strategy, to make it faster, more objective and easily reproducible at a larger scale to deal with huge volumes of data.

Key words: image analysis, segmentation, air pollution, ozone, satellite imaging

Project coordinator: Laurent Wendling (Université Paris Cité)

Combining visual and textual informations for enhancing image retrieval systems in radiological practices

Other relevant discipline: Medicine

The field of diagnostic imaging in Radiology has experienced tremendous growth both in terms of technological development (with new modalities such as MRI, PET-CT, etc.) and market expansion. This leads to an exponential increase in the production of imaging data, moving the diagnostic imaging task in a big data challenge. However, the production of a large amount of data does not automatically allow the real exploitation of its intrinsic value for healthcare. In modern hospitals, all imaging data acquired during clinical routines are stored in a picture archiving and communication system (PACS). A PACS is a medical imaging technology providing economical storage and convenient access to images from multiple modalities. Digital images linked to patient examinations are often accompanied by a medical report in text format, summarizing the radiologist’s report and the clinical data associated with the patient (age, sex, medical history, report of previous examinations, etc.). The problem with PACS systems is that they were primarily designed for archival purposes and not for image retrieval exploitation. Therefore they only allow a search by keywords (name of the patient, date of the examination, type of examination, etc.) and not by pathologies or by content of the image, and they cannot fulfill the function of diagnostic aid when the doctor is confronted with an image of difficult interpretation or of rare pathology. The objective of this project is to combine current research in computer vision and AI to implement a method making it possible to query PACS through example images in order to search for images containing similar pathological cases and to benefit radiologists as a potential decision-making aid during hospital routines.

Key words: medical imaging, computer vision, content-based image retrieval, data intelligent search, deep learning, fusion of image and text, MRI

Project coordinator: Florence Cloppet (Université Paris Cité)

Automatic production of environmental indicators from freely available remote sensing data: from a global to a local scale

Other relevant disciplines: Earth Sciences/Geosciences, Demography

This collaborative project aims at studying the feasibility of automatically producing repeatable indicators from remote sensing data in Africa to allow for spatially complete and temporally up-to-date information. To this effect, we will use freely available Sentinel 2 images (produced by the European Space Agency) to produce standardised environmental indicators, in the form of local climate zones. The student will study the relevance of a convolutional neural network-based method for this task. She will also explore the possibility of embedding spatio-temporal relations in such a model, and quantify the benefits. Finally, a reflection on the relevance of these results for demographic studies will be conducted, as well as a graphical user interface allowing to produce such indicators given a remote sensing image.

Key words: Remote sensing, Deep learning, Sentinel 2, Local climate zones, Africa

Project coordinator: Sylvain Lobry (Université Paris Cité)

Digital Pathology: when AI meets with anatomo-pathology

Other relevant disciplines: Mathematics/Statistics, Engineering, Medicine

The fundamental question we address is: How to predict immunotherapy-related gene classes of tumor just by the phenotypic observation of tissues within Whole Slide Images (WSI) of digitized tissue? Or in other words, how to use multi-omic information (WSI, genomic, clinical, …) to improve health care. We leverage on a pending patent and a strong network of physicians to tackle this challenge combining computer vision and machine learning at a very high level of data annotation and processing.

Key words: Digital Pathology, Computer Vision, Deep Learning, Immuno-therapy

Project coordinator: Nicolas Loménie (Université Paris Cité)

Earth Sciences and Geosciences

Machine learning model of volcanic lava properties helps understanding the dynamics of volcanic eruptions

Other relevant disciplines: Chemistry

How do molten silicate melts move? How do they exchange heat with their surrounding? How do they crystallize? These fundamental questions underpin many practical problems, including the dynamics of volcanic eruptions, the formation of rocks, and the manufacturing of novel technical glass, ceramic, and glass-ceramic materials. Addressing them requires knowledge of different physical properties, such as viscosity, which are ultimately governed by the liquid composition and its associated atomic/ionic structure. At present, no general model allows inferring this information.

This project supports the development of a novel intelligent model that combines deep neural networks with thermodynamic theories to predict the properties and structure of glass-forming oxide melts. We developed a first version of this model. It predicts multiple properties of simple melts with a few oxide components. This project will allow the development of our model for more complex geologic and industrial compositions. This novel, intelligent model has the potential to become a reference for volcanologists and glass scientists, because it will allow tackling various problems, such as how small changes in the composition of lavas can trigger explosive volcanic eruptions, or how controlling nanostructures in glasses can foster the development of shatterproof cellphone screen glasses.

Key words: melt, glass, material, properties, neural networks, thermodynamics

Project coordinator: Charles Le Losq (Université Paris Cité)

Linguistics

ComplexNeuroViz: Complexity Visualisation for Neural Machine Translation

Other relevant discipline: Computer Science

This project aims at improving visualisation of neural networks when they process linguistic data in the context of neural translation. The analysis of neural networks with visualization toolkits mainly focused on the activation of layers and neurons for classification tasks. Compared to the two state-of-the-art visualization toolkits for neural translation, our major contribution consists in adding one currently neglected set of features in Feature Visualization: linguistic complexity. The aim of this project is to adapt current neural complexity. The aim of this project is to adapt current neural visualization toolkits to visualize and diagnose attention weights and linguistic complexity metrics in order to analyse the effects of the complexity of the linguistic input at token level, at phrase level and at sentence level.

Key words: XAI (Explainable Artificial intelligence), neural networks, neural machine translation, dataviz, linguistic complexity

Project coordinator: Nicolas Ballier (Université Paris Cité)

Mathematics and Statistics

Machine learning for the study of EEG data recorded during general anesthesia

Other relevant disciplines: Engineering, Medicine, Neuroscience

General Anesthesia (GA) is a drug-induced, reversible condition with three commonly accepted goals: lack of experience of surgery, nociceptive blockade and immobility for the needs of surgery. In 2010, 11.3 millions of anesthesia procedures were performed in France. However, despite numerous progresses in the understanding of GA mechanisms, some questions remain unanswered like the precise mechanisms of awakening after a GA, the long term effects of anesthesia or the common pathway of all anesthetics. The aim of this project is to use electroencephalogram (EEG) signatures and machine learning tools in order to better understand the phenomena involved during awakening from GA. In particular, the goal of our study is to precisely quantify GA recovery in order to further investigate its nature and time course, which remains debated. GA recovery is often considered as a passive process: as the anesthetics are eliminated, the reactivation of the neuronal circuits affected by GA would simply mirror their deactivation. But is it true? Could we find some traces of GA in the EEG of patients, even several hours after they woke up?

Key words: anesthesia, EEG, classification

Project coordinator: Laurent Oudre (Université Paris Cité)

Influence of blood pressure and aqueous humor dynamics on the response to glaucoma medication: a data-driven computational study

Other relevant disciplines: Computer Science, Mathematics/Statistics, Engineering, Medicine

Intraocular pressure (IOP) is the pressure created by the fluids within the eye. Elevated IOP is clinically referred to as ocular hypertension and it represents a major risk factor for irreversible vision loss, as in glaucoma. However, the establishment of optimal IOP levels for patients is still controversial. Another meaningful open question is related to what determines the efficacy of IOP-lowering medications. Motivated by this context, the intern will work on the construction and analysis of an enhanced data set, obtained by combining a clinical data set from the Indianapolis Glaucoma Progression Study and a novel physically-based simulated data set. The objective is to address the specific question of quantifying the impact of systemic factors (such as blood pressure) and local factors (such as production and drainage of aqueous humor) on the response to IOP-lowering medication in different conditions of clinical interest, thereby enabling more effective, personalised patient care.

Key words: data-driven computational study, mathematical modeling, glaucoma, enhanced data set

Project coordinator: Marcela Szopos (Université Paris Cité)

Neuroscience

Machine Learning techniques applied to eye movement analysis for early screening of learning disorders in young children

Other relevant disciplines: Computer Science

There is a longstanding controversy about the existence of eye movement disorders and their role in dyslexia and school learning disorders. Using REMOBI&AIDEAL innovations the CNRS IRIS Laboratory headed by Zoï Kapoula conducted a critical study in dyslexic and non-dyslexic teenagers. The study confirms intrinsic eye movement disorders particularly of their binocular coordination in dyslexia while testing with the REMOBI embedded device  i.e. not related to reading (see Ward & Kapoula, Scientific Reports, Nature, 2020. The present proposal concerns  a large scale study at nursery and primary school children (5 to 7 years old). Binocular eye movements will be collected and analyzed to assess and treat  preventively eye movement irregularities.

Key words: eye movement, learning disorders, machine learning

Project coordinator: Zoï Kapoula (Université Paris Cité)

Physics and Astronomy

Artificial Intelligence for source deblending in the next generation of astrophysical big data imaging surveys – Combining Euclid and LSST

Other relevant disciplines: Computer Science, Mathematics/Statistics, Physics/Astronomy

Astronomy, as many other disciplines, is entering a big data era. The next generation of space (e.g. Euclid) and ground based (e.g. LSST) imaging survey will produce images of billions of galaxies at unprecedented depth. This new regime requires revisiting the existing methods to process the data both in terms of speed and accuracy in order for these surveys to achieve the main scientific goals. A particularly important source of bias is the one caused by overlapping sources (or blending) in the 2D projected plane of the sky. Galaxies which are at very different distances end up blended together. Built on previous synergic experiences of two research groups, this project explores the use of state-of-the art Artificial Intelligence techniques for deblending of galaxy images. We will in particular explore synergies between LSST and Euclid.

Key words: cosmology, astrophysics, probabilistic deep learning, image processing

Project coordinator: Marc Huertas-Company (Université Paris Cité)

Malvasia – MAchine Learning to VAlue Single Interferometer Analysis

Other relevant disciplines: Computer Science, Mathematics/Statistics, Physics/Astronomy

The direct observation of gravitational waves (GW) by the LIGO and Virgo detectors is one of the breakthrough discoveries of the beginning of the 21st century. However, the searches for GW transient sources are mainly limited by non-Gaussian transient noise artefacts coming from a wide variety of provenances. Statistical modelling of these “instrumental glitches” has not been feasible, so far, because they vary widely in rate, duration, frequency range and morphology. Their contamination can be partially mitigated by requesting temporal coincidence in two or more detectors as their accidental co-occurrence probability is low. When only one detector is operating this strategy cannot be used. The aim of this project is to use deep learning algorithms to separate the glitches from the astrophysical signal, focusing in particular to periods when only one detector is taking data.

Key words: gravitational wave, deep learning

Project coordinator: Agata Trovato (Université Paris Cité)

À lire aussi

diiP Summer School: June 10-14, 2024

diiP Summer School: June 10-14, 2024

The diiP is organizing a Summer School on Data Science (with a focus on deep learning data analytics techniques), on Jun10-14. Read the details below, and register now! The first diiP Summer School on Data Science (dSDS) will be held...

diiP Projects Day: December 6th, 2023

diiP Projects Day: December 6th, 2023

Join us for the diiP Projects Day, an in-person event that will highlight past and upcoming projects, offer opportunities for discussions and networking, and host Prof. Joseph Sifakis (Turing Award winner, 2007) for the last Distinguished Lecture of 2023....