Below are listed internship opportunities currently offered by diiP. These offers are open to second year Master’s students.
Prediction of protein-carbohydrate binding sites using deep learning methods

Supervised by: Tatiana Galochkina (Université Paris Cité)

Description:  The Master student will work in the structural bioinformatics group DSIMB (2 Assistant Professors, 2 Associate Professors, 1 Researcher, 1 Research Engineer, 1 PostDoc, 5 PhD students). Our team has extensive expertise in methodological developments for structural bioinformatics problems such as: i) modelling and analysis of protein dynamics; ii) protein structure and dynamics prediction using
machine learning approaches; iii) development of databases and specific tools for a range of distinct protein families (among others: membrane proteins, camelid antibodies and small disulfide bridge proteins). DSIMB team is internationally recognized for the development of Protein Blocks (PBs), the most widely used structural alphabet in the world applied to analysis and prediction of local protein conformations. DSIMB has also participated in the international CASP 11 and 13 competitions and finished in top 10 for the difficult target category.
The Master student will work with Dr. Tatiana Galochkina in the framework of the SugarPred project funded by ANR. Dr. Galochkina is a specialist in molecular modelling of complex systems and in deep learning applied to the problem of structural bioinformatics. The student will be co-supervised by Dr. Aria Gheeraert, a PostDoc recruited for the same project.

How to apply: contact tatiana.galochkina[a]u-paris.fr

Further information can be found here

Deep learning to model genetic pleiotropy to understand the human genetic architecture

Supervised by: Marie Verbanck (Université Paris Cité)

Description:The internship will be dedicated to explore semi-supervised and supervised methods to classify the pleiotropy of genetic variants, using labeled pleiotropic data from the methods the team has been developing. In human genetics, and especially to study pleiotropy, the major issue is to obtain labeled data since the ground truth is unknown. However, it has been shown that semisupervised learning strategies have already been applied, with high gain in classification performance (Ratsaby and Venkatesh (1995), Cozman, Cohen, and Cirelo (2003)). Therefore, we have already developed a strategy to partially label genetic variants for pleiotropy using Gaussian Mixture models (Darrous, Mounier, and Kutalik 2021; Morrison et al. 2020). Thus, we will explore this first strategy of developing a semisupervised learning framework in case of Gaussian Mixture models. A second approach will explore supervised learning, namely Convolutional Neural Networks (CNN) that are commonly applied to analyze images. In CNN architecture, the receptive fields overlap with each other and do convolutions between the kernel and the data: this is analogous to the sliding window approach, a traditional method in genetics, with genomic intervals “sliding” across the genome. Furthemore, the block architecture of CNNs is comparable to LD-blocks (dependence structure between alleles), one of the major obstacle of mapping pleiotropy. The frameworks Keras and/or Tensorflow (reachable through R and Python) make powerful deep learning tools available, and will be mainly used to develop the framework.

Candidate requirements: 

  • will have a master of data science linked to statistics or artificial intelligence, candidates with more theoretical background however showing strong interest in life science applications are also welcome;
  • will be enthusiastic about transdisciplinary research and open science at the interface between data science and genetics;
  • will show a clear interest to use applied science methodology to benefit biological understanding;
  • will have good programming skills, preferentially R and/or Python;
  • can have a background in biology or genetics;
  • should be open-minded and willing to work as a team with other lab members;
  • will speak decent English since we are closely collaborating with Mount Sinai Hospital in New York City, USA.

How to apply: to apply, please send a concise email describing your research interests and experience as well as an up-to-date CV to marie.verbanck[a]u-paris.fr

Further information can be found here

OpenStreetMap and Sentinel-2 data for the automatic production of environmental indices for demographic studies

Supervised by: Sylvain Lobry (Université Paris Cité)

Description: The work to be conducted during the proposed M2 internship will lead to the following three contributions: • Contribution A: Development of a model to classify LCZs using high quality OSM data. This rule-based model will allow to better understand the LCZ classification scheme. Furthermore, it will provide a baseline to the multi-modal methods to be developed during the internship. • Contribution B: Multi-modal models for LCZ classification Using an already trained deep-learning based method to classify LCZs, we will study different fusion mechanisms (including late fusion, rule based fusion) to integrate the information from the rule-based model. Furthermore, we will develop an end-to-end deep learning based model taking rasterized OSM and Sentinel-2 data as an input. These methods will be compared and evaluated in Ouagadougou, Burkina Faso and Antananarivo, Madagascar. • Contribution C: Link with demographic studies and writing of the master thesis The obtained results will be linked to demographic data in the two previously mentioned regions to better understand the underlying geo-spatial components in population studies. These results will be compared with a baseline developed during the PhD of Basile Rousse.

Candidate requirements:  We are looking for a student in Master 2 or final year of MSc, or engineering school in computer science. The ideal candidate would have knowledge in image processing, computer vision, machine learning, geo-information sciences and Python programming and an interest in handling large amount of data, remote sensing and demography. An experience in statistical data analysis would be a plus.

How to apply: please send a cover letter and a CV to stage-diip[a]listes.ined.fr. You will receive a confirmation by email. The position is open until filled.

Further information can be found here

Enhancing earthquake location with domain adaptation

Supervised by: Léonard Seydoux (Université Paris Cité)

Description:This work aims to correct the systematically biased hypocenters obtained with a permanent seismic array from the hypocenters inferred with a temporary array with an adequate geometry, as illustrated in the figure below. We consider the case of Mayotte to develop the method and show the potential outcomes on other datasets of interest. We will learn the catalog bias from the events detected with the trusted array over five weeks and test the prediction quality over one week. Once successful, we will deploy the technique over several years of continuous data at Mayotte and other contexts.

Candidate requirements: We seek candidates with a strong taste for programming, seismology, and inverse problemsolving. A motivated candidate for learning about and applying artificial intelligence techniques is strongly preferred. The target programming language is Python, although we are open to other suggestions. We will also use the scikit-learn library or the PyTorch framework to develop the strategy.

How to apply: please send a cover letter and a CV to seydoux[a]ipgp.fr

Further information can be found here

À lire aussi

Query analytics in Cypher

2021 PhD/ DIAI Projects @ED 386 : Sciences Mathématiques de Paris Centre PhD student: Alexandra ROGOVA (IRIF, DI ENS)Supervisors: Amélie GHEERBRANT (IRIF, UPC), Leonid LIBKIN (Laboratory for Foundations of Computer Science, University of Edinburgh) ...