2022
Masters Projects
@Computer Science
+Mathematics/Statistics
+Biology
#Big data
#RNA-seq
#computational optimization
#GPU parallelization
#differential analysis
Project Summary
Next-generation sequencing such as RNA-seq aims to quantify the transcriptome of biological samples and compare gene expression between different experimental conditions. The quantification of the genome alignements stemming from such technologies represent the relative measurements which cannot be directly compared between conditions without an adequate data normalization. The optimal approach to normalize such data has not reached a consensus to date (Abrams et al. 2019). Unfortunately, existing methods suffer from practical limitations and may be compromised by the presence of genes showing high expression level or strong variability. In this case a single normalization procedure can lead to erroneous results and false conclusions. Therefore, a novel statistical framework for differential analysis in transcriptomics has been proposed (Desaulle et al. 2021) which is based on intensive iterative random data normalizations and provides good control of the statistical errors. At present, it has been implemented in the R package DArand (Desaulle and Rozenholc 2021) and is publicly available from the Comprehensive R Archive Network. The current package is written in R language and uses only CPU parallelization. Due to the large data size and the framework based on intensive iterative randomizations, further project development requires more advance programming. More precicely, the iterative procedure uses intensive computations and may become rapidly time-consuming with respect to both the size of the transcriptomic experiment and the number of samples. Therefore, the main mission during the internship will consist in adapting the code for efficient parallel processing on a graphic processing unit (GPU) using CUDA.The computational optimization will play an important role in further methodological development. Indeed, the subsequent contribution will aim at extending the methodology from two to more biological conditions. It will be directed towards statistical analysis with more than two conditions such as differential analysis, principal component analysis (PCA) and more generally unsupervised learning tools. Here the difficulty will be to preserve an iterative structure of the procedure with data normalization and while combining results from different approaches in data analysis. The methodological aspects, the implementation and the validation will be followed by the real-data application involving the miRNA data.
Dorota Desaulle
Projects in the same discipline
Diffusion Models Based Visual Counterfactual Explanations
2024 Masters Projects @Computer Science #Visual counterfactual explanations #Diffusion Models #Identification of subtle phenotypesProject Summary to be updated Valerie MezgerProjects in the same discipline
OpenStreetMap and Sentinel-2 data for the production of environmental indices for demographic studies
2023Masters Projects@Computer Science +Demography #Remote sensing#Demography#Deep learning#Sentinel 2#OpenStreetMap#Local climate zones#Africa Project Summaryto be updated. Sylvain Lobry Projects in the same discipline
Diffusion Models Based Unpaired Image-to-Image Translation to Reveal Subtle Phenotypes
2023Masters Projects@Computer Science +Mathematics/Statistics+Biology+neurodevelopment #Image-to-image translation#Deep generative models#Diffusion models#Subtle Phenotypes#Neurodevelopment Project SummaryUnpaired image-to-image translation methods aim at learning a...
Generalization of a method enabling to update vineyard geographic databases from satellite data
2023Masters Projects@Computer Science +Earth Sciences/Geosciences #image time series analysis#deep learning#optical satellite imagery#agriculture monitoring#crop type mapping#vineyard#VENUS images Project Summaryto be updated. Camille Kurtz Projects in the same...