Catégorie
DATAIA

🏆 Lauréats de l'AAP bourses Fellowships

🏆 Lauréats de l'AAP bourses Fellowships

🏆 Lauréats de l'AAP bourses Fellowships

  • Actualités sur le même thème

Partager

lkml
Découvrez les projets lauréats de l'appel à bourses Fellowships lancé par DATAIA en 2024.
Corps de texte

En 2025, l'Institut DATAIA soutient la recherche dans les sciences des données, l'IA et leurs impacts sur la société par le biais de son appel à bourses fellowships.

Clôturé le 20 janvier, l'appel 2024 a permis de retenir 5 projets. Cet appel vise à stimuler l'excellence de la recherche en IA et en science des données à l'Université Paris-Saclay. Le montant du financement alloué sera de 120 000 € et financera une thèse de doctorat ou un post-doctorant.

Nom de l'accordéon
COMPLY-LLM: Compliance and Large Language Models: Detecting Privacy and Copyright Violations
Texte dans l'accordéon

Nicolas Anciaux (INRIA) - Alexandra Bensamoun (Université Paris-Saclay, Laboratoire CERDI)


Large Language Models (LLMs), such as ChatGPT, are transforming industries with their advanced text generation and comprehension capabilities. However, their use raises significant legal and ethical concerns, particularly regarding privacy and copyright compliance. In the European Union, regulations like the General Data Protection Regulation (GDPR), the Digital Single Market (DSM) Directive, and the AI Act establish stringent data protection and intellectual property standards. Yet, the opaque nature of LLMs complicates the detection and remediation of potential violations, such as the unauthorized inclusion of personal or copyrighted data in training datasets. The COMPLY-LLM project addresses these challenges by developing innovative methodologies and tools to identify and mitigate privacy and copyright infringements in LLMs. Leveraging interdisciplinary expertise in law and computer science, the project will explore and enhance Membership Inference Attacks (MIA) used to detect sensitive or copyrighted data in training datasets. This involves refining MIA techniques for greater precision, extending their scope to privacy violations, and addressing biases in assessments. The ultimate goal is to create a GDPR and DSM compliant citizen-oriented tool that empowers individuals to verify (and act) upon potential data misuse. The project will confront difficult challenges, linked to the complexity of LLM architectures, dataset biases, and the integration of technical detection mechanisms with legal frameworks. COMPLY-LLM also supports an ethical and lawful adoption of AI across various sectors.

Nom de l'accordéon
Compression by change-point Analysis of Structured Time Series
Texte dans l'accordéon

Laurent Oudre (ENS Paris-Saclay)


Change point detection (CPD) involves finding the instants at which the generative model of observations in a time series changes. This is often part of a complex processing pipeline in numerous application domains, from neurology to industrial monitoring. CPD approaches are used to compress time series and perform frugal machine learning on signal representations with a low memory footprint. This subject has generated many contributions in recent decades. The amount of different algorithms and difficulty calibrating them on real-world data prevent their widespread use. Our overarching goal is to propose methodologies to help researchers in any field choose and calibrate a CPD method suited to their task. To this end, we propose in this thesis to integrate structure assumptions into the algorithms. These assumptions are of several types. Firstly, the space of observations can be non-Euclidean, modeled, for example, by a spatial graph (EEG sensor network) or a manifold (motion capture data). Secondly, the possible changes and transitions between segments can be constrained. For example, many physiological signals are quasi-periodic (ECG, walking, or breathing signals) or have known average segment durations (e.g., average activity duration when monitoring mice/rats). These types of information are generally easy to formulate, as they are intuitive and correspond to the expertise of the researchers. Incorporating this domain knowledge reduces the number of degrees of freedom, keeps estimators within plausible solution spaces, and simplifies algorithm calibration steps.

Nom de l'accordéon
GARGOUILLEURS - Graph Algorithms for Recurrent Groups Of Ubiquitous Inter-Linked Loop Elements Ushering Rice Stress
Texte dans l'accordéon

Alain Denise (Université Paris-Saclay) - Vladimir Reinharz (Université du Québec à Montréal - UQAM) - Roman Sarrazin-Gendron (Université du Québec à Montréal - UQAM)


The budget requested for this project will be dedicated to funding 18 months of salary for a Ph.D. thesis that will be co-supervised by the two partners. A cotutelle application will be submitted between the two universities: Université Paris-Saclay and Université du Québec à Montréal (UQAM), which is a partner of the IVADO consortium. Briefly, the 3-year project is devoted to the development of algorithms for the analysis, classification, prediction and elucidation of the function of complex structural motifs in RNA molecules, and their application to the study of structural modifications of RNA genes in a plant of great agri-food interest (rice) under various stress conditions, in partnership with an experimental biology team. We are organizing the fellowship around three main axes.

(1)    Classification and organization of over 150000 known complex motifs in RNA 3D structures
(2)    Development and application of a novel algorithm for their prediction in sequences
(3)    New method to evaluate motifs impact on crop resistance to stress.

Each of the three main objectives will require significant algorithmic and conceptual advances, both in terms of improving graph-based modelling of biological realities and designing more efficient search algorithms to parse structures and sequences. In addition, close collaboration with a team of experimental biologists will enable us both to validate our approach on new data and to make advances in our knowledge of stress resistance factors in a plant of great agri-food interest. An excellent student, doing a dual degree (M2) in Bioinformatics and Computer Science at Paris-Saclay, is highly motivated by the prospect of doing this thesis. She did a 4-month internship in M1 on a subject close to that of this application, and is preparing to do a 6-month internship starting at the end of February in the Paris-Saclay team in collaboration with the UQAM team.

Nom de l'accordéon
Riemannian Statistical Framework of Large-Scale fMRI Data for Alzheimer’s Disease Study
Texte dans l'accordéon

Bertrand Thirion (Inria Saclay–lle-de-France)


This proposal combines the strengths of the Riemannian statistical framework with advanced deep learning techniques to enhance the analysis of fMRI functional connectivity in large-scale datasets. A novel statistical analysis framework, rooted in the Riemannian paradigm, will be developed based on these learned representations. Beyond its primary focus, the methodology proposed in this study will apply to broader contexts, such as population studies, enabling exploration of brain correlates related to age- and sex-associated variability. This integrated approach aims to advance both methodological and practical aspects of fMRI-based research, while also providing valuable tools for the machine learning society at large.

Nom de l'accordéon
GenCaloSimExtreme : Generative models for fast Calorimeter Simulation with emphasis on Extremes Description
Texte dans l'accordéon

David Rousseau (IJCLab-Orsay)


Simulation is a mandatory element of High Energy Physics experiments. Focusing on experiments at the Large Hadron Collider at CERN, simulation need to provide event collisions in quantities commensurate to the recorded event collisions and reproducing them very accurately. One resource hungry ingredient is the simulation of particle impinging the calorimeter, creating a 3D image, the "shower". The existing simulator, Geant4, is precise but slow. A community is active on emulating Geant4 with modern NN simulators. However, no attention has been paid so far to extreme cases, i.e. the first and last 10^-4 quantile. A two year post-doc (already identified) will be hired to first evaluate state of the art models wrt to accuracy of extreme shower simulation. Then she will tune these models wrt to this new metric i) by manipulating the training dataset ii) introducing new HPO recipes 3) introduce a fine tuning inference step. This will be done in collaboration with Tau team at LISN. Finally the approach will be reiterated on a different physics use case, in collaboration with CEA/IRFU/ATLAS team.

Actualités sur le même thème