« YARN » project

Brain data are routinely collected and analyzed in clinical and research work, a tedious exercise that requires money and time from qualified experts.

 

The analysis of medical images, and therefore brain imaging, is notoriously a task of expertise, making it difficult but possible to automate. Despite advanced recording techniques to match subject measurements, the automatic interpretation of medical imaging has seen a turning point with deep learning, enabling the growth of spin-off companies with medically validated products such as Avicenna.ai or Therapixel. For electroencephalography (EEG), for example, which is clinically relevant for monitoring patients in coma or anesthesia, but also for sleep medicine, but also for sleep medicine, this turning point is still awaited despite a very active research community.

Clinical data such as physiological signals collected from the brain are inherently unprocessed, noisy and messy compared to laboratory data. The YARN project will tackle this problem using robust statistics and transfer learning, culminating in an automatic data processing pipeline integrated into open science initiative.

Automated EEG analysis faces major problems: poor signal quality with missing data (disconnected disconnected electrodes, muscle artifacts, etc.), data scarcity due to limited acquisition times, intra- and inter-subject variability (both for the signal from hour to hour within a subject and between different subjects). Outside clinical contexts and in highly controlled research environments, advanced machine learning techniques for EEG data are already available, enabling high-precision classification and accurate prediction. These machine learning approaches have all been trained on clean datasets, acquired in controlled laboratory experiments with manual selection. These approaches are not yet suitable for processing raw clinical data, either because of intrinsic limitations (inability to handle poorly conditioned input matrices) or because of a lack of generalizability (dataset distribution mismatch, outliers and label noise).

 

Issues

If EEG ML tools are to make it out of the laboratory and be able to efficiently process clinical data, which is by nature messy and sparse, a number of problems still need to be resolved:

  • Poor signal quality: EEG signals of interest are mixed with various noises. They are mixed with other irrelevant brain signals, ocular and muscular artifacts, instrumental noise and so on;

  • Reproducibility and software availability: the literature on brain signal processing is dense, and the evaluation of ML algorithms is often obscured by partial benchmarks and cherry-picked datasets;

  • Code sources, where available, are entangled with task- and data-specific aspects, limiting their reusability.

 

Objectives

Based on the key issues identified, the following objectives are considered in this project:

  • Recovering information of interest: several theoretical contributions for SRS relying on robust estimators and geometry to characterize signal from noise: extending geometric models, exploiting robust statistics and designing SRS based on both robustness and geometry;

  • Reducing data dependence: using little or no catalogued data to solve problems of intra- and inter-subject variability: define a new recording method for different subjects and equipment, build a suitable feature space based on similarities between subject brainwaves and barycentric coordinates, use transfer learning to infer a model for subjects with limited labeled data ;

  • Open scientific platform: the application aspect consists in developing tools to increase reproducibility and simplify the use of automated processing of raw EEG data. The aim is to minimize the need for model parameterization, automate processing steps and enhance model explicability and explanatory visualizations.


ContactsSylvain Chevallier | Florent Bouchard | Frédéric Pascal | Alexandre Gramfort