Bandeau image
The « YARN » project

The « YARN » project

The « YARN » project

  • The project
  • Contacts
YARN : Automatic Processing of Messy Brain Data with Robust Methods and Transfer Learning
The project
Corps de texte

Brain data is commonly collected and analyzed in clinical and research work is a tedious exercise that requires money and time from qualified experts. The analysis of medical images, and therefore brain imaging, is notoriously a task of expertise that makes it difficult but possible to automate. Despite advanced recording techniques to match subject measurements, the automatic interpretation of medical imaging has seen a turning point with deep learning, allowing the growth of spin-off companies with medically validated products such as or Therapixel. For electroencephalography (EEG), which is for example clinically relevant for monitoring patients in coma or anesthesia, but also for sleep medicine, this turning point is still awaited despite a very active research community.

"Clinical data such as physiological signals collected from the brain are inherently unprocessed, noisy and disordered compared to laboratory data. The YARN project will address this problem through robust statistics and transfer learning, leading to an automatic data processing pipeline integrated into open science initiative."

Automated EEG analysis faces major problems: poor signal quality with missing data (disconnected electrodes, muscle artifacts, etc.), data sparsity due to limited acquisition times, intra- and inter-subject variability (both for the signal from hour to hour within a subject and between different subjects). Outside of clinical settings and in highly controlled research environments, advanced machine learning techniques for EEG data are already available and allow for high precision classification and accurate prediction. These machine learning approaches have all been trained on clean data sets acquired in controlled laboratory experiments with manual selection. These approaches are not yet suitable for processing raw clinical data, either due to intrinsic limitations (inability to handle poorly conditioned input matrices) or lack of generalizability (dataset distribution mismatch, outliers and label noise).


For EEG ML tools to move out of the lab and be able to effectively process clinical data, which is inherently messy and sparse, some issues still need to be resolved :

  • Poor signal quality : EEG signals of interest are mixed with various noises. They are mixed with other irrelevant brain signals, ocular and muscle artifacts, instrumental noise, etc. In addition, accurate labeling of clinical data is quite complicated and mislabeled data are common ;
  • High intra- and inter-subject variability : between subjects and between sessions, signals of interest suffer from high variability ;
  • Reproducibility and availability of software : the literature on brain signal processing is dense, and evaluation of ML algorithms is often obscured by partial benchmarks and cherry-picked data sets. Code sources, when available, are entangled with task- and data-specific aspects, limiting their reuse.


Based on the key issues identified, the following objectives are considered in this project :

  1. Recovering the information of interest : several theoretical contributions for SRS relying on robust estimators and geometry to characterize the signal from the noise: extending geometric models, exploiting robust statistics, and designing the SRS based on both robustness and geometry ;
  2. Reducing data dependence : using little or no catalogued data to address intra- and inter-subject variability: defining a new recording method for different subjects and equipment, constructing a suitable feature space based on similarities between the subject's brain waves and barycentric coordinates, using transfer learning to infer a model for subjects with limited labeled data ;
  3. Open scientific platform : the application aspect consists in developing tools to increase reproducibility and simplify the use of automated processing of raw EEG data. To reduce as much as possible the need to parameterize the model, to automate the processing steps and to reinforce the explicability of the model and the explanatory visualizations.

Go beyond

Read also
Contenus liés
Bandeau image