Le Séminaire Palaisien | Les statistiques et l'apprentissage machine

Le Séminaire Palaisien

Le Séminaire Palaisien | Les statistiques et l'apprentissage machine

02.12.19

ENSAE - Amphi 200

Le séminaire Palaisien réunit, chaque premier mardi du mois, la vaste communauté de recherche de Saclay autour de la statistique et de l'apprentissage machine.

Chaque session du séminaire est divisée en 2 présentations scientifiques de 40 minutes chacune : 30 minutes d’exposé et 10 minutes de questions, suivies par un goûter.

Pierre Laforgue et S ylvain Arlot, deux doctorants à Saclay, animeront la session du 2 décembre.

« On the Dualization of Operator-Valued Kernel Machines » - Pierre Laforgue

Operator-Valued Kernels (OVKs) provide an elegant way to extend scalar kernel methods when the output space is a Hilbert space. If the output space if finite dimensional, this framework naturally allows to tackle multi-class classification or multi-task regression problems. But its ability to deal with infinite dimensional output spaces opens the door to many more applications, such as structured output prediction, structured representation learning, or functional regression. This work investigates how to use the duality principle to handle different families of loss functions, yet unexplored within OVK machines. The difficulty of having infinite dimensional dual variables is overcome by means of a Double Representer Theorem, that will be explicited. This allows for instance to handle Ɛ-insensitive and Huber losses, which are of particular interest in the context of surrogate approaches.

This is a joint work with Alex Lambert, Luc Brogat-Motte and Florence d'Alché-Buc from Télécom Paris. The preprint is available at https://arxiv.org/abs/1910.04621.

« Analysis of some Purely Random Forests » - Sylvain Arlot

Random forests (Breiman, 2001) are a very effective and commonly used statistical method, but their full theoretical analysis is still an open problem. As a first step, simplified models such as purely random forests have been introduced, in order to shed light on the good performance of Breiman's random forests.

In the regression framework, the quadratic risk of a purely random forest can be written as the sum of two terms, which can be understood as an approximation error and an estimation error. Robin Genuer (2010) studied how the estimation error decreases when the number of trees increases for some specific model. In this talk, we study the approximation error (the bias) of some purely random forest models in a regression framework, focusing in particular on the influence of the size of each tree and of the number of trees in the forest.

Under some regularity assumptions on the regression function, we show that the bias of an infinite forest decreases at a faster rate (with respect to the size of each tree) than a single tree. As a consequence, infinite forests attain a strictly better risk rate (with respect to the sample size) than single trees.

This talk is based on joint works with Robin Genuer.

arxiv.org/abs/1407.3939

arxiv.org/abs/1604.01515

Informations Pratiques

Cette session du séminaire aura lieu le lundi 2 décembre 2019.

Elle sera suivie par un pot.

Inscriptions gratuites mais obligatoires dans la limite des places disponibles.

Pour des raisons de sécurité, toute personne non-inscrite ne pourra accéder au lieu du séminaire.

Plan d'accès

Télécharger l'affiche du séminaire

Le Séminaire Palaisien | Les statistiques et l'apprentissage machine

Restez informés !