Don't miss out any new announcement about a new DATAIA seminar!
Subscribe to the DATAIA seminars mailing-list by clicking here
« Clustering and representations of climate scenarios » - Matthieu Jonckheere
2 April 2020 at CentraleSupélec - Bâtiment Eiffel - Amphi IV
We investigated climate scenarios among a large number of temperature simulations provided by RTE
(Réseau de transport d’Electricité). In particular, we were interested in performing the following tasks:
- clustering scenarios and selecting representatives,
- getting a finer understanding of the involved dynamics,
- defining the notion of quantiles for the time-series.
We studied many practical methods, some of which gave meaningful (according to the expertise
of RTE) clusters. We proposed an index allowing to assess and interpret the performance of the clustering methodology.
We also work on sparse representations of the time-series using differential equations and on the definition of quantile in this context.
Joint work with Y. Barrera (Aristas), L. Boechi (Conicet), V. Lefieux (RTE), D. Picard (Paris VII), A. Somacal (Aristas), A. Umfurer (Aristas), E. Smucler (Aristas, Di Tella University)
«How machines learn to talk. Challenges and opportunities of neural approaches for Conversational AI» - Verena Rieser
Amazon Alexa, Apple's Siri or Google's assistant are able to converse with humans using language. The underlying technology - often referred to as spoken dialogue systems - have experienced a revolution over the past decade, moving from being completely handcrafted to using data-driven machine learning methods.
In this talk, I will review current developments including my work on using reinforcement learning and deep learning models, and evaluate these methods in the light of results from real-world applications. In particular, I will report our experience from experimenting with these models for generating responses in open-domain social dialogue as part of the Amazon Alexa Prize challenge, as well as for task-based system as part of the E2E NLG challenge - a shared task organised by my team.
« An overview in factor modelling for high-dimensional time series. Application to air quality and respiratory diseases variables » - Valderio Reisen
27 February 2020 at CentraleSupélec - Bâtiment Eiffel - Amphi IV
Factor analysis makes use of the eigenvalues and eigenvectors of a symmetric matrix for the purpose of space dimension reduction and forecasting, among others.
This talk considers factor modeling for high-dimensional time series in the presence of atypical observations. Estimation and inference of the number of factor will be addressed in the context of classical and robust methodologies. Some real examples will be discussed with a special attention to : (1) Dimension-reduction of the space generated by multivariate pollutant time series variables, and (2 ) The multi-pollutant effect in respiratory healths to quantify the relation between multi-pollutants and respiratory diseases. In this context, alternative time series regression models will be proposed using multivariate statistic techniques.
« Machine learning and causal inference: a two-way road » - Uri Shalit
This talk will have two parts. In the first we will discuss how and when can deep learning methods be applied to learning individual-level causal effects. We will then present a framework we developed for learning individualized treatment recommendations from observational health data, using data of tens of thousands of patients from a big health provider. In the second part we will show how we use ideas from the causal inference literature to address long standing problems in machine learning: off-policy evaluation in a partially observable Markov decision process (POMDP), and learning predictive models that are stable against distributional shifts.
« Phase transition in PCA with missing data: reduced signal-to-noise ratio, not sample size! » - Lars Kai Hansen
Wednesday 27th November, 3pm-4.30pm - Centre Inria-Saclay, bâtiment Alan Turing
Principal component analysis (PCA) is widely used, easy to formulate and compute - yet has many surprising behaviors! It has been shown that the performance of PCA depends on the signal-to-noise ratio and on the ratio of sample size-to-dimensionality. Since the early 90s it is also known that a critical sample size is needed before learning occurs (Biehl and Mietzner, 1993). Here we generalize this analysis to include missing data. An analytic result suggest that the effect of missingdata is to effectively reduce signal-to-noise rather than - as commonly believed - to reduce sample size. The theory predicts a phase transition induced by the missingprocess and this is indeed observed in simulated and in real data.
N. Ipsen, L.K. Hansen. Phase transition in PCA with missing data; Proc. ICML 2019; PMLR 97:2951-2960, 2019.
« Distance learning using Euclidean percolation: Following Fermat's principle » - Matthieu Jonckheere
Thursday 7th November 2019, 2pm-4pm - Centrale Supélec, Eiffel building
In unsupervised statistical learning tasks such as clustering, recommendation, or dimension reduction, a notion of distance or similarity between points is crucial but usually not directly available as an input. We discuss recent techniques to infer a metric from observed data. Then we propose a new density-based estimator for weighted geodesic distances that takes into account the underlying density of the data, and that is suitable for nonuniform data lying on a manifold of lower dimension than the ambient space. The consistency of the estimator is proven using tools from first passage percolation. We then discuss its properties and implementation and evaluate its performance for clustering tasks.
Joint work with P. Groisman and F. Sapienza.
« Nonlinear independent component analysis: A principled framework for unsupervised deep learning » - Aapo Hyvärinen
Wednesday 9th October 2019 2pm-4pm – Room Gilles Kahn, Centre Inria Saclay, Bât Alan Turing, Palaiseau
Unsupervised learning, in particular learning general nonlinear representations, is one of the deepest problems in machine learning. Estimating latent quantities in a generative model provides a principled framework, and has been successfully used in the linear case, e.g. with independent component analysis (ICA) and sparse coding. However, extending ICA to the nonlinear case has proven to be extremely difficult: A straight-forward extension is unidentifiable, i.e. it is not possible to recover those latent components that actually generated the data. Here, we show that this problem can be solved by using additional information either in the form of temporal structure or an additional, auxiliary variable. As a first approach, we formulate self-supervised learning schemes which are similar to those heuristically proposed in computer vision. Our main contribution is to provide a rigorous theoretical framework for such self-supervised algorithms, proving that they are able to solve the nonlinear ICA problem. We further show how a connection between nonlinear ICA and variational autoencoders (VAE): While ordinary VAE suffers from the lack of identifiability, conditioning by auxiliary variables leads to identifiability and provides another method for learning nonlinear ICA.
« Perspectives for causal inference on time series in Earth system sciences » - Jakob Runge
Tuesday 1st October 2019 2pm - 4pm – DIGITEO MOULON - Bâtiment 660 - 91190 Gif-sur-Yvette
The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In disciplines dealing with complex dynamical systems, such as the Earth system, replicated real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated time series data opens up the use of observational causal inference methods beyond the commonly adopted correlation techniques. Observational causal inference is a rapidly growing field with enormous potential to help answer long-standing scientific questions. Unfortunately, many methods are still little known and therefore rarely adopted in Earth system sciences. In this talk I will present a Perspective Paper in Nature Communications which identifies key generic problems and major challenges where causal methods have the potential to advance the state-of-the-art in Earth system sciences. I will also present a novel causal inference benchmark platform that aims to assess the performance of causal inference methods and to help practitioners choose the right method for a particular problem. Some recent methods that address particular challenges of Earth system data will be discussed and illustrated by application examples where causal methods have already led to novel insights in Earth sciences.
Runge, J., S. Bathiany, E. Bollt, G. Camps-Valls, D. Coumou, E. Deyle, C. Glymour, M. Kretschmer, M. D. Mahecha, J. Muñoz-Marı́, E. H. van Nes, J. Peters, R. Quax, M. Reichstein, M. Scheffer, B. Schölkopf, P. Spirtes, G. Sugihara, J. Sun, K. Zhang, and J. Zscheischler (2019). Inferring causation from time series in earth system sciences. Nature Communications 10 (1), 2553.
Antonio Casilli - Les Poinçonneurs de l'IA : le Micro-Travail à l'heure des Plateformes Numériques
Jeudi 9 mai 10h-12h00 - Amphi Sophie Germain, Inria Saclay, Bât Alan Turing, Palaiseau
Malgré leur relative invisibilité, les plateformes numériques de micro-travail représentent le phénomène marquent de la dernière décennie. Services comme Amazon Mechanical Turk, Figure Eight ou Clickworker sont des espaces où les entreprises et les startups ‘entraînent’ ou testent leurs solutions d’intelligence artificielle en recrutant des myriades travailleurs qui réalisent des micro-tâches de transcription, reconnaissance visuelle ou étiquetage de vidéo en échange de rémunérations très faibles d’a peine quelques centimes d’euros. Les études existantes se sont principalement concentrées sur des plateformes anglophones. Notre enquête DiPLab (Digital Plateform Labor, née d’un partenariat entre Télécom Paristech, CNRS, FO, France stratégie et la MSH Paris Saclay) a visé pour la première fois l’écosystème du microtravail en France et dans les pays francophones d’Afrique. Les résultats dressent un tableau surprenant des évolutions du marché du travail à l’heure de l’automation.
Marc-Antoine Dilhac - Une IA éthique est-elle possible ? Perspective de la Déclaration de Montréal
Mercredi 20 mars 10h00 - 12h00, Amphi 5, CentraleSupélec, Gif-sur-Yvette
Marc-Antoine Dilhac de l’Universite de Montréal, porteur de la déclaration de Montréal sur l’IA responsable animera un séminaire DATAIA le mercredi 20 mars matin.
La décennie 2010 a vu apparaître un nouvel objet dans le champ de l’éthique appliquée : l’intelligence artificielle. Objet d’étude académique, l’IA comme technologie a donné naissance à un double, un objet social qui concentre les préoccupations des développeurs, des décideurs publics et des entreprises, mais aussi du public en général, citoyennes et citoyens : l’IA éthique. Souvent confondus, ces deux objets ne sont pas les mêmes et le questionnement éthique qui s’y rapporte varie. L’IA éthique est un produit industriel dont la définition, la standardisation et la régulation font l’objet d’une concurrence intense entre entreprises et organismes. L’émergence de cet objet social a aussi fait naître une classe d’experts, souvent étrangers aux domaines de l’IA et de l’éthique. Mais une IA éthique est-elle possible? Et quelle démarche éthique nous permet de saisir les enjeux que le déploiement de cet objet soulève? Pour y répondre, nous présenterons le processus délibératif dont est issue la Déclaration de Montréal pour un développement responsable de l’IA (2018).
Jérémy Mary - Online Advertising and Strategic Bidding
Mercredi 20 février 2019
Criteo aims to serve personalized online display advertisements, at the root of the business model of many Internet companies, and is leader on this B2B market. The talk will present the technical and ethical challenges of online personalized ads, ranging from the data scale to the customers heterogeneity. The few regulated online advertising market made of auctions will be discussed, revisiting classical auction theory from the buyer point of view (as opposed to, the seller).We show that in the current state of the market which include personalized reserve prices, the best option for the buyer is to be strategic even in a second price option. How to fight information bubbles and user boredom within a recommender system will also be discussed.
Stuart Russell - Provably Beneficial Artificial Intelligence
Stuart Russell, Professor of Computer Science at UC Berkeley, inaugurated the series of seminars organized by the DATAIA Institute on Thursday, May 24, 2018.
Provably Beneficial Artificial Intelligence
I will briefly survey recent and expected developments in AI and their implications. Beyond these, one must expect that AI capabilities will eventually exceed those of humans across a range of real-world-decision making scenarios. Should this be a cause for concern, as Elon Musk, Stephen Hawking, and others have suggested? And, if so, what can we do about it? While some in the mainstream AI community dismiss the issue, I will argue instead that a fundamental reorientation of the field is required. Instead of building systems that optimize arbitrary objectives, we need to learn how to build systems that will, in fact, be beneficial for us. I will show that it is useful to imbue systems with explicit uncertainty concerning the true objectives of the humans they are designed to help, as well as the ability to learn more about those objectives from observation of human behavior.
Corinne Gendron - L'IA, entre éthique et société
The second seminar of the DATAIA Institute took place on Friday, November 16, 2018 at 16:00 at the Alan Turing Building in Palaiseau, animated by Corinne Gendron, holder of the Chair of Social Responsibility and Sustainable Development of the Université du Québec à Montréal, and Professor in the Strategy, Social and Environmental Responsibility Department of the School of Management Sciences, on the theme "AI, between ethics and society".
Avec la progression de l’utilisation des algorithmes dans une multitude de champs d’activité, des questions se confirment et des inquiétudes surgissent : opacité, conséquences imprévues, vide juridique, régulation hors de la loi… pourrons-nous garder le contrôle de notre propre création ? Comment s’assurer que cet outil qui paraît si puissant qu’on l’a gratifié du terme « intelligence » reste au service du bien collectif ? Au-delà des enjeux liés à leur technicité, un regard sociologique sur les algorithmes situe leur développement, leur utilisation et leur promotion dans une société traversée par des rapports sociaux qui mobilisent diverses idéologies. Penser l’éthique ou la responsabilité sociale des algorithmes suppose donc au préalable de saisir les logiques qui président à leur développement et les acteurs qui les portent. Quant à l’arrimage avec le bien commun, il sera tributaire du degré de régulation, de transparence et de contrôle qui viendront sceller un compromis avec les sceptiques et les opposants.
Le Séminaire Palaisien gathers, every first Tuesday of the month, the vast research community of Saclay around statistics and machine learning.
Each seminar session is divided into 2 scientific presentations of 40 minutes each: 30 minutes of presentation and 10 minutes of questions, followed by a coffee break.
Within the framework of its Master "Artificial Intelligence & Advanced Visual Computing", the LIX, with the support of DATAIA, organizes seminars about "Ethical issues, law & novel applications of AI".
The goal of the Signal Seminars of Paris-Saclay University (S3), organized by the L2S lab at CentraleSupélec, is to welcome recognized researchers, but also PhD students and post-docs, around the field of signal processing and its applications.
These seminars are a monthly meeting dedicated to question artificial intelligence and more generally digital technologies from a philosophical point of view. They aim to bring together doctoral students in the humanities who work from this perspective.