Don't miss out any new announcement about a new DATAIA seminar!
Subscribe to the DATAIA seminars mailing-list by clicking here
« Phase transition in PCA with missing data: reduced signal-to-noise ratio, not sample size! » - Lars Kai Hansen
Wednesday 27th November, 3pm-4.30pm - Centre Inria-Saclay, bâtiment Alan Turing
Principal component analysis (PCA) is widely used, easy to formulate and compute - yet has many surprising behaviors! It has been shown that the performance of PCA depends on the signal-to-noise ratio and on the ratio of sample size-to-dimensionality. Since the early 90s it is also known that a critical sample size is needed before learning occurs (Biehl and Mietzner, 1993). Here we generalize this analysis to include missing data. An analytic result suggest that the effect of missingdata is to effectively reduce signal-to-noise rather than - as commonly believed - to reduce sample size. The theory predicts a phase transition induced by the missingprocess and this is indeed observed in simulated and in real data.
N. Ipsen, L.K. Hansen. Phase transition in PCA with missing data; Proc. ICML 2019; PMLR 97:2951-2960, 2019.
Lars Kai Hansen has MSc and PhD degrees in physics from University of Copenhagen. Since 1990 he has been with the Technical University of Denmark, where he currently heads the Section for Cognitive Systems. He has published more than 300 contributoins on machine learning, signal processing, and applications in AI and cognitive systems. His research has been generously funded by the Danish Research Councils and private foundations, the European Union, and the US National Institutes of Health. He has made seminal contributions to machine learning including the introduction of ensemble methods('90) and to functional neuroimaging including the first brain state decoding work based on PET('94) and fMRI('97). In the context of neuroimaging he has developed a suite of methods for visualizing machine learning models and quantification of uncertainty. In 2011 he was elected “Catedra de Excelencia” at UC3M Madrid, Spain.
« Distance learning using Euclidean percolation: Following Fermat's principle » - Matthieu Jonckheere
Thursday 7th November 2019, 2pm-4pm - Centrale Supélec, Eiffel building
In unsupervised statistical learning tasks such as clustering, recommendation, or dimension reduction, a notion of distance or similarity between points is crucial but usually not directly available as an input. We discuss recent techniques to infer a metric from observed data. Then we propose a new density-based estimator for weighted geodesic distances that takes into account the underlying density of the data, and that is suitable for nonuniform data lying on a manifold of lower dimension than the ambient space. The consistency of the estimator is proven using tools from first passage percolation. We then discuss its properties and implementation and evaluate its performance for clustering tasks.
Joint work with P. Groisman and F. Sapienza.
Matthieu Jonckheere received his PhD in applied mathematics from the Ecole Polytechnique (Paris, France). He later completed a postdoc at CWI (Amsterdam) and became an assistant professor at Eindhoven University of Technology. He is now a Conicet researcher and professor at the University of Buenos Aires. He has worked extensively in probability theory and performance evaluation of information and communication systems and more recently in unsupervised learning.
« Nonlinear independent component analysis: A principled framework for unsupervised deep learning » - Aapo Hyvärinen
Wednesday 9th October 2019 2pm-4pm – Room Gilles Kahn, Centre Inria Saclay, Bât Alan Turing, Palaiseau
Unsupervised learning, in particular learning general nonlinear representations, is one of the deepest problems in machine learning. Estimating latent quantities in a generative model provides a principled framework, and has been successfully used in the linear case, e.g. with independent component analysis (ICA) and sparse coding. However, extending ICA to the nonlinear case has proven to be extremely difficult: A straight-forward extension is unidentifiable, i.e. it is not possible to recover those latent components that actually generated the data. Here, we show that this problem can be solved by using additional information either in the form of temporal structure or an additional, auxiliary variable. As a first approach, we formulate self-supervised learning schemes which are similar to those heuristically proposed in computer vision. Our main contribution is to provide a rigorous theoretical framework for such self-supervised algorithms, proving that they are able to solve the nonlinear ICA problem. We further show how a connection between nonlinear ICA and variational autoencoders (VAE): While ordinary VAE suffers from the lack of identifiability, conditioning by auxiliary variables leads to identifiability and provides another method for learning nonlinear ICA.
Aapo Hyvarinen studied mathematics at the universities of Helsinki (Finland), Vienna (Austria), and Paris (France), and obtained a Ph.D. degree in Information Science at the Helsinki University of Technology in 1997. From 2016 to 2019, he was Professor at the Gatsby Computational Neuroscience Unit, University College London, UK. Currently he is visiting DATAIA at Inria-Saclay for a year.
Aapo Hyvarinen is the main author of the books « Independent Component Analysis » (2001) and « Natural Image Statistics » (2009), and author or coauthor of more than 200 scientific articles. He is Action Editor at the Journal of Machine Learning Research and Neural Computation and Editorial Board Member in Foundations and Trends in Machine Learning. His current work concentrates on unsupervised machine learning and its applications to neuroscience.
« Perspectives for causal inference on time series in Earth system sciences » - Jakob Runge
Tuesday 1st October 2019 2pm - 4pm – DIGITEO MOULON - Bâtiment 660 - 91190 Gif-sur-Yvette
The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In disciplines dealing with complex dynamical systems, such as the Earth system, replicated real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated time series data opens up the use of observational causal inference methods beyond the commonly adopted correlation techniques. Observational causal inference is a rapidly growing field with enormous potential to help answer long-standing scientific questions. Unfortunately, many methods are still little known and therefore rarely adopted in Earth system sciences. In this talk I will present a Perspective Paper in Nature Communications which identifies key generic problems and major challenges where causal methods have the potential to advance the state-of-the-art in Earth system sciences. I will also present a novel causal inference benchmark platform that aims to assess the performance of causal inference methods and to help practitioners choose the right method for a particular problem. Some recent methods that address particular challenges of Earth system data will be discussed and illustrated by application examples where causal methods have already led to novel insights in Earth sciences.
Runge, J., S. Bathiany, E. Bollt, G. Camps-Valls, D. Coumou, E. Deyle, C. Glymour, M. Kretschmer, M. D. Mahecha, J. Muñoz-Marı́, E. H. van Nes, J. Peters, R. Quax, M. Reichstein, M. Scheffer, B. Schölkopf, P. Spirtes, G. Sugihara, J. Sun, K. Zhang, and J. Zscheischler (2019). Inferring causation from time series in earth system sciences. Nature Communications 10 (1), 2553.
Jakob is a complex systems data scientist with a focus on climate data. His main research interests are causal discovery and causal inference based on graphical models and deep learning.
Jakob currently leads the Climate Informatics group at the DLR Institute of Data Science. He studied physics at Humboldt University Berlin and the University of California in Santa Cruz and obtained his PhD on causal inference from dynamical complex systems at the Potsdam Institute for Climate Impact Research.
Before his current position Jakob was a Postdoctoral Fellow in Studying Complex Systems at the Grantham Institute, Imperial College London, funded by the James S. McDonnell Foundation.
Antonio Casilli - Les Poinçonneurs de l'IA : le Micro-Travail à l'heure des Plateformes Numériques
Jeudi 9 mai 10h-12h00 - Amphi Sophie Germain, Inria Saclay, Bât Alan Turing, Palaiseau
Malgré leur relative invisibilité, les plateformes numériques de micro-travail représentent le phénomène marquent de la dernière décennie. Services comme Amazon Mechanical Turk, Figure Eight ou Clickworker sont des espaces où les entreprises et les startups ‘entraînent’ ou testent leurs solutions d’intelligence artificielle en recrutant des myriades travailleurs qui réalisent des micro-tâches de transcription, reconnaissance visuelle ou étiquetage de vidéo en échange de rémunérations très faibles d’a peine quelques centimes d’euros. Les études existantes se sont principalement concentrées sur des plateformes anglophones. Notre enquête DiPLab (Digital Plateform Labor, née d’un partenariat entre Télécom Paristech, CNRS, FO, France stratégie et la MSH Paris Saclay) a visé pour la première fois l’écosystème du microtravail en France et dans les pays francophones d’Afrique. Les résultats dressent un tableau surprenant des évolutions du marché du travail à l’heure de l’automation.
Marc-Antoine Dilhac - Une IA éthique est-elle possible ? Perspective de la Déclaration de Montréal
Mercredi 20 mars 10h00 - 12h00, Amphi 5, CentraleSupélec, Gif-sur-Yvette
Marc-Antoine Dilhac de l’Universite de Montréal, porteur de la déclaration de Montréal sur l’IA responsable animera un séminaire DATAIA le mercredi 20 mars matin.
La décennie 2010 a vu apparaître un nouvel objet dans le champ de l’éthique appliquée : l’intelligence artificielle. Objet d’étude académique, l’IA comme technologie a donné naissance à un double, un objet social qui concentre les préoccupations des développeurs, des décideurs publics et des entreprises, mais aussi du public en général, citoyennes et citoyens : l’IA éthique. Souvent confondus, ces deux objets ne sont pas les mêmes et le questionnement éthique qui s’y rapporte varie. L’IA éthique est un produit industriel dont la définition, la standardisation et la régulation font l’objet d’une concurrence intense entre entreprises et organismes. L’émergence de cet objet social a aussi fait naître une classe d’experts, souvent étrangers aux domaines de l’IA et de l’éthique. Mais une IA éthique est-elle possible? Et quelle démarche éthique nous permet de saisir les enjeux que le déploiement de cet objet soulève? Pour y répondre, nous présenterons le processus délibératif dont est issue la Déclaration de Montréal pour un développement responsable de l’IA (2018).
Jérémy Mary - Online Advertising and Strategic Bidding
Mercredi 20 février 2019
Criteo aims to serve personalized online display advertisements, at the root of the business model of many Internet companies, and is leader on this B2B market. The talk will present the technical and ethical challenges of online personalized ads, ranging from the data scale to the customers heterogeneity. The few regulated online advertising market made of auctions will be discussed, revisiting classical auction theory from the buyer point of view (as opposed to, the seller).We show that in the current state of the market which include personalized reserve prices, the best option for the buyer is to be strategic even in a second price option. How to fight information bubbles and user boredom within a recommender system will also be discussed.
Stuart Russell - Provably Beneficial Artificial Intelligence
Stuart Russell, Professor of Computer Science at UC Berkeley, inaugurated the series of seminars organized by the DATAIA Institute on Thursday, May 24, 2018.
Provably Beneficial Artificial Intelligence
I will briefly survey recent and expected developments in AI and their implications. Beyond these, one must expect that AI capabilities will eventually exceed those of humans across a range of real-world-decision making scenarios. Should this be a cause for concern, as Elon Musk, Stephen Hawking, and others have suggested? And, if so, what can we do about it? While some in the mainstream AI community dismiss the issue, I will argue instead that a fundamental reorientation of the field is required. Instead of building systems that optimize arbitrary objectives, we need to learn how to build systems that will, in fact, be beneficial for us. I will show that it is useful to imbue systems with explicit uncertainty concerning the true objectives of the humans they are designed to help, as well as the ability to learn more about those objectives from observation of human behavior.
Corinne Gendron - L'IA, entre éthique et société
The second seminar of the DATAIA Institute took place on Friday, November 16, 2018 at 16:00 at the Alan Turing Building in Palaiseau, animated by Corinne Gendron, holder of the Chair of Social Responsibility and Sustainable Development of the Université du Québec à Montréal, and Professor in the Strategy, Social and Environmental Responsibility Department of the School of Management Sciences, on the theme "AI, between ethics and society".
Avec la progression de l’utilisation des algorithmes dans une multitude de champs d’activité, des questions se confirment et des inquiétudes surgissent : opacité, conséquences imprévues, vide juridique, régulation hors de la loi… pourrons-nous garder le contrôle de notre propre création ? Comment s’assurer que cet outil qui paraît si puissant qu’on l’a gratifié du terme « intelligence » reste au service du bien collectif ? Au-delà des enjeux liés à leur technicité, un regard sociologique sur les algorithmes situe leur développement, leur utilisation et leur promotion dans une société traversée par des rapports sociaux qui mobilisent diverses idéologies. Penser l’éthique ou la responsabilité sociale des algorithmes suppose donc au préalable de saisir les logiques qui président à leur développement et les acteurs qui les portent. Quant à l’arrimage avec le bien commun, il sera tributaire du degré de régulation, de transparence et de contrôle qui viendront sceller un compromis avec les sceptiques et les opposants.
Le Séminaire Palaisien gathers, every first Tuesday of the month, the vast research community of Saclay around statistics and machine learning.
Each seminar session is divided into 2 scientific presentations of 40 minutes each: 30 minutes of presentation and 10 minutes of questions, followed by a coffee break.
LIX Seminars « Ethical issues, law & novel applications of AI »
Within the framework of its Master "Artificial Intelligence & Advanced Visual Computing", the LIX, with the support of DATAIA, organizes seminars about "Ethical issues, law & novel applications of AI".
S³ : Signal Seminar of Université Paris-Saclay
The goal of these seminars is to welcome recognized researchers, but also PhD students and post-docs, around the field of signal processing and its applications.