DATAIA Seminar | Alba Martinez-Ruiz - "Numerical and Computing Strategies to Scale Algorithms for Statistical Learning"

Title
"Numerical and Computing Strategies to Scale Algorithms for Statistical Learning"
Abstract
In the era of big data and artificial intelligence, the management and analysis of large data sets have become fundamental in several scientific disciplines. Simultaneously processing large blocks of streaming data is a computational expensive problem. Streaming data is generated and collected continuously from many different sources. By way of example, from a large set of automatic machines in a productive system, from a set of medical devices, or from internet-connected appliances (Internet of Things). Several strategies may be followed to analyze these data. Numerical strategies refer to modifications or improvements in the calculation of the algorithms in such a way as to enhance execution times in the analysis of large data, while preserving properties such as convergence and monotonicity.
To illustrate we examine a new procedure for calculating the factorization of the multiblock redundancy matrix M, which makes the multiblock method faster and more efficient when analyzing large streaming data and high-dimensional dense matrices. Computing strategies refer to parallel computing or parallelization of algorithms for their execution on High Performance Computing (HPC) Systems.
The main objective is to enable the algorithms to process data faster in distributed systems. Two approaches are possible Master-Workers (MW) and Single Program Multiple Data (SPMD). To illustrate we present the parallelization process of the PLS Mode B algorithm, a multi-block method and a tightly coupled algorithm widely used to estimate structural equation models. We address key aspects such as data distribution schemes, scalable linear algebra libraries, Message Passing Interface, and supercomputing environments.
Biography
Dr. Alba Martínez Ruiz is visiting professor Arthur Tenenhaus in the L2S at CentraleSupélec, France. She is Industrial Engineering and Master of Science in Engineering from the Pontificia Universidad Católica de Chile, Chile, and Doctor in Applied Statistics from Universitat Politècnica de Catalunya BarcelonaTech, Spain. She has been full-time professor since 1999, first at the Universidad Católica de la Santísima Concepción and then at the Universidad Diego Portales in Chile. She has been an ISI Elected Member since 2021, IASC-LARS Chairperson 2025-2026, and IASC-LARS Scientific Secretary 2017-2021. Her research interest includes multiblock data analysis, component-based methods, PLS path modeling, statistical methods for dimensionality reduction, big data analysis, climate change, multi-omics data analysis, and the development of new indicators as a measure of technological value.
- The seminar will take place on Thursday, March 27, 2025, from 12.30 pm to 2 pm at CentraleSupélec, Amphi III (Eiffel building) in Gif-sur-Yvette ;
- A sweet break will be served afterwards.