Uncovering Structure: How Neural Networks Learn and Generalize from Tree-based Data
Jerome Garnier-Brun
Bocconi University, Milan
Seminar of the Series MLP@P (Machine Learning Physics @ Plateau), joint with LISN and IPhT.
Where: LPTMS, Salle des Séminaires (1° étage)
Statistical-physics approaches have provided key insights into the functioning of neural networks, yet most analyses assume high-dimensional random data. In contrast, real-world data possess rich underlying structure that likely shapes how learning and generalization unfold. In an attempt to bridge this gap, this talk will focus on models trained on tree-based data, a setting where Bayes-optimal performance can importantly still be computed exactly. By introducing a controlled filtering procedure that tunes the degree of correlation in the data, we first probe how transformers progressively uncover structure in both supervised and self-supervised inference tasks. The results reveal a hierarchical discovery of correlations—first in time, during training, and in space, across attention layers—which closely mirrors the exact inference algorithm. In a second part, we turn to generative diffusion models, where the same controlled data model exposes a novel biased generalization regime that precedes overt overfitting. There, access to Bayes-optimal benchmarks allows a precise characterization of when and how this bias emerges, suggesting new directions for optimizing diffusion schedules.
