“Halting Time is Predictable for
Large Models: A Universality Property and Average-case Analysis”
In this
talk, I will present a framework for performing average-case analysis in the
large dimensional regime. Average-case analysis computes the complexity
of an algorithm averaged over all possible inputs. Compared to worst-case
analysis, it is more representative of the typical behavior of an algorithm,
but remains largely unexplored in continuous optimization. One difficulty is
that the analysis can depend on the probability distribution of the inputs to
the model. However, I will show that this is not the case for a class of
large-scale problems trained with first-order methods including random least
squares and one-hidden layer neural networks with random weights. In fact, the
halting time exhibits a universality property: it is independent of the
probability distribution. With this barrier for average-case analysis removed,
I will present the first explicit average-case convergence rates showing a
tighter complexity not captured by traditional worst case analysis. Finally I
will illustrate how this framework can be applied to stochastic gradient
descent (SGD). Under our approach, SGD with a fixed step size has deterministic
dynamics and, surprisingly, these dynamics are governed by a Volterra integral equation.
By analyzing this Volterra equation, I will show SGD undergoes a phase
transition at a critical step size that ultimately affects its convergence
rate.
Courtney Paquette is an assistant professor at McGill University and a
CIFAR Canada AI chair. Paquette’s research broadly focuses on designing and
analyzing algorithms for large-scale optimization problems, motivated by
applications in data science. She received her PhD from the mathematics
department at the University of Washington (2017), held postdoctoral positions
at Lehigh University (2017-2018) and University of Waterloo (NSF postdoctoral
fellowship, 2018-2019), and was a research scientist at Google Research, Brain
Montreal (2019-2020).