Primary tabs

A Physics Approach to Deep Learning Theory with Blake Bordelon

https://physics.wustl.edu/xml/events/22796/rss.xml
15736

A Physics Approach to Deep Learning Theory with Blake Bordelon

Blake Bordelon (hosted by Zohar Nussinov) from Harvard University will be presenting the Physics Colloquium on A Physics Approach to Deep Learning Theory.

Recent developments in artificial intelligence have produced models that exhibit remarkable capabilities in a variety of domains including visual object detection, computational biology, image and language generation, and software development. Despite the dizzying pace of empirical progress, these systems suffer from fundamental challenges including limited interpretability, large compute and data costs to train and run the models, and uncertain trustworthiness during deployment. In addition, key theoretical problems relating to generalization beyond the training data, optimization dynamics, and scaling behavior remain mysterious. In this talk, I will promote the view that understanding deep learning is a pressing social problem that is well suited to the ideas, research methods, and mathematical techniques of physics. I will highlight some concrete examples where such physics based approaches have generated insights that have improved deep learning practice. I will first discuss mathematical results based on mean-field techniques from statistical physics to analyze learning dynamics of neural networks. This theory will provide insights to develop initialization and optimization schemes for neural networks that admit well defined infinite width and depth limits and behave consistently across model scales, providing practical efficiency benefits for pretraining large models. These limits also enable a theoretical characterization of the types of learned solutions reached by deep networks, enabling prediction of their generalization and scaling laws in certain cases. To conclude, I will discuss a few open problems including predicting regimes of memorization and generalization in diffusion models, simple models of reasoning in language models, and an optimal control approach to learning trajectories. 


This lecture was made possible by the William C. Ferguson Fund.