Title | : | The Myths about Overfitting |
Speaker | : | Dr. Debarghya Ghoshdastidar (Assistant professor at TU Munich School of Computation) |
Details | : | Tue, 28 Mar, 2023 10:00 AM @ CS25 |
Abstract: | : | Overfitting is the practice of using a complex machine learning model that perfectly fits the training data. Historically, overfitting has been considered a “bad practice” that is expected to produce predictors which perform poorly on new data. However, the recommendation has flipped in recent times, where overfitted neural networks perform surprisingly well in computer vision, natural language processing among others. For instance, in the ImageNet image classification benchmark (with 14 million images), the best architectures have more than a billion parameters and yet achieve 90% accuracy. This naturally raises the question whether overfitting is a good practice or a bad practice.
In this talk, I will discuss the mathematical foundations behind the classical and modern views about overfitting. I will start with a brief introduction to the statistical theory that leads to the conclusion “overfitting is a bad practice”. I will then discuss some recent theoretical results that debunk the following myths: 1. large models with too many parameters always overfit on the training data; 2. models that perfectly fit the training data cannot predict well on unseen data. The above results are the basis of two promising research directions in machine learning theory: Neural Tangent Kernels -- that capture the training dynamics of wide neural networks -- and Double-Descent phenomenon -- a precise characterisation of the performance of overfitted models. We will finally see why the classical and the modern theories of overfitting are not at odds with each other. |