R-3 Repository :: Browsing by Author "Yilmaz, Fatih Furkan"

Browsing by Author "Yilmaz, Fatih Furkan"

Now showing 1 - 2 of 2

Bias-variance Trade-off and Uncertainty Quantification: Effects of Data Distribution in Image Classification
(2022-11-18) Yilmaz, Fatih Furkan; Heckel, Reinhard; Segarra, Santiago
Understanding the training and generalization dynamics of deep neural networks as well as the actual accuracy of the network predictions when deployed in the wild are important open problems in machine learning. In this thesis, we study these two topics in the context of image classification. In the first part, we study the generalization properties of deep neural networks with respect to the regularization of the network training for standard image classification tasks. In the second part, we study the performance of conformal prediction based uncertainty estimation methods. Conformal prediction methods quantify the uncertainty of the predictions of a neural network in practical applications. We study the setup where the test distribution may induce a drop in the accuracy of the predictions due to distribution shift. The training of deep neural networks is often regularized either implicitly, for example by early stopping the gradient descent, or explicitly, by adding an $\ell_2$-penalty to the loss function, in order to prevent overfitting to spurious patterns or noise. Even though these regularization methods are well established in the literature, recently it was uncovered that the test error of the network can exhibit novel phenomena such as yielding a double descent shape with respect to the regularization amount. In the first part of this thesis, we develop a theoretical understanding of the double descent phenomenon with respect to model regularization. For this, we study regression tasks, in both the underparameterized and overparameterized regimes, for linear and non-linear models. We find that for linear regression, a double descent shaped risk is caused by a superposition of bias-variance tradeoffs corresponding to different parts of the data/model and can be mitigated by the proper scaling of the stepsizes or regularization strengths while improving the best-case performance. We next study a non-linear two-layer neural network and characterize the early-stopped gradient descent risk as a superposition of bias-variance tradeoffs and also show that double descent as a function of the L2-regularization coefficient occurs outside of the regime where the risk can be characterized using the existing tools in the literature. We empirically study deep networks trained on standard image classification datasets and show that our results well explain the dynamics of the network training. In the second part of this thesis, we consider the effects of data distribution shift at test time for standard deep neural network classifiers. While recent uncertainty quantification methods like conformal prediction can generate provably valid confidence measures for any pre-trained black-box image classifier, these guarantees fail when there is a distribution shift. We propose a simple test-time recalibration method based on only unlabeled examples that provides excellent uncertainty estimates under natural distribution shifts. We show that our method provably succeeds on a theoretical toy distribution shift problem. Empirically, we show the success of our method for various natural distribution shifts of the popular ImageNet dataset.
Learning to classify images without explicit human annotations
(2020-04-22) Yilmaz, Fatih Furkan; Heckel, Reinhard; Veeraraghavan, Ashok
Image classification problems today are often solved by first collecting examples along with candidate labels, second obtaining clean labels from workers, and third training a large, overparameterized deep neural network on the clean examples. The second, manual labeling step is often the most expensive one as it requires manually going through all examples. In this thesis we propose to i) skip the manual labeling step entirely, ii) directly train the deep neural network on the noisy candidate labels, and iii) early stop the training to avoid overfitting. With this procedure we exploit an intriguing property of overparameterized neural networks: While they are capable of perfectly fitting the noisy data, gradient descent fits clean labels faster than noisy ones. Thus, training and early stopping on noisy labels resembles training on clean labels only. Our results show that early stopping the training of standard deep networks (such as ResNet-18) on a subset of the Tiny Images dataset (which is obtained without any explicit human labels and only about half of the labels are correct), gives a significantly higher test performance than when trained on the clean CIFAR-10 training dataset (which is obtained by labeling a subset of the Tiny Images dataset). We also demonstrate that the performance gains are consistent across all the classes and are not a result of trivial or non-trivial overlaps between the datasets. In addition, our results show that the noise generated through the label collection process is not nearly as adversarial for learning as the noise generated by randomly flipping labels, which is the noise most prevalent in works demonstrating noise robustness of neural networks. We also confirm that our results continue to hold for other datasets by considering the large-scale problem of classifying a sub-set of the ImageNet with the images we obtain from Flickr, only by keyword searches and without any manual labeling.

Browsing by Author "Yilmaz, Fatih Furkan"

Results Per Page

Sort Options