Interpreting Deep Neural Networks and Beyond: Visualization, Learning Dynamics, and Disentanglement
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Despite their great success, deep neural networks are always considered black boxes. In various domains such as self-driving cars and tumor diagnosis, it is crucial to know the reasoning behind a decision made by a neural network. In addition, a better understanding of current deep learning methods will inspire the development of more principled computational approaches with better robustness and interpretability. In this thesis, we focus on interpreting and improving deep neural networks from the following three perspectives: visualization, learning dynamics, and disentanglement.
First, the demand for human interpretable explanations for model decisions has driven the development of visualization techniques. In particular, a class of backpropagation-based visualizations has attracted much attention recently, including Saliency Map, DeconvNet, and Guided Backprop (GBP). However, we find that there exist some perplexing behaviors: DeconvNet and GBP are more human-interpretable but less class-sensitive than Saliency Map. Motivated by this, we develop a theory that shows that GBP and DeconvNet are essentially doing image reconstruction, which is unrelated to network decisions. This analysis, together with various experiments, implies that those human-interpretable visualizations do not always reveal the inner working mechanisms of deep neural networks.
Second, although generative adversarial networks (GANs) have been one of the most powerful deep generative models, they are notoriously difficult to train and the reasons underlying their (non-)convergence behaviors are still not completely understood. To this end, we conduct a non-asymptotic analysis of local convergence in GAN training dynamics by evaluating the eigenvalues of its Jacobian near the equilibrium. The analysis reveals that to ensure a good convergence rate, two factors should be avoided: (i) Phase Factor, i.e., the Jacobian has complex eigenvalues with a large imaginary-to-real ratio, and (ii) Conditioning Factor, i.e., the Jacobian is ill-conditioned. Thus, we propose a new regularization method called JARE that addresses both factors by construction.
Third, disentanglement learning aims to make representations in neural networks more disentangled and human interpretable. However, we find that current disentanglement methods have several limitations: 1) difficulty with high-resolution images, 2) neglecting the existence of a trade-off between learning disentangled representations and controllable generation, and 3) non-identifiability due to the unsupervised setting. To overcome these limitations, we propose new losses and network architectures based on StyleGAN [karras et al., 2019] for semi-supervised high-resolution disentanglement learning. Experimental results show that using very limited supervision significantly improves disentanglement quality and that the proposed method can generalize well to unseen images in the tasks of semantic fine-grained image editing.
Looking forward, with more efforts and meaningful interactions in these three directions, we believe that we can improve our understanding of both the successes and failures of current deep learning methods, and develop more robust, interpretable and flexible AI systems.
Description
Advisor
Degree
Type
Keywords
Citation
Nie, Weili. "Interpreting Deep Neural Networks and Beyond: Visualization, Learning Dynamics, and Disentanglement." (2021) Diss., Rice University. https://hdl.handle.net/1911/113891.