Browsing by Author "Luzi, Lorenzo"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Memorization in Generative Networks and Mixtures of GANs(2020-04-22) Luzi, Lorenzo; Baraniuk, RichardWe demonstrate that memorization (perfectly fitting the training data) is necessary to avoid mode collapse in generative networks. Using a straightforward measure of the distance between the training data points and the closest point in the range of the generator, we study how well current generative models memorize in terms of the training dataset size, data distribution, and generator architecture. An important hallmark of our GOoF measure is that it does not require a second, trained model as with Frechet Inception Distance or Inception Score. The GOoF measure quantifies that the successful, popular generative models DCGAN, WGAN, and BigGAN fall far short of memorization. Our analysis inspires a new method to circumvent mode collapse by subsampling the training data (either randomly or with $k$-means clustering); we discuss the links to overparameterization. Mixtures of generative adversarial networks (GANs) are closely related to subsampling methods. We study these mixtures in the context of memorization and density estimation to show that mixtures of GANs are superior to training a single GAN under certain assumptions. Furthermore, we construct a theoretic framework that explains how single GANs, mixtures of GANs, conditional GANs, and Gaussian mixture GANs are all related to each other by modifying the typical GAN optimization problem. Finally, we show empirically that our modified optimization problem has a memorization sweet spot which can be found with hyperparameter tuning.Item Overparameterization and double descent in PCA, GANs, and Diffusion models(2024-04-19) Luzi, Lorenzo; Baraniuk, Richard GThis PhD thesis constitutes a synthesis of my doctoral work, which addresses various aspects of study related to generative modeling with a particular focus on overparameterization. Using a novel method we call pseudo-supervision, we investigate approaches toward characterization of overparameterization behaviors, including double descent, of GANs as well as PCA-like problems. Extending pseudo-supervision to diffusion models, we see that it can be used to create an inductive bias; we demonstrate that this allows us to train our model with lower generalization error and faster convergence time compared to the baseline. I additionally introduce a novel method called Boomerang to extend our study of diffusion models, showing that they can be used for local sampling in image manifolds. Finally, in an approach we titled WaM, I extend FID to include non-Gaussian distributions by using a Gaussian mixture model and a bound on the 2-Wasserstein metric for Gaussian mixture models to define a metric on non-Gaussian features.