Browsing by Author "Chowdhury, Arkabandhu"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Meta Approaches to Few-shot Image Classification(2021-02-26) Chowdhury, Arkabandhu; Jermaine, ChristopherSince the inception of deep Convolution Neural Network (CNN) architectures, we have seen a tremendous advancement in machine image classification. However, these methods require a large amount of data, sometimes in the order of millions, but often fail to generalize when the data set is small. And so, recently a new paradigm, called `Few-Shot Learning', has been developed to tackle this problem. Essentially, the goal of few-shot learning is to develop techniques that can rapidly generalize to new tasks containing very few samples-- in extreme cases one (called one-shot) or zero (called zero-shot)-- with labels. In this work, I will be particularly tackling few-shot learning in the application of image classification. The most common approach to it is known as Meta-learning or `learning to learn' where, rather than learning to solve a particular learning problem, the goal is to solve many learning problems in an attempt to learn how to learn to solve a particular type of problem. Another way to solve the problem is to re-purpose an existing learner for a new learning problem, known as Transfer learning. In my thesis, I propose two novel approaches, based on meta-learning and transfer learning, to tackle the few-shot (or one-shot) image classification. The first approach I propose is called meta-meta classification, where one uses a large set of learning problems to design an ensemble of learners, each of which has high bias and low variance and is skilled at solving a specific type of learning problems. The meta-meta classifier learns how to examine a given learning problem and combine the various learners to solve the problem. One type of image classification is the one-vs-all (OvA) classification problem, where only one image from the positive class is available for training along with images from a number of negative classes. I evaluate my approach on a one-shot, one-class-versus-all classification task and show that it is able to outperform traditional meta-learning as well as ensembling approaches. I evaluate my method using the popular 1,000 class Imagenet data (ILSVRC2012), the 200 class Caltech-UCSD Birds dataset, the 102 class FGVC-Aircraft dataset, and the 1,200 class Omniglot hand-written character dataset. I compare my results with a popular meta-learning algorithm, called model-agnostic meta learner (MAML), as well as an ensemble of multiple MAML models, and show my approach is able to outperform them in all the problems. The second approach we investigate uses the existing concept of transfer learning, where a simple Multi-Layer Perceptron (MLP) with a hidden layer is fine-tuned on top of pre-trained CNN backbones. Surprisingly, there have been very few works in the few-shot literature that have even examined the use of an MLP for fine-tuning pre-trained models (the assumption may be be that a hidden layer would provide too many parameters for few-shot learning). In order to avoid overfitting, we simply use an L2-regularizer. We argue that a diverse feature vector made of a variety of pre-trained libraries of models trained on a diverese dataset (such as, ILSVRC2012) is sufficiently capable of being re-purposed for small-data problems. We performed a series of experiments on both classification accuracy and feature behavior on multiple few-shot problems. We carefully picked the hyperparameters after validating on Caltech-UCSD Bird dataset and did our final evaluation on FGVC-Aircraft, FC100, Omniglot, Traffic Sign, FGCVx Fungi , QuickDraw, and VGG Flower datasets. Our experimental results showed significantly better performance compared to some baselines, such as, simple ensembling, standalone best model, as well as, some other competitive meta-learning techniques.Item Shepherding Distributions for parallel Markov Chain Monte Carlo(2017-04-07) Chowdhury, Arkabandhu; Jermaine, Christopher M.One of the major concerns for Markov Chain Monte Carlo (MCMC) algorithms is that they can take a long time to converge to the desired stationary distribution. In practice, MCMC algorithms may take to millions of iterations to converge to the target distribution, requiring a wall clock time measured in months. This thesis presents a general algorithmic framework for running MCMC algorithms in a parallel/distributed environment, that can result in faster burn-in leading to convergence to the target distribution. Our framework, which we call the method of "shepherding distributions", relies on the introduction of an auxiliary distribution called a shepherding distribution (SD) that uses several MCMC chains running in parallel. These chains collectively explore the space of samples, communicating via the shepherding distribution, to reach high likelihood regions faster. We consider various scenarios where shepherding distributions can be used, including the case where several machines or CPU cores work on the same data in parallel (the so-called transition parallel application of the framework) and the case where a large data set itself can be partitioned across several machines or CPU cores and various chains work on subsets of the data (the so-called data parallel application of the framework). This latter application is particularly useful in solving "big data" Machine Learning problems. Experiments under both scenarios illustrate the effectiveness of our shepherding approach to MCMC parallelization.