Browsing by Author "Ordonez, Vicente"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Backpropagation-Based Decoding for Multimodal Machine Translation(Frontiers Media S.A., 2022) Yang, Ziyan; Pinto-Alva, Leticia; Dernoncourt, Franck; Ordonez, VicentePeople are able to describe images using thousands of languages, but languages share only one visual world. The aim of this work is to use the learned intermediate visual representations from a deep convolutional neural network to transfer information across languages for which paired data is not available in any form. Our work proposes using backpropagation-based decoding coupled with transformer-based multilingual-multimodal language models in order to obtain translations between any languages used during training. We particularly show the capabilities of this approach in the translation of German-Japanese and Japanese-German sentence pairs, given a training data of images freely associated with text in English, German, and Japanese but for which no single image contains annotations in both Japanese and German. Moreover, we demonstrate that our approach is also generally useful in the multilingual image captioning task when sentences in a second language are available at test time. The results of our method also compare favorably in the Multi30k dataset against recently proposed methods that are also aiming to leverage images as an intermediate source of translations.Item More from Less: Learning with Limited Annotated Data in Vision and Language(2024-04-18) Cascante-Bonilla, Paola; Ordonez, VicenteDeep Learning has significantly impacted how we train models that can effectively interpret and model the world around us. So far, we have achieved unprecedented advances due to a massive amount of effort in collecting and curating annotated samples from the real world. However, the research community knows about the unsustainability of this approach and instead has turned its attention to collecting large amounts of unlabeled, noisy, and weakly supervised data. Such "data in the wild" can be gathered through textual descriptions and images from the Internet. Unfortunately, this Internet content is constrained by the interactions of a portion of the global population. This limitation inevitably affects the diversity of the available data, also impacting specialized knowledge. In this thesis, we aim to develop techniques for training efficient intelligent systems using limited amounts of labeled data, and explore to which extent alternative sources of data, such as synthetic images, can be leveraged to learn useful skills and representations. In particular, this thesis aims to address four key research questions; how can we: (a) learn with limited annotated data, (b) learn to augment the available data, (c) learn to generalize to novel data, and (d) utilize alternative data sources while ensuring privacy through synthetic data generation? In the context of learning with limited annotated data, we propose a pseudo-labeling approach that exploits curriculum learning principles to achieve robustness against out-of-distribution data. We also investigate methods to learn robust compositional representations, employing data augmentation techniques to expand the underlying knowledge present in observed data. Furthermore, we explore zero-shot learning to generalize to novel data, analyzing different methods and feature alignment techniques. Finally, we address the challenge of modeling the world without compromising privacy and ethical principles by generating realistic synthetic data, which has been proven useful in training models to perform well under real test data. We hope that the outcomes of this thesis contribute to a broader vision of creating novel algorithms that can exploit and control diverse sources of data, allowing for the development of unbiased and truthful knowledge and information for training deep learning models.