Repository logo
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of R-3
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Barberan, CJ"

Now showing 1 - 2 of 2
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    GEM Incorporating Context into Genomic Distance Estimation
    (2019-06-04) Barberan, CJ; Baraniuk, Richard G
    A pivotal question in computational biology is how similar two organisms are based on their genomic sequences. Unfortunately, classical sequence alignment-based methods for estimating genomic distances do not scale well to the massive number of organisms that have been sequenced to date. Recently, composition-based methods have gained interest due to their computational efficiencies for massive distance estimation problems. However, these methods reduce the computation time at the cost of distorting the genomic distances. The main problem with composition-based methods is their reliance on the occurrence of length-k subsequences of the genome, known as k-mers, which ignores their ordering, i.e., their context in the genome. In this thesis, we take inspiration from computational linguistics to develop a new genomic distance estimation approach that exploits not only the frequency of the k-mers but also their context. In our Genomic distance EstiMation (GEM) algorithm, we first learn a context-aware, low-dimensional embedding for k-mers by training on a large corpus of FASTA files comprising 159 million bases of whole genome sequence data from microbial organisms in the National Center of Biotechnology Information (NCBI) repository. We then define the distance between two organisms using a generalization of the Jaccard similarity that incorporates the context-aware embedding of the constituent k-mers. A range of experiments demonstrate that GEM estimates the distance between unseen organisms with up to 2 times less error compared to state-of-art algorithms while incurring a similar running time. As a bonus, the GEM context reveals a distinct structure in the ordering of k-mers in bacteria, viruses, and fungi, a finding that motivates follow-up evolutionary studies.
  • Loading...
    Thumbnail Image
    Item
    NeuroView: Explainable Deep Network Decision Making
    (2022-07-06) Barberan, CJ; Baraniuk, Richard G; Balakrishnan, Guha
    Deep neural networks (DNs) provide superhuman performance in numerous computer vision tasks, yet it remains unclear exactly which of a DN's units contribute to a particular decision. A deep network’s prediction cannot be explained in a formal mathematical manner such that you know how all the parameters contribute to the decision. NeuroView is a new family of DN architectures that are explainable by design. Each member of the family is derived from a standard DN architecture by concatenating all of the activations and feeding them into a global linear classifier. The resulting architecture establishes a direct, causal link between the state of each unit and the classification decision. We validate NeuroView on multiple datasets and classification tasks to show that it has on par performance to a typical DN. Also, we inspect how it’s unit/class mapping aids in understanding the decision-making process. In this thesis, we propose using NeuroView in other architectures such as convolutional and recurrent neural networks to show how it can aid in providing additional understanding in applications that need more explanation.
  • About R-3
  • Report a Digital Accessibility Issue
  • Request Accessible Formats
  • Fondren Library
  • Contact Us
  • FAQ
  • Privacy Notice
  • R-3 Policies

Physical Address:

6100 Main Street, Houston, Texas 77005

Mailing Address:

MS-44, P.O.BOX 1892, Houston, Texas 77251-1892