Browsing by Author "Pitkow, Xaq"
Now showing 1 - 20 of 20
Results Per Page
Sort Options
Item Biophysically Plausible Learning in the Brain via Eligibility Traces: Cortical Sequences, Hippocampal Place Cells, and Dopaminergic Reward Prediction Error(2021-08-25) Cone, Ian; Shouval, Harel; Pitkow, XaqThe brain’s ability to learn and associate temporally distal stimuli is one of its most fundamental (and puzzling) functions. The behaviorally relevant time scales (hundreds of milliseconds to seconds) at which the brain must link actions to reward are largely incompatible with Hebbian or STDP-like correlative learning rules. To solve this conundrum, we posit that two types of Hebbian activated, synapse specific “eligibility traces” – one associated with long term potentiation and the other long term depression – act as long lasting synaptic “tags” of previous activity . Upon presentation of a reinforcement signal, these two traces act in competition to determine long term changes in synaptic strength. In this work, we demonstrate the efficacy of this two-trace learning rule in three separate models. The first focuses on the learning and recall of uncompressed temporal sequences, based on recent experimental data from the visual cortex. The second model replicates so called “behavioral time scale plasticity” in hippocampal CA1, where the induction of a dendritic calcium spike triggers plasticity in place fields well in the past or future along the track traversal. Finally, this thesis showcases a model of dopaminergic cells demonstrating reward prediction error, including in the context of various “blocking” and “unblocking” paradigms. These models adhere to biophysical realism as much as possible; leaky-integrate-and-fire neurons with realistic noise are used when appropriate, and the models are either based on or replicate experimental results. Notably, and in contrast to many contemporary models which deal with the temporal credit assignment problem, eligibility traces allow for the principles of locality and causality to always be conserved. The success of these models presents a compelling case for the widespread utility of eligibility traces across a wide range of temporal tasks, and the models’ adherence to biophysical realism lend plausibility to the idea that eligibility traces are actually implemented in such a manner in the brain.Item Decoding Depression Severity From Intracranial Neural Activity(Elsevier, 2023) Xiao, Jiayang; Provenza, Nicole R.; Asfouri, Joseph; Myers, John; Mathura, Raissa K.; Metzger, Brian; Adkinson, Joshua A.; Allawala, Anusha B.; Pirtle, Victoria; Oswalt, Denise; Shofty, Ben; Robinson, Meghan E.; Mathew, Sanjay J.; Goodman, Wayne K.; Pouratian, Nader; Schrater, Paul R.; Patel, Ankit B.; Tolias, Andreas S.; Bijanki, Kelly R.; Pitkow, Xaq; Sheth, Sameer A.Background Disorders of mood and cognition are prevalent, disabling, and notoriously difficult to treat. Fueling this challenge in treatment is a significant gap in our understanding of their neurophysiological basis. Methods We recorded high-density neural activity from intracranial electrodes implanted in depression-relevant prefrontal cortical regions in 3 human subjects with severe depression. Neural recordings were labeled with depression severity scores across a wide dynamic range using an adaptive assessment that allowed sampling with a temporal frequency greater than that possible with typical rating scales. We modeled these data using regularized regression techniques with region selection to decode depression severity from the prefrontal recordings. Results Across prefrontal regions, we found that reduced depression severity is associated with decreased low-frequency neural activity and increased high-frequency activity. When constraining our model to decode using a single region, spectral changes in the anterior cingulate cortex best predicted depression severity in all 3 subjects. Relaxing this constraint revealed unique, individual-specific sets of spatiospectral features predictive of symptom severity, reflecting the heterogeneous nature of depression. Conclusions The ability to decode depression severity from neural activity increases our fundamental understanding of how depression manifests in the human brain and provides a target neural signature for personalized neuromodulation therapies.Item Dynamical latent state computation in the male macaque posterior parietal cortex(Springer Nature, 2023) Lakshminarasimhan, Kaushik J.; Avila, Eric; Pitkow, Xaq; Angelaki, Dora E.Success in many real-world tasks depends on our ability to dynamically track hidden states of the world. We hypothesized that neural populations estimate these states by processing sensory history through recurrent interactions which reflect the internal model of the world. To test this, we recorded brain activity in posterior parietal cortex (PPC) of monkeys navigating by optic flow to a hidden target location within a virtual environment, without explicit position cues. In addition to sequential neural dynamics and strong interneuronal interactions, we found that the hidden state - monkey’s displacement from the goal - was encoded in single neurons, and could be dynamically decoded from population activity. The decoded estimates predicted navigation performance on individual trials. Task manipulations that perturbed the world model induced substantial changes in neural interactions, and modified the neural representation of the hidden state, while representations of sensory and motor variables remained stable. The findings were recapitulated by a task-optimized recurrent neural network model, suggesting that task demands shape the neural interactions in PPC, leading them to embody a world model that consolidates information and tracks task-relevant hidden states.Item Essential nonlinear properties in neural decoding(2018-06-04) Yang, Qianli; Pitkow, XaqThe sensory data about most natural task-relevant variables is confounded by task-irrelevant sensory variations, called nuisance variables. To be useful, the sensory signals that encode the relevant variables must be untangled from the nuisance variables through nonlinear recoding transformations, before the brain can use or decode them to drive behaviors. The information to be untangled is represented in the cortex by the activity of large populations of neurons, constituting a nonlinear population code. In this thesis I provide three major contributions in theoretical neuroscience. First, I provide a new way of thinking about nonlinear population codes and nuisance variables, leading to a theory of nonlinear feedforward decoding of neural population activity. This theory obeys fundamental mathematical limitations on information content that are inherited from the sensory periphery, producing redundant codes when there are many more cortical neurons than primary sensory neurons. Second, and critically for experimental testing, I provide a theory that predicts a simple, easily computed quantitative relationship between fluctuating neural activity and behavioral choices if the brain uses its nonlinear population codes optimally: more informative patterns should be more correlated with choices. To validate this theory, I show that when primates discriminate between a wide or narrow distribution from which oriented images could be sampled, quadratic statistics of primary visual cortex activity match this predicted pattern. Third, I contribute new concepts and methods to characterize behaviorally relevant nonlinear computation downstream of recorded neurons. Since many neural transformations can generate the same behavioral output, I will define a new concept of equivalence classes for neural transformations based on the degeneracy of the decoding. This suggests that we can understand the neural transformations by picking a convenient nonlinear basis that approximates the actual neural transformation up to an equivalence relation given by the intrinsic uncertainty, instead of trying to reproduce the biophysical details. Then I extend the concept of redundant codes to a more general scenario: when different subsets of neural response statistics contain limited information about the stimulus. This extension allows us understand the neural computation at the representational level --- extracting representations for different subsets of neural nonlinear statistics, characterizing how these representations transform the information about task-relevant variables and studying the coarse-grained computations on these representations.Item Exploring Spatial Resolution in Image Processing(2021-04-30) Yu, Lantao; Orchard, Michael T.; Baraniuk, Richard G.; Pitkow, Xaq; Kyrillidis, Anastasios; Guleryuz, Onur G.Motivated by the human visual system’s instinct to explore details, image processing algorithms designed to facilitate the viewer’s interpretation of details in an image are ubiquitous. Such algorithms seek to extract the highest spatial frequency information that an original image has to offer, and to render that information clearly to the viewer in the form of an image with often an increased number of pixels. This thesis focuses on methods for extracting the highest possible spatial frequency information from digital imagery. Classical sampling theory provides a full understanding of the highest possible spatial frequency information that can be represented by sampled images that have been spatially band-limited to the Nyquist rate. However, natural digital images are rarely band-limited and often carry substantial energy (and information) at frequencies well beyond the Nyquist rate. My research investigates approaches for extracting information from this out-of-band (beyond the Nyquist frequency limit) energy and proposes algorithms to use that information to generate images with higher spatial resolution. This thesis pursues three approaches to extracting high spatial frequency information from digital imagery, based on frequency, spatial, and cross-channel perspectives to the problem. a) Coefficients representing out-of-band high-frequency contents are closely related to co-located coefficients representing in-band, low-frequency contents. The frequency perspective seeks to exploit those relationships to estimate both the uncorrupted out-of-band and in-band coefficients representing an image with higher spatial resolution; b) Spatial patches (blocks of pixels) of an image are known to be similar to other spatial patches elsewhere in the image. Thus, a patch with high-resolution details that has an insufficient number of samples to accurately represent its details could benefit from its similarity to other spatial patches. Although each individual patch may still be insufficiently sampled to retain its details, the ensemble of samples from the collection of similar patches provides a richer sampling pattern that I seek to exploit in the spatial perspective to the problem; c) In some imaging settings, multiple electro-magnetic channels of images are available from the same scene, with different imaging modalities offering different sensor information, each with its own spatial resolution. The cross-channel perspective seeks to exploit cross-channel proximity to produce high-resolution versions of multiple channels.Item How Can Single Sensory Neurons Predict Behavior?(Elsevier, 2015) Pitkow, Xaq; Liu, Sheng; Angelaki, Dora E.; DeAngelis, Gregory C.; Pouget, AlexSingleᅠsensory neuronsᅠcan be surprisingly predictive of behavior in discrimination tasks. We propose this isᅠpossible because sensory information extracted from neural populations is severely restricted, either by near-optimal decoding of a population with information-limiting correlations or by suboptimal decoding that is blind to correlations. These have different consequences for choice correlations, the correlations between neural responses and behavioral choices. In theᅠvestibularᅠandᅠcerebellar nucleiᅠand the dorsalᅠmedial superior temporal area, we found that choice correlations during heading discrimination are consistent with near-optimal decoding ofᅠneuronal responses corrupted by information-limiting correlations. In the ventral intraparietal area, the choice correlations are also consistent with the presence of information-limiting correlations, but this area does not appear to influence behavior, although the choice correlations are particularly large. These findings demonstrate how choice correlations can be used to assess the efficiency of the downstream readout and detect the presence of information-limiting correlations.Item Inference as Control predicts Phase transitions in when Feedback is useful(2021-08-09) Boominathan, Lokesh; Pitkow, XaqSensory observations about the world are invariably ambiguous. Inference about the world's latent variables is thus an important computation for the brain. However, computational constraints limit the performance of these computations. These constraints include energetic costs for neural activity and noise for every channel. Efficient coding is a prominent theory that describes how limited resources can be used best. In one incarnation, this leads to a theory of predictive coding, where predictions are subtracted from signals, reducing the cost of sending something that is already known. This theory does not, however, account for the costs or noise associated with those predictions. Here we offer a theory that accounts for both feedforward and feedback costs, and noise in all computations. We formulate this inference problem as message-passing on a graph whereby feedback is viewed as a control signal aiming to maximize how well an inference tracks a target state while minimizing the costs of computation. We apply this novel formulation of inference as control to the canonical problem of inferring the hidden scalar state of a linear dynamical system with Gaussian variability. Our theory predicts the gain of optimal predictive feedback and how it is incorporated into the inference computation. We show that there is a non-monotonic dependence of optimal feedback gain as a function of both the computational parameters and the world dynamics, and we reveal phase transitions in whether feedback provides any utility in optimal inference under computational costs.Item Inference by Reparameterization using Neural Population Codes(2015-12-04) Vasudeva Raju, Rajkumar; Pitkow, Xaq; Aazhang, Behnaam; Ernst, Philip; Josic, KresimirBehavioral experiments on humans and animals suggest that the brain performs probabilistic inference to interpret its environment. Here we present a general-purpose, biologically plausible implementation of approximate inference based on Probabilistic Population Codes (PPCs). PPCs are distributed neural representations of probability distributions that are capable of implementing marginalization and cue-integration in a biologically plausible way. By connecting multiple PPCs together, we can naturally represent multivariate probability distributions, and capture the conditional dependency structure by setting those connections as in a probabilistic graphical model. To perform inference in general graphical models, one convenient and often accurate algorithm is Loopy Belief Propagation (LBP), a ‘message-passing’ algorithm that uses local marginalization and integration operations to perform approximate inference efficiently even for complex models. In LBP, a message from one node to a neighboring node is a function of incoming messages from all neighboring nodes, except the recipient. This exception renders it neurally implausible because neurons cannot readily send many different signals to many different target neurons. Interestingly, however, LBP can be reformulated as a sequence of Tree-based Re-Parameterization (TRP) updates on the graphical model which re-factorizes a portion of the probability distribution. Although this formulation still implicitly has the message exclusion problem, we show this can be circumvented by converting the algorithm to a nonlinear dynamical system with auxiliary variables and a separation of time-scales. By combining these ideas, we show that a network of PPCs can represent multivariate probability distributions and implement the TRP updates for the graphical model to perform probabilistic inference. Simulations with Gaussian graphical models demonstrate that the performance of the PPC-based neural network implementation of TRP updates for probabilistic inference is comparable to the direct evaluation of LBP, and thus provides a compelling substrate for general, probabilistic inference in the brain.Item Inferring Implicit Inference(2019-12-05) Vasudeva Raju, Rajkumar; Pitkow, XaqOne of the biggest challenges in theoretical neuroscience is to understand how the collective activity of neuronal populations generate behaviorally relevant computations. Repeating patterns of structure and function in the cerebral cortex suggest that the brain employs a repeating set of elementary or “canonical” computations. Neural representations, however, are distributed; so it remains an open challenge how to define these canonical computations, because the relevant operations are only indirectly related to single-neuron transformations. In this thesis, I present a theory-driven mathematical framework for inferring canonical computations from large-scale neural measurements. This work is motivated by one important class of cortical computation, probabilistic inference. In the first part of the thesis, I develop the Neural Message Passing theory, which posits that the brain has a structured internal model of the world, and that it approximates probabilistic inference on this model using nonlinear message-passing implemented by recurrently connected neural population codes. In the second part of the thesis, I present Inferring Implicit Inference, a principled framework for inferring canonical computations from large-scale neural data that is based on the theory of neural message passing. This general data analysis framework simultaneously finds (i) the neural representation of relevant variables, (ii) interactions between these latent variables that define the brain's internal model of the world, and (iii) canonical message-functions that specify the implicit computations. As a concrete demonstration of this framework, I analyze artificial neural recordings generated by a model brain that implicitly implements advanced mean-field inference. Given external inputs and noisy neural activity from the model brain, I successfully estimate the latent dynamics and canonical parameters that explain the simulated measurements. Analysis of these models reveal certain features of experiment design required to successfully extract canonical computations from neural data. In this first example application, I used a simple polynomial basis to characterize the latent canonical transformations. While this construction matched the true model, it is unlikely to capture a real brain's nonlinearities efficiently. To address this, I develop a general, flexible variant of the framework based on Graph Neural Networks, to infer approximate inferences with known neural embedding. Finally, I develop a computational pipeline to analyze large-scale recordings from the mouse visual cortex generated in response to naturalistic stimuli designed to highlight the influence of lateral connectivity. The first practical application of this framework did not reveal any compelling influences of lateral connectivity. However, these preliminary results provide valuable insights about which assumptions in our underlying models and which aspects of experiment design should be refined to reveal canonical properties of the brain's distributed nonlinear computations.Item Influence of sensory modality and control dynamics on human path integration(eLife Sciences Publications Ltd., 2022) Stavropoulos, Akis; Lakshminarasimhan, Kaushik J; Laurens, Jean; Pitkow, Xaq; Angelaki, DoraPath integration is a sensorimotor computation that can be used to infer latent dynamical states by integrating self-motion cues. We studied the influence of sensory observation (visual/vestibular) and latent control dynamics (velocity/acceleration) on human path integration using a novel motion-cueing algorithm. Sensory modality and control dynamics were both varied randomly across trials, as participants controlled a joystick to steer to a memorized target location in virtual reality. Visual and vestibular steering cues allowed comparable accuracies only when participants controlled their acceleration, suggesting that vestibular signals, on their own, fail to support accurate path integration in the absence of sustained acceleration. Nevertheless, performance in all conditions reflected a failure to fully adapt to changes in the underlying control dynamics, a result that was well explained by a bias in the dynamics estimation. This work demonstrates how an incorrect internal model of control dynamics affects navigation in volatile environments in spite of continuous sensory feedback.Item Learning precise spatiotemporal sequences via biophysically realistic neural circuits with modular structure(2020-05-27) Cone, Ian; Shouval, Harel; Pitkow, XaqThe ability to express and learn temporal sequences is an essential part of neural learning and memory. Learned temporal sequences are expressed in multiple brain regions and as such there may be common design in the circuits that mediate it. This thesis proposes a substrate for such representations, via a biophysically realistic network model that can robustly learn and recall discrete sequences of variable order and duration. The model consists of a network of spiking leaky-integrate-and-fire model neurons placed in a modular architecture designed to resemble cortical microcolumns. Learning is performed via a learning rule with “eligibility traces”, which hold a history of synaptic activity before being converted into changes in synaptic strength upon neuromodulator activation. Before training, the network responds to incoming stimuli, and contains no memory of any particular sequence. After training, presentation of only the first element in that sequence is sufficient for the network to recall an entire learned representation of the sequence. An extended version of the model also demonstrates the ability to successfully learn and recall non-Markovian sequences. This model provides a possible framework for biologically realistic sequence learning and memory, and is in agreement with recent experimental results, which have shown sequence dependent plasticity in sensory cortex.Item NEURD: automated proofreading and feature extraction for connectomics(2023-04-21) Celii, Brendan; Pitkow, Xaq; Reimer, JacobWe are now in the era of millimeter-scale electron microscopy (EM) volumes collected at nanometer resolution (Shapson-Coe et al., 2021; Consortium et al., 2021). Dense reconstruction of cellular compartments in these EM volumes has been enabled by recent advances in Machine Learning (ML) (Lee et al., 2017; Wu et al., 2021; Lu et al., 2021; Macrina et al., 2021). Automated segmentation methods can now yield exceptionally accurate reconstructions of cells, but despite this accuracy, laborious post-hoc proofreading is still required to generate large connectomes free of merge and split errors. The elaborate 3-D meshes of neurons produced by these segmentations contain detailed morphological information, from the diameter, shape, and branching patterns of axons and dendrites, down to the fine-scale structure of dendritic spines. However, extracting information about these features can require substantial effort to piece together existing tools into custom workflows. Building on existing open-source software for mesh manipulation, here we present "NEURD", a software package that decomposes each meshed neuron into a compact and extensively annotated graph representation. With these feature-rich graphs, we implement workflows for state of the art automated post-hoc proofreading of merge errors, cell classification, spine detection, axon-dendritic proximities, and other features that can enable many downstream analyses of neural morphology and connectivity. NEURD can make these new massive and complex datasets more accessible to neuroscience researchers focused on a variety of scientific questions.Item NEURD: automated proofreading and feature extraction for connectomics(2024-03-28) Celii, Brendan; Reimer, Jacob; Pitkow, XaqWe are now in the era of millimeter-scale electron microscopy (EM) volumes collected at nanometer resolution. Dense reconstruction of cellular compartments in these EM volumes has been enabled by recent advances in Machine Learning (ML). Automated segmentation methods can now yield exceptionally accurate reconstructions of cells, but despite this accuracy, laborious post-hoc proofreading is still required to generate large connectomes free of merge and split errors. The elaborate 3-D meshes of neurons produced by these segmentations contain detailed morphological information, from the diameter, shape, and branching patterns of axons and dendrites, down to the fine-scale structure of dendritic spines. However, extracting information about these features can require substantial effort to piece together existing tools into custom workflows. Building on existing open-source software for mesh manipulation, here we present "NEURD", a software package that decomposes each meshed neuron into a compact and extensively-annotated graph representation. With these feature-rich graphs, we implement workflows for tasks unable to be performed manually at these scales, such as state of the art automated post-hoc proofreading of merge errors, cell classification, spine detection, axon-dendritic proximities, and other features that can enable many downstream analyses of neural morphology and connectivity. NEURD can make these new massive and complex datasets more accessible to neuroscience researchers focused on a variety of scientific questions.Item Nonlinear neural codes(2015-12-03) Yang, Qianli; Pitkow, Xaq; Aazhang, Behnaam; Johnson, Don H.; Baraniuk, Richard G.; Tolias, AndreasMost natural task-relevant variables are encoded in the early sensory cortex in a form that can only be decoded nonlinearly. Yet despite being a core function of the brain, nonlinear population codes are rarely studied and poorly understood. Interestingly, the most relevant existing quantitative model of nonlinear codes is inconsistent with known architectural features of the brain. In particular, for large population sizes, such a code would contain more information than its sensory inputs, in violation of the data processing inequality. In this model, the noise correlation structures provide the population with an information content that scales with the size of the cortical population. This correlation structure could not arise in cortical populations that are much larger than their sensory input populations. Here we provide a better theory of nonlinear population codes that obeys the data processing inequality by generalizing recent work on information-limiting correlations in linear population codes. Although these generalized, nonlinear information-limiting correlations bound the performance of any decoder, they also make decoding more robust to suboptimal computation, allowing many suboptimal decoders to achieve nearly the same efficiency as an optimal decoder. Although these correlations are extremely difficult to measure directly, particularly for nonlinear codes, we provide a simple, practical test by which one can use choice-related activity in small populations of neurons to determine whether decoding is limited by correlated noise or by downstream suboptimality. Finally, we discuss simple sensory tasks likely to require approximately quadratic decoding, to which our theory applies.Item Revealing nonlinear neural decoding by analyzing choices(Springer Nature, 2021) Yang, Qianli; Walker, Edgar; Cotton, R. James; Tolias, Andreas S.; Pitkow, XaqSensory data about most natural task-relevant variables are entangled with task-irrelevant nuisance variables. The neurons that encode these relevant signals typically constitute a nonlinear population code. Here we present a theoretical framework for quantifying how the brain uses or decodes its nonlinear information. Our theory obeys fundamental mathematical limitations on information content inherited from the sensory periphery, describing redundant codes when there are many more cortical neurons than primary sensory neurons. The theory predicts that if the brain uses its nonlinear population codes optimally, then more informative patterns should be more correlated with choices. More specifically, the theory predicts a simple, easily computed quantitative relationship between fluctuating neural activity and behavioral choices that reveals the decoding efficiency. This relationship holds for optimal feedforward networks of modest complexity, when experiments are performed under natural nuisance variation. We analyze recordings from primary visual cortex of monkeys discriminating the distribution from which oriented stimuli were drawn, and find these data are consistent with the hypothesis of near-optimal nonlinear decoding.Item Robust deep learning object recognition models rely on low frequency information in natural images(PLOS, 2023) Li, Zhe; Caro, Josue Ortega; Rusak, Evgenia; Brendel, Wieland; Bethge, Matthias; Anselmi, Fabio; Patel, Ankit B.; Tolias, Andreas S.; Pitkow, XaqMachine learning models have difficulty generalizing to data outside of the distribution they were trained on. In particular, vision models are usually vulnerable to adversarial attacks or common corruptions, to which the human visual system is robust. Recent studies have found that regularizing machine learning models to favor brain-like representations can improve model robustness, but it is unclear why. We hypothesize that the increased model robustness is partly due to the low spatial frequency preference inherited from the neural representation. We tested this simple hypothesis with several frequency-oriented analyses, including the design and use of hybrid images to probe model frequency sensitivity directly. We also examined many other publicly available robust models that were trained on adversarial images or with data augmentation, and found that all these robust models showed a greater preference to low spatial frequency information. We show that preprocessing by blurring can serve as a defense mechanism against both adversarial attacks and common corruptions, further confirming our hypothesis and demonstrating the utility of low spatial frequency information in robust object recognition.Item Submarinul Iertat (Forgiven Submarine), a work for large orchestra(2018-04-18) Monds, Shane; Jalbert, Pierre; Ferris, David; Lavenda, Richard; Pitkow, XaqSubmarinul Iertat (Forgiven Submarine) is an original music composition for large symphony orchestra. The work is inspired by and written in collaboration with two Romanian authors, Ruxandra Ceseraneu and Andrei Codrescu. The title stems from the opening line in a collaborative poem by Ceseraneu and Codrescu. Like the poem, the orchestral work tries to synthesize elements of eastern European surrealism, magical realism, mid-sixties American avant-garde and “beat” generation aesthetics. The musical work opens in a “sea” of spectral sound and overtone harmonies that depict the bizarre and hallucinatory tableaus of the poetry. This large densely textured music gives way to a strange “love-song” of sorts – representing the work’s coy seduction embodied in the collaborative poem. The final portion of the piece represents the characters succumbing to their own delirium and frenetic energy – the ethereal, unique trance state that is central to Ceseraneu’s poetics.Item The Science of Mind Reading: New Inverse Optimal Control Framework(2018-11-19) Daptardar, Saurabh; Pitkow, XaqContinuous control and planning by the brain remain poorly understood and is a major challenge in the field of Neuroscience. To truly say that we understand the underlying mechanisms we should first be able to explain the behavioral actions of the animals, so that we can relate the neural activity to these explanations. We hypothesize that animals choose actions rationally under possibly mistaken assumptions about the world. That is, their actions result from solving an optimal control problem. We consider a naturalistic task to study this in greater detail, under a formal optimal control framework of Partially Observable Markov Decision Processes. In our "firefly" task, monkeys are trained to steer to catch transiently visible fireflies in a Virtual Reality environment, using motion cues to navigate. There are no spatial landmarks in this task, which introduces significant uncertainty. The animal must therefore make decisions to maximize its total reward based on beliefs about the hidden firefly location. We cannot observe this internal belief state, nor the internal model assumed by the animal, but only the actions chosen and the sensory observations the animal received. To explain the actions we need to reconstruct the internal model which results in the actions. Using reinforcement learning algorithms, we solve the forward problem of solving for the optimal actions given a model and a given reward function. We then propose a novel framework of inverse reinforcement learning, which learns optimal policies generalized over the model space. Our proposed method is able to recover the true model of simulated agents within theoretical error bounds. Finally, we interpret our framework in a way that opens new possibilities for hierarchical inference while an animal learns.Item Embargo Third-Order Interactions in Neural Computations(2024-04-17) Fei, Yicheng; Hafner, Jason H.; Pitkow, XaqIn this thesis, we explore the role of third-order interactions in neural computations, emphasizing their significance as a reflection of such generative processes in the physical world. We also propose using third-order interactions in probabilistic graphical models (PGMs) within the exponential family as a normative way to define a gating mechanism in generative probabilistic graphical models. By going a step beyond pairwise interactions, it empowers much more computational efficiency, like a transistor expands possible digital computations. We also demonstrate the use of third-order PGM for explaining observed properties of neural computations, particularly in context-dependent flexible divisive normalization and attention. Both can be conceptualized as a gating mechanism. As a concrete example, we show that a graphical model with three-way interactions provides a normative explanation for observed divisive normalization properties in the macaque primary visual cortex. Inference in such PGMs is nontrivial. We define Recurrent Factor Graph Neural Network (RF-GNN), a machine learning approach developed for fast approximate inference in PGMs with higher-order interactions. Experimental results on several families of graphical models demonstrate the out-of-distribution generalization capability of our method to different-sized graphs and indicate the domain in which our method outperforms Belief Propagation (BP). Moreover, we test the RF-GNN on a real-world Low-Density Parity-Check dataset as a benchmark along with other baseline models including BP variants and a stacked GNN method. Overall we find that RF-GNNs outperform other methods under high noise levels.Item Understanding Robustness and Generalization of Artificial Neural Networks Through Fourier Masks(Frontiers Media S.A., 2022) Karantzas, Nikos; Besier, Emma; Ortega Caro, Josue; Pitkow, Xaq; Tolias, Andreas S.; Patel, Ankit B.; Anselmi, FabioDespite the enormous success of artificial neural networks (ANNs) in many disciplines, the characterization of their computations and the origin of key properties such as generalization and robustness remain open questions. Recent literature suggests that robust networks with good generalization properties tend to be biased toward processing low frequencies in images. To explore the frequency bias hypothesis further, we develop an algorithm that allows us to learn modulatory masks highlighting the essential input frequencies needed for preserving a trained network's performance. We achieve this by imposing invariance in the loss with respect to such modulations in the input frequencies. We first use our method to test the low-frequency preference hypothesis of adversarially trained or data-augmented networks. Our results suggest that adversarially robust networks indeed exhibit a low-frequency bias but we find this bias is also dependent on directions in frequency space. However, this is not necessarily true for other types of data augmentation. Our results also indicate that the essential frequencies in question are effectively the ones used to achieve generalization in the first place. Surprisingly, images seen through these modulatory masks are not recognizable and resemble texture-like patterns.