Counterfactuals for Interpretable Machine Learning: Model Reasoning from “What” to “How”

Yang, Fan

Counterfactuals for Interpretable Machine Learning: Model Reasoning from “What” to “How”

Files

YANG-DOCUMENT-2023.pdf (3.54 MB)

Date

2023-05-23

Authors

Yang, Fan

Abstract

With the extensive usage of machine learning (ML) in real-world applications, how to effectively explain the behaviors of ML models is becoming increasingly significant. A bunch of interpretation techniques have then been proposed, aiming to facilitate end-users for a better understanding towards the model working mechanism. Existing techniques for interpretable machine learning mainly focus on the feature attribution methods, where highly contributed features are exported as evidence for model predictions. However, those obtained feature contribution scores are not discriminative in nature, which makes them limited in reasoning decisions and understanding "how".

Counterfactual Explanation, serving as one of the emerging types of ML interpretations, has raised the attention from both researchers and practitioners in recent years. Counterfactual explanation is essentially a series of hypothetical data samples, which is categorized under the example-based reasoning methodology and explored under "what-if" circumstances. The overall interpretation goal of counterfactuals is to indicate how the model decision alters with input perturbations. With valid counterfactual explanations, end-users can know how to flip the model decisions to a preferred outcome, so as to get a better sense of the decision boundaries.

In this thesis, I will cover my previous research efforts on counterfactual explanations, and outline the introduction from three different perspectives. Firstly, for counterfactual derivation, I designed a framework to generate counterfactuals specifically for raw data instances with the proposed Attribute-Informed Perturbation. By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively. Instead of directly modifying instances in the data space, I iteratively optimized the constructed attribute-informed latent space, where features are more robust and semantic. Secondly, for counterfactual explainer deployment, I proposed a Model-based Counterfactual Synthesizer framework for efficient interpretation. I analyzed the model-based counterfactual process, and constructed a base synthesizer by adopting the conditional generative adversarial net structure. To better approximate the counterfactual universe for those minor queries, I employed the umbrella sampling technique to conduct the synthesizer training. I also enhanced the synthesizer by incorporating the causal dependence among attributes, and further validated its correctness through the causality identification approach. Thirdly, for counterfactual delivery to stake-holders, I proposed a novel framework to generate differentially private counterfactuals, where noises are injected for protection while maintaining the explanation roles. I trained an autoencoder with the functional mechanism to construct noisy class prototypes, and then derived the counterfactual explanation from the latent prototypes based on the post-processing immunity of differential privacy. Beyond general stake-holders, I also specifically proposed two explanation delivery frameworks for end-users and model developers. The further research goals are to focus on the sequential counterfactual which is more actionable for end-users, and the global counterfactual which is more insightful for model developers. At the end of thesis, I will list several promising directions to explore in the future.

Advisor

Hu, Xia

Degree

Doctor of Philosophy

Type

Thesis

Keywords

Counterfactual explanation, Interpretable machine learning, Explainable artificial intelligence

Citation

Yang, Fan. "Counterfactuals for Interpretable Machine Learning: Model Reasoning from “What” to “How”." (2023) Diss., Rice University. https://hdl.handle.net/1911/115216.

Rights

Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.

Citable link to this page

https://hdl.handle.net/1911/115216

Collections

Rice University Theses and Dissertations

Full item page