Backdoor in AI: Algorithms, Attacks, and Defenses

Date
2024-08-05
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

As deep learning models (DNNs) become increasingly integral to critical domains such as healthcare, finance, and autonomous systems, ensuring their safety and reliability is of utmost importance. Among the various threats to these systems, backdoor attacks pose a particularly insidious challenge. These attacks compromise the model by embedding a hidden backdoor function, which can be triggered by specific inputs to manipulate the model's behavior. My research goal initially involves exploring the potential backdoor attack surface within the deep learning pipeline. Once we gain a more comprehensive understanding of the backdoor attack mechanism, we can then proceed to develop advanced defense algorithms.

First, for exploring the new backdoor attack surface, we propose a training-free backdoor attack approach which is different from the traditional backdoor insertion method where backdoor behaviors are injected by training the model on a poisoned dataset. Specifically, the proposed attack embeds the backdoor into the target model by inserting a tiny malicious module, TrojanNet, into the target model. The infected model with the backdoor function can misclassify inputs into a target label when the inputs are stamped with preset triggers. The proposed TrojanNet has several new properties including (1) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios and (2) the training-free mechanism saves massive training efforts.

Second, to defend against backdoor attacks, we proposed a honeypot defense method. Our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples. To this end, we propose and integrate a honeypot module into the original DNNs, specifically designed to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in DNNs carry sufficient backdoor features while carrying minimal information about the original tasks. Consequently, we can impose penalties on the information acquired by the honeypot module to inhibit backdoor creation during the fine-tuning process of the stem network. Comprehensive experiments conducted on benchmark datasets substantiate the effectiveness and robustness of our defensive strategy.

Third, we actively explore leveraging backdoors for socially beneficial applications. We demonstrate that backdoors can be used for watermarking valuable assets within the deep learning pipeline. We focused on using backdoors as watermarks to protect data, models, and APIs. To monitor the unauthorized use of datasets, we introduced a clean-label backdoor watermarking framework. Our findings indicate that incorporating just 1% of watermarking samples is sufficient to embed a traceable backdoor function into unauthorized models. To counteract model theft or unauthorized redistribution, we introduced a novel product-key-based security layer for deep learning models. This mechanism restricts access to the model's functionalities until a verified key is entered.

Description
Advisor
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Deep Learning, Backdoor Attack, Backdoor Defense, IP Protection, Watermark
Citation

Tang, Ruixiang. Backdoor in AI: Algorithms, Attacks, and Defenses. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/117794

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page