Backdoor in AI: Algorithms, Attacks, and Defenses

Tang, Ruixiang

Backdoor in AI: Algorithms, Attacks, and Defenses

Files

TANG-DOCUMENT-2024.pdf (7.4 MB)

Date

2024-08-05

Authors

Tang, Ruixiang

Abstract

As deep learning models (DNNs) become increasingly integral to critical domains such as healthcare, finance, and autonomous systems, ensuring their safety and reliability is of utmost importance. Among the various threats to these systems, backdoor attacks pose a particularly insidious challenge. These attacks compromise the model by embedding a hidden backdoor function, which can be triggered by specific inputs to manipulate the model's behavior. My research goal initially involves exploring the potential backdoor attack surface within the deep learning pipeline. Once we gain a more comprehensive understanding of the backdoor attack mechanism, we can then proceed to develop advanced defense algorithms.

First, for exploring the new backdoor attack surface, we propose a training-free backdoor attack approach which is different from the traditional backdoor insertion method where backdoor behaviors are injected by training the model on a poisoned dataset. Specifically, the proposed attack embeds the backdoor into the target model by inserting a tiny malicious module, TrojanNet, into the target model. The infected model with the backdoor function can misclassify inputs into a target label when the inputs are stamped with preset triggers. The proposed TrojanNet has several new properties including (1) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios and (2) the training-free mechanism saves massive training efforts.

Second, to defend against backdoor attacks, we proposed a honeypot defense method. Our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples. To this end, we propose and integrate a honeypot module into the original DNNs, specifically designed to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in DNNs carry sufficient backdoor features while carrying minimal information about the original tasks. Consequently, we can impose penalties on the information acquired by the honeypot module to inhibit backdoor creation during the fine-tuning process of the stem network. Comprehensive experiments conducted on benchmark datasets substantiate the effectiveness and robustness of our defensive strategy.

Third, we actively explore leveraging backdoors for socially beneficial applications. We demonstrate that backdoors can be used for watermarking valuable assets within the deep learning pipeline. We focused on using backdoors as watermarks to protect data, models, and APIs. To monitor the unauthorized use of datasets, we introduced a clean-label backdoor watermarking framework. Our findings indicate that incorporating just 1% of watermarking samples is sufficient to embed a traceable backdoor function into unauthorized models. To counteract model theft or unauthorized redistribution, we introduced a novel product-key-based security layer for deep learning models. This mechanism restricts access to the model's functionalities until a verified key is entered.

Advisor

Hu, Xia

Degree

Doctor of Philosophy

Type

Thesis

Keywords

Deep Learning, Backdoor Attack, Backdoor Defense, IP Protection, Watermark

Citation

Tang, Ruixiang. Backdoor in AI: Algorithms, Attacks, and Defenses. (2024). PhD diss., Rice University. https://hdl.handle.net/1911/117794

Rights

Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.

Citable link to this page

https://hdl.handle.net/1911/117794

Collections

Rice University Theses and Dissertations

Full item page