An Unsupervised Approach to Detect Spam Campaigns that Use Botnets on Twitter

Date
2018-04-20
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

In recent years, Twitter has seen a proliferation of automated accounts or bots that send spam, offer clickbait, compromise security using malware, and attempt to skew public opinion. Previous research estimates that around 9% to 17% of Twitter accounts are bots contributing to between 16% to 56% of tweets on the medium. Our research introduces an unsupervised approach to detect Twitter spam campaigns in real-time. The bot groups we detect tweet duplicate content with shortened embedded URLs over extended periods of time. Our experiments with the detection protocol reveal that bots consistently account for 10% to 50% of tweets generated from 7 popular URL shortening services on Twitter. More importantly, we discover that bots using shortened URLs are connected to large scale spam campaigns that control thousands of domains. We present two use cases of our detection protocol: one as a filtering tool for sentiment analysis during 2014 #UmbrellaRevolution event, the other as a measurement tool to track political bot activities during 2018 #ReleaseTheMemo event. We also document two distinct mechanisms used to control bot groups. Our detection system runs 24/7 and actively collects bots involved in spam campaigns. As of November 2017, we have identified 200,379 unique bot accounts. We make our database of detected bots available for query through a REST API so others can filter out bots to get high quality Twitter datasets for analysis. We report bot accounts and suspicious domains to URL shortening services and Twitter, and our efforts have impacted those companies to suspend abused URLs and update their anti-spam policy.

Description
Degree
Master of Science
Type
Thesis
Keywords
Bot detection, Spam detection, Social Network Analysis
Citation

Chen, Zhouhan. "An Unsupervised Approach to Detect Spam Campaigns that Use Botnets on Twitter." (2018) Master’s Thesis, Rice University. https://hdl.handle.net/1911/105662.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page