An Unsupervised Approach to Detect Spam Campaigns that Use Botnets on Twitter

Subramanian, Devika2019-05-172019-05-172018-052018-04-20May 2018Chen, Zhouhan. "An Unsupervised Approach to Detect Spam Campaigns that Use Botnets on Twitter." (2018) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/105662">https://hdl.handle.net/1911/105662</a>.https://hdl.handle.net/1911/105662In recent years, Twitter has seen a proliferation of automated accounts or bots that send spam, offer clickbait, compromise security using malware, and attempt to skew public opinion. Previous research estimates that around 9% to 17% of Twitter accounts are bots contributing to between 16% to 56% of tweets on the medium. Our research introduces an unsupervised approach to detect Twitter spam campaigns in real-time. The bot groups we detect tweet duplicate content with shortened embedded URLs over extended periods of time. Our experiments with the detection protocol reveal that bots consistently account for 10% to 50% of tweets generated from 7 popular URL shortening services on Twitter. More importantly, we discover that bots using shortened URLs are connected to large scale spam campaigns that control thousands of domains. We present two use cases of our detection protocol: one as a filtering tool for sentiment analysis during 2014 #UmbrellaRevolution event, the other as a measurement tool to track political bot activities during 2018 #ReleaseTheMemo event. We also document two distinct mechanisms used to control bot groups. Our detection system runs 24/7 and actively collects bots involved in spam campaigns. As of November 2017, we have identified 200,379 unique bot accounts. We make our database of detected bots available for query through a REST API so others can filter out bots to get high quality Twitter datasets for analysis. We report bot accounts and suspicious domains to URL shortening services and Twitter, and our efforts have impacted those companies to suspend abused URLs and update their anti-spam policy.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.Bot detectionSpam detectionSocial Network AnalysisAn Unsupervised Approach to Detect Spam Campaigns that Use Botnets on TwitterThesis2019-05-17