[Rephrasing] Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam

💡This is a reader's perspective on the paper written by Siyuan Tang (from Indiana University Bloomington) and published at the ACM Conference on Computer and Communications Security 2022.

Brief Description

The authors propose a method to collect spam messages from the Twitter feed. By searching for blacklisted words such as "scam" or "phish" they collect a database of phishing messages, which are later classified using ML as actually a Scam or not. Later, the authors explore the Scam detection systems already created and propose insights into how the Scams work.

Observations

The first thing that I had in mind was a tool to classify it. Since they do not propose it, I suppose that it is still a research GAP to be explored by researchers.

Something that I would like to know is regarding brands in the study. They mention that collect a huge data on Scams, but I still want to know what is the company that is mostly faked by those scams. Ig it is still a research GAP.

Initial Questions

The very first question I had was regarding the data collection mechanism, which they claim to use the Twitter Academic API for.

Where do the experiment ideas come from?

All the paper runs around the idea of collecting Scam messages from Twitter on a large scale. All the following experiments provide insights about the data, and the tools are made to correctly collect and evaluate the data.

What are the interesting ideas/results?

The first interesting idea from the paper is the ability to search for information using social media. Other social media such as Facebook and Reddit were also explored, but I wonder how it can be used for phishing searching (since it might appear much earlier than platforms like VirusTotal or PhishTank).

I like the exploration of the different languages in the dataset, even though it is a preliminary study on the topic (Maybe can be used for phishing as well).

Also, they used Google Vision to extract text from images, which is a very nice approach. In the same way, another interesting tool is the Twilio Lookup API to search for phone number information.

One other interesting idea they had was to evaluate in a time-series manner the Scam landscape. They explored slightly but could be even more exhausted.

They also verify network information such as IP information and DNS datasets to collect further insights on the URLs shared by the Scam messages.

João Pedro Favoretti - Blog

Tuesday, August 20, 2024