Paper published at IEEE S&P 2024.
💡 My purpose here is mainly to evidence methods and ideas presented by the paper, which may not include all conclusions.
The paper's main point is to solve the problem pointed out by APWG [?], saying that there was a 70% increase in SMS and voice attacks. The main problem of the paper was to identify how they would analyze this problem without a public dataset of phishing messages.
Their key insight was that a paper published in 2018 (Characterizing the security of the sms ecosystem with public gateways) by one of the co-authors of the paper, Bradley Reaves, used a service called "Public SMS Gateway" to identify malicious traffic in SMSs, but did not focus on URL that are shared through SMS messages. While Reaves identified only 64 phishing URLs in the previous study, Nahapetyan captured 2,866 phishing URLs validated by Virus Total.
Those Public SMS Gateways are used mostly as test devices that can receive SMS messages for developing applications, with test numbers from many different countries.
Using the phone numbers provided by those websites, you could test an A2P (Application to Person) SMS service that is currently being integrated in an application aimed for user authentication.
- sms24[.]me
- receivesms[.]org
- freephonenum[.]com
- 7sim[.]org
- temp-number[.]com
- receivesms[.]cc
- sms-online[.]co
- freeonlinephone[.]org
- sms-online[.]co
- receive-sms-online[.]com
- receivesms[.]co
Even though it might not seem to be a very impactful analysis of phishing SMSs, the authors still got 67,991 phishing messages in a year, which is higher than I thought it would be.
Another interesting approach is how they group the messages. After filtering the messages they substitute things like one-time passwords (i.e. verification codes that you receive from MFA), URLs, numbers, or emails with template strings. Like "Your OTP is 1234" gets transformed into "Your OTP is #OTP". After that, they judge the message as the same if it is exactly the same. That approach leaves room for some NLP type of analysis to cluster messages.
Additionaly, they use SpaCy python library to analyze stuff like message language and brand/organization name.
In the end, the author writes about 15 very interesting inferences taken from the data they collected, but something was on my mind. If phishing developers are using those platforms for testing, how long does it take for a URL from there to appear on public platforms like PhishTank, OpenPhish or PhishStasts?
No comments:
Post a Comment