João Pedro Favoretti - Blog: August 2024

Monday, August 26, 2024

[Rephrasing] PhishTime: Continuous Longitudinal Measurement of the Effectiveness of Anti-phishing Blacklists

💡This is a reader's perspective on the paper written by Adam Oest (from Arizona State University) and published at the USENIX Security Symposium 2020.

Brief Description

The authors provide a study to obtain a perspective on the operation of phishing blacklisting tools (such as Google Safe Browsing and Microsoft SmartScreen). To do that, they automatically launch more than 2,000 phishing websites, each with different cloaking techniques to understand which of them is detected and how long it takes.

[Rephrasing] Sunrise to Sunset: Analyzing the End-to-end Life Cycle and Effectiveness of Phishing Attacks at Scale

💡This is a reader's perspective on the paper written by Adam Oest (from Arizona State University) and published at the USENIX Security Symposium 2020.

Brief Description

The authors seek to understand the timespan between the development of the phishing kit, the distribution, and the discovery of the phishing websites. They do that by analyzing the requests made by the phishing website to the benign website to request images and stylesheets. Using that, they find out that phishing webpages have on average 21 hours to abuse users until they are discovered by defenders.

Observations

I like the idea of providing an experiment that reduces the "Golden Hour duration", that is a term they claimed and is worth being the creators. However, there should be an easier metric to calculate the Golden Hour duration for phishing websites that target a specific brand, which there isn't considering it is required to have access to private information from the company to calculate that.

Another interesting experiment was to track the effectiveness of phishing emails by the reports that users do to the company. However, this is not related to the users who fall for the phishing attack because those are the ones who would not report the website. I need to check the study "Cognitive triaging of phishing attacks" before questioning if that is a research GAP.

I would also like to know what is the "public dump" they mention in Section 4.1, because that would be interesting to check. I wonder if that is a private repository from the company itself.

One last question I had was about the pros of using an asset served by the benign webpage. Is it to bypass detection metrics? Or is it just laziness from the attackers?

Initial Questions

The first question I had was regarding the user network traffic of phishing web pages. They mention that they do that by accessing private information from a specific company in the financial sector, which is not reproducible by future work, but it is a nice idea to take advantage of.

Where do the experiment ideas come from?

They mention that "Cognitive triaging of phishing attacks" uses a similar approach to understand the effectiveness of phishing email lures. That might be the main motivation behind the ideas.

What are the interesting ideas/results?

The first genius idea is to leverage requests to benign content as a tracker for phishing websites. That is a thing I have never seen before.

Nice preliminary study in Section 3.2 to understand that phishing pages usually request content from benign pages.

In Section 4.3, a nice confirmation of a metric visualized in the data with a report from APWG Q3 2019.

Nice geolocation experiment in Section 5.1 to verify the time at which the phishing page was being developed.

Sunday, August 25, 2024

[Rephrasing] Discovering HTTPSiﬁed Phishing Websites Using the TLS Certiﬁcates Footprints

💡This is a reader's perspective on the paper written by Yuji Sakurai (from Waseda University) and published at the IEEE European Symposium on Security and Privacy 2020.

Brief Description

This paper proposes a series of studies on HTTPs used for phishing. They propose a way to collect data, a clustering technique, and a classification algorithm based on TLS-based information, mainly the domain of the phishing website, that uses regex to find interesting patterns in the phishing domain.

Observations

Even though I was a bit skeptical about the idea of using the domain name to cluster the dataset, they proposed an interesting approach using regex (which might be an influence from one of the coauthors, very strange approach if otherwise), which is worth reading.

One other question I had was regarding the hyper-parameter testing on the DBSCAN. They should've tried changing the default parameters since they just stated that this was the only thing they tried.

Initial Questions

My first question was if they were proposing a classification for phishing websites. In the end, they do propose an unsupervised "classification" system that is based on their dataset.

Where do the experiment ideas come from?

Since there were already other tools that used TLS information to classify malicious URLs, I suspect that the idea was to improve the existing tools.

What are the interesting ideas/results?

I like the LCS-based algorithm to calculate the difference of strings.

I really like the example they provided in Section 3.3 to explain their algorithm.

Nice confirmation that Lets Encrypt is highly used by phishing websites, which was explored in other papers as well.

[Rephrasing] Security Analysis on Practices of Certificate Authorities in the HTTPS Phishing Ecosystem

💡This is a reader's perspective on the paper written by Doowon Kim (from the University of Tennessee, Knoxville) and published at the ACM ASIA Conference on Computer and Communications Security 2021.

Brief Description

The authors provide a nice view of the role of certificate authorities in the phishing landscape. They first study how they validate the website to emit certificates, the effectiveness of HTTPS Phishing websites, and the procedures of CAs for emitting and revoking certificates. Also, they mention how they treat reports for phishing websites.

Observations

In Section 1, they mention that of all successful phishing attacks (I also have no idea how they consider one to be successful), 85% are HTTPs Phishing, and the other 15% are HTTP Phishing. That success rate might not mean much if the amount of HTTPS Phishing attacks is also 85%. Later they mention that the amount of HTTPs Phishing is closer to 86%, which basically means that it is worse, isn't it?

Second, why is it good for the CAs to emit certificates to phishing websites? Is it related to the number of emitted phishing websites (That could later increase the evaluation of the company)? Or is it about the price required to evaluate each website for which you are generating the certificate further? It could be a further research GAP.

Another thing I don't agree with the authors is related to the removal of the password field in the Mock Phishing websites. They mention it to be an ethical thing to avoid making regular users to input their information, but they could do it by not storing the information that users put in. If you remove the password field, it will not try to steal the user's password, which would make it different than a phishing website.

Initial Questions

The first thing I asked myself was regarding their URL data source, and whether the phishing websites were required to be live. While I don't know the answer to the second question, they mentioned later that they got those URLs from APWG eCX.

Where do the experiment ideas come from?

I suspect that the inspiration for the paper is based on the curiosity of how malicious websites could be verified, and how the "false" security that Chrome imposes might hurt the users.

What are the interesting ideas/results?

I really like the sequence of experiments on the evaluation of certificate authorities.

Nice idea on the deployment of phishing websites to verify the behavior of CAs

Saturday, August 24, 2024

[Rephrasing] Catching Transparent Phish: Analyzing and Detecting MITM Phishing Toolkits

💡This is a reader's perspective on the paper written by Brian Kondracki (from Stony Brook University) and published at the ACM Conference on Computer and Communications Security 2021.

Brief Description

That is a nice paper that analyses features related to RTT and TLS information from certain Reverse-Proxy MiTM phishing kits, to create a classification tool for webpages built with those phishing kits. They use that to verify how many web pages are being deployed in the wild with those phishing kits.

Observations

One thing they mention is the idea of an "all-in-one phishing toolkit", that redirects the regular pages from benign websites to phishing victims. I would like to see a concrete example of that since I have never seen such an example.

I was thinking that this 2FA at least mitigates the sharing/selling of passwords from a single website, since you can only share the authentication token (which can be bonded to the IP address). I wonder if there is a possibility to bind the cookie to the MAC address of the computer for further protection (Maybe a solution provided by the browser).

I wonder what hacker forums they are talking about in Section 3.1.

They mention that one of the features they use for the classification system is RTT. That is the beginning of a performance-based classification of phishing kits (Which is still a huge study GAP).

Problem in Section 4.1 (forgot to reference the table).

I would like to see the number of blank windows from their headless crawler (They don't mention it).

They use brand impersonation using only the URL from the website, which could add to the idea of brand detection using content-based features.

Initial Questions

When reading the abstract, I wondered what framework they used for crawling in new URLs, and if they were possibly evaded by cloaking techniques. They mention that they are using a regular Selenium version in headless mode, which seems that they are possibly being evaded.

One other thing that I was thinking was related to the tool they used to collect network-related information. I guess I did not get the answer to this directly in the paper, but I suggest that they are using features from selenium to capture that kind of information.

Lastly, I wondered what were their sources for URLs, which they mentioned to be PhishTank and OpenPhish. (Which might open a GAP to research on certificate-based URL publishing, such as CertStream).

Where do the experiment ideas come from?

I guess that it mostly came after accessing those open-sourced MiTM phishing toolkits from GitHub.

What are the interesting ideas/results?

Nice briefing on advanced cloaking techniques in Section 3.2.

Nice tool TLSProber.

Nice robustness testing on the Random Forest classifier in Section 3.5.

Nice experiment on the difference of domains in Section 4.2.

Friday, August 23, 2024

[Rephrasing] Is Real-time Phishing Eliminated with FIDO? Social Engineering Downgrade Attacks against FIDO Protocols

💡This is a reader's perspective on the paper written by Enis Ulqinaku (from the Department of Computer Science, ETH Zürich) and published at the USENIX Security Symposium 2021.

Brief Description

The authors propose a different attack perspective on the FIDO authentication mechanism. Since most websites have other account verification techniques, besides FIDO, the authors thought of faking the FIDO verification dialog prompting the user for an OTP number and faking the FIDO authentication in the benign websites. Unfortunately, it was only a preliminary study on the topic, since they were unable to provide any information on whether the users really fell for that attack since most had already identified the websites as phishing from the email message or the website characteristics (URL/content).

Observations

One very strange thing is that they claim that FIDO is the solution for MITM phishing attacks, but I don't see how that is. I see it just as another OTP-like verification system.

In all, I think that a user study regarding this topic is still a very large study GAP since it was hard to really evaluate that attack's effectiveness.

I also don't like that they do not implement any crawler to interact with the benign page, even as a POC. From my point of view, the correct interaction with the benign webpage is as important as making the user believe in the full process

Initial Questions

The first question I had on the paper was about the crawler they used on benign pages to correctly interact with them. But they didn't any, since they claimed it was an only "user study", so meh...

Where do the experiment ideas come from?

Might have come from an insight on the studies. While the methodology in the paper is nice, they could've focused on other important things, such as making a full POC.

What are the interesting ideas/results?

Nice methodology for the user study. Very detailed

Thursday, August 22, 2024

[Rephrasing] Catching Phishers By Their Bait: Investigating the Dutch Phishing Landscape through Phishing Kit Detection

💡This is a reader's perspective on the paper written by Hugo Bijmans (from the Netherlands Organisation for Applied Scientific Research) and published at the USENIX Security Symposium 2021.

Brief Description

The authors propose a way of categorizing phishing kits and checking how are they being used in the wild. To do that, they propose a graph-based community identification of phishing kits by source code similarity and observe CertStream URLs to crawl for certain fingerprints identified on each phishing kit in their dataset.

Observations

GAP: Study on brand impersonation and phishing kit detection. Which are the more common impersonated brands. Are there phishing kits with different brand impersonations?

The only sources of phishing kits they studied were Telegram and PhishFinder-like tools. There is room for improvement on websites such as phishunt.io or private datasets.

Another interesting idea is to compare the clustering of the phishing kits or logo identification with the favicons of the website. I still wonder how many phishing webpages do not have a favicon, or a favicon not related to the brand impersonation.

I like the heuristic approach to finding URLs in CertStream. But they could use an existing phishing detection tool for that.

One drawback of this approach is that you have to identify the phishing kit source code to identify it in the wild.

Initial Questions

One of the main problems I face in code comparison of phishing websites is Code Obfuscation. Therefore I asked myself if they handled that somehow to compare the phishing kits with the live webpages. In the end, they use specific path/string-based fingerprints in certain files to identify what phishing kit is being used.

Where do the experiment ideas come from?

I think it might have come from the moment they accessed free phishing kits from Telegram groups.

What are the interesting ideas/results?

Nice time explanation of the experiments in Section 3.4.

Nice experiment with evasion techniques used in phishing kits. That may lead to some other very interesting experiments.

Nice explanation in Section 7 on features of phishing pages that do appear on PhishTank vs phishing pages that do not even get there (might be more complex ones).

[Rephrasing] Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

💡This is a reader's perspective on the paper written by Yun Lin and Ruofan Liu (from the National University of Singapore) and published at the USENIX Security Symposium 2021.

Brief Description

This is a proposition of a system that identifies what brand is being impersonated by the phishing webpage, given a training set of benign logos. The tool proposes a deep-learning algorithm to identify where on the page is the logo, and if the logo is similar to one of the training sets.

Observations

One thing that is very clear to me, every time I read a paper from those authors, is how short is my knowledge of deep learning algorithms. This should be fixed, because I could understand almost nothing of Section 3.2 on the explanation of the requirement of using Resnetv2 as a preliminary classification network).

In Section 5.1, I don't understand how they manually analyzed 350K phishing URLs and how "sometimes" they corrected the labels collected from OpenPhish. Similarly, in Section 5.2.1 they also mention the manual validation of 5,000 web pages.

Initial Questions

At first, I wanted to know where the benign logo dataset came from. Later the authors mention the usage of Logo2K+, which is commonly used by the authors in future work as well.

Where do the experiment ideas come from?

The authors seem to be very familiar with deep learning algorithms, so that might be their main motivation for this kind of work.

What are the interesting ideas/results?

In Section 3.2, I like the catch on the necessity of using a Siamese model to perform logo identification. This is caused by the drawback of multi-label classification systems (they will always classify as one of the trained labels).

Nice explanation on the required Step ReLU to harden adversarial attacks.

I like the log scale in Figure 10 to portray the advantages of the Phishpedia classification ROC curve.

Nice qualitative analysis from both the Phishpedia tool and the baseline tools. (Nice section "Why does Phishpedia outperform the baselines)

Wednesday, August 21, 2024

[Rephrasing] Assessing Browser-level Defense against IDN-based Phishing

💡This is a reader's perspective on the paper written by Hang Hu (from the University of Illinois at Urbana-Champaign) and published at the USENIX Security Symposium 2021.

Brief Description

This paper provides a huge study on IDNs (Internationalized Domain Names), which are domains not written fully in ASCII. Since Unicode groups different languages, there are languages with different Unicode but similar at the same time. This proposes a deep study on how the browser handles those IDNs, what are the perception of the user on IDNs, and how social media and email providers handle those IDNs.

Observations

While I like the study on mobile browser behaviors on IDNs, studying the user perspective on phishing domains on mobile browsers is still a research GAP (One might use the automatic infrastructure they used in this paper to test it, with LambdaTest).

One thing that I am still aiming to find out is the skeleton rule classification system they created. I wonder if it does involve any image processing behind the scenes ( Section 4.1).

This paper should also propose a protocol to be followed by most browsers to avoid causing that result difference between the browsers when interpreting the domains.

In Section 5.2 they mention that Chrome fails to enforce the rules they claimed. I still wonder if it was an experiment error thing (while Google might have plenty of automatic testing tools to enforce it).

The paper mentions some studies regarding phishing, but I still did not see anything regarding those domains phishing-wise. Plenty to explore.

Initial Questions

At first, I asked myself about the way they found 1,855 homograph IDNs in a set of 900,000. But they essentially develop an algorithm to do that automatically. That leads to some of the disagreements I have regarding the "effectiveness" of Chromium in detecting homograph IDN. It is not the case that Chrome is bad, it is just that the algorithm is different, in which case the Google developers might have decided how close it should be to be considered "homographic", in which case the False Negatives might not really be False Negatives. In this sense, I personally found most of the False Negatives very different from the real domains, such that a regular user might find out that something is wrong.

Where do the experiment ideas come from?

At first, I had no idea where the idea of studying IDNs came from. But they might have got it from the eCrime paper written in 2018 called "Large scale detection of IDN domain name masquerading". Nice inspiration on the experiments to further explore this area of work.

What are the interesting ideas/results?

Nice testing with already existing datasets.

Nice "category" creation for the experiments. Made it much more clear.

Nice usage of Google Tesseract to perform character recognition.

Verifying network traffic/Source code verification is nice to understand the behavior.

Time-series studies are always amazing.

Nice to mention the testing plan for user studies.

Tuesday, August 20, 2024

[Rephrasing] I’m Spartacus, No, I’m Spartacus: Proactively Protecting Users from Phishing by Intentionally Triggering Cloaking Behavior

💡This is a reader's perspective on the paper written by Penghui Zhang (from Arizona State University) and published at the ACM Conference on Computer and Communications Security 2022.

Brief Description

The authors propose a tool that abuses the cloaking techniques that phishing websites use to protect users against them. The idea is that if crawlers can't see the phishing page, then users will not as well. They also evaluate a lot of server-side cloaking abused by phishing websites regarding how are they used in the wild. In the end, they evaluate how benign pages are affected by that.

Observations

I don't understand the idea of using a blacklist of URLs to enhance the tool. Is it because only the behavior analysis was not enough or was it to enhance the time performance of the tool in malicious websites?

Another idea I had while reading the paper was to try to mimic some other type of client-side fingerprinting. Which is a huge study GAP I want to explore later, as a tool to systematically modify fingerprinting inputs.

In Section 4.4 they mention that uncloaked websites take 28 minutes to be detected. But is it 28 minutes starting from when?

In Section 5 they mention the hashing of the URLs to store it for future usage, however, I am not sure if it is a good option since it is easy to create a different hash from the same domain using a simple unused query parameter.

Initial Questions

One thing that I had in mind was, how did they intercept the network connections to rewrite the HTTP header parameters? While they do not mention specifically that, they expose that their tool is a browser extension, and there are a lot of extensions that modify HTTP requests. Therefore, it might be possible in that way. (Btw, never done anything like that. What an interesting thing to try).

Where do the experiment ideas come from?

All the ideas of the paper are around the crawler-imitating idea. After that, they create the extension and think about the experiments.

What are the interesting ideas/results?

Nice idea on the impact on benign websites. Besides that there are a lot of experiments around that, for example, verifying a large group of URLs from Alexa Top 1M, manually verifying the results in a smaller dataset, using the extension for a month as a regular user, and verifying the impact of 2FA on the tool.

I like the modularization of the tool into "profiles" that can be modified and tested separately in the experiments section.

In the disclaimer section, I like the attention given to the usage of user data.

Finally, I really like the deep analysis of the False Positives in Section 6.6 as it gave a lot of insights into the behavior of the tool.

[Rephrasing] Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam

💡This is a reader's perspective on the paper written by Siyuan Tang (from Indiana University Bloomington) and published at the ACM Conference on Computer and Communications Security 2022.

Brief Description

The authors propose a method to collect spam messages from the Twitter feed. By searching for blacklisted words such as "scam" or "phish" they collect a database of phishing messages, which are later classified using ML as actually a Scam or not. Later, the authors explore the Scam detection systems already created and propose insights into how the Scams work.

Observations

The first thing that I had in mind was a tool to classify it. Since they do not propose it, I suppose that it is still a research GAP to be explored by researchers.

Something that I would like to know is regarding brands in the study. They mention that collect a huge data on Scams, but I still want to know what is the company that is mostly faked by those scams. Ig it is still a research GAP.

Initial Questions

The very first question I had was regarding the data collection mechanism, which they claim to use the Twitter Academic API for.

Where do the experiment ideas come from?

All the paper runs around the idea of collecting Scam messages from Twitter on a large scale. All the following experiments provide insights about the data, and the tools are made to correctly collect and evaluate the data.

What are the interesting ideas/results?

The first interesting idea from the paper is the ability to search for information using social media. Other social media such as Facebook and Reddit were also explored, but I wonder how it can be used for phishing searching (since it might appear much earlier than platforms like VirusTotal or PhishTank).

I like the exploration of the different languages in the dataset, even though it is a preliminary study on the topic (Maybe can be used for phishing as well).

Also, they used Google Vision to extract text from images, which is a very nice approach. In the same way, another interesting tool is the Twilio Lookup API to search for phone number information.

One other interesting idea they had was to evaluate in a time-series manner the Scam landscape. They explored slightly but could be even more exhausted.

They also verify network information such as IP information and DNS datasets to collect further insights on the URLs shared by the Scam messages.

[Rephrasing] Phishing URL Detection: A Network-based Approach Robust to Evasion

💡This is a reader's perspective on the paper written by Taeri Kim (from Hanyang University) and published at the ACM Conference on Computer and Communications Security 2022.

Brief Description

First of all, this paper is highly mathematical and I don't have (and did not have the intention to have) the knowledge required to make propositions and contributions to the content. Besides that, I also skipped some [what might have been] very important parts, because reading it or not had the same result (As I didn't intend to spend the time to fully understand it).

Besides that, the authors provide a URL-based classification using network theory by splitting the URLs into separate words to better associate the words with phishing-related features, such as their IP addresses. They also mention that URL-based detection is worth studying because it can be used as a first-barrier classification that does not need to access the content of the phishing pages, which can be masked by cloaking techniques (Reasonable).

Observations

I understand that this paper is a plate full of food for a person who is into network theory, but I wonder if there wasn't any better way of explaining it with examples.

One thing that I recognize here is that I really need to understand deep learning better and get into the TensorFlow framework.

Initial Questions

The first question I had was on Section 2.2, which I wondered what are the brands that are mostly targeted. While I didn't find the answer to this in this paper, I think that it might be a nice experiment GAP to be done in future research.

Where do the experiment ideas come from?

I suppose that one of the authors of the paper is very into network theory, and they had a student who really likes computer security and found URL-based classification as a great mix. Besides, they had a lot of creativity to test the model on (even though it might not have been the most creative I have ever read).

What are the interesting ideas/results?

I like the way they say that PhishTank is not reliable because it can be messed up by attackers. Even though it is possible to do this automatically, I wonder how great this problem is. However, I like the idea of verifying the set of URLs in VirusTotal to further explore it.

I also like the idea of testing different methods as a comparison experiment to test if the data collection system is robust, which they did in Section 6.2.

Finally, I like the time complexity calculation to explore the network model that they explored further. Besides that, I like the explanation for the transductive approach they took.

Monday, August 19, 2024

[Rephrasing] Leaky Forms: A Study of Email and Password Exfiltration Before Form Submission

💡This is a reader's perspective on the paper written by Asuman Senol (from the University in Leuven) and published at the USENIX Security Symposium 2022.

Brief Description

The main idea behind the paper is that identify what the webpages do with the user credentials once they start typing them on the page forms. They do that by leveraging the DuckduckGo Tracker Radar Collectors which uses Chrome Devtools Protocol and a Network tracer to identify how the information is being sent do the server. While they focus only on the information sent to the tracker domains, they solve the challenge of encoded information by calculating many encodings beforehand and comparing that to the information sent through HTTP and Websockets

Observations

Verifying if phishing websites also have this behavior of collecting stuff while it is being typed is a very nice idea and seems to be an open research GAP.

This paper also uses the Chrome DevTools Protocol (CDP) to capture the behavior of the webpage, which might be a good thing to try.

In Section 3.2 they mention that use the position of the buttons in the page to find out how the crawler can go to login pages. Even though it might seem to be a good idea, I can't figure out if the effort of making that tool is really usable.

I don't like the idea of restricting the capturing of the behavior only from tracker domains, I would like to see a perspective on benign domains as well, even if it might be a separate study.

One interesting topic is the idea that trackers behave differently based on user geolocation because some just collect information from clients that are in the US. I guess that it is still a research GAP to understand if phishing also has a different behavior from users in different locations.

Besides that, identifying if the behavior of phishing websites is different in mobile vs desktop environments is a nice research GAP, and might be interesting to find out.

Initial Questions

The first thing that popped into my mind was about the tool they use to collect those behaviors, which became clear through the description of the methodology.

Where do the experiment ideas come from?

I am a little in doubt about the beginning of the study, whether it was related to a motivation study published by Surya Mattu, or if it was related to the idea of using the DuckduckGo tool

What are the interesting ideas/results?

The attention to the GDPR rules is a very nice idea for experiments. Sending the requests and giving a description of the GDPR scope in this paper is a very nice result for the reader.

They use a tool called Mozilla Fathom, which identifies different parts of the page automatically (as a classifier?). Very interesting idea to identify email and password fields.

Another idea they had was related to the "Do you allow cookies" pop-up. On some pages, even with the user clicking on "No tracking", they still collect the information. Besides that, it is interesting to see some scenarios where the user still receives emails from webpages that secretly collect their emails.

[Rephrasing] Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach

💡This is a reader's perspective on the paper written by Ruofan Liu (from the National University of Singapore) and published at the USENIX Security Symposium 2022.

Brief Description

The authors developed a reference-based classification tool for phishing websites that identifies "important blocks" on the page regarding form fields, icons or buttons to identify if there is an intention to steal information and if the page is similar to a benign one. They use this tool in a crawler to identify how to go from the home page to a signup/login page on the phishing website, which is very cool.

Observations

User click emulation using Helium is very primitive. A research GAP that might be still valid is the idea of emulating user mouse movement patterns to bypass some websites.

I would also like to see what is the amount of HTML obfuscated websites comparing benign websites and phishing websites.

Another thing I don't understand is the difference between the Threat Model section and the Challenges section. As I understood, the first mentions how attackers could exploit their idea, while the challenges are related to the conversion from the data to the model.

Initial Questions

One interesting question I had at the beginning of the paper was regarding the benign logo dataset they used. Throughout the paper, they mentioned using the Logo2K+ dataset, but another nice idea would be to use the Alexa Top websites.

From the URLs they found as phishing on CertStream, it is a nice idea to check for a couple of months if they are eventually identified as phishing in VirusTotal, which I didn't see in the results section.

A little typo in Section 10.2, in which they meant Kaspersky instead of Kapaskey (I think).

Where do the experiment ideas come from?

Everything in the paper orbits around the idea of capturing input fields in a webpage using a visual system, which I don't have much clue about why it works (I guess that I lack deep learning know-how).

What are the interesting ideas/results?

I like their improvement on a brand detection tool, which could be used to infer interesting aspects from phishing pages that mimic certain brands.

The idea of using the tool on CertStream URLs is very nice since they evaluate the ability of their system to identify zero-days.

Another interesting idea they had was to compare how much data was necessary to obtain a better classification score. Which is slightly different from the usual overall precision vs precision comparison (but it takes more effort as well).

I like the improvement on the logo detection module and the explanation on why there was a necessity to improve it (in Section 4).

The crawler that automatically searches a page detecting where to click is a very nice alternative to the usual "login" regex match.

Sunday, August 18, 2024

[Rephrasing] Phish in Sheep’s Clothing: Exploring the Authentication Pitfalls of Browser Fingerprinting

💡This is a reader's perspective on the paper written by Xu Lin (from the University of Illinois at Chicago) and published at the USENIX Security Symposium 2022.

Brief Description

The authors provide a great idea of how user fingerprints can be captured by an attacker and replicated systematically to avoid two-factor authentication (2FA), given that the attacker already has the correct username and password. Interestingly, they deeply study how many benign websites can be attacked as such, and verify if phishing websites are interested in the fingerprints that can be used to avoid 2FA.

Observations

One of the things that I kept in mind while reading the paper is the idea that they used Javascript function hooks to log the used functions by the Browser Extension, which might not be a very nice idea. However, they claim to use VisibleV8 as an auxiliary tool, which is very nice.

Another observation I like about the paper is the description that their crawler might have been evaded by cloaking techniques since they did not approach this defense in any way. In the same direction, I like how they explain the methodology of the experiments.

Initial Questions

The main questions I had in the beginning were about how the tools they created work to capture. Of course, they do not share the tool because it might help attackers. Unfortunately, while they are scientifically correct in the description of the tool, it might not be enough to easily reproduce their work (which is very sad).

Where do the experiment ideas come from?

I sense that the paper was a follow-up from the idea, instead of the data capturing tool (which might be uncommon). But I like how the follow-up experiments are not far from other papers published in the same Symposium.

What are the interesting ideas/results?

Verifying the brands of the phishing websites and correlating them with the benign web pages' vulnerabilities to the attack is a very nice idea, following up the deep analysis of the vulnerabilities of the own benign web pages.

I also like how they propose mechanisms to prevent the attack, even though it might be strongly based on the IP information.

They also have a nice explanation of the advanced fingerprinting techniques that are widely used to verify authentication.

I like the description for the automatic crawling algorithm to find login pages in URLs visited by their crawler.

In the end, they also claim an alternative to the problem with IP fingerprinting they had, instead of just saying "Well, this is out of scope". Following that, there is a very nice robustness test for the fingerprints used by 2FA in benign websites, in which they could really see what are the specific fingerprints used by the algorithm ("by a process of elimination").

Also like the evolution studies through time. Even more than the disclosure of the information to the companies.

[Rephrasing] Leaky Kits: The Increased Risk of Data Exposure from Phishing Kits

💡This is a reader's perspective on the paper written by Bhaskar Tejaswi (from the Concordia University) and published at the APWG Symposium on Electronic Crime Research 2022.

Brief Description

The paper proposes a nice review of phishing kits and how they can be exploited to leak information that they collected. The tools they use are summarized in their open sourced repository. While they obtain a lot of information from the phishing kits they analyzed, they focus on the incapacity of the attackers that buy the phishing kit and don't look at the source code to identify possible backdoors and leaked information.

Observations

There are some interesting related works pointed out by the paper regarding "identification and collection of phishing kits", which might point out some alternatives to cluster phishing kits using some fingerprints. Which is still a nice study GAP.

One nice thing is the idea of clustering the phishing kits based on the leaked information they gathered. While it is very primitive clustering, it is a nice idea to see different variations of a phishing kit.

Initial Questions

The first thing I asked myself was about the way they used to acquire this large data on phishing kits, since is it an interesting research topic. In the end, they presented that they used PhishFinder, which is a common technique used to find phishing kits from hosted phishing pages.

Where do the experiment ideas come from?

Their whole paper was created on the idea of analyzing leaked information from phishing kits, starting from the phishing kit collection system to the ideas of experiments on the collected phishing kit.

What are the interesting ideas/results?

I like the dynamic analysis tool they developed to run the phishing kit without having to host it.

They also use a nice tool PhishFinder to discover phishing kits using URLs from open repositories.

Nice tool to discover sensitive information on files PDSCAN.

Nice tool Whispers finds interesting information on the source code of phishing kits.

One awesome experiment they thought of was to enter the Telegram group to see the information that was sent in there. That was indeed a very nice finding and provides a measure to the size of the problem (HUGE).

Friday, August 16, 2024

[Rephrasing] “To Do This Properly, You Need More Resources”: The Hidden Costs of Introducing Simulated Phishing Campaigns

💡This is a reader's perspective on the paper written by Lina Brunkenn (from the Ruhr University Bochum) and published at the USENIX Security 2023.

Brief Description

This is a paper developed to understand the people's side of the equation regarding phishing experiments done with employees. It is very different than what I am used to, which are papers that test websites, algorithms, and infrastructures.

This is not a paper that I am interested now in because one would have to be able to access a ton of unpublished data to build upon that. However, it seems very necessary for any phishing researcher to understand that part of the coin. So I may read this later, as a bedside book.

[Rephrasing] TRIDENT: Towards Detecting and Mitigating Web-based Social Engineering Attacks

💡This is a reader's perspective on the paper written by Zheng Yang (from the Georgia Institute of Technology) and published at the USENIX Security 2023.

Brief Description

While I lack understanding of the clickjacking of ads and deployment of Social Engineering ads, I fail to suggest some alternatives to the approach taken by the authors. Besides that, I might fail to identify some of the key aspects of the paper.

Despite that, the paper suggests a classification for Social Engineering ads, which are ads that redirect the user to a page and trick them into doing something else without their consent. That could be tricking the user into downloading adware or clicking on ads to make the attacker earn money on the click.

They do that by creating a

Observations

First of all, most of the observations here might be caused by some misunderstanding I had on the paper (which may be caused by personal misreadings, even though I tried to read a few times some of the parts).

With that said, I still fail to understand how they got the data from the ads using publicwww.com.

Besides that, I still don't understand what the WAHG look like, and how they derive it from the logs extracted from CDP (Which I really would like to understand). Since their code is open sourced, I am thinking of taking a look later on.

There might be a typo on the Introduction at page 3, where they said "sea" instead of "SE-ad".

Very old version of Chromium (87), since now it is 126.

The paper also suggests the behavior of cloaking techniques that might cause a limitation to their work. That is indeed a nice study GAP still, to how to bypass it. Besides, seeing the cloaking techniques under the glasses of this graph-based tool is a very nice idea.

Initial Questions

The first question I had while reading the paper was about the tool they used to extract the WAHG graph from the webpages, and how they made the crawling. Even though it is not completely described in the paper (again, it might be my fault), they open-sourced the tool.

Where do the experiment ideas come from?

I like how they got the idea to instrument the Blink CDP from PageGraph, create the WAHG representation of the page, and make a classification system on top of the data-gathering tool. Ultimately, all the papers are basically consequences of a data-gathering technique.

What are the interesting ideas/results?

I like the idea of analyzing the concept drift of the classifier, by testing with data gotten from a year later.

Another thing that I liked is that, even with the argument of being a little bit more generic than other tools, it still relies on some specific characteristics of SE-ads, by analyzing the redirection behavior and clickjacking.

Creating the graph representation of the logs is also a very nice idea. It might be interesting to reproduce.

Another nice idea from their crawler is the webpage interaction page. It is a thing that is being done by other works and might be nice to have. Understanding the pages by interacting with them as well.

I also like the idea that they had to cluster the pages using the perceptual hash of the screenshots of each website so that they could reduce the number of manual classifications they had.

Besides that, I really like the performance evaluation of the classifier that they had. I left the reading without any doubts or ideas of how to evaluate the classifier even further. That can also be said for the feature importance analysis they did in the paper.

On the performance evaluation part, it is also a nice idea to compare the tool against commercial tools. In this case, they compared against the Brave ad blocker.

Thursday, August 15, 2024

[Rephrasing] Phishing in the Free Waters: A Study of Phishing Attacks Created using Free Website Building Services

💡This is a reader's perspective on the paper written by Sayak Saha Roy (from the University of Texas at Arlington) and published at the Internet Measurement Conference 2023.

Brief Description

This paper is written around the idea of capturing phishing URLs from Twitter and Facebook that have distinct second-level domains associated with Free Website Building services (FWBs), which is an open-sourced crawler shared by the authors.

Observations

There is a typo in Section 5.5 "Blogpsot".

I think that the feature extraction section is the less impactful because the features chosen were fairly simple in my opinion. Even though I like the choices for ML models.

Besides that, I would really like to see Shoppify and Woocomerce in there too. Maybe the list of FWBs was too short.

Initial Questions

In the beginning of the paper, the first question I had was about the fingerprints in FWB-generated websites, mainly because the author states that they are indistinguishable from regular phishing websites built from phishing kits. In the end, I guess that they used the domain information to identify the FWB service used, which neglects the phishing pages that were published with custom domains (But that is out of scope of the dataset collection tool).

Another thing that I was curious about was the default cloaking techniques against web crawlers that are implemented by those FWBs. Unfortunately none, besides redirection buttons, were mentioned in the study.

At first, I thought that is was basically a study around "Scam Websites", which would be very nice to expand this study. I guess that Beyond Phish already covered a small part of this GAP and is probably working on some social media crawler to capture Scam Websites built with these tools.

Where do the experiment ideas come from?

I guess that the follow-up experiments all run around the analysis of the dataset creation method, which was clearly the first idea from the authors.

What are the interesting ideas/results?

One thing that I like about the study is the idea of capturing the URLs from social media such as Facebook and Twitter, and verifying how long it takes to appear on Public Sharing of URLs, such as PhishTank, OpenPhish. Mainly because it states that when it gets there, most of them would be way past the Golden Hour.

Another analysis that I like from the paper is the time it takes from both FWB services and Social Media services to shut down the websites/post, given that it was reported as phishing.

I also like to see a confirmation from Crawlphish on the idea that there are some websites that are just redirections to other phishing websites to evade detection.

Wednesday, August 14, 2024

[Rephrasing]: Knowledge Expansion and Counterfactual Interaction for Reference-Based Phishing Detection

💡This is a reader's perspective on the paper written by Ruofan Liu (from National University of Singapore) and published at USENIX Security 2023

Brief Description

This paper proposes some very interesting tools and techniques to enhance phishing detection while being open-sourced. The key aspects are the proposition of a way to incrementally enhance reference-based detection of phishing websites, another way of creating a repository of phishing kits (even though are arguably weak phishing kits), and a tool to dynamically interact with phishing pages to infer phishing intentions.

One last less interesting aspect is the implementation of a Brand Knowledge Expansion module, which is fairly simple in comparison to the more interesting aspects of the paper.

Observations

The first observation is regarding the overly mathematical propositions in the paper. While it might be interesting to clearly formalize some explanations, the paper overdoes it in my opinion, by creating mathematical symbols and relations where it is not necessary.

They mention in Section 4.3 that precision is more valuable than recall, but I am actually not convinced since there are severe consequences of having a high False Negative rate (e.g. the classifying missing a phishing page).

Another fear I had was at the creation of the phishing dataset. To collect phishing kits they use the Miteru tool, which looks for directory listing and ZIP files being hosted in the phishing infrastructure, which I believe happens only on simpler phishing campaigns. Having a dataset that covers only simple phishing campaigns might not mean much because it might not be a problem anymore, only the more advanced phishing pages. By the way, understanding how different "Complexities" of phishing pages affect users is still a very interesting study GAP.

Another thing they mentioned was that the tool is still too slow to be used on the fly by the user, taking over 5 seconds to analyze the page. Therefore I guess that a faster tool is still a study GAP.

A last observation is that they did not experiment with cloaking, which is still a huge study GAP.

Initial Questions

One of the first questions I had, in the beginning, was "What might be a dynamic phishing dataset". Well, I guess that it is just a way to express something that could automatically increase itself, which is interesting

Where do the experiment ideas come from?

It is clear that every experiment is around the idea of dynamically improving a reference-based classifier, even though it is not the best contribution of the paper. Creating a dataset was a result of having to evaluate the tool. Creating the Webpage Interaction module is a result of having pages that are brandless.

What are the interesting ideas/results?

An interesting idea the authors had was to specify the threat model of reference-based phishing classifiers regarding some phishing pages.

They demonstrate that Webpage Injection is still a study GAP, and I am interested in discovering more about how it works.

They demonstrate that their Webpage Interaction module is nice to understand phishing intentions in brandless websites, however, it still only approaches login fields, which leaves a study GAP for a more general website interaction tool. Besides that, it is interesting to see the tool they created to identify how to navigate to login pages on the homepage.

A genius-like move they made was at the creation of the Phishing Dataset, they leveraged a taint-based approach to automatically de-weaponize the phishing kits. This is a great strategy that I have to try at least once to see the results.

Another thing that I like about this is the way they separate the evaluation of the different modules of the program to best represent the results, very nice idea.

One last thing I like about the tool is the ability they had to identify the brand that is being impersonated by the phishing website, which could be used in a future study to enhance statistics.

Tuesday, August 13, 2024

[Rephrasing]: BEYOND PHISH: Toward Detecting Fraudulent e-Commerce Websites at Scale

💡This is a reader's perspective on the paper written by Marzieh Bitaab and published at IEEE S&P 2023

Brief Description

Even though it is an incredibly hard challenge to evaluate scam websites at scale, due to the small difference between a faulty benign website and a purposeful scam campaign, the authors propose a genius idea for crowdsource data collection using Reddit. Another thing that is very interesting is the validation method they used, which was basically done by posting answers in a Reddit comment and seeing what are the votes on the comment.

Observations

Something that is on my mind is regarding the NLP classification method they used to evaluate answer sentiment. Would it be possible to use ready-to-use SpaCy models? Instead of training their own model with public data?

One other thing was the sentiment dataset they used. It was said in Section IV B that they used the Stanford Sentiment Treebank to train the sentiment classifier, but then they used the classifier on Reddit comments. So another thing that I thought about was if the possibility of slang and misspelled words (That might occur often on Reddit comments) might affect the classification of the program.

It was a very nice idea to first provide some statistical information in Section IV about the features that will be used by the classifier later in the paper.

I like the idea of creating the classifier, even though it was basically created with "heuristic features" instead of creating a generalized classifier. But since this is a first-of-its-kind paper, then it is clearly acceptable.

Even though phishing classifiers were used to evaluate the classification performance, I like the effort that it took to implement all that just for half a page of content in the paper.

Initial Questions

While I was reading the first lines of the paper, I was asking myself what might be the difference between Fake Online Shopping websites and Fraudulent eCommerce Websites (FCW). But then I found out that FCW is a general label for scam websites that aim to sell anything, so one thing is inside the other.

One thing that was on my mind while reading the explanation for feature engineering of the classifier was the feature "Alexa top 100k". At first, I thought that it would be cheating to use it if the benign data was only from Alexa's top URLs, but later in the paper they clearly say that they removed this feature from the classifier when that data was used.

Where do the experiment ideas come from?

As I read through the paper, I think that creating a classifier as an experiment for a proposed technique of collecting data is a very nice and natural follow-up for the authors. The same might have occurred for the testing of existing tools such as Google Safe Browsing, which is used to detect phishing and malware.

What are the interesting ideas/results?

The most interesting idea of the paper is the usage of Reddit as a dataset collection, which is way off-track and genius at the same time.

I like how the research questions are simply structured. The first question is about getting the dataset, then they perform an evaluation on tools, and in the end, they propose a classification solution.

Another interesting idea while writing the paper was to enumerate the benefited parties in the study.

One other nice idea that was tested was to understand the amount of scams that are created using Shopify, which made me wonder how much is the company doing to avoid this. Another GAP that might arise is a comparison between different e-commerce creation platforms such as Woocommerce. Besides that, understanding the difference between websites that are purposely made to be a scam, and websites that were made by the developers but were not able to cope with the demand (both cases might have been considered as Scams, but a line has to be drawn between them).

The other thing that I cant leave unnoticed is the idea of creating a bot to post comments on Reddit and using the Reddit users to evaluate the answer. But I wonder if the users were aware of what were they contributing with, and if that would have changed the behavior of the users (Might be an interesting GAP).

I also like the way which economics of scam websites were taken into account in the insights on Model Robustness (Section VI F), and how honest were they when trying to break the classifier by manually adjusting the features.

Monday, August 12, 2024

[Rephrasing] Rods with Laser Beams: Understanding Browser Fingerprinting on Phishing Pages

💡This is a reader's perspective on the paper written for USENIX Security 2023

Brief Description

This paper is the most complete work done on fingerprinting on phishing pages, it provides information from geographical tendencies to differences in techniques used by benign and malicious pages. They conclude the paper by making clusters on their part of their data (6%) to provide some case studies.

Observations

One key observation they had was that phishing pages that are more accessed tend to use more fingerprinting. Also, they also track that fingerprinting is becoming more used in the past months.

Besides that, fingerprinting is more used for content manipulation than to send to the backend server. But of course, there are a lot of those that do both.

One thing that I took a step back was in Section 4.1 which said that country-specific phishing is more common, which might be a bit misleading because their data might be mostly from the USA.

In Section 5.1 they also pointed out nice numbers about which are the most used fingerprinting APIs used by phishing websites.

GAP: Another key finding was that fingerprinting done on phishing pages is often different than fingerprinting done on benign pages of the same brand (e.g. comparing benign amazon vs fake amazon). This brings a GAP to understand if it is possible to develop a classification tool for phishing pages using fingerprinting information. A problem though might be to develop a crawler that is resilient to cloaking techniques.

GAP: One clear GAP that I saw reading the paper was that there is room for work that uses APIs to cluster phishing pages in a more general fashion because the paper proposes an exact match to cluster pages into the same phishing campaign.

Initial Questions

At first, I asked myself how deep they covered cloaking techniques in the paper. But in the end, I understood that they covered cloaking techniques more generally as fingerprinting used to modify page content, not specifying that it is maliciously or benignly.

Another question I had was about the data collection tool they used, and if I could do one of those myself. They first said that the tool is deployed by a partner company and that it tracks feeds for recent URLs and also users that end up falling for those URLs, which is a very rich dataset. While tracking public feeds is feasible, tracking the users (which provided the authors more room for insights) is harder but could be implemented as a browser extension for users who opted for it.

One other thing that I wanted to know was the group of fingerprinting functions they used throughout the experiments. Something that wasn't provided.

Where do the experiment ideas come from?

I like the idea of clustering the URLs into phishing campaigns, that is an idea that is a nice follow-up for a study that handles that kind of log data.

What are the interesting ideas/results?

The main thing that deserves the spotlight in this paper is the analysis they did on the "3.3. Fingerprinting Intentions". Their crawler was able to understand if the fingerprinting collected was used to modify the DOM or send requests by watching the website requests and DOM changes, which is very interesting. Btw, I would love to put my hands on this crawler.

Another very nice result was to be able to understand what are the brands involved in the study (which was possible because of the user tracking data that they had).

[Rephrasing] From Chatbots to PhishBots? - Preventing Phishing scams created using ChatGPT, Google Bard and Claude

💡This is a reader's perspective on the paper written for IEEE S&P 2024

Brief Description

In summary, the paper describes how attackers can use commercial LLMs to create phishing campaigns, both creating the page and creating the email used for distribution. To handle variety, the authors focus basically on asking the chat to "create prompts"

Observations

First of all, the idea of "generating prompts" instead of the actual content makes sense, because the attacker can feed the prompts back to the LLM and achieve different results. However, this work is more a guide on "prompt techniques" than a demonstration of some specific properties of LLMs.

Initial Questions

I guess that the first question that I had while reading through the paper was if that strategy of creating prompts is really useful? In the end, I got the impression that even though that strategy provides randomization, creating the pages does not seem to be too relevant.

Something else I wanted to point out was the "human verification process". The author pointed out that two different people used the chat services to create the web pages, however

Where do the experiment ideas come from?

I like the idea of creating a classifier as an experiment for the pipeline they proposed, I am just not sure that classifying the prompts is really a thing. But I guess it was a natural process of "Now that we created a tool for attackers, let us see how we can classify the approach", even though the prompt would be inside of the black box of the attack, in which case the defender would not be able to access it.

What are the interesting ideas/results?

The first thing that I really liked about reading it, was the idea of using real phishing email messages from APWG, to generate more using LLMs (By using the strategy of generating prompts). The only thing I wanted to know is how it bypass Scam detection that email providers implement.

One of the ideas that they propose is creating a classifier for prompts. However, it appeared to me that it was basically a solution they created themselves.

Saturday, August 10, 2024

[Rephrasing] On SMS Phishing Tactics and Infrastructure

Paper published at IEEE S&P 2024.

💡 My purpose here is mainly to evidence methods and ideas presented by the paper, which may not include all conclusions.

The paper's main point is to solve the problem pointed out by APWG [?], saying that there was a 70% increase in SMS and voice attacks. The main problem of the paper was to identify how they would analyze this problem without a public dataset of phishing messages.

Their key insight was that a paper published in 2018 (Characterizing the security of the sms ecosystem with public gateways) by one of the co-authors of the paper, Bradley Reaves, used a service called "Public SMS Gateway" to identify malicious traffic in SMSs, but did not focus on URL that are shared through SMS messages. While Reaves identified only 64 phishing URLs in the previous study, Nahapetyan captured 2,866 phishing URLs validated by Virus Total.

Those Public SMS Gateways are used mostly as test devices that can receive SMS messages for developing applications, with test numbers from many different countries.

Home page of sms24.me

Using the phone numbers provided by those websites, you could test an A2P (Application to Person) SMS service that is currently being integrated in an application aimed for user authentication.

The Public SMS Gateways used by the paper are:

sms24[.]me
receivesms[.]org
freephonenum[.]com
7sim[.]org
temp-number[.]com
receivesms[.]cc
sms-online[.]co
freeonlinephone[.]org
sms-online[.]co
receive-sms-online[.]com
receivesms[.]co

Even though it might not seem to be a very impactful analysis of phishing SMSs, the authors still got 67,991 phishing messages in a year, which is higher than I thought it would be.

Another interesting approach is how they group the messages. After filtering the messages they substitute things like one-time passwords (i.e. verification codes that you receive from MFA), URLs, numbers, or emails with template strings. Like "Your OTP is 1234" gets transformed into "Your OTP is #OTP". After that, they judge the message as the same if it is exactly the same. That approach leaves room for some NLP type of analysis to cluster messages.

Additionaly, they use SpaCy python library to analyze stuff like message language and brand/organization name.

In the end, the author writes about 15 very interesting inferences taken from the data they collected, but something was on my mind. If phishing developers are using those platforms for testing, how long does it take for a URL from there to appear on public platforms like PhishTank, OpenPhish or PhishStasts?