[Rephrasing] Rods with Laser Beams: Understanding Browser Fingerprinting on Phishing Pages

💡This is a reader's perspective on the paper written for USENIX Security 2023

Brief Description

This paper is the most complete work done on fingerprinting on phishing pages, it provides information from geographical tendencies to differences in techniques used by benign and malicious pages. They conclude the paper by making clusters on their part of their data (6%) to provide some case studies.

Observations

One key observation they had was that phishing pages that are more accessed tend to use more fingerprinting. Also, they also track that fingerprinting is becoming more used in the past months.

Besides that, fingerprinting is more used for content manipulation than to send to the backend server. But of course, there are a lot of those that do both.

One thing that I took a step back was in Section 4.1 which said that country-specific phishing is more common, which might be a bit misleading because their data might be mostly from the USA.

In Section 5.1 they also pointed out nice numbers about which are the most used fingerprinting APIs used by phishing websites.

GAP: Another key finding was that fingerprinting done on phishing pages is often different than fingerprinting done on benign pages of the same brand (e.g. comparing benign amazon vs fake amazon). This brings a GAP to understand if it is possible to develop a classification tool for phishing pages using fingerprinting information. A problem though might be to develop a crawler that is resilient to cloaking techniques.

GAP: One clear GAP that I saw reading the paper was that there is room for work that uses APIs to cluster phishing pages in a more general fashion because the paper proposes an exact match to cluster pages into the same phishing campaign.

Initial Questions

At first, I asked myself how deep they covered cloaking techniques in the paper. But in the end, I understood that they covered cloaking techniques more generally as fingerprinting used to modify page content, not specifying that it is maliciously or benignly.

Another question I had was about the data collection tool they used, and if I could do one of those myself. They first said that the tool is deployed by a partner company and that it tracks feeds for recent URLs and also users that end up falling for those URLs, which is a very rich dataset. While tracking public feeds is feasible, tracking the users (which provided the authors more room for insights) is harder but could be implemented as a browser extension for users who opted for it.

One other thing that I wanted to know was the group of fingerprinting functions they used throughout the experiments. Something that wasn't provided.

Where do the experiment ideas come from?

I like the idea of clustering the URLs into phishing campaigns, that is an idea that is a nice follow-up for a study that handles that kind of log data.

What are the interesting ideas/results?

The main thing that deserves the spotlight in this paper is the analysis they did on the "3.3. Fingerprinting Intentions". Their crawler was able to understand if the fingerprinting collected was used to modify the DOM or send requests by watching the website requests and DOM changes, which is very interesting. Btw, I would love to put my hands on this crawler.

Another very nice result was to be able to understand what are the brands involved in the study (which was possible because of the user tracking data that they had).

João Pedro Favoretti - Blog

Monday, August 12, 2024