[Rephrasing] Catching Transparent Phish: Analyzing and Detecting MITM Phishing Toolkits

💡This is a reader's perspective on the paper written by Brian Kondracki (from Stony Brook University) and published at the ACM Conference on Computer and Communications Security 2021.

Brief Description

That is a nice paper that analyses features related to RTT and TLS information from certain Reverse-Proxy MiTM phishing kits, to create a classification tool for webpages built with those phishing kits. They use that to verify how many web pages are being deployed in the wild with those phishing kits.

Observations

One thing they mention is the idea of an "all-in-one phishing toolkit", that redirects the regular pages from benign websites to phishing victims. I would like to see a concrete example of that since I have never seen such an example.

I was thinking that this 2FA at least mitigates the sharing/selling of passwords from a single website, since you can only share the authentication token (which can be bonded to the IP address). I wonder if there is a possibility to bind the cookie to the MAC address of the computer for further protection (Maybe a solution provided by the browser).

I wonder what hacker forums they are talking about in Section 3.1.

They mention that one of the features they use for the classification system is RTT. That is the beginning of a performance-based classification of phishing kits (Which is still a huge study GAP).

Problem in Section 4.1 (forgot to reference the table).

I would like to see the number of blank windows from their headless crawler (They don't mention it).

They use brand impersonation using only the URL from the website, which could add to the idea of brand detection using content-based features.

Initial Questions

When reading the abstract, I wondered what framework they used for crawling in new URLs, and if they were possibly evaded by cloaking techniques. They mention that they are using a regular Selenium version in headless mode, which seems that they are possibly being evaded.

One other thing that I was thinking was related to the tool they used to collect network-related information. I guess I did not get the answer to this directly in the paper, but I suggest that they are using features from selenium to capture that kind of information.

Lastly, I wondered what were their sources for URLs, which they mentioned to be PhishTank and OpenPhish. (Which might open a GAP to research on certificate-based URL publishing, such as CertStream).

Where do the experiment ideas come from?

I guess that it mostly came after accessing those open-sourced MiTM phishing toolkits from GitHub.

What are the interesting ideas/results?

Nice briefing on advanced cloaking techniques in Section 3.2.

Nice tool TLSProber.

Nice robustness testing on the Random Forest classifier in Section 3.5.

Nice experiment on the difference of domains in Section 4.2.

João Pedro Favoretti - Blog

Saturday, August 24, 2024