[Rephrasing] Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

💡This is a reader's perspective on the paper written by Yun Lin and Ruofan Liu (from the National University of Singapore) and published at the USENIX Security Symposium 2021.

Brief Description

This is a proposition of a system that identifies what brand is being impersonated by the phishing webpage, given a training set of benign logos. The tool proposes a deep-learning algorithm to identify where on the page is the logo, and if the logo is similar to one of the training sets.

Observations

One thing that is very clear to me, every time I read a paper from those authors, is how short is my knowledge of deep learning algorithms. This should be fixed, because I could understand almost nothing of Section 3.2 on the explanation of the requirement of using Resnetv2 as a preliminary classification network).

In Section 5.1, I don't understand how they manually analyzed 350K phishing URLs and how "sometimes" they corrected the labels collected from OpenPhish. Similarly, in Section 5.2.1 they also mention the manual validation of 5,000 web pages.

Initial Questions

At first, I wanted to know where the benign logo dataset came from. Later the authors mention the usage of Logo2K+, which is commonly used by the authors in future work as well.

Where do the experiment ideas come from?

The authors seem to be very familiar with deep learning algorithms, so that might be their main motivation for this kind of work.

What are the interesting ideas/results?

In Section 3.2, I like the catch on the necessity of using a Siamese model to perform logo identification. This is caused by the drawback of multi-label classification systems (they will always classify as one of the trained labels).

Nice explanation on the required Step ReLU to harden adversarial attacks.

I like the log scale in Figure 10 to portray the advantages of the Phishpedia classification ROC curve.

Nice qualitative analysis from both the Phishpedia tool and the baseline tools. (Nice section "Why does Phishpedia outperform the baselines)

João Pedro Favoretti - Blog

Thursday, August 22, 2024