💡This is a reader's perspective on the paper written by Yuji Sakurai (from Waseda University) and published at the IEEE European Symposium on Security and Privacy 2020.
Brief Description
This paper proposes a series of studies on HTTPs used for phishing. They propose a way to collect data, a clustering technique, and a classification algorithm based on TLS-based information, mainly the domain of the phishing website, that uses regex to find interesting patterns in the phishing domain.
Observations
Even though I was a bit skeptical about the idea of using the domain name to cluster the dataset, they proposed an interesting approach using regex (which might be an influence from one of the coauthors, very strange approach if otherwise), which is worth reading.
One other question I had was regarding the hyper-parameter testing on the DBSCAN. They should've tried changing the default parameters since they just stated that this was the only thing they tried.
Initial Questions
My first question was if they were proposing a classification for phishing websites. In the end, they do propose an unsupervised "classification" system that is based on their dataset.
Where do the experiment ideas come from?
Since there were already other tools that used TLS information to classify malicious URLs, I suspect that the idea was to improve the existing tools.
What are the interesting ideas/results?
I like the LCS-based algorithm to calculate the difference of strings.
I really like the example they provided in Section 3.3 to explain their algorithm.
Nice confirmation that Lets Encrypt is highly used by phishing websites, which was explored in other papers as well.
No comments:
Post a Comment