💡This is a reader's perspective on the paper written by Zheng Yang (from the Georgia Institute of Technology) and published at the USENIX Security 2023.
Brief Description
While I lack understanding of the clickjacking of ads and deployment of Social Engineering ads, I fail to suggest some alternatives to the approach taken by the authors. Besides that, I might fail to identify some of the key aspects of the paper.
Despite that, the paper suggests a classification for Social Engineering ads, which are ads that redirect the user to a page and trick them into doing something else without their consent. That could be tricking the user into downloading adware or clicking on ads to make the attacker earn money on the click.
They do that by creating a
Observations
First of all, most of the observations here might be caused by some misunderstanding I had on the paper (which may be caused by personal misreadings, even though I tried to read a few times some of the parts).
With that said, I still fail to understand how they got the data from the ads using publicwww.com.
Besides that, I still don't understand what the WAHG look like, and how they derive it from the logs extracted from CDP (Which I really would like to understand). Since their code is open sourced, I am thinking of taking a look later on.
There might be a typo on the Introduction at page 3, where they said "sea" instead of "SE-ad".
Very old version of Chromium (87), since now it is 126.
The paper also suggests the behavior of cloaking techniques that might cause a limitation to their work. That is indeed a nice study GAP still, to how to bypass it. Besides, seeing the cloaking techniques under the glasses of this graph-based tool is a very nice idea.
Initial Questions
The first question I had while reading the paper was about the tool they used to extract the WAHG graph from the webpages, and how they made the crawling. Even though it is not completely described in the paper (again, it might be my fault), they open-sourced the tool.
Where do the experiment ideas come from?
I like how they got the idea to instrument the Blink CDP from PageGraph, create the WAHG representation of the page, and make a classification system on top of the data-gathering tool. Ultimately, all the papers are basically consequences of a data-gathering technique.
What are the interesting ideas/results?
I like the idea of analyzing the concept drift of the classifier, by testing with data gotten from a year later.
Another thing that I liked is that, even with the argument of being a little bit more generic than other tools, it still relies on some specific characteristics of SE-ads, by analyzing the redirection behavior and clickjacking.
Creating the graph representation of the logs is also a very nice idea. It might be interesting to reproduce.
Another nice idea from their crawler is the webpage interaction page. It is a thing that is being done by other works and might be nice to have. Understanding the pages by interacting with them as well.
I also like the idea that they had to cluster the pages using the perceptual hash of the screenshots of each website so that they could reduce the number of manual classifications they had.
Besides that, I really like the performance evaluation of the classifier that they had. I left the reading without any doubts or ideas of how to evaluate the classifier even further. That can also be said for the feature importance analysis they did in the paper.
On the performance evaluation part, it is also a nice idea to compare the tool against commercial tools. In this case, they compared against the Brave ad blocker.