Phishing kits are some of the most powerful enablers of digital crime. By having a cookie-cutter, ready-made format that can be purchased by anyone, the barrier to entry in creating convincing fake login portals is nearly eliminated. There have been several reports in recent months (TodayZoo
Franken-phish: TodayZoo built from other phishing kits. October 21, 2021. Accessible here.
, most recently) that point out attacks leveraging phishing kits. This report investigates a kit that has stayed active for more than four years. In just the last 18 months, thousands of users across more than a dozen public and private sectors are known to have been affected, suggesting that this kit has had a tremendous impact on organizations far and wide.
Uncovering the Activity
I’d like to add a disclaimer here before continuing. This section starts with a bit of a marketing vibe to it, but this is really how it happened. Feel free to skip this part of the narrative.
In late October, I sat down to start creating useful content for a new platform my team and I are launching called NetworkSage. One of the first technical blogs I wrote focuses on how various Sandbox technologies -- while incredibly useful -- have some important gaps that are addressed by NetworkSage (if interested, you can read that blog post here). While collecting data and finishing up one of the key points I was making (Scenario 5: User Visits Known Phishing Site), I realized that I was actively staring at something that shared attack infrastructure with a number of other samples in our system:
Figure 1: Suspicious Domain Appearing in Several Samples
Further still, I noticed that there was another domain showing up soon after the above activity. This appeared to be occurring in all of the samples where I entered credentials into the site, and almost always had a C2-like channel associated with it:
Figure 2: Suspicious Domain with C2-like Behavior
This discovery led me to the hypothesis that there was a widespread attack occurring that was flying under the radar of most systems.
Before getting into the technical details, I want to note that about half-way into my investigation, I discovered that portions of this attack and group (dubbed PerSwaysion) had been found and reported on previously. Interestingly, despite these reports, the activity has continued uninterrupted. In January of 2020, Avanan researchers briefly discussed one particular tactic used by clients of the phishing kit, namely the delivery of exceptionally legitimate-looking emails that link to malicious content hosted on Microsoft’s Sway service
Cybercriminals Use Microsoft Sway Scams to Phish Office 365 Security and Your Well-Trained Users. January 9, 2020. Accessible here.
. This provided a bit of context about how attackers use Microsoft Sway to bypass security filters and convince users that their request is legitimate.
In April of 2020, Group-IB did a much deeper dive
PerSwaysion Campaign: Playbook of Microsoft Document Sharing-Based Phishing Attack. April 30, 2020. Accessible here.
. Their report laid out a narrative that:
dubbed the group PerSwaysion
laid out the arc of events as they knew it (spanning from August 2019 through April 2020)
identified the likely nationality of the developers of the kit (Vietnamese)
described the kit’s global customer base
established some aspects of the modularity of the kit
provided a detailed review of some aspects of the attacker infrastructure
gave a walkthrough of a particular compromise
discussed known victim locations and relevance
identified some artifacts that can be used to find these attacks
These are both great reports that I recommend reviewing. As such, I will not repeat content that they have already covered. Instead, in this report I’ll spend time:
establishing the overall timeline of the group, which actually extends back to 2017
describing activity since the last report, including the scope of victims
elaborating on aspects of the kit not yet discussed
identifying Attack Vectors
identifying indicators userful for hunting and detection at various stages of the attack
providing a way to determine if you've been affected
Throughout this report I will also describe how I found much of this information. I believe this is an important contribution for the benefit of the security community beyond just this report. What follows is the investigation and how it unfolded.
Understanding the Scope
The system we’re releasing today is new and focuses on network traffic, which is absolutely critical for understanding activity when it’s hard or impossible to reproduce. However, because this attack was ongoing, I chose to investigate further by analyzing my samples and correlating them with Urlscan, a community platform focused on detonating URLs. This allowed me to dig deeper into what was actually happening in each of these steps.
I had many questions, but two were crucial to answer in order to determine the attack's scope.
Question 1: How Long has the Attack Existed?
While trying to establish the timeline for this attack, I first needed to understand how many samples existed using the domain that originally piqued my curiosity -- wancdnapp[.]page. Using Urlscan’s search feature, I was able to quickly get an idea of how many samples contained it:
Figure 3: Discovering How Many Samples had Suspicious Domain
From there, I was able to review the samples and analyze the files requested from various submissions over time. I correlated this with manual analysis I was performing on live phishing portals, further cementing the similarity of activity across time:
Figure 4: Comparing Data Between Fiddler and Urlscan
Figure 5: Code Comparison of Two Samples
However, not everything is identical. While analyzing scripts for one of the samples, I found a reference to a domain name that only exists in a comment:
Figure 6: Domain Name in Code Comment
By refining the search to both topics, I was able to learn which user uploaded the file:
Figure 8: User Associated with Script
At this time, I became aware of the Group-IB report (and consequently the Avanan report), namely because the researchers at Group-IB found this exact same link! However, this was the earliest activity that they confidently identified as part of this phishing kit.
Returning to my discovery of the anytools[.]biz reference, I noticed (via its WHOIS information) that it was registered nearly two years earlier than this established start of activity:
Figure 9: WHOIS for anytools[.]biz
While sophisticated adversaries certainly use the concept of domain aging
Domain aging is the process of registering a domain much earlier than when it will be used in an attack. For details, review page 18 of APWG's Global Phishing Survey: Trends and Domain Name Use in 2016, which is available here.
, I was highly suspicious that this domain lay dormant for so long.
A Pattern Emerges
At this point, I spent several hours researching characteristics that identified other related activity. Reviewing some samples with similar URL parameters, for example, led to additional domains I had not yet discovered (such as this one), which associated domain perfectstuff[.]info to the attack). Others had PHP script names that were relatively unique, such as 1.newsvpost_ads/loading.php. Searching the Internet for that string led me to this Pastebin paste identifying domain sptech[.]org. Finally, after reviewing the known Credential Collection site from the Group-IB report (c3y5-tools[.]com), I noticed that a considerable number of the attack domains had similar registration dates clustered around late September of 2017:
Figure 10: Cluster of Domains with Similar Creation Date
This matched closely in time with the registration of anytools[.]biz, which further strengthened my suspicion that mid 2019 was not the beginning of the attack.
On the last day of intelligence gathering, I was able to definitively tie the activity all the way back to October 10, 2017. This was made possible by the community of Urlscan users who share links they’ve received in exchange for better knowledge about whether it is malicious. Without this community and this platform, I likely never would have confirmed my suspicion.
Figure 11: Finding the Oldest Known Sample
The connection was made by first identifying something that all samples have in common (loading mobile-detect.min.js, a benign and open-source library), iteratively searching for all submissions that contained that file, and then reviewing the sequence of events that occurred during that site’s loading. While this may not be the actual beginning of the attack, it serves as the earliest-known use of the TTPs I’ll describe later:
Figure 12: Transactions from First Known Sample
Additionally, it correlates strongly with the cluster of attack-related domains that were created just a couple of weeks earlier.
Question 2: How Widespread is this Attack?
After reviewing Group-IB's report, many of my thoughts around this topic had at least partial answers. I was, however, still unsure about what type of organizations were known to have been targeted in the last 18 months. To partially answer
This is a partial answer because my analysis is necessarily subject to several biases, most specifically Availability Bias.
this question, I focused on analyzing data from Urlscan to understand:
how many known phishing portals existed, and where they were hosted
which email addresses were entered by potential victims
which Attack Vectors were used to deliver phishing lures
From this, I found that since May of 2020: 7403 total samples submitted 444unique phishing portals
Figure 13: Distribution of Phishing Portals by Hosting Site
14 public or private sectors affected
Known Sectors Affected
Realistically, because of the breadth and nature of this kit and the attacks, it’s likely that virtually any industry could have been a target over time.
While the Group-IB report does a great job of covering many aspects of the kit, there are several items I’d like to elaborate on to help analysts understand when they have come across this activity in the wild.
This kit has two types of modularity. First, it makes deploying a phishing portal for many brands essentially drag-and-drop:
Figure 14: Template Locations for Various Portals in English and Vietnamese
There are eight templates supported out of the box. Interestingly, the choice to target some of these brands itself highlights the age of this phishing kit.
While all of these templates are available, most known samples focus solely on Microsoft's Office365, with a small handful aiming to collect Outlook and other credentials.
The second modular aspect of this kit is how the attack infrastructure itself is set up. While the particular customer of the kit controls some implementation decisions, there are four aspects for each campaign:
1. Front-End Phishing Portal (Short-Lived)
These sites are where the phishing portals load for a user. This is what a user would likely see in their browser when visiting a page. As an example:
Figure 15: Example of a Front-End Phishing Portal
Because these are quickly detected and reported, they generally have a short shelf life.
2. Redirector Site (Long-Lived)
Figure 16: Example of a Redirector Site's Code
Because it is generally less clear that these sites are malicious (unless one looks at a system that can correlate many samples), these generally stay active for many months. To give a concrete example, when working with the vendor where one of these sites was hosted, I was told that there was only one unverifiable abuse report filed in the many months that the site was active!
3. Template Hosting Site (Short-Lived)
Figure 17: Example of a Template Hosting Site Loading Assets
Because the Template Hosting sites are easy to pattern match against (which we’ll discuss later) and view their resources, their shelf life is also short.
4. Credential Collection Site (Long-Lived)
The Credential Collection site is used to collect credentials that are entered by a user. These sites tend to stay up for extended periods of time (months or more) for a few reasons. First, they only appear when credential information has been entered, which does not occur in most sandbox environments. Second, all of the information is encrypted, which makes understanding the activity more difficult for most analysts. Third, they employ basic anti-analysis techniques that we’ll describe in a moment.
Outside of those discussed by Group-IB, there are two additional anti-analysis techniques employed by this phishing kit.
1. Code Packing
Figure 18: Before and After Code Unpacking
While this isn’t something difficult to bypass, it does hinder the ability to look for exact string matches in the content.
2. Anti Chrome Dev Tools
Second, this kit has anti-debugging set up to block analysis by Chrome’s Developer Tools. Any time Developer Tools is launched, the code triggers a Pause function:
Figure 19: Anti-Debugging Found in Phishing Kit
This, combined with the fact that all communications are encoded and then encrypted, complicates analysis further. Fortunately, this can easily be bypassed by using an off-the-shelf proxy:
Figure 20: Decrypting and Decoding User-Submitted Payload in Fiddler and CyberChef
Concrete discussion about which vectors were used to deliver some malicious content at the beginning of an attack is something that rarely appears in threat intelligence reports. At SeclarityIO we believe that this is an underserved area that can be used to help all analysts understand tactics used repeatedly by adversaries. As we analyze more data going forward, we intend to eliminate this major blind spot. While we don't know everything that we'd like for this report, we do still know some things about which vectors were used to deliver the kit.
At a high level, phishing is the #1 vector for delivery in this attack (as is true in a large majority of today’s attacks). However, we also know that those phishing emails employed various techniques to hide their intent, and some amount of attacks originated outside of email. Since May 2020, the techniques we observed are as follows:
1. URL Shorteners
The use of URL shorteners can help to bypass some email protections, as well as add an air of legitimacy to a URL that disarms a suspicious user. The shorteners below were utilized.
URL Shorteners Observed
To see an example of what it looks like when a cutt[.]ly link is clicked and the user is redirected to a phishing portal, head to the How to Know If I’m Affected section.
2. Email Marketing Platforms
There were many samples which used sendgrid[.]net to bypass email filters and end up in users’ inboxes.
3. Compromised Sites
Compromising sites is additional work for the adversary, but doing so provides them with the ability to hide more stealthily within an otherwise benign site. While we won’t mention the sites that were compromised, there were several of them.
4. Malicious Domains
There were many domains set up by attackers with the intention to bypass security platforms by being unknown. The full list of known malicious domains can be found in the indicators section of this report.
5. Content Preview and Hosting Sites
Platforms that allow users to host arbitrary content on them are often used by attackers to quickly set up content and bypass security defenses. In addition to those platforms associated with the Phishing Portals, the following were observed:
A few samples had redirects through Google’s advertising infrastructure, specifically googleads.g.doubleclick[.]net
7. Open Redirects
It is possible to use certain sites in a malicious way without compromising them. One such example that appeared in the data set is hangouts.google[.]com
Open redirects ... and why Phishers love them. June 18, 2021. Accessible here.
Hunting and Detection
Since this phishing kit has been affecting thousands of organizations across more than four years, it is clear that there is not enough concrete information about how to successfully identify this activity across different layers of the security stack. In this section, I provide indicators that allow for security teams to not only identify what exists today (“specific indicators”), but also to identify the activity more broadly.
There are a wealth of specific indicators visible at various layers. These will be useful for detecting compromises that have already occurred, as well as those that do not adjust to the publishing of this report. Note that as you review the indicators listed below, some of them have been sampled by analyzing one out of every 100 samples (across the total sample size of ~7400). This is because there is a massive number of sites and domains associated with this time period, and the value of many of these indicators has already expired (as the attack infrastructure evolves).
Known Front-End Phishing Portals by Hosting Platform (05/01/2020-11/04/2021)
Sampling of Known Redirector Sites (05/01/2020-11/04/2021)
This site went undetected for 8 months until we worked with a vendor to block it.
Sampling of Known Template Hosting Sites (05/01/2020-11/04/2021)
Known Credential Collection Sites (05/01/2020-11/04/2021)
Known Malicious Domains (05/01/2020-11/04/2021)
6. Recaptcha Key
Inside the core phishing kit code that is repackaged and identical for the last 4+ years, there is a reference to the kit’s Google Recaptcha site key:
Figure 21: Recaptcha Key in Beautified Code
Checking in with Google on this, I learned that this was in fact able to help identify a significant number of sites.
7. Strings and Regexes
Broad indicators allow this activity to continue to be detected while the underlying phishing kit remains the same, even as the infrastructure itself evolves. Because this kit is used by many criminal groups and has remained static for over four years, it is likely that these indicators will remain useful long-term.
1. Open-Source Libraries Loading
In every instance of this phishing kit’s use, there are a handful of libraries requested in order. This can be seen (after unpacking and beautifying) in the following code:
Figure 23: Block of Code Loading Open Source Libraries
Comparing the first-known sample with one from recent days, we see this behavior matching identically:
Figure 24: Open Source Libraries Load Identically Over Time
If you have access to a decrypted version of the information you can match things exactly as seen above. However, it’s also possible to identify this activity on the network! Below we see how an example of this activity appears in NetworkSage. Multiple requests to the same CDN are grouped together into one encrypted session, which provides a more succinct view:
Figure 25: Network View of Open Source Libraries Loading
2. Content Loading from Cloud Hosting Platforms
The second indicator for these (and for a wider range of) attacks is understanding how common some cloud hosting site is. Since these sites are acting as nearly one-use phishing portals in this activity, it’s likely that you’ll see that they are incredibly uncommon across the global population:
Figure 26: Uncommon Cloud-Hosted Activity
How to Know if I'm Affected
Knowing how far the phishing attack got -- as well as how it arrived for your users -- can be learned by analyzing the network traffic around the activity. For example, if your users were targeted by an attack that arrived via phishing through your Microsoft 365 instance, you’ll likely see activity like the following:
Figure 27: User Clicked on Email Containing cutt[.]ly Link for Phishing Portal
In the case above, we also discover that the user very likely received an email where the malicious domain was hidden behind a Cuttly URL shortening link, one of many URL shorteners that have been used to deliver malicious links in this and various other phishing attacks.
To learn if your users have entered credentials, there are two things to look for. First, you should be on the lookout for recent known Credential Collection sites. These are automatically labeled and described for your convenience in NetworkSage:
Figure 28: Labeled Credential Collection Site
Second, if the site is not yet recognized, be on the lookout for uncommonly-occurring activity that suggests a C2-like channel is set up soon after other indicators of this attack:
Figure 29: Site with C2-like Behavior Near Known PerSwaysion Indicators
If you submit your samples to NetworkSage and don’t see any indication of the C2 activity above, it’s likely that the user decided to leave the site before entering any information.
There are several loose ends that I’ve come across in the investigation of this phishing kit that I’d like to share with the community in hopes that it helps to continue crippling this infrastructure and the group behind the kit.
1. How is this kit marketed?
I am far from an expert on Dark Web activity, but my searching on various forums turned up no meaningful leads. Moreover, none of the strings (outside of those that I’ve mentioned) in any of the files I’ve analyzed have appeared anywhere on the Internet.
2. Who developed this kit?
Group-IB’s report identified that the developers likely spoke Vietnamese natively, but no other ties to the developers themselves were mentioned (there were references to users who bought the platform, but that isn’t what I’m interested in). Despite extensively searching a couple of possible leads associated with Vietnamese developers, my search turned up nothing fruitful.
3. What was the anytools[.]biz site?
While I found references to anytools[.]biz app development in many samples that existed from 2019 onward, I was unable to find any historical information (including via the Internet Archive) about this site or its contents.
4. Is this a view of an early UI?
While analyzing one site that served as a Credential Collection site in mid-2019 (dtvd[.]biz), I noticed that one of the samples that appeared in Urlscan had a page that referenced a Gmail Auto Login GUI:
Figure 30: UI Found on Credential Collection Site
This type of GUI would be useful for somebody who was trying to quickly validate whether the credentials entered were valid and valuable. It’s likely that this was a quick app spun up by an attacker for their own campaign, but it was the only piece of control infrastructure for which I was able to find visual evidence.
This analysis would not have been possible without the contributions of many creators inside and outside of the security community. As such, I wanted to specifically share the tools that I used as a thank you and as a reference for others.
Urlscan for significant plaintext site analysis and historical comparison
NetworkSage for identifying shared infrastructure, finding an active C2 domain, and allowing users to know whether or not they were affected
Fiddler for decrypting and reviewing communications to phishing portals