Scuba Gear from CISA, ROBLOX Malware Campaign, and RUST backdoo-rs
Hello, this week Jordan_Zebor is your editor looking at the notable security news for Scuba Gear from CISA, a ROBLOX Malware Campaign, & a Rust based meterpreter named Backdoo-rs. Scuba Gear from CISA ScubaGear is a CISA-developed tool designed to assess and verify whether a Microsoft 365 (M365) tenant’s configuration aligns with the Secure Cloud Business Applications (SCuBA) Security Configuration Baseline. This tool ensures that organizations are following CISA’s recommended security settings for cloud environments, helping to identify vulnerabilities or misconfigurations in their M365 setup. The value of running ScubaGear lies in its ability to enhance an organization’s cybersecurity posture, mitigate risks, and maintain compliance with security standards, which is crucial for protecting sensitive data in cloud-based systems. ScubaGear addresses the growing need for secure cloud deployments by automating the assessment process, making it easier for IT and security teams to identify gaps and take corrective actions. Regular assessments with this tool can help reduce the chances of data breaches, unauthorized access, and other security threats, thereby maintaining the integrity and confidentiality of business operations. Additionally, it supports organizations in staying ahead of compliance requirements by ensuring they meet the security baselines recommended by CISA. ROBLOX Malware Campaign Checkmarx recently discovered a year-long malware campaign targeting Roblox developers through malicious npm packages that mimic the popular “noblox.js” library. The attackers used tactics like brandjacking and typosquatting to create malicious packages that appeared legitimate, aiming to steal sensitive data like Discord tokens, deploy additional payloads, and maintain persistence on compromised systems. Despite efforts to remove these packages, new versions keep appearing on the npm registry, indicating an ongoing threat. RUST backdoo-rs The article "Learning Rust for Fun and backdoo-rs" describes the author's journey of learning Rust by developing a custom meterpreter. While Rust is designed to avoid common programming errors, ensuring software is secure from the outset, the choice of using it to create red teaming tools is also a great use case. A key aspectI covered recently is how Rust helps eliminate vulnerabilities like buffer overflows and use-after-free errors. These are traditionally common in C and C++, but Rust's ownership model prevents such risks by ensuring safe memory usage. In addition, Rust's growing adoption in the cybersecurity community, driven by companies like Google and Microsoft, emphasizes its role in secure software development, underscoring the "secure by design" principles that CISA advocates for. Projects like "backdoo-rs" demonstrate Rust’s potential for secure, reliable development in any context.151Views2likes0CommentsHow to Identify and Manage Scrapers (Pt. 2)
Introduction Welcome back to part two of the article on how to identify and manage scrapers. While part one focused on ways to identify and detect scrapers, part two will highlight various approaches to prevent, manage, and reduce scraping. 9 Ways to Manage Scrapers We'll start by highlighting some of the top methods used to manage scrapers to help you find the method best suited for your use case. 1. Robots.txt The robots.txt file on a website contains rules for bots and scrapers, but it lacks enforcement power. Often, scrapers ignore these rules, scraping data they want. Other scraper management techniques are needed to enforce compliance and prevent scrapers from ignoring these rules. 2. Site, App, and API Design to Limit Data Provided to Bare Minimum To manage scrapers, remove access to desired data, which may not always be feasible due to business-critical requirements. Designing websites, mobile apps, and APIs to limit or remove exposed data effectively reduces unwanted scraping. 3. CAPTCHA/reCAPTCHA CAPTCHAs (including reCAPTCHA and other tests) are used to manage and mitigate scrapers by presenting challenges to prove human identity. Passing these tests grants access to data. However, they cause friction and decrease conversion rates. With advancements in recognition, computer vision, and AI, scrapers and bots have become more adept at solving CAPTCHAs, making them ineffective against more sophisticated scrapers. 4. Honey Pot Links Scrapers, unlike humans, can see hidden elements on a web page, such as form fields and links. Security teams and web designers can add these to web pages, allowing them to respond to transactions performed by scrapers, such as forwarding them to a honeypot or providing incomplete results. 5. Require All Users to be Authenticated Most scraping occurs without authentication, making it difficult to enforce access limits. To improve control, all users should be authenticated before data requests. Less motivated scrapers may avoid creating accounts, while sophisticated scrapers may resort to fake account creation. F5 Labs published an entire article series focusing on fake account creation bots. These skilled scrapers distribute data requests among fake accounts, adhering to account-level request limits. Implementing authentication measures could discourage less-motivated scrapers and improve data security. 6. Cookie/Device Fingerprint-Based Controls To limit user requests, cookie-based tracking or device/TLS fingerprinting can be used, but they are invisible to legitimate users and can't be used for all users. Challenges include cookie deletion, collisions, and divisions. Advanced scrapers using tools like Browser Automation Studio (BAS) have anti-fingerprint capabilities including fingerprint switching, which can help them bypass these types of controls. 7. WAF Based Blocks and Rate Limits (UA and IP) Web Application Firewalls (WAFs) manage scrapers by creating rules based on user agent strings, headers, and IP addresses, but are ineffective against sophisticated scrapers who use common user agent strings, large numbers of IP addresses, and common header orders. 8. Basic Bot Defense Basic bot defense solutions use JavaScript, CAPTCHA, device fingerprinting, and user behavior analytics to identify scrapers. They don't obfuscate signals collection scripts, encrypt, or randomize them, making it easy for sophisticated scrapers to reverse engineer. IP reputation and geo-blocking are also used. However, these solutions can be bypassed using new generation automation tools like BAS and puppeteer, or using high-quality proxy networks with high reputation IP addresses. Advanced scrapers can easily craft spoofed packets to bypass the defense system. 9. Advanced Bot Defense Advanced enterprise-grade bot defense solutions use randomized, obfuscated signals collection to prevent reverse engineering and tamper protection. They use encryption and machine learning (ML) to build robust detection and mitigation systems. These solutions are effective against sophisticated scrapers, including AI companies, and adapt to varying automation techniques, providing long-term protection against both identified and unidentified scrapers. Scraper Management Methods/Controls Comparison and Evaluation Table 1 (below) evaluates scraper management methods and controls, providing a rating score (out of 5) for each, with higher scores indicating more effective control. Control Pros Cons Rating Robot.txt +Cheap +Easy to implement +Effective against ethical bots -No enforcement -Ignored by most scrapers 1 Application redesign +Cheap -Not always feasible due to business need 1.5 CAPTCHA +Cheap +Easy to implement -Not always feasible due to business need 1.5 Honey pot links +Cheap +Easy to implement -Easily bypassed by more sophisticated scrapers 1.5 Require authentication +Cheap +Easy to implement +Effective against less motivated scrapers -Not always feasible due to business need -Results in a fake account creation problem 1.5 Cookie/fingerprint based controls +Cheaper than other solutions +Easier to implement +Effective against low sophistication scrapers -High risk of false positives from collisions -Ineffective against high to medium sophistication scrapers 2 Web Application Firewall +Cheaper than other solutions +Effective against low to medium sophistication scrapers -High risk of false positives from UA, header or IP based rate limits -Ineffective against high to medium sophistication scrapers 2.5 Basic bot defense +Effective against low to medium sophistication scrapers -Relatively expensive -Ineffective against high sophistication scrapers -Poor long term efficacy -Complex to implement and manage 3.5 Advanced bot defense +Effective against the most sophisticated scrapers +Long term efficacy -Expensive -Complex to implement and manage 5 Conclusion There are many methods of identifying and managing scrapers, as highlighted above, each with its pros and cons. Advanced bot defense solutions, though costly and complex, are the most effective against all levels of scraper sophistication. To read the full article in its entirety, including more detail on all the management options described here, head over to our post on F5 Labs.17Views0likes0CommentsHow to Identify and Manage Scrapers (Pt. 1)
Introduction The latest addition in our Scraper series focuses on how to identify and manage scrapers, but we’ll be splitting up the article into two parts. Part one will focus on outlining ways to identify and detect scrapers, while part two will focus on tactics to help manage scrapers. How to Identify Scraping Traffic The first step in identifying scraping traffic involves detecting various methods based on the scraper’s motivations and approaches. Some scrapers, like benign search bots, self-identify for network and security permission. Others, like AI companies, competitors, and malicious scrapers, hide themselves, making detection difficult. More sophisticated approaches are needed to combat these types of scrapers. Self-Identifying Scrapers There are several scrapers that announce themselves and make it very easy to identify them. These bots self-identify using the HTTP user agent string, indicating explicit permission or belief in providing valuable service. These bots can be classified into three categories. Search Engine Bots/Crawlers Performance or Security Monitoring Archiving Several scraper websites offer detailed information on their scrapers, including identification, IP addresses, and opt- out options. It's crucial to review these documents for scrapers of interest, as unscrupulous scrapers often impersonate known ones. Websites often provide tools to verify if a scraper is real or an imposter. Links to these documentation and screenshots are provided inour full blog on F5 Labs. Many scrapers identify themselves via the user agent string. A string is usually added to the user-agent string that contains the following. The name of the company, service or tool that is doing the scraping A website address for the company, service or tool that is doing the scraping A contact email for the administrator of the entity doing the scraping Other text explaining what the scraper is doing or who they are A key way to identify self-identifying scrapers is to search the user-agent field in your server logs for specific strings. Table 1 below outlines common strings you can look for. Table 1: Search strings to find self-identifying scrapers (* is a wildcard) Self Identification method Search String Name of the tool or service *Bot * or *bot* Website address *www* or *.com* Contact Email *@* Examples of User Agent Strings OpenAI searchbot user agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI- SearchBot/1.0; +https://openai.com/searchbot Bing search bot user agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; + http://www.bing.com/bingbot.htm) Chrome/ These scrapers have both the name of the tool or service, as well as the website in the user-agent string and can be identified using two of the methods highlighted in Table 1 above. Impersonation Because user agents are self-reported, they are easily spoofed. Any scraper can pretend to be a known entity like Google bot by simply presenting the Google bot user agent string. We have observed countless examples of fake bots impersonating large known scrapers like Google, Bing and Facebook. As one example, Figure 1 below shows here the traffic overview of a fake Google scraper bot. This scraper was responsible for almost a hundred thousand requests per day against a large US hotel chain’s room search endpoints. The bot used the following user-agent string, which is identical to the one used by the real Google bot. Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) (765362) IP-based Identification Scrapers can identify themselves via their IP addresses. Whois lookups can reveal the IP address of a scraper, their organization, or their registered ASNs. While not revealing the identity of the actual entity, they can be useful in certain cases. Geolocation information can also be used to identify automated scraping activity. Reverse DNS lookups help identify the scraper’s identify by using the Domain Name System (DNS) to find the domain name associated with an IP address, which can be identified by using free online reverse DNS lookup services. Since IP address spoofing is non-trivial, identifying and allowlisting scrapers using IP addresses is more secure than simply using user agents. Artificial Intelligence (AI) Scrapers Artificial intelligence companies are increasingly using internet scraping to train models, causing a surge in data scraping. This data is often used for for-profit AI services, which sometimes compete with scraping victims. Several lawsuits are currently underway against these companies. A California class-action lawsuit has been filed by 16 claimants against OpenAI, alleging copyright infringement due to the scraping and use of their data for model training. Due to all the sensitivity around AI companies scraping data from the internet, a few things have happened. Growing scrutiny of these companies has forced them to start publishing details of their scraping activity and ways to both identify these AI scrapers as well as ways to opt out of your applications being scraped. AI companies have seen an increase in opt-outs from AI scraping, resulting in them being unable to access the data needed to power their apps. Some less ethical AI companies have since set up alternative “dark scrapers” which do not self-identify, and instead secretly continue to scrape the data needed to power their AI services. Unidentified Scrapers Most scrapers don't identify themselves or request explicit permission, leaving application, network, and security teams unaware of their activities on Web, Mobile, and API applications. Identifying scrapers can be challenging, but we've used traffic pattern analysis in the past to help identify the organization or actors behind them. In particular, we look for continuous scraping patterns, interval-based scaping patterns, and high-velocity scraping patterns as indicators of scraper origins; you can view each of these techniques in depth and with examples in our blog post on F5 Labs. Requests for Obscure or Non-Existent Resources Website scrapers crawl obscure or low-volume pages, requesting resources like flight availability and pricing. They construct requests manually, sending them directly to airline origin servers. Figure 2 shows an example of a scraper that was scraping an airline’s flights and requesting flights to and from a train station. IP Infrastructure Analysis, Use of Hosting Infra or Corporate IP Ranges (Geo Location Matching) Scrapers distribute traffic via proxy networks or bot nets to avoid IP-based rate limits, making it easier to identify them. Some of these tactics include: Round-robin IP or UA usage Use of hosting IPs Use of low-reputations IPs Use of international IPs that do not match expected user locations The following are additional things to keep in mind when trying to identify scrapers. We provide an in-depth overview of each in our full article on F5 Labs. Conversion or look-to-book analysis Not downloading or fetching images and dependencies but just data Behavior/session analysis Conclusion Identifying scrapers targeting Web, Mobile, and API application data involves various methods. Sometimes scrapers will self-identify, though this is easily spoofed by malicious bots. IP-based identification is more reliable when building allowlists. While some AI scrapers self-identify, others do not, or operate alternative "dark scrapers". Finally, unidentified scrapers require more advanced analysis of traffic patterns, unusual requests, infrastructure analysis, and more. We go more in-depth on all of these identification tactics in our post on F5 Labs. Meanwhile, stay tuned for part two where we’ll outline tactics to help manage scrapers.27Views0likes0CommentsWhat Are Scrapers and Why Should You Care?
Introduction Scrapers are automated tools designed to extract data from websites and APIs for various purposes, posing significant threats to organizations of all sizes. They can lead to intellectual property theft, competitive advantage erosion, website/API performance degradation, and legal liabilities. Scraping is one of the top 10 automated threats by OWASP, defined as using automation to collect application content and/or other data for use elsewhere. It impacts businesses across various industries and its legal status varies depending on geographic and legal jurisdictions. What is Scraping? Scraping involves requesting web pages, loading them, and parsing the HTML to extract the desired data and content. Examples of heavily scraped items include: Flights Hotel rooms Retail product prices Insurance rates Credit and mortgage interest rates Contact lists Store locations User profiles Scrapers use automation to make many smaller requests and put the data together in pieces, often with tens of thousands or even millions of individual requests. In the 2024 Bad Bots Review by F5 Labs, scraping bots were responsible for high levels of automation on two of the three most targeted flows, Search and Quotes, throughout 2023 across the entire F5 Bot Defense network. See figure 1 below. In addition, up to 70% of all search traffic originates from scrapers without advanced bot defense solutions. This percentage is based on the numerous proof of concept analyses done for enterprises with no advanced bot controls in place. Scraper versus Crawler or Spider Scrapers are different from crawlers or spiders in that they are mostly designed to get data and content from a website or API. Crawlers and spiders are used to list websites for search engines. Scrapers are designed to extract and exfiltrate data and content from the website or API, which can then be reused, resold, and otherwise repurposed as the scraper intends. Scraping is typically in violation of the terms and conditions of most websites and APIs, with some cases overturning previous rulings. Most scrapers target information on the web, but activity against APIs is on the rise. Business Models for Scraping There are many different parties active in the scraping business, with different business models and incentives for scraping content and data. Figure 2 below provides an overview of the various sources of scraping activity. The scraping industry involves various parties with different business models and incentives for scraping content and data. Search engine companies, such as Google, Bing, Facebook, Amazon, and Baidu, index content from websites to help users find things on the internet. Their business model is selling ads placed alongside search results. Competitors scrape content and data from each other to win customers, market share, and revenue. They use scraping to increase market share, competitive pricing, network scraping, inventory scraping, researchers, and investment firms, intellectual property owners, data aggregators, news aggregators, and AI companies. Competitors scrape pricing and availability of competitor products to win increased market share. Network scraping involves scraping the names, addresses, and contact details of a company's network partners, such as repair shops, doctors, hospitals, clinics, insurance agents, and brokers. Inventory scraping involves stealing valuable content and data from a competing site for use on their own site. Researchers and investment firms use scraping to gather data for their research and generate revenue by publishing and selling the results of their market research. Intellectual property owners use scraping to identify possible trademark or copyright infringements and ensure compliance with pricing and discounting guidelines. Data aggregators collect and aggregate data from various sources and sell it to interested parties. Some specialize in specific industries, while others use scrapers to pull news feeds, blogs, articles, and press releases from various websites and APIs. Artificial Intelligence (AI) companies scrape data across various industries, often without identifying themselves. As the AI space continues to grow, scraping traffic is expected to increase. Criminal organizations often scrape websites or applications for various malicious purposes--including phishing, vulnerability scanning, identity theft, and intermediation. Criminals use scrapers to create replicas of the victim’s website or app, requiring users to provide personal information (PII). They also use scrapers to test for vulnerabilities in the website or application, such as allowing them to access discounted rates or back-end systems. Costs of Scraping Direct costs of scraping include infrastructure costs, server performance, and outages, loss of revenue and market share, and intermediary-driven intermediation. Companies prefer direct relationships with customers for selling and marketing, customer retention, cross-selling, and upselling, and customer experience. However, indirect costs include loss of investment, intellectual property, reputational damage, legal liability, and questionable practices. Scraping can lead to a loss of revenue, profits, market share, and customer satisfaction. Indirect costs include the loss of intellectual property, reputational damage, legal liability, and questionable practices. Companies may lose control over the end-to-end customer experience when intermediaries are involved, leading to dissatisfied customers. Conclusion Scraping is a significant issue that affects enterprises worldwide in various industries. F5 Labs' research shows that almost 1 in 5 search and quote transactions are generated by scrapers. It is usually done by various entities, including search engines, competitors, AI companies, and malicious third parties. These costs result in the loss of revenue, profits, market share, and customer satisfaction. For a deeper dive into the impact of scraping on enterprises and effective mitigation strategies, read the full article on F5 Labs.79Views2likes0Comments- 207Views2likes2Comments