Forum Discussion

gnico's avatar
gnico
Icon for Nimbostratus rankNimbostratus
Nov 08, 2022

Irule to block request from amazonaws.com

Hello,

I have an irule to block request from amazonaws.com bad crawlers (millions of requests a day) but my irule doesn't work. Total executions is 0.. 

Here is the code : 

 

 

when HTTP_REQUEST  {
    if { [matchclass [string tolower [HTTP::header Host]] contains blacklist_host] } {
        reject
    }
}

 

 

In my datagroup blacklist_host, I have amazonaws.com entry.

If someone has a solution. Thank you

  • The issue here is that amazonaws.com is not the Host value. The amazonaws.com bot is making a request to your site, so the Host value is still your HTTP Host. To find a crawler bot, you'd want to use the User-Agent header.

    when HTTP_REQUEST  {
        if { [matchclass [string tolower [HTTP::header User-Agent]] contains blacklist_host] } {
            reject
        }
    }

    But then it may also be useful to consider using a robots.txt file: https://developers.google.com/search/docs/crawling-indexing/robots/intro, which you could host directly from an iRule:

    when HTTP_REQUEST {
      if { [HTTP::uri] == "/robots.txt" } {
        HTTP::respond 200 content [ifile get robots.txt]
      }
    }
  • gnico's avatar
    gnico
    Icon for Nimbostratus rankNimbostratus

    Thank you for your reply.

    That I want to get is a property like the Apache remote_host. In my Apache logs, i have millions hits from remote_host ec2-xx-xxx-xxx-xx.eu-west-3.compute.amazonaws.com

    They are malicious bots with classical User Agent likeMozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78 and with a lot of differents IP

    So, the last and only solution I found is to block them with their remote host. I don't want to use Apache rules. I want to block them before Apache.

    Is there a way ?

    Thank you

    • Kevin_Stewart's avatar
      Kevin_Stewart
      Icon for Employee rankEmployee

      Apache remote_host is essentially a reverse DNS lookup. You can do this in an iRule:

      1. Create a resolver object:

      list net dns-resolver my-resolver 
      net dns-resolver my-resolver {
          forward-zones {
              . {
                  nameservers {
                      10.1.20.1:domain { }
                  }
              }
          }
          route-domain 0
      }

      2. Create an iRule that uses the resolver object.

      Ref: https://clouddocs.f5.com/api/irules/RESOLVER__name_lookup.html

      proc resolv_ptr_v4 { addr_v4 } {
          set ret [scan $addr_v4 {%d.%d.%d.%d} a b c d]
          if { $ret != 4 } {
              return
          }
          set ret [RESOLVER::name_lookup "/Common/my-resolver" "$d.$c.$b.$a.in-addr.arpa" PTR]
          set ret [lindex [DNSMSG::section $ret answer] 0]
          if { $ret eq "" } {
              return
          }
          return [lindex $ret end]
      }
      when CLIENT_ACCEPTED {
          set result [call resolv_ptr_v4 [IP::client_addr]]
          log local0. $result
          ## put your data group search here
      }

       

    • Hi gnico , 
            if you recieve millions of hits also if you have Advanced WAF license on your F5 Bigip Appliance 

      , I think configuring Bot defense on your F5 will be good workaround. 
      Please check this Video : 
      https://www.youtube.com/watch?v=zSw4boZmNBA

      and monitor traffic from Asm event logs , also keep track your CPU and Memory as well.