One site that I manage that has a lot of complex disease-related data that is given out for free to the public directly via the web site and also via an iOS app.  The entire site is dynamic and derived from multiple back end sources and subsequently causes a bit of a hit to render the information together.  We have often seen other entities (IPs originating from China & Germany recently) decide they want to “scrape” the entire site for all of the content without throttling their connections.  Due to our limited resources (only two front-end application servers), when someone hits us this way, this effectively creates a minor DoS for us.   The best place to try to control for these sort of scenarios would of course be at the load balancer device, but I don’t have control of configuration of them and so I have had to try to take matters into my own hands.

While there are quite a number of ways to cap someone’s impact on your server (mod_bw, mod_ratelimit, IP Tables, etc), most often, these are rendered fairly useless by the client’s IP being replaced by the IP of the LTM (traffic manager / load balancer).

Here’s how I effectively was able to limit other entities ability to hit our site beyond what it could handle.

Step 1:  Install mod_security (and mod_unique if not already loaded) – my mod_security was version 2.7 on CentOS

Step 2: Create a basic config for mod security
########## MOD SEC Basic Config File ###########
LoadModule unique_id_module modules/mod_unique_id.so
LoadModule security2_module modules/mod_security2.so
SecRuleEngine On
SecDataDir /tmp
SecTmpDir /tmp
SecDebugLog /var/log/httpd_logs/modsec_debug.log
SecDebugLogLevel 0
##########################################

Step 3: Add the specific connection limiting configuration anywhere inside the VirtualHost directives of the site you are trying to protect:

####### SPECIFIC rate limiting for this site inside apache VirtualHost #########
<LocationMatch “^/(?!(?:jpe?g|png|bmp|gif|css|js|svg))(.*)”>
#counts everything but images and such
SecAction initcol:ip=%{X-Forwarded-For},pass,nolog,id:4444446
SecAction “phase:5,deprecatevar:ip.mysiteconncounter=1/1,pass,nolog,id:4444447”
SecRule IP:MYSITECONNCOUNTER “@gt 150” “id:4444448,phase:2,pause:300,deny,status:509,setenv:RATELIMITED,skip:1,nolog”
SecAction “id:4444449,phase:2,pass,setvar:ip.mysiteconncounter=+1,nolog”
Header always set Retry-After “15” env=RATELIMITED
</LocationMatch>
ErrorDocument 509 “Rate Limit Exceeded”
############## End rate limit configuration for virtual host here ###########

Step 4: Restart apache!

Some additional notes:
* You can limit what portion of your site is watched in the LocationMatch (instead of the whole site like I did)
* With Mod Security 2.7, I guess you need to add unique rule ID’s to each rule.  For these numbers I chose arbitrarily 4444446 – 4444449
* Change the “@gt 150” to a lower number if you want to lower the threshold for number of connections (each second the number of connections decreases by 1 for each IP) or the speed to which it deprecates the IP connection stat (1/1)
* If you would like to protect multiple sites (additional VirtualHosts on the same server), use a different variable for collecting IPs. In my case above I used “mysiteconncounter” so make sure it is something different, otherwise they will stomp on each other.
* You could probably return a different error code rather than 509 as it is only really partially supported out there.