Forum

RequestLimitMask and BanlistMask

Anton
23 February 2015, 18:38
I set this subnet of googlebot to avoid banning

BanlistMask = deny 66.249.64.0/19


Do I need to use RequestLimitMask in same way?
Manual looks a little bit misleading.

What is the best way to prevent parsing (bad bots)?
It looks like ChallengeClient is the best option for it.
Hugo Leisink
23 February 2015, 18:43
No, Google bot won't send any requests that need special rights (extra time, max request size, etc). So, RequestLimitMask is not needed for Google bots.

What do you mean by 'prevent parsing'?
Anton
23 February 2015, 19:31
Thanks for the answer. Now I understand the difference.

I am trying to figure out better Hiawatha options to ban bots that crawl or parse sites in one or more threads. In some cases such activities looks like DOS attacks.
Hugo Leisink
23 February 2015, 19:33
For that, take a look at the Header UrlToolkit option:
UrlToolkit {
ToolkitID = ban_bots
Header User-Agent googlebot DenyAccess
Header User-Agent evilbot Ban 3600
}
Anton
23 February 2015, 19:58
Thanks for the answer.

As I undestood this line

Header User-Agent googlebot DenyAccess


will ban all fake bots that use googlebot as User Agent.

And this line

BanlistMask = deny 66.249.64.0/19


will not ban real googlebots.

Hugo Leisink
24 February 2015, 13:19
The first line will ban ALL bots using that User-Agent, but the BanListMask can make an exception for IP ranges.
Anton
8 March 2015, 13:14
Thanks fot the answer.
Now I am testing these rules on new site.
Currently there are some spam bots and other bots are touching new site with Googlebot's user agent.
Some of them are in stopforumspam database and others.

Current setup:

Hiawatha v9.12, cache, IPv6, Monitor, reverse proxy, SSL (1.3.10), Tomahawk, URL toolkit, XSLT

Config includes Google subnets in BanlistMask.

ServerId = www-data
ServerString = Server
ConnectionsTotal = 1000
ConnectionsPerIP = 10
SystemLogfile = /var/log/hiawatha/system.log
GarbageLogfile = /var/log/hiawatha/garbage.log
ExploitLogfile = /var/log/hiawatha/exploit.log

Binding {
Port = 80
}

BanlistMask = deny 66.249.64.0/19, deny 72.14.192.0/18
BanOnGarbage = 600
BanOnFlooding = 10/1:600
BanOnMaxPerIP = 600
ChallengeClient = 200, httpheader, 1800
KickOnBan = yes
RebanDuringBan = yes

Hostname = 111.111.111.111
WebsiteRoot = /var/www/html
StartFile = index.html
AccessLogfile = /var/log/hiawatha/access.log
ErrorLogfile = /var/log/hiawatha/error.log

UrlToolkit {
ToolkitID = ban_bots
Header User-Agent googlebot DenyAccess
}

VirtualHost {
Hostname = domain.name
WebsiteRoot = /var/www/html/domain.name
UrlToolkit = ban_bots
}


error.log includes banned IP of real Googlebot and fake IPs of Googlebot

36.81.174.132|Sun 08 Mar 2015 14:38:39 +0300|access denied via URL toolkit rule
222.124.149.178|Sun 08 Mar 2015 14:38:53 +0300|access denied via URL toolkit rule
66.249.79.51|Sun 08 Mar 2015 14:39:17 +0300|access denied via URL toolkit rule


Also system.log includes such strange lines with real Googlebot IPs

Sun 08 Mar 2015 13:59:16 +0300|Hiawatha v9.12 stopped.
Sun 08 Mar 2015 13:59:17 +0300|Hiawatha v9.12 started.
66.249.79.67|Sun 08 Mar 2015 14:24:28 +0300|Client kicked
Sun 08 Mar 2015 14:24:28 +0300|Hiawatha v9.12 stopped.
Sun 08 Mar 2015 14:24:29 +0300|Hiawatha v9.12 started.
66.249.79.51|Sun 08 Mar 2015 14:35:52 +0300|Client kicked
Sun 08 Mar 2015 14:35:52 +0300|Hiawatha v9.12 stopped.
Sun 08 Mar 2015 14:35:52 +0300|Hiawatha v9.12 started.



Is it misconfiguration?
Hugo Leisink
9 March 2015, 09:43
No, those lines in system.log means that a client had multiple connections to the webserver. Because it 'misbehaved' in one connection, all other connections are closed (kicked) as well. Those messages are written to system.log, because it is not always possible to find out for which (virtual) host they were sending requests (for example if they had not sent one yet).
This topic has been closed.