Forum

Hiawatha and Pound interaction troubles

Rodney Hester
2 November 2010, 13:31
Hiawatha version: 7.3
Operating System: Tomato Linux K26

I'm using Pound as a front-end to Hiawatha. For some reason, every request that comes through Pound to Hiawatha appears to the latter as a new (and persistent) client connection, so that 150 requests (by default) later, _even if they are generated from the same host with the same wget in a looped script_, Hiawatha begins to report that it has reached max connections. This behavior is not exhibited if Pound is taken out of the loop.

Any ideas what Pound might be doing to make every request look like a new client that never goes away? I could understand if Pound was reusing persistent connections, but based on lsof, that doesn't appear to be the case. I've toyed with various timeout and session parameters in Pound with no success.

I did _not_ see this behavior using Apache 2 as a backend.

Rodney
Rodney Hester
2 November 2010, 14:02
Wow. This only happens if I use the HideProxy directive. If I disable that, everything looks clean.

Any clue?

Rodney
Rodney Hester
2 November 2010, 14:48
Best as I can tell, this has something to do with the fact that the actual socket is between Pound and Hiawatha, but once Hiawatha overrides the client IP (the one from Pound) with the one from X_FORWARDED_FOR, at that point, the socket cannot be closed properly because it was bound against one IP but closed against another (if that makes sense). I've looked through hiawatha.c and session.c, and can see somewhat how this could happen, but I can't follow the code well enough to figure out how to fix it. (I was thinking of adding another field in the session structure to hold the X_FORWARDED_FOR value and check against it when writing to the weblog, but I'm not sure that will model the expected behavior 100% - is "hiding" intended for anything other than logging?)

Rodney
Rodney Hester
2 November 2010, 14:50
Hrm...never mind. Hiding should also apply to banning, etc. so that the "real" (i.e. non-local) IP is also considered.

I have no idea how to fix this, then. :/

Rodney
Rodney Hester
2 November 2010, 15:01
FYI, this appears to be the same problem as http://www.hiawatha-webserver.org/forum/topic/487,. but I can assure you the OP wasn't imagining it.

Rodney
Hugo Leisink
2 November 2010, 15:05
Can you test if the same thing happens with this 7.4 beta version [www.leisink.org]. Please note that this is a beta version, so please do not distribute.
Rodney Hester
2 November 2010, 15:23
Will check as soon as my toolchain finishes rebuilding. Here's what I'm seeing in command "show clients" when this is happening (127.0.0.1 is my HideProxy address):

  Client ID   : 3
IP-address : 2001:470:1f07:b17:a00:27ff:fef0:cc9e
Hostname : tomato.multics.mooo.com
Socket : 8
Kept alive : 1

Client ID : 2
IP-address : 172.16.0.152
Hostname : multics.dynalias.com
Socket : 8
Kept alive : 1

Client ID : 1
IP-address : 2001:470:1f07:b17:a00:27ff:fef0:cc9e
Hostname : tomato.multics.mooo.com
Socket : 8
Kept alive : 1

Client ID : 0
IP-address : 172.16.0.152
Hostname : multics.dynalias.com
Socket : 8
Kept alive : 1


Note that I'm seeing the same socket # over and over (I should - it's a single socket between Pound and Hiawatha), but different IPs depending on the requesting client, which results in different client IDs. I imagine this is what's confusing client_list.

Rodney
Hugo Leisink
2 November 2010, 15:25
It looks like a bug in Hiawatha. I sent you the 7.4 beta, because I already fixed something in the client list handling. Very well possible it solves your problem.
Rodney Hester
2 November 2010, 15:31
Same thing:

  Client ID   : 4
IP-address : 2001:470:1f07:b17:a00:27ff:fef0:cc9e
Hostname : tomato.multics.mooo.com
Socket : 8
Kept alive : 1

Client ID : 3
IP-address : 172.16.0.152
Hostname : multics.dynalias.com
Socket : 8
Kept alive : 1

Client ID : 2
IP-address : 2001:470:1f07:b17:a00:27ff:fef0:cc9e
Hostname : tomato.multics.mooo.com
Socket : 8
Kept alive : 1

Client ID : 1
IP-address : 172.16.0.152
Hostname : multics.dynalias.com
Socket : 8
Kept alive : 1

Total: 4 clients


Note that I am hitting a redirected URL (redirected via Hiawatha itself), which loops back through Pound, so every request is really 2. However, it doesn't seem to make any difference even if I hit the direct/underlying URL:

  Client ID   : 8
IP-address : 2001:470:1f07:b17:a00:27ff:fef0:cc9e
Hostname : tomato.multics.mooo.com
Socket : 9
Kept alive : 1

Client ID : 7
IP-address : 2001:470:1f07:b17:a00:27ff:fef0:cc9e
Hostname : tomato.multics.mooo.com
Socket : 9
Kept alive : 1


I'm sure I'm running the beta:

Tue 02 Nov 2010 10:29:29 -0400|Hiawatha v7.4 started

Rodney
Hugo Leisink
2 November 2010, 15:36
Do you see any messages in the error logfile?
Hugo Leisink
2 November 2010, 16:03
Can you please redownload the beta source file and show me the 'show clients' list again? I've added a timer field. Please wait at least 10 seconds between making the requests and printing the client list.
Rodney Hester
2 November 2010, 16:25
The last one (client 0) was 'regular' web traffic from the outside. Clients 1 and 2 were my tests, waiting 15 seconds between each request and show clients:

  Client ID   : 2
IP-address : 2001:470:1f07:b17:a00:27ff:fef0:cc9e
Hostname : tomato.multics.mooo.com
Socket : 8
Timer : off
Kept alive : 1

Client ID : 1
IP-address : 2001:470:1f07:b17:a00:27ff:fef0:cc9e
Hostname : tomato.multics.mooo.com
Socket : 7
Timer : off
Kept alive : 1

Client ID : 0
IP-address : 66.249.71.115
Hostname : multics.dynalias.com
Socket : 7
Timer : off
Kept alive : 1

Total: 3 clients


Nothing in the error logs, and no "Client disconnected" messages in system.log.

Rodney
Hugo Leisink
2 November 2010, 16:33
The 'Timer' should not say 'off'. Can you send me your Hiawatha and Pound configuration? I'll have to do some debugging. Please, e-mail them to hugo@hiawatha-webserver.org.

Which version of Pound are you using?
Rodney Hester
2 November 2010, 16:48
Mail sent.

Rodney
Rodney Hester
2 November 2010, 21:03
BTW, with HideProxy disabled, I'm unable to reproduce the issue. The # of active clients is virtually always zero, despite rapid pounding of the server for testing.

However, I did manage (in a *VERY* tight loop) to catch one of them (again with HideProxy disabled):


Client ID : 62
IP-address : 127.0.0.1
Hostname : tomato.multics.mooo.com
Socket : 8
Timer : off
Kept alive : 1

Total: 1 clients

Note that Timer still says off, yet Hiawatha (at least externally speaking) appears to be working as expected with no client buildup.

Interesting, no?

Rodney
Hugo Leisink
2 November 2010, 21:28
Can you again redownload the beta package. I think I've solved the problem.
Rodney Hester
2 November 2010, 22:12
Looks like that did it!

Any idea when 7.4 may be released? I want to respect your wishes not to distribute beta source, but at the same time, Hiawatha is emerging as the go-to web server of choice on Tomato and similar embedded router platforms.

Rodney
Hugo Leisink
2 November 2010, 22:21
As you can read in the ChangeLog, 7.4 contains several other fixes and improvements. Specially with this important bugfix, my plan is to release it soon. I want to do a last check and test before actually releasing it. I'm quite busy this week, so it will probably done next week. I'd appreciate if you would share your test results with me.
Rodney Hester
3 November 2010, 19:00
So far, so good - 358 clients, no issues. I'll let it percolate for another 72 hours or so uninterrupted and let you know.

Rodney
Rodney Hester
4 November 2010, 12:45
Hrm.

I may need to open a separate thread on this, but when I set up a test harness yesterday to fire 1 CGI request and 1 XSL/directory listing request every quarter second at Hiawatha, while the client list remained sane through over 9000 requests (yay!), the memory...well, that I'm having trouble explaining.

ps -eo pid,ppid,rss,vsz,cmd

11809 1 31932 46832 /opt/usr/sbin/hiawatha -c /opt/etc/hiawatha
11810 11809 31932 46832 /opt/usr/sbin/hiawatha -c /opt/etc/hiawatha
11812 11810 31932 46832 /opt/usr/sbin/hiawatha -c /opt/etc/hiawatha


So the combined 3 threads of Hiawatha after the test are using almost 32MB of resident RAM and over 46MB of virtual. Wow! I've noticed since the test (I left the daemon running) that even lightweight requests that came in overnight from the 'net have continued to increase the memory footprint.

Is this a leak, or is there some internal cache/buffer that will reach a certain size and then stop growing? It blows my mind that it should need this much memory servicing a few small web sites. I'm not using FastCGI or anything else that I can think of that would create this sort of growth, and I've confirmed the weblogs are being flushed to disk in a timely fashion, so they shouldn't be backlogged in RAM. Nothing at all in Hiawatha's system log past the "started" line before testing began.

I unfortunately don't have valgrind on the MIPS side, just wondering whether this is something to be concerned about (this memory footprint, unfortunately, makes for a very difficult recommendation on embedded devices!).

Thoughts?

Rodney
Rodney Hester
4 November 2010, 12:47
BTW, just as a point of reference, after a clean start my RSS is at 848K and VSZ at 4.6MB.

Rodney
Rodney Hester
4 November 2010, 13:13
Looks like it's XSLT (or related).

500 PHP5 requests put me at a max (and fixed) RSS of 1.6MB and VSZ of 4.8MB - it remained unchanged after 200+ iterations, which looks reasonable.

However, when I started a battery of the same requests against a directory listing using XSLT, memory began to climb rapidly (approximately 350K per 50 requests of both RSS and VSZ) and there appears to be no end in sight.

Rodney
Rodney Hester
4 November 2010, 13:29
If it helps, I'm using:

libxml2 2.7.7 (current) with configure opts "--with-minimum --with-tree=yes --with-xpath=yes --with-html=yes --with-output=yes --with-xinclude=yes --with-xptr=yes"

and

libxslt 1.1.26 (current) with configure opts "--without-crypto --without-debug --without-debugger --without-plugins --with-libxml-src=../libxml2"

(just in case a leak is suspected in either - a 7K leak per dir listing is pretty harsh!

Rodney
Hugo Leisink
4 November 2010, 19:13
Thanks for your bugreport. I'll do some testing myself very soon. I always check new releases with valgrind, but haven't seen anything that looks like a memory leak with 7.3.

Please note that the VSZ and RSS values are not the actual memory in use. They represent the amount of memory the application would use when it was the only application running on that machine. So, shared libraries are also included in that amount.
Rodney Hester
4 November 2010, 19:26
Any idea how I can distinguish between the libraries' usage and Hiawatha's? That was my first suspicion that the real issue could be in one of the underlying libraries.

BTW, I repeated my torture test on standard HTML (i.e. not PHP and not XSLT), and no increase in memory footprint, so the issue is *definitely* limited to XSLT. If there's anything I can do to help further narrow it, I'm happy to...this is used a _lot_ on my site.

Rodney
Hugo Leisink
4 November 2010, 19:29
I did some heavy requesting on directory listings myself and can confirm that there is indeed an increase in memory usage. I'll do my best to kill this bug.
Hugo Leisink
7 November 2010, 21:35
I'm happy to inform you that the memory leak has been found. The next version with this fix will be released within a week.
Rodney Hester
8 November 2010, 02:47
Would you like me to test beforehand, or are you fairly comfortable?

Rodney
Hugo Leisink
8 November 2010, 09:46
Yes, please test. You can find the latest one here [www.leisink.org].
Rodney Hester
8 November 2010, 15:27
So far, so good, I think.

12,000 requests later, my RSS is stable at 2MB and VSZ stable at 5MB, having been battered via standard HTML requests, PHP, and XSLT.

However, I did hit one interesting anomaly.

Just after I started the first round of testing, Hiawatha locked up. The CommandChannel hung, closing and restarting it resulted in connection refused, and trying to kill the Hiawatha process had no effect...a kill -9 did the trick.

I saw this in /var/log/messages:

Nov 8 08:04:37 cerberus daemon.warn pound: (136003) e414 headers: request URI too long
Nov 8 08:04:43 cerberus daemon.alert syslog: segmentation fault!
Nov 8 08:04:58 cerberus daemon.notice pound: (13a404) e500 response error read from 127.0.0.1:80/GET / HTTP/1.0: Connection timed out (15.004 secs)

I don't _think_ the "URI too long" message is related, since the actual hang happened right about 8:04:43. I'm not sure if the syslog segfault is related or not, but the timing is certainly suspicious. You can see that pound also lost communication with Hiawatha at exactly that time.

There was nothing interesting in dmesg. I saw this in Hiawatha's system.log:

Mon 08 Nov 2010 07:35:59 -0500|Hiawatha v7.4 started
Mon 08 Nov 2010 08:06:01 -0500|Hiawatha v7.4 started

which tells me that Hiawatha was unaware it was having a problem (the 7:35:59 session was the one that hung, the 8:06:01 session was after the kill -9 and restart. It looks like up to that point Hiawatha was working fine:

2001:470:1f07:b17:a00:27ff:fef0:cc9e|Mon 08 Nov 2010 08:04:42 -0500|200|11038||GET / HTTP/1.0|User-Agent: Wget/1.12 (linux-gnu)|Accept: */*|Host: blog.multics.mooo.com|Connection: keep-alive|X-Forwarded-For: 2001:470:1f07:b17:a00:27ff:fef0:cc9e

2001:470:1f07:b17:a00:27ff:fef0:cc9e|Mon 08 Nov 2010 08:04:43 -0500|200|2971||GET / HTTP/1.0|User-Agent: Wget/1.12 (linux-gnu)|Accept: */*|Host: kickstart.multics.mooo.com|Connection: keep-alive|X-Forwarded-For: 2001:470:1f07:b17:a00:27ff:fef0:cc9e

2001:470:1f07:b17:a00:27ff:fef0:cc9e|Mon 08 Nov 2010 08:04:42 -0500|200|37805||GET / HTTP/1.0|User-Agent: Wget/1.12 (linux-gnu)|Accept: */*|Host: tomato.multics.mooo.com|Connection: keep-alive|X-Forwarded-For: 2001:470:1f07:b17:a00:27ff:fef0:cc9e

(the above was the three URLs I was hammering at half-second intervals with the test)

I can't find anything else of interest, and it's been rock-stable ever since, so I'm not quite sure where things went wrong. At this point I'm willing to assume that some sort of external event (noting the syslog failure above) was the real root cause and that there is no issue with Hiawatha, but I felt it warranted mentioning anyway.

I'll report in if I see any further issues, but at this point things are looking good. Thank you for all your help!

Rodney
Hugo Leisink
8 November 2010, 15:33
Could you please try to run Hiawatha in valgrind (see this HOWTO page for instructions) and redo the test / reproduce the segmentation fault. The 'segmentation fault!' is definitly a Hiawatha crash.
Rodney Hester
8 November 2010, 17:19
Unfortunately, valgrind isn't supported on mipsel where the code is running. I've also been unable to reproduce thus far after more than 20,000 iterations.

Rodney
This topic has been closed.