Forum

Reverse Proxy Intermittant 408 Time Out Errors

David Johnson
29 May 2014, 16:56
Hi Hugo,

Hiawatha version: 9.5
Operating System: debian 7.1

I've been trying to track down an intermittent 408 error problem which my reverse proxy setup is throwing. All the traffic is being server in TLS/SSL from the proxy.

The first issue that this has thrown up is that it appears that these 408 errors thrown in reverse proxy mode do not appear to be logged anywhere. So I can't tell you how often it is happening relative to the overall load on the server (which is fairly busy - serving over a million hits a month). Having traced specific requests that have thrown the error at the browser, the requests did not appear to be logged at all - nothing in the reverse proxy logs, and nothing at the backend server logs.

I've managed to confirm that the error is being generated at the reverse proxy, not at the backend (Changing the background colour of the standard error.xslt is useful!). Most often it occurs on secondary static image files, but I've seen it occasionally on the primary html/php file, js file etc. so it doesn't look like a CGI or cache issue.

Watching requests go through with Chrome developer tools open, what is most strange is that the error is returned within a few tens or at most a couple of hundred msecs of the request being sent by the browser which look rather odd given that it is a timeout error. I've tinkered with the TimeForRequest settings, MaxKeepAlive, and the new reverse proxy keepalive option. No change.

Any suggestions of where to look, or things to try? Any other info you need to help you with this?

Many thanks,

David Johnson
David Johnson
30 May 2014, 17:19
Hi Hugo,

I think I've solved my own problem!

Having gone away and read up the RFCs on persistent connections and keep-alive and had a rummage in the source code, a better understanding and some configuration changes have brought about a fix.

First, the timeout errors ARE logged (sorry to suggest otherwise) - I just wasn't recognizing that the Timeout errors listed in the hiawatha system.log were the 408 errors I was seeing in the browser. Would it be clearer if these could be logged in the access.log or error.log with the http error code?

The fix was to add the following custom headers in the backend server hiawatha config for each vhost:

CustomHeader: Keep-Alive: timeout=<n>, max=<m>
CustomHeader: Connection: Keep-Alive

The values of n and m need to be equal to or a little less than the values set for TimeForRequest and MaxKeepAlive in the connection Binding config in the reverse proxy config - or this seems to work for me anyway.

So, a system log is full of Timeout errors:

144.59.47.47|Sat 24 May 2014 23:46:33 +0000|Timeout while waiting for request

suggests your users may be getting 408 timeout errors and you may need to implement these custom headers in your config to tell browsers how longer the server will hold idle connections open, and how many requests will be served before the connection is closed.

For the next update, might I suggest the addition of a config option to turn on the addition of these headers and set the params, rather than users having to recognize the need to add the custom headers?

Many thanks once again for producing a maintaining a great program. Since switching to hiawatha I have had better performance, far fewer problems and no significant downtime for my sites.

David Johnson
Hugo Leisink
30 May 2014, 21:33
I can't log a 408 to the access or error log of a virtual host, because there is no request so Hiawatha doesn't know for which virtual host to select. Therefor, the log goes in the the general system log.

Thinking about all you wrote, I don't understand the cause of the quick 408 errors and how your solution works for you.
David Johnson
3 June 2014, 12:25
I think the problem I was seeing is a result of a latency overlap between the client sending a request and the server closing the socket as the TimeForReqest timeout kicks in. Having scanned through section 8 of RFC2616 on persistent connections, my understanding is that this can occur:

"A client, server, or proxy MAY close the transport connection at any time. For example, a client might have started to send a new request at the same time that the server has decided to close the "idle" connection. From the server's point of view, the connection is being closed while it was idle, but from the client's point of view, a request is in progress."

It goes on to state that the client SHOULD reopen a connection to request the object again. It looks to me that this isn't always happening - I've seen the problem with both Chrome and IE10 on windows. Usually this showed up as missing images on a webpage, but just now and again I got a 408 error page displayed in response to the initial page request.

With experiment I could get these types of problem to occur more frequently by timing when I request a page in the browser relative to the TimeForRequest setting in the
config - eg with TimeForRequest set to 10s, by clicking on a link at about 10s after the last page completed loading, the errors would occur most frequently.

I was also getting LOTS of timeout errors listed in the hiawatha system.log of the reverse proxy:

144.59.*.*|Sat 24 May 2014 23:45:22 +0000|Timeout while waiting for request

and these errors showed up in the log for my local IP at the point at which I was getting the errors in the browser.

This is my setup:

Hiawatha 9.5 acting as reverse proxy connecting to a couple of Hiawatha 9.5 backends on openvz VMs (although I think the problem has existed since at least 9.3.1, possibly
earlier) All public traffic to the reverse proxy is https, all traffic between reverse proxy and backends is http. Reverse proxy caching is enabled for common static files.

The key question is why is the client not opening a new connection and requesting the missing images again? I started to try and look at how the proxy was closing
connections, but for https, this appears to be implemented in the PolarSSL code and I quickly got lost!

It is my understanding of RFC2068 (19.7.1.1) that adding the KeepAlive response header informs the client how long an idle connection will be kept open, and how
many requests will be served over a connection before it is closed by the server. It is better explained here:

http://tools.ietf.org/id/draft-thomson-hybi-http-timeout-01.html

It appears that common browsers do use this info to ensure that they don't send requests on connections which are about to timeout. With these response headers sent by the server the problem has gone and I haven't seen a single basic timeout error in the reverse proxy system log. I do still get a few of these:

98.21.*.*|Sun 01 Jun 2014 06:05:15 +0100|Timeout while waiting for first request
92.24.*.*|Sun 01 Jun 2014 06:05:16 +0100|Silent client disconnected

but I've assumed that these relate to the fact that most browsers open multiple connections and may on occasion end up not using one of them, so it times out without any request being sent.

Further, from RFC2068, the KeepAlive response header is a "hop-by-hop" header which is not supposed to be forwarded by proxies, does this mean that it should really be
added at the reverse proxy, not the backend? I tried this, it didn't work, but by adding the headers at the backend they are being passed through the reverse proxy and are
received by the browser.

As I've said before, I'm not an expert on HTTP, so my understanding could be wrong, but it does appear to have fixed the problem and my system.log files are a lot smaller.

Hope this is helpful. If you need any more information, copies of config files etc please let me know.

Regards, David
David Johnson
6 June 2014, 16:25
Hi Hugo,

Just another snippet of info on the issue - it appears that the problem may be at the browser end than the server. See the following re this very problem in Chrome - the browser I tend to use most. I'm sure I've seen the same issue with IE10 as well though.

https://code.google.com/p/chromium/issues/detail?id=377581

The work-around of using Keep-Alive headers may be useful to others who run into this problem with hiawatha.

Regards, David
This topic has been closed.