Forum

Reverse Proxy Cache

Martijn
1 February 2013, 09:46
First of all thanks for making Hiawatha Webserver.

I've setup Hiawatha as a reverse proxy for my Apache + Mysql + Wordpress setup.
It runs great. It's one VPS so all the software run on the same server.
Apache is setup for port 8080 en Hiawatha is running at port 80.

In the hiawatha.conf I've added these rules:

CacheSize = 50
CacheMaxFilesize = 1024
ReverseProxy .* http://127.0.0.1:8080

The proxy works. I can visit the sites.
However I was wondering, does Hiawatha serve static files from it's own cache?
Because I still see GET logs entries for small images when requesting a page.

Is that possible like you can do with Nginx?
Or do I need to setup mod_expire under Apache?
I've build Hiawatha with the make debian package.

One other thing (for Hugo). There's a typo on the manpage:

CacheMinFilesize = <size in bytes>
Minimum size of a file Hiawatha will store in its internal cache.
Default = 1, example: CacheMaxFilesize = 512
(requires that Hiawatha was not compiled with -DENABLE_CACHE=off)

CacheMaxFilesize = 512 should be CacheMinFilesize = 512 I believe.

Hiawatha version: 8.7
Operating System: Debian Squeeze
Hugo Leisink
1 February 2013, 09:50
When working as a reverse proxy, nothing is cached. And requests are always logged, whether files are loaded from disk or from cache.
Martijn
1 February 2013, 09:53
Ok.

The entry I was referring to the apache log file, not the Hiawatha log file.

Can I add this as a feature request?
So you can accomplish something like this:

http://www.djm.org.uk/wordpress-nginx-reverse-proxy-caching-setup/
Hugo Leisink
1 February 2013, 15:44
I'll take a look at it.
Martijn
1 February 2013, 15:46
Tnx!
Martijn
1 February 2013, 15:49
Here's an example how I've setup a reverse proxy with nginx.
Static files like js, css, jpg etc are cached and served directly from nginx, or for the first time retreived via Apache.
Dynamic pages are retreived via Apache.
proxy_cache_path  /tmp/nginx_cache  levels=1:2 keys_zone=STATIC:10m inactive=24h max_size=1g;

server {
listen 80;
client_max_body_size 4m;

location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
}

location ~* \.(js|css|png|jpg|jpeg|gif|ico)$ {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_cache STATIC;
proxy_cache_valid 200 1d;
proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504;
}
}
Aquanet
2 February 2013, 03:00
Hello Hugo,

We are also interested in Reverse-proxy side caching and more control over it.

To Martijn, we have done the following to overcome hiawatha's lack of reverse proxy caching:

Hiawatha on port 80 as reverse proxy to port 8080
Nginx on port 8080 as reverse proxy to Apache on port 8081
Apache on port 8081

Regards
Andrew.
Aquanet
2 February 2013, 03:01
Alternative configuration that we have not tested yet is:

Hiawatha on port 80
Varnish Cache on port 8080
Apache on port 8081
Aquanet
2 February 2013, 04:18
just tested second configuration (hiawatha+varnish), everything works great, super fast
Martijn
3 February 2013, 11:31
@Aquanet
Thanks for the tips. For now I've settled with mod_expires in Apache. Html5boilerplate has a pretty good .htaccess file with gzip and expire headers config.
I choosed not use .htaccess but added it to extra conf files under Apache.

@Hugo
Do the options like BanOnGarbage and PreventXSS work with the ReverseProxy option? Or are these also disabled when setup as a reverse proxy?
Hugo Leisink
3 February 2013, 14:53
Options like BanOnGarbage and PreventXSS also work for the Reverse Proxy.
Martijn
4 February 2013, 19:40
Great! Tnx.
Martijn
7 February 2013, 22:13
Hi Hugo.

As an alternative to reverse proxy cache, I would also be happy with a solution like this in Nginx:

# Server static files directly
location ~ ^/(images|javascript|js|css|flash|media|static)/ {
root /var/www/default_website;
expires 1w;
}

# Serve dynamic content via Apache
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
}


In this setup nginx servers static files directly, the dynamic part is send to Apache.

I could not figure it out if this is already possible or not with reverse proxy and virtual hosts.
Hugo Leisink
7 February 2013, 23:35
The beta version of 8.8 which has caching for reverse proxy can be found here. It caches files with extensions set via the CacheRProxyExtensions option. Only responses with a 200 HTTP code are being cached. They will not be cached if the Cache-Control or Pragma HTTP header contains 'no-cache'.

Please let me know if it works.
Aquanet
8 February 2013, 00:52
Very interesting, gona try shortly and compare with Hiawatha+Varnish speed.

Can you please add description of "CacheRProxyExtensions" to Manual page, how exactly it is used.

How extensions are specified...

CacheRProxyExtensions = png,css,js,gif

Like that?
Aquanet
8 February 2013, 00:56
BTW the "Cache-control" header is in many cases misleading, some websites seem to add it to all content...from our experience...

I think would be great to have some control over it, like:

NoCacheHeaderCaching = no / 10 seconds / 20 seconds / etc...

(for example, we set our Varnish setup to 40 second caching of "no-cache" header pages and noone yet complained. Today we had 600,000 pages served via Hiawatha/Varnish setup)
Aquanet
8 February 2013, 01:00
Finally, it would be great to add some header to cached objects, like:

CachedObjectHeader: "Cache HIT| Cache MISS"

So to know whether object was served from cache or directly =)

Just giving you thoughts on this
Aquanet
8 February 2013, 01:22
found description in MAN page =)
Aquanet
8 February 2013, 02:16
Got a few questions though:

1) Does cache save files on DISK on in RAM?

2) Will it be possible to increase MAX ram size above 50 MB? I dont think this is enough for multi-website installs.
Martijn
8 February 2013, 10:05
Tnx for the beta release.
This weekend I will give it a try.
Hugo Leisink
8 February 2013, 10:56
1) Does cache save files on DISK on in RAM?

RAM

2) Will it be possible to increase MAX ram size above 50 MB? I dont think this is enough for multi-website installs.

Yes, see CacheSize setting in the manual page.
Martijn
8 February 2013, 13:14
I've compiled the beta version.

Added this to hiawatha.conf:

CacheRProxyExtensions = js,css,png,jpg,jpeg,gif,ico
CacheSize = 50
CacheMaxFilesize = 1024
ReverseProxy .* http://127.0.0.1

And this to apache (+ mod_headers enabled):

Header set Cache-Control "max-age=604800, public"

However I still see GET request for the jpg's and js/css files in the apache log.

With nginx I did not have those entries once cached by nginx. In hiawatha they keep showing up.

What am I doing wrong?
Martijn
8 February 2013, 14:57
Done some debugging with Firebug.

First in Safari I've clicked through my site. So static files should be cached by now.
Then I open up Firefox with firebug enabled.
Cleared the browser cache.
On the first run everything is revevied with http 200 status. These entries alos show up in the apache log.
If I pick a jpg file the request headers do not contain cache control or pragma.
The response header contains Cache Control max-age 604800. So that should be oke.

Next I reload the page.
JS and CSS files are received with http 304 buth the jpg and png with 200.
Now the request header contains cache-control max-age=0 and the response header max-age=604800. Weird.

On the third run all static files are received with http 304.
But still with mag-age=0 and max-age=604800.

Cache is enabled in firefox / firebug.

Martijn
8 February 2013, 20:09
Ok. Small update.

It's working, but not a bit strange.
First this:
js file like this: jquery.js?ver=1.8.3 don't work. Understandable because it's not the end of the extension (.js). But very common these days with CMS systems.

The image are cached, but very short.
When visiting my site the images are logged in apache. Clear cache, load page again and voila no image entries in apache.
But when I clear the cache in my browser and wait like a couple of minutes then the images are requested via apache.

So it looks like it's only caching for maybe a minute or so?

max-age is set far enough in the future so that could not be the problem.
Aquanet
8 February 2013, 21:25
We also tested, Martijn, and Varnish so far wins =) But it's a nice feature to have for people who don't serve millions of pages.

Varnish is much more flexible at the moment
Hugo Leisink
8 February 2013, 22:31
Hiawatha only caches responses with a 200 result code. 304 are not cached. Good point, this needs some more attention before the final release. Hiawatha caches a file for only a minute. If a file within that minute is requested again, 60 seconds are added to the cache timer. This goes to a maximum of one hour. Thanks all for the test feedback.
Hugo Leisink
9 February 2013, 09:22
I've updated the beta package. Please, redownload and test again.
Martijn
9 February 2013, 11:25
I've downloaded and compiled the new package.

What should be different? Because I still see the same behavior.

Yesterday I've modified the time_in_cache to HOUR instead of MINUTE. That worked.
And maybe it is possible to 'strip' the request_uri in the cache.c file at the proxy function at the bottom of the file.
Something like request_uri_stripped = substr session_request_uri from 0 to first ?
Yeah I know bit of lame pseudo code.
Hugo Leisink
10 February 2013, 08:58
I don't know what you mean with the 'strip the request_uri' remark.
Martijn
10 February 2013, 11:01
js files like jquery.js?ver=1.8.1 are not cached.
Probably because of the strrchr looking for the dot . Which sets the pointer and then in_charlist is performed.
So no match is made because .js?ver=1.8.1 is not in the extensions list for caching.

If you do some sort of regex on the request uri to filter out stuff like this: ?ver=1.8.1, based on the ? character, then it would be possible to cache the js and css files also.
Hugo Leisink
10 February 2013, 12:13
I do look for question marks. Look at the function extension_from_uri() in libstr.c, line 549. It all works fine here. Please note that the beta package has been updated since I first released it.
Martijn
10 February 2013, 19:54
Yest I've downloaded the latest version.

Are you sure you updated the package on the site?

Because when I download the beta version and do a grep -inr 'extension_from_uri' . in the dir I get no results.
Also libstr.c ends at line 548.

Still my .js?ver files are not cached. So not sure if the latest version is on the site for download.
Hugo Leisink
10 February 2013, 21:34
Hmm, you're right. I thought I updated it, but apparently not. I updated it now, so redownload it again
Martijn
10 February 2013, 22:39
Yes now I see line 549 and more.

Sorry to say but unfortunately still no caching voor js files with ? in the url.
Images are not all cached on the first run, sometimes 1, then 3, then all.

I'll let it rest for a while. I don't have much time to test it on the website for a couple of weeks.

Thanks for the help.
This topic has been closed.