Forum

gzip compression

emmel
5 March 2017, 11:34
Hello Hugo,

I know I'm a bit late to the show, but I just realized that you removed GZfile in favor of on-the-fly compression in version 10. Now, on-the-fly-compression is surely nice to have, but does that mean I can't benefit from pre-compressed files at all anymore?
I have a somewhat large set of static svg files roughly in the 750k to meg range (uncompressed). Caching in memory isn't really practical, but I'd really like to avoid having to recompress them every time. So I was really looking forward to offloading the work to the disk letting hiawatha do its thing...

So, can I still utilize precompressed files somehow, or am I stuck with hiawatha doing the work over and over again?

Greetings,
emmel
Hugo Leisink
5 March 2017, 11:44
You now got the best of both worlds. Upon the first request, Hiawatha will gzip the requested file and cache it on disk (/var/lib/hiawatha/gzipped). Every time the file is requested again, the already gzipped version from disk will be used. It will notice (timestamp and size) file changes and the cache is cleared upon restart. How does that sound?
emmel
5 March 2017, 21:57
Not bad. It sounds not bad at all. Unfortunately it still means double work as I already have the gzipped version (first call actually goes to a script that creates the file and sends delivers it - usually gzipped).
I guess content that stems from CGI/FCGI will not be gzipped by hiawatha the same way static content is? If it is, then I can cut out a step, I guess.
Also, is there a way to manage the disk cache or should I use a periodic purge through a cronjob?
Hugo Leisink
6 March 2017, 09:06
Well, you can now simply remove that script and let Hiawatha handle the gzipping.

No, CGI/FastCGI output will not be zipped. But (assuming that's what you use) PHP can gzip its output as well. So, that's covered too.

Cache should be handled manually. You can delete any file you want. Hiawatha doesn't keep an index in memory. It simply looks up an existing gzipped version in the cache directory. Serve if found, generate a new one if not.
emmel
6 March 2017, 23:27
The script doesn't gzip the file, it actually creates the file. And since I have to gzip it for delivery anyway, it would have been easy enough to write the gzipped version to disk as I already have it.

But you are giving me an idea. How exactly do you generate the file names for the cache files. I mean, I could probably dig through the source code, but you probably know by heart.

(Also: You might want to consider adding a paragraph or two to the man page regarding the gzip and cache.)
Hugo Leisink
7 March 2017, 10:04
Its just the SHA256 hash of the complete path + filename, followed by '.gz'. The code for it can be found in src/target.c at line 120. My advice is to let Hiawatha handle the gzipping. Just offer the file for download and let Hiawatha handle the rest.
This topic has been closed.