Forum

does not work typo3 extension "crawler" for indexed search

lvuser
23 December 2008, 13:44
Does not work automatic page indexing witch is realized by crawler extension for typo3.
This typo3 extension is tested and work fine on Apache, Lighttpd and Nginx, but on Hiawatha don`t..
No errors in Hiawatha logs.
We can`t find the reason of this problem.
Any solutions?
Hugo Leisink
23 December 2008, 14:00
I don't know TYPO3 or its crawler extension. Take a look in the access logfile after you've run the crawler extension and see what URL's generate a 404 error.
lvuser
23 December 2008, 14:26
Error 404 not found in access logfile

There is some strings from access logfile:

192.168.1.46|Tue 23 Dec 2008 14:20:41 +0000|200|18296||GET /index.php?id=228 HTTP/1.0|Host: 192.168.1.46|Connection: keep-alive|X-T3crawler: 6228:b14748f392e4e042a2b1dbfe6baafba1
192.168.1.46|Tue 23 Dec 2008 14:20:42 +0000|200|9679||GET /typo3/mod/web/info/index.php?&id=1&SET[crawlaction]=cli HTTP/1.1|Host: 192.168.1.46|User-Agent: Mozilla/5.0 (X11; U; Linux i686; lv-LV; rv:1.9.2a1pre) Gecko/20081218 Minefield/3.2a1pre|Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8|Accept-Language: en-us,en;q=0.5|Accept-Encoding: gzip,deflate|Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7|Keep-Alive: 300|Connection: keep-alive|Referer: http://192.168.1.46/typo3/mod/web/info/index.php?&id=1&SET[crawlaction]=log|If-Modified-Since: Tue, 23 Dec 2008 14:19:59 GMT
192.168.1.46|Tue 23 Dec 2008 14:20:42 +0000|200|17304||GET /index.php?id=228&L=1 HTTP/1.0|Host: 192.168.1.46|Connection: keep-alive|X-T3crawler: 6229:1b3de9750dc8bdc4df7fc4db0af751bf
192.168.1.46|Tue 23 Dec 2008 14:20:42 +0000|200|19910||GET /index.php?id=228&L=2 HTTP/1.0|Host: 192.168.1.46|Connection: keep-alive|X-T3crawler: 6230:ee10520fdc2ee8ad8a8a2741d553667c
Hugo Leisink
23 December 2008, 17:05
My guess is that the problem lies in the X-T3crawler HTTP header, which is being sent. Hiawatha ignores it, since it's not a standard HTTP header. I think that converting that header line to a HTTP_X_T3CRAWLER environment string will solve the problem. I'll see what I can do for the next release.
Hugo Leisink
24 December 2008, 09:59
A temporary solution for this problem is to edit the file envir.c and add the following line to the matching block of lines around line 180-190:
headerfield_to_environment(session, fcgi_buffer, "X-T3crawler:", "HTTP_X_T3CRAWLER:");


Please, let me know if this solves your problem.
lvuser
29 December 2008, 11:56
We edit the file envir.c , recompile hiawatha and test crawler extension for typo3, but without positive results.
Access log look`s like previously...
Hugo Leisink
29 December 2008, 13:43
I've just released 6.11, which should handle X-HTTP headers better. Can you test if that version works?

If it still doesn't work, please investigate the TYPO3 crawler extension. I'm sure it does something in a non-compliant way.
lvuser
30 December 2008, 09:57
Thank You very much!!!
Now all work`s fine!
This topic has been closed.