Multiple instance of web server
Shashank
25 May 2009, 15:17
Hi
We have faced a strange issue. While running our application under load ( actually not thate nmuch ) 5, 6 users accessing website at same time. We are loading bit heavy images on a emdeded enviornment.
We observed that sometime there are more then one instance of webserver. We have not created these instances ( infact if we try to do it manually, its not able to bind the application to same port twice as expected).
After problem happens, we are not able to access any resource and site almost hangs.
Though we dont have the exact steps to reproduce the issue, but I am not sure if you have faced similar problem.
Can you please help us out in this.
Regards
Shashank
Shashank
25 May 2009, 18:04
these are the process running at that time
bash-3.2# ps -w | grep nobody
27013 nobody 5888 S hiawatha -c ../etc/hiawatha/ -d
27253 nobody 1608 S /usr/sbin/pppoe -p /var/run/pppoe.conf-pppoe.pid.pppoe -I wan0 -T 80 -U -m 1412
27543 nobody 5744 S hiawatha -c ../etc/hiawatha/ -d
27550 root 3224 S grep nobody
Hugo Leisink
25 May 2009, 18:55
Can you tell me a little more about your OS and how Hiawatha is started. Why is the '-d' flag used?
Shashank
25 May 2009, 21:23
We are using the debian linux 2.6.24 kernel.
When we tried it to run without -d option, server was giving eerror ... Error redirecting stdin. So we haves used -d option.
We are working on ARM processor.
Regards
Sachin
Shashank
25 May 2009, 21:40
We are now hitting this problem very frequently.
Right now we have decreased the load on system and only two ppl are testing simultaneously. Stil this problem is coming after every 10 mins. I am not sure if we are doing something wrong as we testing our site with Hiawatha for last two months and we are facing this issue from last week only.
The only changes we have made is backend is same but we have put heavy graphics and simultaneous testing level is high.
Hugo Leisink
25 May 2009, 22:27
If you are using Debian, I suggest that you take the source, use './configure' and 'make deb' to build a Debian package. Install the created package (located one directory higher that the source directory) via dpkg -i <hiawatha.deb> and use /etc/init.d/hiawatha to start and stop hiawatha. Make sure you start Hiawatha with user root.
Shashank
25 May 2009, 22:56
Hi
We are using the cross compilation. Will the same steps will work with this ? Though we are using debian, being in embedded enviornment, we dont have dpkg utility.
We are using the source version only. we are using configure and make.
Also I do remember in one previous update you suggested that we should not run hiawatha as root as its a secutiry issue ( In fact we are starting it as a root and server is itself switching to nobody)
Can you please suggest what can be the possible reason of multiple instances of webserver ? I am not sure how its going to change the things.
Hugo Leisink
25 May 2009, 23:08
A, cross compilation. That's always tricky.
About running as root: I meant starting it as root so Hiawatha is able to redirect stdin. It will indeed switch to the user as configured in httpd.conf. But since you're already doing that, ignore that remark.
Abou the multiple instances: Hiawatha will never start another instance of Hiawatha. It will only fork to start a CGI. But after that, it will always quit. So, whatever is starting the second instance, it's not Hiawatha. Have you configured all the CGI handlers correctly? What does the system.log file say?
If you use the -f flag on ps (for example 'ps auxf'), does that say who's the parent of the second instance? What does the information in /proc/<PID> tell you about who started which instance?
Shashank
25 May 2009, 23:35
well unfortunately again in busybox ( bundle of linux utilities for embdded enviormment) does not support -f option in ps.
But I have explored the /proc/PID option. Its suported. Hope that problem gets reproduce again adn we can make out some parent information.
Shashank
26 May 2009, 10:39
Hi Hugo
Hiawatha is itself starting its instance.
Please see the log below. If you see the logs, there are two hiawatah instances 3200 and 3345. 3345 is being started by 3200.
Also the seconf hiawatha is ephemeral and disappares after some time. I think hiawatah is starting itself as an CGI. This is only happening if I am tersting under load condition and I ahve not obsercved this ahppening when I am working with single user mode.
**************************************************************************************
3200 nobody 6508 S sbin/hiawatha -c etc/hiawatha/ -d
3228 root 3220 S /usr/sbin/telnetd -i
3229 root 3220 S -sh
3237 root 2492 S bash
3345 nobody 6320 S sbin/hiawatha -c etc/hiawatha/ -d
3361 root 3220 S /bin/sh ./startgw
3366 root 94512 S ./ipgateway
3393 root 3220 R ps -w
bash-3.2# cat /proc/3200/stat
3200 (hiawatha) S 901 901 899 34818 901 4194560 2053 21988 0 0 106 404 715 708 20 0 6 0 206176 6664192 331 4294967295 32768 130748 2661219456 2661218800 890473968 0 0 528420 19970 4294967295 0 0 17 0 0 0 0 0 0
bash-3.2#
bash-3.2#
bash-3.2#
bash-3.2# cat /proc/3345/stat
3345 (hiawatha) S 3200 901 899 34818 901 4194368 75 0 0 0 0 0 0 0 20 0 1 0 232959 6471680 177 4294967295 32768 130748 2661219456 891948248 889618556 0 0 528420 19970 2684815756 0 0 17 0 0 0 0 0 0
bash-3.2# cat /proc/901/stat
901 (bash) S 899 901 899 34818 901 4194304 573 119227 0 26 7 8 4753 8757 20 0 1 0 18369 2568192 349 4294967295 32768 772264 2659577552 2659576688 890071664 0 0 16388 1266695931 2684679688 0 0 17 0 0 0 0 0 0
bash-3.2#
***************************************************************************************
Hugo Leisink
26 May 2009, 11:04
What version of Hiawatha are you using? What does the ErrorLog and SystemLog say? Can you give me some information about the hardware Hiawatha running on?
Shashank
26 May 2009, 11:29
We are using Hiawatha 6.11. We are working on ARM processor. Error log/System log are not providing much information
Hugo Leisink
26 May 2009, 12:32
The only time Hiawatha forks, is when it's about to run a CGI program. If the CGI execution failes, the forked process outputs an error and exits. If Hiawatha has forked, but not yet run the CGI program or produced an error message, things went wrong between fork() and execvp(). The strange thing is that the code between those system calls is really straight forward. No loops. You can check it for yourself in cgi.c, line 290 to 361. So, my guess is that some system call in between has gone wrong and not returned. The only reason I can think of is the cross compilation. Is it possible to compile Hiawatha on the target machine?
Is it true that the processor load of the extra Hiawatha process is zero?
Shashank
26 May 2009, 13:13
Well I will check this.
About processor load yes we have a powerfiul processor and only 4-5 users are working on web at same time.
I do really like to understand from you how hiawatha have handled multiple requests. You are calling Select function but you are passing maxfd as fd assigned plus one. Is it is always sure that new fd assigned will be less then old one if multiple cgis are running.
In meantime can you please tell me the way so that I can run Hiawatha as root. I need to run some cripts from webserver which can only be executed as root. Right now I am stuck up and cant modify those secripts as those have interdendencies.
Shashank
26 May 2009, 13:14
Abouyt compiling on target machine Iwe have done that. Our initial testing ewas done on target machine only. But now we are tryong it on real setup with more users.
Hugo Leisink
26 May 2009, 13:30
Hiawatha uses threads to handle multiple connections. For each connection, a thread is created. About select(), maxfd + 1 is used because that's how it is done according to the select() manual page.
Hiawatha cannot be run as root because of security reasons. If you want to run a CGI job as root, you have to give that CGI program the setuid bit and make the program use setuid(0). If you really want to run Hiawatha as root, you have to change the code: remove the change-uid code in hiawatha.c starting on line 1733. Only do this when you know what you are doing.
If you are able to compile Hiawatha on the target machine, why did you choose to go for cross compiling?
Shashank
26 May 2009, 13:56
I do know the use of select. I was thinking there was only one process and single process was running the select on all open descriptors as select is generally used for more then one fds.
meanewhile in tho code I do have the cha nge user lines in 1654- 1664.
Hugo Leisink
26 May 2009, 14:03
There can be more that one fd. For every binding you create, there will be a fd Hiawatha has to listen to. CommandChannel connections are also handled via that select() call.
Shashank
26 May 2009, 15:26
Hmm abt the multiple instance issue ..
Can we get some pointer from the fact that second instance of server is executing amost the same time as of cgi timeout.
It is deterministic in nature.
Hugo Leisink
26 May 2009, 18:27
You are saying 'almost'. How much seconds is there in between. Is it the same amount of seconds every time? And is that amount of seconds equal to the TimeForCGI setting?
Shashank
27 May 2009, 07:56
yep .. u r rite .. its equal to timeout for CGI. This is one minute for our page.
Also we are able to reproduce this on auto refresh pages. we have kept the cgi timeout to 50 second and auto refresh time to 1 minute. Still its happening. Today I am planning to debug this issue. If you can give me some pointers, may be we can fix this issue.
Hugo Leisink
27 May 2009, 10:48
As I mentioned before, a CGI is executed via the code in cgi.c, line 290 to 361. What you can do is use the log_error() function to check if a certain point in the code is reached.
log_error(session, "Hello world");
The text will appear in the file specified by ErrorLogfile. My guess is that the problem lies somewhere between fork() and execvp().
Btw, thanks for not giving up on Hiawatha and for helping me to check if this is indeed a bug in Hiawatha or not.
Shashank
27 May 2009, 13:06
hmm .. well we are using hiawatha and wud be glad if we can contributesmoething back.
In ,meanwhile I have checked that u have not checked the retun type fro dup call between fork and execvp. Need to look more on this. Hope that we can resolve this soon as it will boost up our demonstartaion to management.
Shashank
27 May 2009, 21:57
Hi Hugo
Problem is coming in close_binding . We are not geeting prints after close_binding. There is a while loop in this function.
I think binding list getting corrupted some how. Can you please help in this. How you are populating binding list..
Shashank
27 May 2009, 22:16
oopss .. i get the logic for populating binding ..
As to verify you arer creating binding for all the ports server is listening to. When forking the process, child process dont need them .. so we are closing them .. but then why hang
there .. let me try with some more prints .. have u faced this earlier
Hugo Leisink
27 May 2009, 22:51
Do you use HTTPS bindings? Maybe it's an OpenSSL issue. If you do, can you please try to reproduce the problem without using HTTPS?
Shashank
27 May 2009, 23:17
sruy code is having problem in close_log. No we dont have ssl
Shashank
27 May 2009, 23:18
sry i mean log_close
Shashank
27 May 2009, 23:20
I am facing some problem in logging in these function as session instance is not avalilable and I cant use printf ( as it then treat it as cgi output).
I have tried to use syslog but then it was showing the problem in linking.
Can you suggest some way to log .. shd i use log_file
Shashank
27 May 2009, 23:27
I think we are not able to get the mutexlock
Hugo Leisink
28 May 2009, 00:09
You can use log_string(char *logfile, char *mesg) to log a string to a random logfile.
Shashank
28 May 2009, 00:18
well we have replaced the pthread_mutex_lock by pthreda_mutex_trylock.
Also we have now are not using the access log file and problem got disappeared.
Well as I am not aware of complete design, but is there any case in which same thread is calling the pthread_mutex_lock more then one time. If so, I think we can use the recursive lock in place of default lock.
Please guide.
Hugo Leisink
28 May 2009, 00:21
What do you mean with "we now are not using the access log file" ? What code did you disable?
Shashank
28 May 2009, 00:26
well i did not diable the file. But I have deleted the log directory. So our code was not able to take make the access file.
Shashank
28 May 2009, 00:31
So when webserver tries to open the access file in apend mode, it get failure as dirctory is not present
Hugo Leisink
28 May 2009, 00:48
Oke, close_binding() is not causing the problem, but log_close() is. It's failing because the mutex causes a lockup, right?
Shashank
28 May 2009, 00:49
On off note, with bit heavy load, we are getting 503 a lot. It is happening when 2,3 session trying to retrive the same page.
But shd it ahpepen so frequently ?
Shashank
28 May 2009, 00:49
yep, you are right , its log_close which is causing the problem for multiple instance issue
Hugo Leisink
28 May 2009, 01:05
Oke, thanks. I will investigate it. I find this really strange, because I use Hiawatha on a heavily used webserver with more than 40 websites and lots of users. It runs with no problem at all. So, I'm really curious at what's going wrong on your server. So, any detailed information is welcome.
Hugo Leisink
28 May 2009, 01:07
Wait a minute. In one of your first posts you said you have to use '-d' because of a 'error redirecting stdin' error. I wonder why that is. Can you investigate that one for me?
Hugo Leisink
28 May 2009, 01:16
I'm thinking about this issue...... I wonder what happens when a thread executes pthread_mutex_lock(), another thread forks and the first thread executes pthread_mutex_unlock(). In the forked process, the mutex record has not been set to 'unlocked'. That might cause some problem.....
Shashank
28 May 2009, 01:21
well now we are not facing this problem right now after making webserver run as root user.
In our setup to access the /dev directory directly only superuser is allowed. I am not sure if it made it work but right now its working fine without -d option.
I am really worried about 503 issue. what does copy_directory_settings function do. when it will return 503 ?
Hugo Leisink
28 May 2009, 01:25
copy_directory_settings() copies the settings from a directory configuration block to the current session record. It will return 503 when you configure that files in a certain directory can be accessed by a maximum amount of users at the same time and that amount of users is reached. Look for the UploadSpeed setting in the manual page.
Shashank
28 May 2009, 01:28
aha .. it may cause the problem but I am not suyre exactly ...i thibnk thread informatuion maintaoined by OS and its specic to task/thread. When we create thread/process a nre process id is allocated. So thread ownership is not shared as file descriptors
Hugo Leisink
28 May 2009, 01:30
I updated a previous post. Can you please try the code patch in it and let me know if that solves your problem?
Shashank
28 May 2009, 01:30
about 503 issue .... what is amount of users .. is it specific to user ot client ? I think you mean to say client ...if I log from 10 different places with same user will it give 503 ?
Hugo Leisink
28 May 2009, 01:32
Please try using a "if (force_close == false)" with the pthread_mutex functions in log_close():
if (force_close == false) {
pthread_mutex_lock(&accesslog_mutex);
}
...
if (force_close == false) {
pthread_mutex_unlock(&accesslog_mutex);
}
and let me know if that solves your problem.
Shashank
28 May 2009, 01:34
well just curios .. how does force_close works ?
For some time my setup is taken by someone else .. I will try it in some time and let you know ..
Can you please tell abt 503 as I updated in previos post
Hugo Leisink
28 May 2009, 01:37
force_close is just a parameter of log_close. Some calls of log_close() set it to 'true', others set it to 'false'.
about the 503 in copy_directory_settings(): if you don't use the UploadSpeed setting in a Directory configuration block, you can just ignore it. It won't cause you any trouble since that code won't be executed.
Shashank
28 May 2009, 01:51
well we did not set the Upload spped ..
I am trying this for force_Close
if (force_close) {
pthread_mutex_lock(&accesslog_mutex);
while (host != NULL) {
if ((now >= host->access_time + LOGFILE_OPEN_TIME) || force_close) {
if (*(host->access_fp) != NULL) {
fclose(*(host->access_fp));
*(host->access_fp) = NULL;
}
}
host = host->next;
}
pthread_mutex_unlock(&accesslog_mutex);
}
But I think it will impact your LOGFILE_OPEN_TIME feature i.e. to close file if it not accedd in last 30 secs
Hugo Leisink
28 May 2009, 01:53
That code is not correct. Use this:
if (force_close == false) {
pthread_mutex_lock(&accesslog_mutex);
}
while (host != NULL) {
if ((now >= host->access_time + LOGFILE_OPEN_TIME) || force_close) {
if (*(host->access_fp) != NULL) {
fclose(*(host->access_fp));
*(host->access_fp) = NULL;
}
}
host = host->next;
}
if (force_close == false) {
pthread_mutex_unlock(&accesslog_mutex);
}
Shashank
28 May 2009, 01:56
Ofcourse it will solve our problem .. I will test but I dont think testing is reqd ... as now we are not taling the lock when we are calling it from fork_cgi_process
Shashank
28 May 2009, 02:05
I was wrong testing was reqd
No problem is not solved with this patch
same problem is coming .. I am myself not sure about exact problem but may be we can try for anothey type of mutest .. errorcheck or recursive mutex...
Shashank
28 May 2009, 02:10
log_request is also using the pthread_mutex_lock. With multiple request this function will be heavily called ... can it cause some problem ..
Hugo Leisink
28 May 2009, 02:15
The problem is that when a thread calls fork() while another thread has a mutex lock, the lock in the chiuld process exists and won't be cleared because the unlock function is called in the parent process.
I've made a test version of Hiawatha for you. You can download it via
http://www.leisink.org/~hugo/hiawatha-6.14.tar.gz. Can you please test that one for me? Please, do not distribute that tarball. It is
not an official release.
Shashank
28 May 2009, 02:27
well Hugo I will certainly do that .. but can we please postpone it till Monday. as I have to work for the demo on Friday ...
If the issue is same as you suggested, then recursive lock shd solve the problem ...I am really sorry but I promise by Monday, I will do this and will update you the results ...
Can we please do that ?
Hugo Leisink
28 May 2009, 02:29
Sure, no problem. I'm almost sure that the fork-mutex thing is the problem that's bugging you. A lot can be found about it on the internet. Thanks a lot for not giving up on this problem and helping me improve Hiawatha. I really appriciate it!!!!
Shashank
28 May 2009, 02:40
well pleasure is mine .. I am working on Hiawatha and wud like to c that incorporated ..
In btw , you may think of providing the syslog capabilty in Hiawatha ..
Also can you please help us with 503 issue .... I have opened 10 sesssions at single time .. all sessions are access the same page and pages are auto refresh pages .. after some time it start giveing 503 ....
Hugo Leisink
28 May 2009, 02:43
Are you using FastCGI?
Shashank
28 May 2009, 02:46
Nope .... In a single page we are loading some images and then calling a cgi
Hugo Leisink
28 May 2009, 02:49
Well, the only 2 places where Hiawatha generates a 503 error is when the maximum amount of users for UploadSpeed is reached or a FastCGI server is not available. Since both cases do not apply to your situation, the only reason is that your CGI application is generating the 503 error.
Shashank
28 May 2009, 02:59
means ? I have not explicitly used the 503 error code ..
Abt multimple instance issue .. if in place of tar you can tell me the exact changes, I can try them roight now .. I am on intranet and cant downlaod the files from external server
Shashank
28 May 2009, 03:28
In meanwhile I have tried following code
pthread_mutex_t accesslog_mutex;
pthread_mutex_attr_t accesslog_mutex_attr;
static int delay_timer = 0;
void init_log_module(void) {
pthread_mutexattr_settype( &accesslog_mutex_attr, PTHREAD_MUTEX_RECURSIVE );
pthread_mutex_init(&accesslog_mutex, &accesslog_mutex_attr);
delay_timer = 0;
}
I am ghetting other problem .. dup is getting failed
...
Shashank
28 May 2009, 03:57
In between, Thanks for the extensive help you have given us today ..
Thanks a lot
Shashank
1 June 2009, 15:35
Hi Hugo
Problem is resolved in new version.
Thanks
This topic has been closed.