sigkill - Page 2 - Developer IT

Apache error: could not make child process 25105 exit, attempting to continue anyway

- by Temnovit

Hello! I have a web server based on Ubuntu Server 9.10 with this software: apache 2 PHP 5.3 MySQL 5 Python 2.5 Few of my websites are PHP based, few use python/django through mod_wsgi. For month or so, every day my apache server stops responding until I manually restart it. Error logs show: [Fri Mar 05 17:06:47 2010] [error] could not make child process 25059 exit, attempting to continue anyway [Fri Mar 05 17:06:47 2010] [error] could not make child process 25061 exit, attempting to continue anyway [Fri Mar 05 17:06:47 2010] [error] could not make child process 24930 exit, attempting to continue anyway [Fri Mar 05 17:06:47 2010] [error] could not make child process 25084 exit, attempting to continue anyway [Fri Mar 05 17:06:47 2010] [error] could not make child process 25105 exit, attempting to continue anyway and so on. I tried to google this problem but it seems, that I can't find a solution there. How can I determine the cause of this error and how do I fix it? Thank you for your help. UPDATE Updating mod-wsgi to version 3.1 didn't solve the problem Updating PHP to 5.3 also didn't solve it Here is a list of all installed modules: core mod_log_config mod_logio prefork http_core mod_so mod_alias mod_auth_basic mod_authn_file mod_authz_default mod_authz_groupfile mod_authz_host mod_authz_user mod_autoindex mod_cgi mod_deflate mod_dir mod_env mod_mime mod_negotiation mod_php5 mod_rewrite mod_setenvif mod_status mod_wsgi Here's how my virtual host with wsgi looks: <VirtualHost *:80> ServerName example.net DocumentRoot /var/www/example.net #wcgi script that serves all the thing WSGIScriptAlias / /var/www/example.net/index.wsgi WSGIDaemonProcess example user=wsgideamonuser group=root processes=1 threads=10 WSGIProcessGroup example Alias /static /var/www/example.net/static #serving admin files Alias /media/ /usr/local/lib/python2.6/dist-packages/django/contrib/admin/media/ <Location "/static"> SetHandler None </Location> <Location "/media"> SetHandler None </Location> ErrorLog /var/www/example.net/error.log </VirtualHost> Error log now contains two types of errors fallowed one by another: [error] child process 9486 still did not exit, sending a SIGKILL [error] could not make child process 9106 exit, attempting to continue anyway

Read the article

Why is my cron daemon is being killed every few minutes?

- by user113215

As of about a week ago, my cron daemon refuses to stay running. I'm using Debian 6 x64 on an OpenVZ virtual machine. Running something like pgrep cron shows that the daemon isn't running. I start the service with service cron start or /etc/init.d/cron start and it launches, but it disappears from the running process list after a few minutes (varying anywhere between 1 - 30 minutes before the process is killed again). Using strace -f service cron start, I can see that the process is being killed for some reason: nanosleep({60, 0}, <unfinished ...> +++ killed by SIGKILL +++ There's nothing relevant in /var/log/syslog, /var/log/messages, /var/log/auth.log, or /var/log/kern.log to explain why the the process is dying. The system has at least 800 MB of free memory, and cat /proc/loadavg returns 0.22 0.13 0.04 so resources shouldn't be the issue. With cron running, free -m reports: total used free shared buffers cached Mem: 1024 211 812 0 0 0 -/+ buffers/cache: 211 812 Swap: 0 0 0 I also tried removing and reinstalling the cron package using apt-get. Update: I initially thought the problem was a resource issues. I erased my entire VPS and started from a fresh Debian image. There is now nothing else running on the system, but even from a clean install my cron daemon is still being killed at random. What else should I check? How do I find out what's killing my crond?

Read the article

PPPD Is Locking the Modem and Not Releasing It

- by Skid

Got an issue with PPPD on one of our system, we have a PC that is used to talk to remote sites via a dial up connection, the modem can both connect out to the sites and the sites also dial back in. Currently I'm having an issue where some times a site ether dials in or we dial out, and it connects, but then blocks the modem and throws and error to kern.log. Aug 26 14:23:57 TM-SCADA kernel: [191233.503745] INFO: task pppd:8142 blocked for more than 120 seconds. Aug 26 14:23:57 TM-SCADA kernel: [191233.503750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 14:23:57 TM-SCADA kernel: [191233.503753] pppd D ffffffff8180cb40 0 8142 1 0x00000000 Aug 26 14:23:57 TM-SCADA kernel: [191233.503759] ffff8800ac1f5dc8 0000000000000086 ffff8800ac1f5fd8 00000000000137c0 Aug 26 14:23:57 TM-SCADA kernel: [191233.503765] ffff8800ac1f4010 00000000000137c0 00000000000137c0 00000000000137c0 Aug 26 14:23:57 TM-SCADA kernel: [191233.503770] ffff8800ac1f5fd8 00000000000137c0 ffffffff81c13020 ffff880135df5b80 Aug 26 14:23:57 TM-SCADA kernel: [191233.503775] Call Trace: Aug 26 14:23:57 TM-SCADA kernel: [191233.503784] [<ffffffff8166ba29>] schedule+0x29/0x70 Aug 26 14:23:57 TM-SCADA kernel: [191233.503790] [<ffffffff813db005>] tty_ldisc_ref_wait+0x65/0xb0 Aug 26 14:23:57 TM-SCADA kernel: [191233.503796] [<ffffffff813f3061>] ? uart_ioctl+0xd1/0x1c0 Aug 26 14:23:57 TM-SCADA kernel: [191233.503801] [<ffffffff81076960>] ? wake_up_bit+0x40/0x40 Aug 26 14:23:57 TM-SCADA kernel: [191233.503806] [<ffffffff813d3fa0>] tty_ioctl+0x2c0/0x9a0 Aug 26 14:23:57 TM-SCADA kernel: [191233.503810] [<ffffffff811d0549>] ? fcntl_setlk+0x69/0x200 Aug 26 14:23:57 TM-SCADA kernel: [191233.503815] [<ffffffff81195f79>] do_vfs_ioctl+0x99/0x330 Aug 26 14:23:57 TM-SCADA kernel: [191233.503820] [<ffffffff81195212>] ? do_fcntl+0x232/0x410 Aug 26 14:23:57 TM-SCADA kernel: [191233.503823] [<ffffffff811962b1>] sys_ioctl+0xa1/0xb0 Aug 26 14:23:57 TM-SCADA kernel: [191233.503829] [<ffffffff81674e69>] system_call_fastpath+0x16/0x1b The syslog trace stops at "Serial connection established". Aug 28 06:00:03 TM-SCADA pppd[10358]: pppd 2.4.5 started by root, uid 0 Aug 28 06:00:04 TM-SCADA chat[10360]: abort on (NO CARRIER) Aug 28 06:00:04 TM-SCADA chat[10360]: abort on (NO DIALTONE) Aug 28 06:00:04 TM-SCADA chat[10360]: abort on (ERROR) Aug 28 06:00:04 TM-SCADA chat[10360]: abort on (NO ANSWER) Aug 28 06:00:04 TM-SCADA chat[10360]: abort on (BUSY) Aug 28 06:00:04 TM-SCADA chat[10360]: abort on (Username/Password Incorrect) Aug 28 06:00:04 TM-SCADA chat[10360]: send (atz^M) Aug 28 06:00:04 TM-SCADA chat[10360]: expect (OK) Aug 28 06:00:04 TM-SCADA chat[10360]: atz^M^M Aug 28 06:00:04 TM-SCADA chat[10360]: OK Aug 28 06:00:04 TM-SCADA chat[10360]: -- got it Aug 28 06:00:04 TM-SCADA chat[10360]: send (atx0^M) Aug 28 06:00:04 TM-SCADA chat[10360]: expect (OK) Aug 28 06:00:04 TM-SCADA chat[10360]: ^M Aug 28 06:00:04 TM-SCADA chat[10360]: atx0^M^M Aug 28 06:00:04 TM-SCADA chat[10360]: OK Aug 28 06:00:04 TM-SCADA chat[10360]: -- got it Aug 28 06:00:04 TM-SCADA chat[10360]: send (atdt0123456789^M) Aug 28 06:00:04 TM-SCADA chat[10360]: expect (CONNECT/ARQ) Aug 28 06:00:04 TM-SCADA chat[10360]: ^M Aug 28 06:00:30 TM-SCADA chat[10360]: atdt0123456789^M^M Aug 28 06:00:30 TM-SCADA chat[10360]: CONNECT/ARQ Aug 28 06:00:30 TM-SCADA chat[10360]: -- got it Aug 28 06:00:30 TM-SCADA pppd[10358]: Serial connection established. I've only found two ways to release the modem in this condition, the first is to turn the modem off and on again, the second is to delete the serial lock file, and then SIGKILL pppd. Now I could write into our software to do the latter if the modem is locked, but I would rather stop it from locking in the first place if at all possible. The reason I put this issue in the askubuntu is because we used to use OpenSuse and never had an issue with it, admittedly that was version 11.2 or earlier so its still and old kernel, but I figured I would ask here first anyway. Any suggestions of places to look would be appreciated.

Read the article

can upstart expect/respawn be used on processes that fork more than twice?

- by johnjamesmiller

I am using upstart to start/stop/automatically restart daemons. One of the daemons forks 4 times. The upstart cookbook states that it only supports forking twice. Is there a workaround? how it fails If I try to use expect daemon or expect fork upstart uses the pid of the second fork. When I try to stop the job nobody responds to upstarts SIGKILL signal and it hangs until you exhaust the pid space and loop back around. It gets worse if you add respawn. Upstart thinks the job died and immediately starts another one. bug acknowledged by upstream A bug has been entered for upstart. The solutions presented are stick with the old sysvinit, rewrite your daemon, or wait for a re-write. rhel is close to 2 years behind the latest upstart package so by the time the rewrite is released and we get updated the wait will probably be 4 years. The daemon is written by a subcontractor of a subcontractor of a contractor so it will not be fixed any time soon either.

Read the article

Why is my cron daemon is being killed every few minutes? OpenVZ?

- by user113215

As of about a week ago, my cron daemon refuses to stay running. I'm using Debian 6. Running something like pgrep cron shows that the daemon isn't running. I start the service with service cron start or /etc/init.d/cron start and it launches, but it disappears from the running process list after a few minutes (varying anywhere between 1 - 30 minutes before the process is killed again). Using strace -f service cron start, I can see that the process is being killed for some reason: nanosleep({56, 0}, 0x7fffa7184c80) = 0 stat("crontabs", {st_mode=S_IFDIR|S_ISVTX|0730, st_size=4096, ...}) = 0 stat("/etc/crontab", {st_mode=S_IFREG|0644, st_size=1100, ...}) = 0 stat("/etc/cron.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat("/etc/cron.d/php5", {st_mode=S_IFREG|0644, st_size=475, ...}) = 0 stat("/etc/cron.d/anacron", {st_mode=S_IFREG|0644, st_size=244, ...}) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, NULL, {0x4036f0, [CHLD], SA_RESTORER|SA_RESTART, 0x2b0e8465f230}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({60, 0}, <unfinished ...> +++ killed by SIGKILL +++ There's nothing relevant in /var/log/syslog, /var/log/messages, /var/log/auth.log, or /var/log/kern.log to explain why the the process is dying. The system has about 500 MB of free memory, and cat /proc/loadavg returns 0.10 0.21 0.45 so resources shouldn't be the issue. I also tried removing and reinstalling the cron package using apt-get. What else should I check? How do I find out what's killing my crond? Edit: I'm on a virtual machine under OpenVZ (and as such, I have no swap). With cron running, free -m reports: total used free shared buffers cached Mem: 1024 465 558 0 0 0 -/+ buffers/cache: 465 558 Swap: 0 0 0 My OpenVZ User Beancounters via cat /proc/user_beancounters: Version: 2.5 uid resource held maxheld barrier limit failcnt 172087: kmemsize 8275718 25561636 51200000 51200000 0 lockedpages 0 968 2048 2048 0 privvmpages 113442 266465 262200 262200 3740757 shmpages 788 4004 128000 128000 0 dummy 0 0 0 0 0 numproc 39 98 600 600 0 physpages 50521 208434 0 9223372036854775807 0 vmguarpages 0 0 512000 512000 0 oomguarpages 50521 208447 512000 512000 0 numtcpsock 7 323 4096 4096 0 numflock 7 64 2048 2048 0 numpty 1 4 32 32 0 numsiginfo 0 23 1024 1024 0 tcpsndbuf 137984 17878480 20480000 20480000 0 tcprcvbuf 114688 6983504 20480000 20480000 0 othersockbuf 162960 1074440 20480000 20480000 0 dgramrcvbuf 0 24208 10240000 10240000 0 numothersock 101 353 2048 2048 0 dcachesize 459171 747444 10240000 10240000 0 numfile 1010 4221 50000 50000 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 numiptent 39 424 2048 2048 0

Read the article

Can't sync filesystem without reboot

- by Fabio

I'm having an issue with a linux server. Once a week the running mysql instance hangs and there is no way to fully stop it. If I kill it, it remains in zombie status and init does not reap its pid. The server is used for staging deployments and some internal tools, so it's not under heavy load. The only process constantly used id mysql and for this I think that it's the only process which suffer of this issue. I've searched system logs for errors and the only thing I found is this error (repeated a couple of times) in dmesg output: [706560.640085] INFO: task mysqld:31965 blocked for more than 120 seconds. [706560.640198] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [706560.640312] mysqld D ffff88032fd93f40 0 31965 1 0x00000000 [706560.640317] ffff880242a27d18 0000000000000086 ffff88031a50dd00 ffff880242a27fd8 [706560.640321] ffff880242a27fd8 ffff880242a27fd8 ffff88031e549740 ffff88031a50dd00 [706560.640325] ffff88031a50dd00 ffff88032fd947f8 0000000000000002 ffffffff8112f250 [706560.640328] Call Trace: [706560.640338] [<ffffffff8112f250>] ? __lock_page+0x70/0x70 [706560.640344] [<ffffffff816cb1b9>] schedule+0x29/0x70 [706560.640347] [<ffffffff816cb28f>] io_schedule+0x8f/0xd0 [706560.640350] [<ffffffff8112f25e>] sleep_on_page+0xe/0x20 [706560.640353] [<ffffffff816c9900>] __wait_on_bit+0x60/0x90 [706560.640356] [<ffffffff8112f390>] wait_on_page_bit+0x80/0x90 [706560.640360] [<ffffffff8107dce0>] ? autoremove_wake_function+0x40/0x40 [706560.640363] [<ffffffff8112f891>] filemap_fdatawait_range+0x101/0x190 [706560.640366] [<ffffffff81130975>] filemap_write_and_wait_range+0x65/0x70 [706560.640371] [<ffffffff8122e441>] ext4_sync_file+0x71/0x320 [706560.640376] [<ffffffff811c3e6d>] do_fsync+0x5d/0x90 [706560.640379] [<ffffffff811c40d0>] sys_fsync+0x10/0x20 [706560.640383] [<ffffffff816d495d>] system_call_fastpath+0x1a/0x1f When this happens the only way to make everything working again is a full reboot, but in order to do that I'm forced to use this command after I've manually stopped all running processes echo b > /proc/sysrq-trigger otherwise normal reboot process hangs forever. I've tracked reboots script and I've found out that also the reboot process hangs on a sync call, this one in /etc/init.d/sendsigs (I'm on ubuntu) # Flush the kernel I/O buffer before we start to kill # processes, to make sure the IO of already stopped services to # not slow down the remaining processes to a point where they # are accidentily killed with SIGKILL because they did not # manage to shut down in time. sync I'm almost sure that the cause of this is an hardware issue (the RAID controller???) also because I've other two machines with the same hardware and software configuration and they don't suffer of this, but I can't find any hint in syslog or dmesg. I've also installed smartmontools and mcelog packages but none of them did report any issue. What can I do to track the cause of this issue? Today is happened again, here is the status of system after triggering a reboot init---console-kit-dae---64*[{console-kit-dae}] +-dbus-daemon +-mcelog +-mysqld---{mysqld} +-newrelic-daemon---newrelic-daemon---11*[{newrelic-daemon}] +-ntpd +-polkitd---{polkitd} +-python3 +-rpc.idmapd +-rpc.statd +-rpcbind +-sh---rc---S20sendsigs---sync +-smartd +-snmpd +-sshd---sshd---zsh---sudo---zsh---pstree +-sshd---sshd---zsh---sudo---zsh And here is the status of sync process # ps aux | grep sync root 3637 0.1 0.0 4352 372 ? D 05:53 0:00 sync i.e. Uninterruptible sleep... Hardware specs as reported by lshw I think the raid controller is a fake raid. I usually don't deal with hardware (and for the record I don't have physical access to it) description: Computer product: X7DBP () vendor: Supermicro version: 0123456789 serial: 0123456789 width: 64 bits capabilities: smbios-2.4 dmi-2.4 vsyscall32 configuration: administrator_password=disabled boot=normal frontpanel_password=unknown keyboard_password=unknown power-on_password=disabled uuid=53D19F64-D663-A017-8922-0030487C1FEE *-core description: Motherboard product: X7DBP vendor: Supermicro physical id: 0 version: PCB Version serial: 0123456789 *-firmware description: BIOS vendor: Phoenix Technologies LTD physical id: 0 version: 6.00 date: 05/29/2007 size: 106KiB capacity: 960KiB capabilities: pci pnp upgrade shadowing escd cdboot bootselect edd int13floppy2880 acpi usb ls120boot zipboot biosbootspecification *-storage description: RAID bus controller product: 631xESB/632xESB SATA RAID Controller vendor: Intel Corporation physical id: 1f.2 bus info: pci@0000:00:1f.2 version: 09 width: 32 bits clock: 66MHz capabilities: storage pm bus_master cap_list configuration: driver=ahci latency=0 resources: irq:19 ioport:18a0(size=8) ioport:1874(size=4) ioport:1878(size=8) ioport:1870(size=4) ioport:1880(size=32) memory:d8500400-d85007ff

Read the article

Possible to give one connection to each IP?

- by Alice

I am having overloading problems. Too many connections, and some IP has more than 20 connection at once. I do this command. netstat -anp |grep 'tcp\|udp' | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n To get total of connection and this is the output: 1 106.3.98.81 1 106.3.98.82 1 108.171.251.2 1 110.85.103.207 1 111.161.30.217 1 113.53.103.55 1 119.235.237.20 1 124.106.19.34 1 157.55.32.166 1 157.55.33.49 1 157.55.34.28 1 175.141.103.239 1 180.76.5.59 1 180.76.5.61 1 188.235.165.216 1 205.213.195.70 1 216.157.222.25 1 218.93.205.100 1 222.77.209.105 1 27.153.148.109 1 27.159.194.242 1 27.159.253.71 1 54.242.122.201 1 61.172.50.99 1 65.55.24.239 1 71.179.78.5 1 74.125.136.27 1 74.125.182.30 1 74.125.182.36 1 79.112.225.39 1 93.190.139.208 2 124.227.191.67 2 157.55.33.84 2 157.55.35.34 2 190.66.3.107 2 203.87.153.38 2 220.161.119.3 2 221.6.15.156 2 27.153.148.116 2 27.159.197.0 2 96.47.224.42 3 202.14.70.1 3 218.6.15.42 3 222.77.218.226 3 222.77.224.187 3 37.59.66.100 3 46.4.181.244 3 87.98.254.192 3 91.207.8.62 4 188.143.233.222 4 218.108.168.166 4 221.12.154.18 4 93.182.157.8 4 94.142.128.183 5 180.246.170.187 5 8.21.6.226 6 178.137.94.87 6 218.93.205.112 7 199.15.234.222 9 9 125.253.97.6 10 178.137.17.196 11 46.118.192.179 12 212.79.14.14 21 72.201.187.135 27 0.0.0.0 Can anyone give me some directions, my server crashed few times this week because of this. Thanks. EDIT: Alright, my error logs says: [Thu Oct 18 12:17:39 2012] [error] could not make child process 4842 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4843 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4855 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4856 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4861 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4869 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4872 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4873 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4874 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4875 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4876 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4880 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4882 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4885 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4897 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4900 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4901 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4906 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4907 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4925 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4926 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4927 exit, attempting to continue anyway [Thu Oct 18 12:17:39 2012] [error] could not make child process 4931 exit, attempting to continue anyway [Thu Oct 18 12:17:40 2012] [notice] caught SIGTERM, shutting down PHP Warning: PHP Startup: Unable to load dynamic library '/usr/lib/php5/20060613+lfs/curl.iso' - /usr/lib/php5/20060613+lfs/curl.iso: cannot open shared object file: No such file or directory in Unknown on line 0 [Thu Oct 18 12:17:45 2012] [notice] Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny10 with Suhosin-Patch configured -- resuming normal operations And I have over thousands of line saying:(each has different process id) [Thu Oct 18 12:17:38 2012] [error] child process 4906 still did not exit, sending a SIGKILL And I also have line saying: [Wed Oct 17 09:44:58 2012] [error] server reached MaxClients setting, consider raising the MaxClients setting <IfModule prefork.c> StartServers 8 MinSpareServers 5 MaxSpareServers 50 MaxClients 300 MaxRequestsPerChild 5000 </IfModule>

Developer IT