Why doesn't pppd over ssh work here? Why can't I kill pppd?
- by Peter V. Mørch
I'm trying to setup a simple ppp tunnel over ssh. It works on several machines just fine. But on one machine, pppd gets "stuck":
> pgrep pppd | xargs ps up
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 4178 0.0 0.1 3020 1088 pts/1 Ds+ 05:28 0:00 /usr/sbin/pppd
Any attempt to kill it (even sudo kill -9 4178) has no effect that I can see. strace -p 4178 also hangs similarly. After it has been started for a while, I start getting messages in dmesg like shown below.
It is started like so from another machine:
ssh -t root@server /usr/sbin/pppd passive noauth
When I do this to one of the machines that work, the remote end's pppd spits out garbage/binary data to the console (as expected). When I do it to the one that fails, I get no output from pppd, but the ssh session eventually times out. If I instead ssh to the machine, and then run /usr/sbin/pppd passive noauth in a separate step I also get the expected binary output.
I now have a couple of questions:
What could be up with the one machine where pppd fails? I don't even know where to start looking...
What could be the difference between ssh -t root@server /usr/sbin/pppd passive noauth in a single step and ssh root@server and /usr/sbin/pppd passive noauth in two steps?
How can it be that I can't kill the process even with sudo kill -9? The only way I know is to reboot.
(I've tried searching for something similar but didn't get anywhere so I'm sorry I don't have any more leads)
Any ideas?
The problem machine runs in debian on VMware "hardware" (as do the ones that work) and it exhibits the problem when cloned and on both debian lenny (original) and squeeze (after upgrade)
dmesg entries:
[ 1198.727248] INFO: task pppd:4178 blocked for more than 120 seconds.
[ 1198.727507] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1198.727904] pppd D ece2dc9c 0 4178 4174 0x00000004
[ 1198.727908] 00000098 00000082 f2503520 ece2dc9c 0000b1e7 00000000 c148d1c0 c148d1c0
[ 1198.727913] f2a06100 f6e071c0 00000000 ece2dc18 f5cd07e0 00000000 ece2d400 ece2dc9d
[ 1198.727918] 00c52300 ece2dcbc f67bfef8 ec98e480 f291cec0 00000000 c10cf5b0 c10dfd21
[ 1198.727923] Call Trace:
[ 1198.727926] [<c10cf5b0>] ? nameidata_to_filp+0x37/0x41
[ 1198.727929] [<c10dfd21>] ? dput+0x21/0xb7
[ 1198.727932] [<c11cfecc>] ? tty_ldisc_ref_wait+0x5f/0x76
[ 1198.727935] [<c104de7a>] ? wake_up_bit+0x5c/0x5c
[ 1198.727938] [<c11cb91b>] ? tty_ioctl+0x85f/0x8ba
[ 1198.727941] [<c10fec18>] ? do_lock_file_wait+0x3d/0xd9
[ 1198.727944] [<c1162c97>] ? _copy_from_user+0x2b/0x102
[ 1198.727946] [<c11cb0bc>] ? tty_check_change+0xb9/0xb9
[ 1198.727949] [<c10dbeb7>] ? do_vfs_ioctl+0x485/0x4c7
[ 1198.727952] [<c10db59a>] ? do_fcntl+0x24f/0x3a2
[ 1198.727954] [<c10dbf3a>] ? sys_ioctl+0x41/0x58
[ 1198.727957] [<c12c6a1f>] ? sysenter_do_call+0x12/0x28
[ 1318.457225] INFO: task sshd:4174 blocked for more than 120 seconds.
[ 1318.457500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1318.457896] sshd D f25024cc 0 4174 2393 0x00000000
[ 1318.457901] 00000098 00000086 f2a06940 f25024cc 0000b246 00000000 c148d1c0 c148d1c0
[ 1318.457906] f2503520 f6e071c0 00000000 3f056585 0000000f ece2d4bc 3f056585 f2503520
[ 1318.457911] ec98bb38 ec98bbdc 00000000 00000000 00000000 c12c09b5 f2503520 c10327cb
[ 1318.457916] Call Trace:
[ 1318.457926] [<c12c09b5>] ? schedule_hrtimeout_range_clock+0x3c/0xd9
[ 1318.457931] [<c10327cb>] ? try_to_wake_up+0x13f/0x13f
[ 1318.457935] [<c11cfecc>] ? tty_ldisc_ref_wait+0x5f/0x76
[ 1318.457940] [<c104de7a>] ? wake_up_bit+0x5c/0x5c
[ 1318.457943] [<c11c9ad3>] ? tty_poll+0x32/0x5e
[ 1318.457947] [<c10dd4d5>] ? do_select+0x2a1/0x42e
[ 1318.457950] [<c10dcb83>] ? poll_freewait+0x69/0x69
[ 1318.457953] [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457955] [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457958] [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457960] [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457963] [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457965] [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457968] [<c10dcc25>] ? __pollwait+0xa2/0xa2
[ 1318.457971] [<c10429c2>] ? lock_timer_base+0x19/0x35
[ 1318.457974] [<c1042eb5>] ? __mod_timer+0x10c/0x116
[ 1318.457977] [<c1042f89>] ? mod_timer+0x69/0x6e
[ 1318.457981] [<c121325d>] ? sk_reset_timer+0xc/0x16
[ 1318.457984] [<c1252f57>] ? tcp_event_new_data_sent+0x66/0x6b
[ 1318.457987] [<c1255b85>] ? tcp_write_xmit+0x7a7/0x86a
[ 1318.457990] [<c121760d>] ? __alloc_skb+0x50/0xfd
[ 1318.457994] [<c12c12bc>] ? _raw_spin_lock_bh+0x8/0x1e
[ 1318.457996] [<c1212e98>] ? release_sock+0x10/0xc4
[ 1318.457999] [<c124b543>] ? tcp_sendmsg+0x6dd/0x7b7
[ 1318.458003] [<c1162c97>] ? _copy_from_user+0x2b/0x102
[ 1318.458006] [<c10dd7a0>] ? core_sys_select+0x13e/0x1c3
[ 1318.458009] [<c12102a3>] ? sock_aio_write+0xc0/0xd4
[ 1318.458012] [<c10d0655>] ? do_sync_write+0xa0/0xe4
[ 1318.458016] [<c10b141c>] ? handle_mm_fault+0x222/0x238
[ 1318.458019] [<c10f6096>] ? fsnotify+0x1de/0x1f9
[ 1318.458022] [<c10dd9e8>] ? sys_select+0x6e/0x8f
[ 1318.458024] [<c10d105e>] ? sys_write+0x3c/0x63
[ 1318.458028] [<c12c6a1f>] ? sysenter_do_call+0x12/0x28