How should I clean up hung grandchild processes when an alarm trips in Perl?

Posted by brian d foy on Stack Overflow See other posts from Stack Overflow or by brian d foy
Published on 2010-05-15T10:54:35Z Indexed on 2010/05/15 11:04 UTC
Read the original article Hit count: 421

Filed under:
|
|
|
|

I have a parallelized automation script which needs to call many other scripts, some of which hang because they (incorrectly) wait for standard input. That's not a big deal because I catch those with alarm. The trick is to shut down those hung grandchild processes when the child shuts down. I thought various incantations of SIGCHLD, waiting, and process groups could do the trick, but they all block and the grandchildren aren't reaped.

My solution, which works, just doesn't seem like it is the right solution. I'm not especially interested in the Windows solution just yet, but I'll eventually need that too. Mine only works for Unix, which is fine for now.

I wrote a small script that takes the number of simultaneous parallel children to run and the total number of forks:

 $ fork_bomb <parallel jobs> <number of forks>

 $ fork_bomb 8 500

This will probably hit the per-user process limit within a couple of minutes. Many solutions I've found just tell you to increase the per-user process limit, but I need this to run about 300,000 times, so that isn't going to work. Similarly, suggestions to re-exec and so on to clear the process table aren't what I need. I'd like to actually fix the problem instead of slapping duct tape over it.

I crawl the process table looking for the child processes and shut down the hung processes individually in the SIGALRM handler, which needs to die because the rest of real code has no hope of success after that. The kludgey crawl through the process table doesn't bother me from a performance perspective, but I wouldn't mind not doing it:

use Parallel::ForkManager;
use Proc::ProcessTable;

my $pm = Parallel::ForkManager->new( $ARGV[0] );

my $alarm_sub = sub {
        kill 9,
            map  { $_->{pid} }
            grep { $_->{ppid} == $$ }
            @{ Proc::ProcessTable->new->table }; 

        die "Alarm rang for $$!\n";
        };

foreach ( 0 .. $ARGV[1] ) 
    {
    print ".";
    print "\n" unless $count++ % 50;

    my $pid = $pm->start and next; 

    local $SIG{ALRM} = $alarm_sub;

    eval {
        alarm( 2 );
        system "$^X -le '<STDIN>'"; # this will hang
        alarm( 0 );
        };

    $pm->finish;
    }

If you want to run out of processes, take out the kill.

I thought that setting a process group would work so I could kill everything together, but that blocks:

my $alarm_sub = sub {
        kill 9, -$$;    # blocks here
        die "Alarm rang for $$!\n";
        };

foreach ( 0 .. $ARGV[1] ) 
    {
    print ".";
    print "\n" unless $count++ % 50;

    my $pid = $pm->start and next; 
    setpgrp(0, 0);

    local $SIG{ALRM} = $alarm_sub;

    eval {
        alarm( 2 );
        system "$^X -le '<STDIN>'"; # this will hang
        alarm( 0 );
        };

    $pm->finish;
    }

The same thing with POSIX's setsid didn't work either, and I think that actually broke things in a different way since I'm not really daemonizing this.

Curiously, Parallel::ForkManager's run_on_finish happens too late for the same clean-up code: the grandchildren are apparently already disassociated from the child processes at that point.

© Stack Overflow or respective owner

Related posts about perl

Related posts about alarm