I have used GNU/Linux on systems from 4 MB RAM to 512 GB RAM. When
they start swapping, most of the time you can still log in and kill
off the offending process - you just have to be 100-1000 times more
patient.
On my new 32 GB system that has changed: It blocks when it starts
swapping. Sometimes with full disk activity but other times with no
disk activity.
To examine what might be the issue I have written this program. The
idea is:
1 grab 3% of the memory free right now
2 if that caused swap to increase: stop
3 keep the chunk used for 30 seconds by forking off
4 goto 1
-
#!/usr/bin/perl
sub freekb {
my $free = `free|grep buffers/cache`;
my @a=split / +/,$free;
return $a[3];
}
sub swapkb {
my $swap = `free|grep Swap:`;
my @a=split / +/,$swap;
return $a[2];
}
my $swap = swapkb();
my $lastswap = $swap;
my $free;
while($lastswap >= $swap) {
print "$swap $free";
$lastswap = $swap;
$swap = swapkb();
$free = freekb();
my $used_mem = "x"x(1024 * $free * 0.03);
if(not fork()) {
sleep 30;
exit();
}
}
print "Swap increased $swap $lastswap\n";
Running the program forever ought to keep the system at the limit of
swapping, but only grabbing a minimal amount of swap and do that very
slowly (i.e. a few MB at a time at most).
If I run:
forever free | stdbuf -o0 timestamp > freelog
I ought to see swap slowly rising every second. (forever and timestamp
from https://github.com/ole-tange/tangetools).
But that is not the behaviour I see: I see swap increasing in jumps
and that the system is completely blocked during these jumps. Here the
system is blocked for 30 seconds with the swap usage increases with 1
GB:
secs
169.527 Swap: 18440184 154184 18286000
170.531 Swap: 18440184 154184 18286000
200.630 Swap: 18440184 1134240 17305944
210.259 Swap: 18440184 1076228 17363956
Blocked: 21 secs. Swap increase 2400 MB:
307.773 Swap: 18440184 581324 17858860
308.799 Swap: 18440184 597676 17842508
330.103 Swap: 18440184 2503020 15937164
331.106 Swap: 18440184 2502936 15937248
Blocked: 20 secs. Swap increase 2200 MB:
751.283 Swap: 18440184 885288 17554896
752.286 Swap: 18440184 911676 17528508
772.331 Swap: 18440184 3193532 15246652
773.333 Swap: 18440184 1404540 17035644
Blocked: 37 secs. Swap increase 2400 MB:
904.068 Swap: 18440184 613108 17827076
905.072 Swap: 18440184 610368 17829816
942.424 Swap: 18440184 3014668 15425516
942.610 Swap: 18440184 2073580 16366604
This is bad enough, but what is even worse is that the system sometimes
stops responding at all - even if I wait for hours. I have the
feeling it is related to the swapping issue, but I cannot tell for
sure.
My first idea was to tweak /proc/sys/vm/swappiness from 60 to 0 or
100, just to see if that had any effect at all. 0 did not have an
effect, but 100 did cause the problem to arise less often.
How can I prevent the system from blocking for such a long time?
Why does it decide to swapout 1-3 GB when less than 10 MB would suffice?