Search Results

Search found 30 results on 2 pages for 'cpu2'.

Page 2/2 | < Previous Page | 1 2 

  • Why are my Opteron cores running at only 75% capacity each? (25% CPU idle)

    - by Tim Cooper
    We've just taken delivery of a powerful 32-core AMD Opteron server with 128Gb. We have 2 x 6272 CPU's with 16 cores each. We are running a big long-running java task on 30 threads. We have the NUMA optimisations for Linux and java turned on. Our Java threads are mainly using objects that are private to that thread, sometimes reading memory that other threads will be reading, and very very occasionally writing or locking shared objects. We can't explain why the CPU cores are 25% idle. Below is a dump of "top": top - 23:06:38 up 1 day, 23 min, 3 users, load average: 10.84, 10.27, 9.62 Tasks: 676 total, 1 running, 675 sleeping, 0 stopped, 0 zombie Cpu(s): 64.5%us, 1.3%sy, 0.0%ni, 32.9%id, 1.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132138168k total, 131652664k used, 485504k free, 92340k buffers Swap: 5701624k total, 230252k used, 5471372k free, 13444344k cached ... top - 22:37:39 up 23:54, 3 users, load average: 7.83, 8.70, 9.27 Tasks: 678 total, 1 running, 677 sleeping, 0 stopped, 0 zombie Cpu0 : 75.8%us, 2.0%sy, 0.0%ni, 22.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 77.2%us, 1.3%sy, 0.0%ni, 21.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 77.3%us, 1.0%sy, 0.0%ni, 21.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 77.8%us, 1.0%sy, 0.0%ni, 21.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 76.9%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 76.3%us, 2.0%sy, 0.0%ni, 21.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 12.6%us, 3.0%sy, 0.0%ni, 84.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 8.6%us, 2.0%sy, 0.0%ni, 89.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu8 : 77.0%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 77.0%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 77.6%us, 1.7%sy, 0.0%ni, 20.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 75.7%us, 2.0%sy, 0.0%ni, 21.4%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 76.6%us, 2.3%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 76.6%us, 2.3%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 76.2%us, 2.6%sy, 0.0%ni, 15.9%id, 5.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 76.6%us, 2.0%sy, 0.0%ni, 21.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu16 : 73.6%us, 2.6%sy, 0.0%ni, 23.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu17 : 74.5%us, 2.3%sy, 0.0%ni, 23.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu18 : 73.9%us, 2.3%sy, 0.0%ni, 23.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu19 : 72.9%us, 2.6%sy, 0.0%ni, 24.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu20 : 72.8%us, 2.6%sy, 0.0%ni, 24.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu21 : 72.7%us, 2.3%sy, 0.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu22 : 72.5%us, 2.6%sy, 0.0%ni, 24.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu23 : 73.0%us, 2.3%sy, 0.0%ni, 24.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu24 : 74.7%us, 2.7%sy, 0.0%ni, 22.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu25 : 74.5%us, 2.6%sy, 0.0%ni, 22.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu26 : 73.7%us, 2.0%sy, 0.0%ni, 24.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu27 : 74.1%us, 2.3%sy, 0.0%ni, 23.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu28 : 74.1%us, 2.3%sy, 0.0%ni, 23.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu29 : 74.0%us, 2.0%sy, 0.0%ni, 24.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu30 : 73.2%us, 2.3%sy, 0.0%ni, 24.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu31 : 73.1%us, 2.0%sy, 0.0%ni, 24.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132138168k total, 131711704k used, 426464k free, 88336k buffers Swap: 5701624k total, 229572k used, 5472052k free, 13745596k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13865 root 20 0 122g 112g 3.1g S 2334.3 89.6 20726:49 java 27139 jayen 20 0 15428 1728 952 S 2.6 0.0 0:04.21 top 27161 sysadmin 20 0 15428 1712 940 R 1.0 0.0 0:00.28 top 33 root 20 0 0 0 0 S 0.3 0.0 0:06.24 ksoftirqd/7 131 root 20 0 0 0 0 S 0.3 0.0 0:09.52 events/0 1858 root 20 0 0 0 0 S 0.3 0.0 1:35.14 kondemand/0 A dump of the java stack confirms that none of the threads are anywhere near the few places where locks are used, nor are they anywhere near any disk or network i/o. I had trouble finding a clear explanation of what 'top' means by "idle" versus "wait", but I get the impression that "idle" means "no more threads that need to be run" but this doesn't make sense in our case. We're using a "Executors.newFixedThreadPool(30)". There are a large number of tasks pending and each task lasts for 10 seconds or so. I suspect that the explanation requires a good understanding of NUMA. Is the "idle" state what you see when a CPU is waiting for a non-local access? If not, then what is the explanation?

    Read the article

  • What is causing the unusual high load average?

    - by James
    I noticed on Tuesday night of last week, the load average went up sharply and it seemed abnormal since the traffic is small. Usually, the numbers usually average around .40 or lower and my server stuff (mysql, php and apache) are optimized. I noticed that the IOWait is unusually high even though the processes is barely using any CPU. top - 01:44:39 up 1 day, 21:13, 1 user, load average: 1.41, 1.09, 0.86 Tasks: 60 total, 1 running, 59 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni, 91.5%id, 8.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1048576k total, 331944k used, 716632k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 15 0 2468 1376 1140 S 0 0.1 0:00.92 init 1656 root 15 0 13652 5212 664 S 0 0.5 0:00.00 apache2 9323 root 18 0 13652 5212 664 S 0 0.5 0:00.00 apache2 10079 root 18 0 3972 1248 972 S 0 0.1 0:00.00 su 10080 root 15 0 4612 1956 1448 S 0 0.2 0:00.01 bash 11298 root 15 0 13652 5212 664 S 0 0.5 0:00.00 apache2 11778 chikorit 15 0 2344 1092 884 S 0 0.1 0:00.05 top 15384 root 18 0 17544 13m 1568 S 0 1.3 0:02.28 miniserv.pl 15585 root 15 0 8280 2736 2168 S 0 0.3 0:00.02 sshd 15608 chikorit 15 0 8280 1436 860 S 0 0.1 0:00.02 sshd Here is the VMStat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 768644 0 0 0 0 14 23 0 10 1 0 99 0 IOStat - Nothing unusal Total DISK READ: 67.13 K/s | Total DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 19496 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19501 be/4 mysql 3.95 K/s 0.00 B/s 0.00 % 0.00 % mysqld 19568 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19569 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19570 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19571 be/4 chikorit 7.90 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19573 be/4 chikorit 7.90 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 11778 be/4 chikorit 0.00 B/s 0.00 B/s 0.00 % 0.00 % top 19470 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 0.00 % mysqld Load Average Chart - http://i.stack.imgur.com/kYsD0.png I want to be sure if this is not a MySQL problem before making sure. Also, this is a Ubuntu 10.04 LTS Server on OpenVZ. Edit: This will probably give a good picture on the IO Wait top - 22:12:22 up 17:41, 1 user, load average: 1.10, 1.09, 0.93 Tasks: 33 total, 1 running, 32 sleeping, 0 stopped, 0 zombie Cpu(s): 0.6%us, 0.2%sy, 0.0%ni, 89.0%id, 10.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1048576k total, 260708k used, 787868k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 15 0 2468 1376 1140 S 0 0.1 0:00.88 init 5849 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 8063 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 9732 root 16 0 8280 2728 2168 S 0 0.3 0:00.02 sshd 9746 chikorit 18 0 8412 1444 864 S 0 0.1 0:01.10 sshd 9747 chikorit 18 0 4576 1960 1488 S 0 0.2 0:00.24 bash 13706 chikorit 15 0 2344 1088 884 R 0 0.1 0:00.03 top 15745 chikorit 15 0 12968 5108 1280 S 0 0.5 0:00.00 apache2 15751 chikorit 15 0 72184 25m 18m S 0 2.5 0:00.37 php5-fpm 15790 chikorit 18 0 12472 4640 1192 S 0 0.4 0:00.00 apache2 15797 chikorit 15 0 72888 23m 16m S 0 2.3 0:00.06 php5-fpm 16038 root 15 0 67772 2848 592 D 0 0.3 0:00.00 php5-fpm 16309 syslog 18 0 24084 1316 992 S 0 0.1 0:00.07 rsyslogd 16316 root 15 0 5472 908 500 S 0 0.1 0:00.00 sshd 16326 root 15 0 2304 908 712 S 0 0.1 0:00.02 cron 17464 root 15 0 10252 7560 856 D 0 0.7 0:01.88 psad 17466 root 18 0 1684 276 208 S 0 0.0 0:00.31 psadwatchd 17559 root 18 0 11444 2020 732 S 0 0.2 0:00.47 sendmail-mta 17688 root 15 0 10252 5388 1136 S 0 0.5 0:03.81 python 17752 teamspea 19 0 44648 7308 4676 S 0 0.7 1:09.70 ts3server_linux 18098 root 15 0 12336 6380 3032 S 0 0.6 0:00.47 apache2 18099 chikorit 18 0 10368 2536 464 S 0 0.2 0:00.00 apache2 18120 ntp 15 0 4336 1316 984 S 0 0.1 0:00.87 ntpd 18379 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 18387 mysql 15 0 62796 36m 5864 S 0 3.6 1:43.26 mysqld 19584 root 15 0 12336 4028 668 S 0 0.4 0:00.02 apache2 22498 root 16 0 12336 4028 668 S 0 0.4 0:00.00 apache2 24260 root 15 0 67772 3612 1356 S 0 0.3 0:00.22 php5-fpm 27712 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 27730 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 30343 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 30366 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 This is the free ram as of today. total used free shared buffers cached Mem: 1024 302 721 0 0 0 -/+ buffers/cache: 302 721 Swap: 0 0 0 Update: Looking into the logs, particularly the PHP5-FPM, which is causing the CPU spike. I found that its segment faulting for some apparent reason. [03-Jun-2012 06:11:20] NOTICE: [pool www] child 14132 started [03-Jun-2012 06:11:25] WARNING: [pool www] child 13664 exited on signal 11 (SIGSEGV) after 53.686322 seconds from start [03-Jun-2012 06:11:25] NOTICE: [pool www] child 14328 started [03-Jun-2012 06:11:25] WARNING: [pool www] child 14132 exited on signal 11 (SIGSEGV) after 4.708681 seconds from start [03-Jun-2012 06:11:25] NOTICE: [pool www] child 14329 started [03-Jun-2012 06:11:58] WARNING: [pool www] child 14328 exited on signal 11 (SIGSEGV) after 32.981228 seconds from start [03-Jun-2012 06:11:58] NOTICE: [pool www] child 15745 started [03-Jun-2012 06:12:25] WARNING: [pool www] child 15745 exited on signal 11 (SIGSEGV) after 27.442864 seconds from start [03-Jun-2012 06:12:25] NOTICE: [pool www] child 17446 started [03-Jun-2012 06:12:25] WARNING: [pool www] child 14329 exited on signal 11 (SIGSEGV) after 60.411278 seconds from start [03-Jun-2012 06:12:25] NOTICE: [pool www] child 17447 started [03-Jun-2012 06:13:02] WARNING: [pool www] child 17446 exited on signal 11 (SIGSEGV) after 36.746793 seconds from start [03-Jun-2012 06:13:02] NOTICE: [pool www] child 18133 started [03-Jun-2012 06:13:48] WARNING: [pool www] child 17447 exited on signal 11 (SIGSEGV) after 82.710107 seconds from start I'm thinking that this might be causing the problem. If that is the cause, probably switching it off that to fastcgi/fcgid might resolve it... but still, I want to see if something else might be causing it to do this.

    Read the article

  • IRQ problem with 2.6.32/2.6.39 kernel on Debian Squeeze x86_64

    - by MasterM
    I recently assembled a new computer so that all hardware is pretty new. Since then I've been experiencing some problem with IRQs when running Debian 6.0. On random occasions, usually after an hour or so of running I hear a beep and this shows up in dmesg: [ 3537.762795] irq 16: nobody cared (try booting with the "irqpoll" option) [ 3537.762797] Pid: 0, comm: swapper Tainted: P W O 2.6.39-2-amd64 #1 [ 3537.762798] Call Trace: [ 3537.762799] <IRQ> [<ffffffff810924d4>] ? __report_bad_irq+0x3a/0xa2 [ 3537.762803] [<ffffffff810926a4>] ? note_interrupt+0x168/0x1da [ 3537.762805] [<ffffffff81090dd4>] ? handle_irq_event_percpu+0x171/0x18f [ 3537.762807] [<ffffffff8100e0e2>] ? read_tsc+0x5/0x16 [ 3537.762809] [<ffffffff8106b8a2>] ? update_ts_time_stats+0x32/0x6b [ 3537.762810] [<ffffffff81090e26>] ? handle_irq_event+0x34/0x52 [ 3537.762812] [<ffffffff81063fb7>] ? sched_clock_idle_wakeup_event+0x12/0x1c [ 3537.762813] [<ffffffff81092df2>] ? handle_fasteoi_irq+0x82/0xa4 [ 3537.762815] [<ffffffff8100aadb>] ? handle_irq+0x1a/0x23 [ 3537.762816] [<ffffffff8100a384>] ? do_IRQ+0x45/0xaa [ 3537.762818] [<ffffffff81332c93>] ? common_interrupt+0x13/0x13 [ 3537.762818] <EOI> [<ffffffff81332c8e>] ? common_interrupt+0xe/0x13 [ 3537.762821] [<ffffffff81026800>] ? native_safe_halt+0x2/0x3 [ 3537.762829] [<ffffffffa016ed58>] ? acpi_idle_do_entry+0x39/0x62 [processor] [ 3537.762831] [<ffffffffa016edde>] ? acpi_idle_enter_c1+0x5d/0xad [processor] [ 3537.762834] [<ffffffff81261033>] ? cpuidle_idle_call+0x11f/0x1cc [ 3537.762835] [<ffffffff81008dd2>] ? cpu_idle+0xab/0xe1 [ 3537.762837] [<ffffffff8169fc60>] ? start_kernel+0x3e0/0x3eb [ 3537.762838] [<ffffffff8169f3c8>] ? x86_64_start_kernel+0x102/0x10f [ 3537.762839] handlers: [ 3537.762840] [<ffffffffa0358d5a>] (rtl8169_interrupt+0x0/0x2d7 [r8169]) [ 3537.762842] [<ffffffffa08ff2ca>] (nv_kern_isr+0x0/0x54 [nvidia]) [ 3537.762902] Disabling IRQ #16 After that Xorg either hogs on CPU or is unstable (up to hanging the system completely). When I restart Xorg everything is fine again and the problem doesn't occur until next reboot. I tried to upgrade the kernel from stock 2.6.32 to 2.6.39 from unstable repository but that didn't help. Booting with irqpoll option only seems to prolong the initial time period after which the problem occurs. I'm using latest NVIDIA drivers and Realtek firmware from firmware-realtek package. I have two GTX 560Ti that run in SLI. Disabling SLI or taking out one card completely doesn't solve the problem either. Output of uname -a is: Linux whitestar 2.6.39-2-amd64 #1 SMP Wed Jun 8 11:01:04 UTC 2011 x86_64 GNU/Linux Output of lspci is: 00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09) 00:01.0 PCI bridge: Intel Corporation Sandy Bridge PCI Express Root Port (rev 09) 00:01.1 PCI bridge: Intel Corporation Sandy Bridge PCI Express Root Port (rev 09) 00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04) 00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05) 00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 05) 00:1b.0 Audio device: Intel Corporation Cougar Point High Definition Audio Controller (rev 05) 00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b5) 00:1c.1 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 2 (rev b5) 00:1c.2 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 3 (rev b5) 00:1c.4 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 5 (rev b5) 00:1c.6 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5) 00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 05) 00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05) 01:00.0 VGA compatible controller: nVidia Corporation Device 1200 (rev a1) 01:00.1 Audio device: nVidia Corporation Device 0e0c (rev a1) 02:00.0 VGA compatible controller: nVidia Corporation Device 1200 (rev a1) 02:00.1 Audio device: nVidia Corporation Device 0e0c (rev a1) 04:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04) 06:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04) 07:00.0 PCI bridge: Device 1b21:1080 (rev 01) 08:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) 08:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0) Contents of /proc/interrupts: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 77 0 0 0 0 0 0 0 IO-APIC-edge timer 1: 2 0 0 0 0 0 0 0 IO-APIC-edge i8042 8: 1 0 0 0 0 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 12: 4 0 0 0 0 0 0 0 IO-APIC-edge i8042 16: 699083 0 0 0 0 0 0 0 IO-APIC-fasteoi nvidia, eth0 17: 87810 0 0 0 0 0 0 0 IO-APIC-fasteoi firewire_ohci, hda_intel, nvidia 18: 242 0 0 0 0 0 0 0 IO-APIC-fasteoi hda_intel 23: 85925 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb5, ehci_hcd:usb6 40: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME 41: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME 42: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME 43: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME 44: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME 45: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME 46: 79853 0 0 0 0 0 0 0 PCI-MSI-edge ahci 48: 1 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 49: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 50: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 51: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 52: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 53: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 54: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 55: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 56: 1 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 57: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 58: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 59: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 60: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 61: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 62: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 63: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd 64: 173506 0 0 0 0 0 0 0 PCI-MSI-edge hda_intel NMI: 482 89 25 13 277 24 11 10 Non-maskable interrupts LOC: 783857 194752 114133 70577 372438 179065 117179 162016 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 482 89 25 13 277 24 11 10 Performance monitoring interrupts IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts RES: 131917 46750 7432 3291 150003 9576 3435 3067 Rescheduling interrupts CAL: 2759 6563 7150 6997 5387 7140 7269 6678 Function call interrupts TLB: 4396 2038 1336 492 5434 1896 1121 606 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 37 37 37 37 37 37 37 37 Machine check polls ERR: 0 MIS: 0 Last but not least, right after boot-up those lines are usually present in dmesg: [ 18.367094] hda-intel: IRQ timing workaround is activated for card #1. Suggest a bigger bdl_pos_adj. [ 18.458859] hda-intel: IRQ timing workaround is activated for card #2. Suggest a bigger bdl_pos_adj. I'm not sure if it's related or a symptom of a bigger problem so I'm posting it just in case. I don't really know what other information might be of relevance here. Don't hesitate to ask for more in the comments.

    Read the article

  • More interruptions than cpu context switches

    - by Christopher Valles
    I have a machine running Debian GNU/Linux 5.0.8 (lenny) 8 cores and 12Gb of RAM. We have one core permanently around 40% ~ 60% wait time and trying to spot what is happening I realized that we have more interruptions than cpu context switches. I found that the normal ratio between context switch and interruptions is around 10x more context switching than interruptions but on my server the values are completely different. backend1:~# vmstat -s 12330788 K total memory 12221676 K used memory 3668624 K active memory 6121724 K inactive memory 109112 K free memory 3929400 K buffer memory 4095536 K swap cache 4194296 K total swap 7988 K used swap 4186308 K free swap 44547459 non-nice user cpu ticks 702408 nice user cpu ticks 13346333 system cpu ticks 1607583668 idle cpu ticks 374043393 IO-wait cpu ticks 4144149 IRQ cpu ticks 3994255 softirq cpu ticks 0 stolen cpu ticks 4445557114 pages paged in 2910596714 pages paged out 128642 pages swapped in 267400 pages swapped out 3519307319 interrupts 2464686911 CPU context switches 1306744317 boot time 11555115 forks Any ideas if that is an issue? And in that case, how can I spot the cause and fix it? Update Following the instructions of the comments and focusing on the core stuck in wait I checked the processes attached to that core and below you can find the list: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 24 root RT -5 0 0 0 S 0 0.0 0:03.42 7 migration/7 25 root 15 -5 0 0 0 S 0 0.0 0:04.78 7 ksoftirqd/7 26 root RT -5 0 0 0 S 0 0.0 0:00.00 7 watchdog/7 34 root 15 -5 0 0 0 S 0 0.0 1:18.90 7 events/7 83 root 15 -5 0 0 0 S 0 0.0 1:10.68 7 kblockd/7 291 root 15 -5 0 0 0 S 0 0.0 0:00.00 7 aio/7 569 root 15 -5 0 0 0 S 0 0.0 0:00.00 7 ata/7 1545 root 15 -5 0 0 0 S 0 0.0 0:00.00 7 ksnapd 1644 root 15 -5 0 0 0 S 0 0.0 0:36.73 7 kjournald 1725 root 16 -4 16940 1152 488 S 0 0.0 0:00.00 7 udevd 2342 root 20 0 8828 1140 956 S 0 0.0 0:00.00 7 sh 2375 root 20 0 8848 1220 1016 S 0 0.0 0:00.00 7 locate 2421 root 30 10 8896 1268 1016 S 0 0.0 0:00.00 7 updatedb.findut 2430 root 30 10 58272 49m 616 S 0 0.4 0:17.44 7 sort 2431 root 30 10 3792 448 360 S 0 0.0 0:00.00 7 frcode 2682 root 15 -5 0 0 0 S 0 0.0 3:25.98 7 kjournald 2683 root 15 -5 0 0 0 S 0 0.0 0:00.64 7 kjournald 2687 root 15 -5 0 0 0 S 0 0.0 1:31.30 7 kjournald 3261 root 15 -5 0 0 0 S 0 0.0 2:30.56 7 kondemand/7 3364 root 20 0 3796 596 476 S 0 0.0 0:00.00 7 acpid 3575 root 20 0 8828 1140 956 S 0 0.0 0:00.00 7 sh 3597 root 20 0 8848 1216 1016 S 0 0.0 0:00.00 7 locate 3603 root 30 10 8896 1268 1016 S 0 0.0 0:00.00 7 updatedb.findut 3612 root 30 10 58272 49m 616 S 0 0.4 0:27.04 7 sort 3655 root 20 0 11056 2852 516 S 0 0.0 5:36.46 7 redis-server 3706 root 20 0 19832 1056 816 S 0 0.0 0:01.64 7 cron 3746 root 20 0 3796 580 484 S 0 0.0 0:00.00 7 getty 3748 root 20 0 3796 580 484 S 0 0.0 0:00.00 7 getty 7674 root 20 0 28376 1000 736 S 0 0.0 0:00.00 7 cron 7675 root 20 0 8828 1140 956 S 0 0.0 0:00.00 7 sh 7708 root 30 10 58272 49m 616 S 0 0.4 0:03.36 7 sort 22049 root 20 0 8828 1136 956 S 0 0.0 0:00.00 7 sh 22095 root 20 0 8848 1220 1016 S 0 0.0 0:00.00 7 locate 22099 root 30 10 8896 1264 1016 S 0 0.0 0:00.00 7 updatedb.findut 22108 root 30 10 58272 49m 616 S 0 0.4 0:44.55 7 sort 22109 root 30 10 3792 452 360 S 0 0.0 0:00.00 7 frcode 26927 root 20 0 8828 1140 956 S 0 0.0 0:00.00 7 sh 26947 root 20 0 8848 1216 1016 S 0 0.0 0:00.00 7 locate 26951 root 30 10 8896 1268 1016 S 0 0.0 0:00.00 7 updatedb.findut 26960 root 30 10 58272 49m 616 S 0 0.4 0:10.24 7 sort 26961 root 30 10 3792 452 360 S 0 0.0 0:00.00 7 frcode 27952 root 20 0 65948 3028 2400 S 0 0.0 0:00.00 7 sshd 30731 root 20 0 0 0 0 S 0 0.0 0:01.34 7 pdflush 31204 root 20 0 0 0 0 S 0 0.0 0:00.24 7 pdflush 21857 deploy 20 0 1227m 2240 868 S 0 0.0 2:44.22 7 nginx 21858 deploy 20 0 1228m 2784 868 S 0 0.0 2:42.45 7 nginx 21862 deploy 20 0 1228m 2732 868 S 0 0.0 2:43.90 7 nginx 21869 deploy 20 0 1228m 2840 868 S 0 0.0 2:44.14 7 nginx 27994 deploy 20 0 19372 2216 1380 S 0 0.0 0:00.00 7 bash 28493 deploy 20 0 331m 32m 16m S 4 0.3 0:00.40 7 apache2 21856 deploy 20 0 1228m 2844 868 S 0 0.0 2:43.64 7 nginx 3622 nobody 30 10 21156 10m 916 D 0 0.1 4:42.31 7 find 7716 nobody 30 10 12268 1280 888 D 0 0.0 0:43.50 7 find 22116 nobody 30 10 12612 1696 916 D 0 0.0 6:32.26 7 find 26968 nobody 30 10 12268 1284 888 D 0 0.0 1:56.92 7 find Update As suggested I take a look at /proc/interrupts and below the info there: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 35 0 0 1469085485 0 0 0 0 IO-APIC-edge timer 1: 0 0 0 8 0 0 0 0 IO-APIC-edge i8042 8: 0 0 0 1 0 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 0 105 0 0 0 0 IO-APIC-edge i8042 16: 0 0 0 0 0 0 0 580212114 IO-APIC-fasteoi 3w-9xxx, uhci_hcd:usb1 18: 0 0 142 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb6, ehci_hcd:usb7 19: 9 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3, uhci_hcd:usb5 21: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 23: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4, ehci_hcd:usb8 1273: 0 0 1600400502 0 0 0 0 0 PCI-MSI-edge eth0 1274: 0 0 0 0 0 0 0 0 PCI-MSI-edge ahci NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 214252181 69439018 317298553 21943690 72562482 56448835 137923978 407514738 Local timer interrupts RES: 27516446 16935944 26430972 44957009 24935543 19881887 57746906 24298747 Rescheduling interrupts CAL: 10655 10705 10685 10567 10689 10669 10667 396 function call interrupts TLB: 529548 462587 801138 596193 922202 747313 2027966 946594 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts ERR: 0 All the values seems more or less the same for all the cores but this one IO-APIC-fasteoi 3w-9xxx, uhci_hcd:usb1 only affects to the core 7 (the same with the wait time of 40% ~ 60%) could be something attached to the usb port causing the issue? Thanks in advanced

    Read the article

  • unexplainable packet drops with 5 ethernet NICs and low traffic on Ubuntu

    - by jon
    I'm stuck on problem where my machine started to drops packets with no sign of ANY system load or high interrupt usage after an upgrade to Ubuntu 12.04. My server is a network monitoring sensor, running Ubuntu LTS 12.04, it passively collects packets from 5 interfaces doing network intrusion type stuff. Before the upgrade I managed to collect 200+GB of packets a day while writing them to disk with around 0% packet loss depending on the day with the help of CPU affinity and NIC IRQ to CPU bindings. Now I lose a great deal of packets with none of my applications running and at very low PPS rate which a modern workstation NIC would have no trouble with. Specs: x64 Xeon 4 cores 3.2 Ghz 16 GB RAM NICs: 5 Intel Pro NICs using the e1000 driver (NAPI). [1] eth0 and eth1 are integrated NICs (in the motherboard) There are 2 other PCI-X network cards, each with 2 Ethernet ports. 3 of the interfaces are running at Gigabit Ethernet, the others are not because they're attached to hubs. Specs: [2] http://support.dell.com/support/edocs/systems/pe2850/en/ug/t1390aa.htm uptime 17:36:00 up 1:43, 2 users, load average: 0.00, 0.01, 0.05 # uname -a Linux nms 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux I also have the CPU governor set to performance mode and irqbalance off. The problem still occurs with them on. # lspci -t -vv -[0000:00]-+-00.0 Intel Corporation E7520 Memory Controller Hub +-02.0-[01-03]--+-00.0-[02]----0e.0 Dell PowerEdge Expandable RAID controller 4 | \-00.2-[03]-- +-04.0-[04]-- +-05.0-[05-07]--+-00.0-[06]----07.0 Intel Corporation 82541GI Gigabit Ethernet Controller | \-00.2-[07]----08.0 Intel Corporation 82541GI Gigabit Ethernet Controller +-06.0-[08-0a]--+-00.0-[09]--+-04.0 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) | | \-04.1 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) | \-00.2-[0a]--+-02.0 Digium, Inc. Wildcard TE210P/TE212P dual-span T1/E1/J1 card 3.3V | +-03.0 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) | \-03.1 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) +-1d.0 Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 +-1d.7 Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller +-1e.0-[0b]----0d.0 Advanced Micro Devices [AMD] nee ATI RV100 QY [Radeon 7000/VE] +-1f.0 Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge \-1f.1 Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller I believe the NIC nor the NIC drivers are dropping the packets because ethtool reports 0 under rx_missed_errors and rx_no_buffer_count for each interface. On the old system, if it couldn't keep up this is where the drops would be. I drop packets on multiple interfaces just about every second, usually in small increments of 2-4. I tried all these sysctl values, I'm currently using the uncommented ones. # cat /etc/sysctl.conf # high net.core.netdev_max_backlog = 3000000 net.core.rmem_max = 16000000 net.core.rmem_default = 8000000 # defaults #net.core.netdev_max_backlog = 1000 #net.core.rmem_max = 131071 #net.core.rmem_default = 163480 # moderate #net.core.netdev_max_backlog = 10000 #net.core.rmem_max = 33554432 #net.core.rmem_default = 33554432 Here's an example of an interface stats report with ethtool. They are all the same, nothing is out of the ordinary ( I think ), so I'm only going to show one: ethtool -S eth2 NIC statistics: rx_packets: 7498 tx_packets: 0 rx_bytes: 2722585 tx_bytes: 0 rx_broadcast: 327 tx_broadcast: 0 rx_multicast: 1504 tx_multicast: 0 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 1504 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 0 rx_missed_errors: 0 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 0 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 0 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 0 tx_tcp_seg_failed: 0 rx_flow_control_xon: 0 rx_flow_control_xoff: 0 tx_flow_control_xon: 0 tx_flow_control_xoff: 0 rx_long_byte_count: 2722585 rx_csum_offload_good: 0 rx_csum_offload_errors: 0 alloc_rx_buff_failed: 0 tx_smbus: 0 rx_smbus: 0 dropped_smbus: 01 # ifconfig eth0 Link encap:Ethernet HWaddr 00:11:43:e0:e2:8c UP BROADCAST RUNNING NOARP PROMISC ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:373348 errors:16 dropped:95 overruns:0 frame:16 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:356830572 (356.8 MB) TX bytes:0 (0.0 B) eth1 Link encap:Ethernet HWaddr 00:11:43:e0:e2:8d UP BROADCAST RUNNING NOARP PROMISC ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:13616 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:8690528 (8.6 MB) TX bytes:0 (0.0 B) eth2 Link encap:Ethernet HWaddr 00:04:23:e1:77:6a UP BROADCAST RUNNING NOARP PROMISC ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:7750 errors:0 dropped:471 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2780935 (2.7 MB) TX bytes:0 (0.0 B) eth3 Link encap:Ethernet HWaddr 00:04:23:e1:77:6b UP BROADCAST RUNNING NOARP PROMISC ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:5112 errors:0 dropped:206 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:639472 (639.4 KB) TX bytes:0 (0.0 B) eth4 Link encap:Ethernet HWaddr 00:04:23:b6:35:6c UP BROADCAST RUNNING NOARP PROMISC ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:961467 errors:0 dropped:935 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:958561305 (958.5 MB) TX bytes:0 (0.0 B) eth5 Link encap:Ethernet HWaddr 00:04:23:b6:35:6d inet addr:192.168.1.6 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4264 errors:0 dropped:16 overruns:0 frame:0 TX packets:699 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:572228 (572.2 KB) TX bytes:124456 (124.4 KB) I tried the defaults, then started to play around with settings. I wasn't using any flow control and I increased the RxDescriptor count to 4096 before the upgrade as well without any problems. # cat /etc/modprobe.d/e1000.conf options e1000 XsumRX=0,0,0,0,0 RxDescriptors=4096,4096,4096,4096,4096 FlowControl=0,0,0,0,0 debug=16 Here's my network configuration file, I turned off checksumming and various offloading mechanisms along with setting CPU affinity with heavy use interfaces getting an entire CPU and light use interfaces sharing a CPU. I used these settings prior to the upgrade without problems. # cat /etc/network/interfaces # The loopback network interface auto lo iface lo inet loopback # The primary network interface auto eth0 iface eth0 inet manual pre-up /sbin/ethtool -G eth0 rx 4096 tx 0 pre-up /sbin/ethtool -K eth0 gro off gso off rx off pre-up /sbin/ethtool -A eth0 rx off autoneg off up ifconfig eth0 0.0.0.0 -arp promisc mtu 1500 allmulti txqueuelen 0 up post-up echo "4" > /proc/irq/48/smp_affinity down ifconfig eth0 down post-down /sbin/ethtool -G eth0 rx 256 tx 256 post-down /sbin/ethtool -K eth0 gro on gso on rx on post-down /sbin/ethtool -A eth0 rx on autoneg on auto eth1 iface eth1 inet manual pre-up /sbin/ethtool -G eth1 rx 4096 tx 0 pre-up /sbin/ethtool -K eth1 gro off gso off rx off pre-up /sbin/ethtool -A eth1 rx off autoneg off up ifconfig eth1 0.0.0.0 -arp promisc mtu 1500 allmulti txqueuelen 0 up post-up echo "4" > /proc/irq/49/smp_affinity down ifconfig eth1 down post-down /sbin/ethtool -G eth1 rx 256 tx 256 post-down /sbin/ethtool -K eth1 gro on gso on rx on post-down /sbin/ethtool -A eth1 rx on autoneg on auto eth2 iface eth2 inet manual pre-up /sbin/ethtool -G eth2 rx 4096 tx 0 pre-up /sbin/ethtool -K eth2 gro off gso off rx off pre-up /sbin/ethtool -A eth2 rx off autoneg off up ifconfig eth2 0.0.0.0 -arp promisc mtu 1500 allmulti txqueuelen 0 up post-up echo "1" > /proc/irq/82/smp_affinity down ifconfig eth2 down post-down /sbin/ethtool -G eth2 rx 256 tx 256 post-down /sbin/ethtool -K eth2 gro on gso on rx on post-down /sbin/ethtool -A eth2 rx on autoneg on auto eth3 iface eth3 inet manual pre-up /sbin/ethtool -G eth3 rx 4096 tx 0 pre-up /sbin/ethtool -K eth3 gro off gso off rx off pre-up /sbin/ethtool -A eth3 rx off autoneg off up ifconfig eth3 0.0.0.0 -arp promisc mtu 1500 allmulti txqueuelen 0 up post-up echo "2" > /proc/irq/83/smp_affinity down ifconfig eth3 down post-down /sbin/ethtool -G eth3 rx 256 tx 256 post-down /sbin/ethtool -K eth3 gro on gso on rx on post-down /sbin/ethtool -A eth3 rx on autoneg on auto eth4 iface eth4 inet manual pre-up /sbin/ethtool -G eth4 rx 4096 tx 0 pre-up /sbin/ethtool -K eth4 gro off gso off rx off pre-up /sbin/ethtool -A eth4 rx off autoneg off up ifconfig eth4 0.0.0.0 -arp promisc mtu 1500 allmulti txqueuelen 0 up post-up echo "4" > /proc/irq/77/smp_affinity down ifconfig eth4 down post-down /sbin/ethtool -G eth4 rx 256 tx 256 post-down /sbin/ethtool -K eth4 gro on gso on rx on post-down /sbin/ethtool -A eth4 rx on autoneg on auto eth5 iface eth5 inet static pre-up /etc/fw.conf address 192.168.1.1 netmask 255.255.255.0 broadcast 192.168.1.255 gateway 192.168.1.1 dns-nameservers 192.168.1.2 192.168.1.3 up ifconfig eth5 up post-up echo "8" > /proc/irq/77/smp_affinity down ifconfig eth5 down Here's a few examples of packet drops, i ran one after another, probabling totaling 3 or 4 seconds. You can see increases in the drops from the 1st and 3rd. This was a non-busy time, very little traffic. # awk '{ print $1,$5 }' /proc/net/dev Inter-| face drop eth3: 225 lo: 0 eth2: 505 eth1: 0 eth5: 17 eth0: 105 eth4: 1034 # awk '{ print $1,$5 }' /proc/net/dev Inter-| face drop eth3: 225 lo: 0 eth2: 507 eth1: 0 eth5: 17 eth0: 105 eth4: 1034 # awk '{ print $1,$5 }' /proc/net/dev Inter-| face drop eth3: 227 lo: 0 eth2: 512 eth1: 0 eth5: 17 eth0: 105 eth4: 1039 I tried the pci=noacpi options. With and without, it's the same. This is what my interrupt stats looked like before the upgrade, after, with ACPI on PCI it showed multiple NICs bound to an interrupt and shared with other devices such as USB drives which I didn't like so I think i'm going to keep it with ACPI off as it's easier to designate sole purpose interrupts. Is there any advantage I would have using the default i.e. ACPI w/ PCI. ? # cat /etc/default/grub | grep CMD_LINE GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 noacpi pci=noacpi" GRUB_CMDLINE_LINUX="" # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 45 0 0 16 IO-APIC-edge timer 1: 1 0 0 7936 IO-APIC-edge i8042 2: 0 0 0 0 XT-PIC-XT-PIC cascade 6: 0 0 0 3 IO-APIC-edge floppy 8: 0 0 0 1 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-edge acpi 12: 0 0 0 1809 IO-APIC-edge i8042 14: 1 0 0 4498 IO-APIC-edge ata_piix 15: 0 0 0 0 IO-APIC-edge ata_piix 16: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 18: 0 0 0 1350 IO-APIC-fasteoi uhci_hcd:usb4, radeon 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 23: 0 0 0 4099 IO-APIC-fasteoi ehci_hcd:usb1 38: 0 0 0 61963 IO-APIC-fasteoi megaraid 48: 0 0 1002319 4 IO-APIC-fasteoi eth0 49: 0 0 38772 3 IO-APIC-fasteoi eth1 77: 0 0 130076 432159 IO-APIC-fasteoi eth4 78: 0 0 0 23917 IO-APIC-fasteoi eth5 82: 1329033 0 0 4 IO-APIC-fasteoi eth2 83: 0 4886525 0 6 IO-APIC-fasteoi eth3 NMI: 5 6 4 5 Non-maskable interrupts LOC: 61409 57076 64257 114764 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts IWI: 0 0 0 0 IRQ work interrupts RES: 17956 25333 13436 14789 Rescheduling interrupts CAL: 22436 607 539 478 Function call interrupts TLB: 1525 1458 4600 4151 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 16 16 16 16 Machine check polls ERR: 0 MIS: 0 Here's sample output of vmstat, showing the system. Barebones system right now. root@nms:~# vmstat -S m 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 14992 192 1029 0 0 56 2 419 29 1 0 99 0 0 0 0 14992 192 1029 0 0 0 0 922 27 0 0 100 0 0 0 0 14991 192 1029 0 0 0 36 763 50 0 0 100 0 0 0 0 14991 192 1029 0 0 0 0 646 35 0 0 100 0 0 0 0 14991 192 1029 0 0 0 0 722 54 0 0 100 0 0 0 0 14991 192 1029 0 0 0 0 793 27 0 0 100 0 ^C Here's dmesg output. I can't figure out why my PCI-X slots are negotiated as PCI. The network cards are all PCI-X with the exception of the integrated NICs that came with the server. In the output below it looks as if eth3 and eth2 negotiated at PCI-X speeds rather than PCI:66Mhz. Wouldn't they all drop to PCI:66Mhz? If your integrated NICs are PCI, as labeled below (eth0,eth1), then wouldn't all devices on your bus speed drop down to that slower bus speed? If not, I still don't know why only one of my NICs ( each has two ethernet ports) is labeled as PCI-X in the output below. Does that mean it is running at PCI-X speeds are is it showing that it's capable? # dmesg | grep e1000 [ 3678.349337] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI [ 3678.349342] e1000: Copyright (c) 1999-2006 Intel Corporation. [ 3678.349394] e1000 0000:06:07.0: PCI->APIC IRQ transform: INT A -> IRQ 48 [ 3678.409725] e1000 0000:06:07.0: Receive Descriptors set to 4096 [ 3678.409730] e1000 0000:06:07.0: Checksum Offload Disabled [ 3678.409734] e1000 0000:06:07.0: Flow Control Disabled [ 3678.586409] e1000 0000:06:07.0: eth0: (PCI:66MHz:32-bit) 00:11:43:e0:e2:8c [ 3678.586419] e1000 0000:06:07.0: eth0: Intel(R) PRO/1000 Network Connection [ 3678.586642] e1000 0000:07:08.0: PCI->APIC IRQ transform: INT A -> IRQ 49 [ 3678.649854] e1000 0000:07:08.0: Receive Descriptors set to 4096 [ 3678.649859] e1000 0000:07:08.0: Checksum Offload Disabled [ 3678.649863] e1000 0000:07:08.0: Flow Control Disabled [ 3678.826436] e1000 0000:07:08.0: eth1: (PCI:66MHz:32-bit) 00:11:43:e0:e2:8d [ 3678.826444] e1000 0000:07:08.0: eth1: Intel(R) PRO/1000 Network Connection [ 3678.826627] e1000 0000:09:04.0: PCI->APIC IRQ transform: INT A -> IRQ 82 [ 3679.093266] e1000 0000:09:04.0: Receive Descriptors set to 4096 [ 3679.093271] e1000 0000:09:04.0: Checksum Offload Disabled [ 3679.093275] e1000 0000:09:04.0: Flow Control Disabled [ 3679.130239] e1000 0000:09:04.0: eth2: (PCI-X:133MHz:64-bit) 00:04:23:e1:77:6a [ 3679.130246] e1000 0000:09:04.0: eth2: Intel(R) PRO/1000 Network Connection [ 3679.130449] e1000 0000:09:04.1: PCI->APIC IRQ transform: INT B -> IRQ 83 [ 3679.397312] e1000 0000:09:04.1: Receive Descriptors set to 4096 [ 3679.397318] e1000 0000:09:04.1: Checksum Offload Disabled [ 3679.397321] e1000 0000:09:04.1: Flow Control Disabled [ 3679.434350] e1000 0000:09:04.1: eth3: (PCI-X:133MHz:64-bit) 00:04:23:e1:77:6b [ 3679.434360] e1000 0000:09:04.1: eth3: Intel(R) PRO/1000 Network Connection [ 3679.434553] e1000 0000:0a:03.0: PCI->APIC IRQ transform: INT A -> IRQ 77 [ 3679.704072] e1000 0000:0a:03.0: Receive Descriptors set to 4096 [ 3679.704077] e1000 0000:0a:03.0: Checksum Offload Disabled [ 3679.704081] e1000 0000:0a:03.0: Flow Control Disabled [ 3679.738364] e1000 0000:0a:03.0: eth4: (PCI:33MHz:64-bit) 00:04:23:b6:35:6c [ 3679.738371] e1000 0000:0a:03.0: eth4: Intel(R) PRO/1000 Network Connection [ 3679.738538] e1000 0000:0a:03.1: PCI->APIC IRQ transform: INT B -> IRQ 78 [ 3680.046060] e1000 0000:0a:03.1: eth5: (PCI:33MHz:64-bit) 00:04:23:b6:35:6d [ 3680.046067] e1000 0000:0a:03.1: eth5: Intel(R) PRO/1000 Network Connection [ 3682.132415] e1000: eth0 NIC Link is Up 100 Mbps Half Duplex, Flow Control: None [ 3682.224423] e1000: eth1 NIC Link is Up 100 Mbps Half Duplex, Flow Control: None [ 3682.316385] e1000: eth2 NIC Link is Up 100 Mbps Half Duplex, Flow Control: None [ 3682.408391] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None [ 3682.500396] e1000: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None [ 3682.708401] e1000: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX At first I thought it was the NIC drivers but I'm not so sure. I really have no idea where else to look at the moment. Any help is greatly appreciated as I'm struggling with this. If you need more information just ask. Thanks! [1]http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/Documentation/networking/e1000.txt?v=2.6.11.8 [2] http://support.dell.com/support/edocs/systems/pe2850/en/ug/t1390aa.htm

    Read the article

< Previous Page | 1 2