Search Results

Search found 2 results on 1 pages for 'danneh3826'.

Page 1/1 | 1 

  • Slow filesystem access

    - by danneh3826
    I'm trying to diagnose a slow filesystem issue on a server I look after. It's been ongoing for quite some time, and I've run out of ideas as to what I can try. Here's the thick of it. The server itself is a Dell Poweredge T310. It has 4 SAS hard drives in it, configured at RAID5, and is running Citrix XenServer 5.6. The VM is a (relatively) old Debian 5.0.6 installation. It's given 4 cores, and 4Gb's of RAM. It has 3 volumes. A 10Gb volume (ext3) for the system, 980Gb volume (xfs) for data (~94% full), and another 200Gb volume (xfs) for data (~13% full). Now here's the weird thing. Read/write access to the 980Gb volume is really slow. I might get 5Mb/s out of it if I'm lucky. At first I figured it was actually disk access in the system, or at a hypervisor level, but ruled that out entirely as other VMs on the same host are running perfectly fine (a good couple hundred Mb/s disk r/w access). So then I started to target this particular VM. I started thinking it was XFS, but to prove it I wasn't going to attempt to change the filesystem on the 980Gb drive, with years and years of billions of files on there. So I provisioned the 200Gb drive, and did the same read/write test (basically dd), and got a good couple hundred Mb/s r/w access to it. So that ruled out the VM, the hardware, and the filesystem type. There's also a lot of these in /var/log/kern.log; (sorry, this is quite long) Sep 4 10:16:59 uriel kernel: [32571790.564689] httpd: page allocation failure. order:5, mode:0x4020 Sep 4 10:16:59 uriel kernel: [32571790.564693] Pid: 7318, comm: httpd Not tainted 2.6.32-4-686-bigmem #1 Sep 4 10:16:59 uriel kernel: [32571790.564696] Call Trace: Sep 4 10:16:59 uriel kernel: [32571790.564705] [<c1092a4d>] ? __alloc_pages_nodemask+0x476/0x4e0 Sep 4 10:16:59 uriel kernel: [32571790.564711] [<c1092ac3>] ? __get_free_pages+0xc/0x17 Sep 4 10:16:59 uriel kernel: [32571790.564716] [<c10b632e>] ? __kmalloc+0x30/0x128 Sep 4 10:16:59 uriel kernel: [32571790.564722] [<c11dd774>] ? pskb_expand_head+0x4f/0x157 Sep 4 10:16:59 uriel kernel: [32571790.564727] [<c11ddbbf>] ? __pskb_pull_tail+0x41/0x1fb Sep 4 10:16:59 uriel kernel: [32571790.564732] [<c11e4882>] ? dev_queue_xmit+0xe4/0x38e Sep 4 10:16:59 uriel kernel: [32571790.564738] [<c1205902>] ? ip_finish_output+0x0/0x5c Sep 4 10:16:59 uriel kernel: [32571790.564742] [<c12058c7>] ? ip_finish_output2+0x187/0x1c2 Sep 4 10:16:59 uriel kernel: [32571790.564747] [<c1204dc8>] ? ip_local_out+0x15/0x17 Sep 4 10:16:59 uriel kernel: [32571790.564751] [<c12055a9>] ? ip_queue_xmit+0x31e/0x379 Sep 4 10:16:59 uriel kernel: [32571790.564758] [<c1279a90>] ? _spin_lock_bh+0x8/0x1e Sep 4 10:16:59 uriel kernel: [32571790.564767] [<eda15a8d>] ? __nf_ct_refresh_acct+0x66/0xa4 [nf_conntrack] Sep 4 10:16:59 uriel kernel: [32571790.564773] [<c103bf42>] ? _local_bh_enable_ip+0x16/0x6e Sep 4 10:16:59 uriel kernel: [32571790.564779] [<c1214593>] ? tcp_transmit_skb+0x595/0x5cc Sep 4 10:16:59 uriel kernel: [32571790.564785] [<c1005c4f>] ? xen_restore_fl_direct_end+0x0/0x1 Sep 4 10:16:59 uriel kernel: [32571790.564791] [<c12165ea>] ? tcp_write_xmit+0x7a3/0x874 Sep 4 10:16:59 uriel kernel: [32571790.564796] [<c121203a>] ? tcp_ack+0x1611/0x1802 Sep 4 10:16:59 uriel kernel: [32571790.564801] [<c10055ec>] ? xen_force_evtchn_callback+0xc/0x10 Sep 4 10:16:59 uriel kernel: [32571790.564806] [<c121392f>] ? tcp_established_options+0x1d/0x8b Sep 4 10:16:59 uriel kernel: [32571790.564811] [<c1213be4>] ? tcp_current_mss+0x38/0x53 Sep 4 10:16:59 uriel kernel: [32571790.564816] [<c1216701>] ? __tcp_push_pending_frames+0x1e/0x50 Sep 4 10:16:59 uriel kernel: [32571790.564821] [<c1212246>] ? tcp_data_snd_check+0x1b/0xd2 Sep 4 10:16:59 uriel kernel: [32571790.564825] [<c1212de3>] ? tcp_rcv_established+0x5d0/0x626 Sep 4 10:16:59 uriel kernel: [32571790.564831] [<c121902c>] ? tcp_v4_do_rcv+0x15f/0x2cf Sep 4 10:16:59 uriel kernel: [32571790.564835] [<c1219561>] ? tcp_v4_rcv+0x3c5/0x5c0 Sep 4 10:16:59 uriel kernel: [32571790.564841] [<c120197e>] ? ip_local_deliver_finish+0x10c/0x18c Sep 4 10:16:59 uriel kernel: [32571790.564846] [<c12015a4>] ? ip_rcv_finish+0x2c4/0x2d8 Sep 4 10:16:59 uriel kernel: [32571790.564852] [<c11e3b71>] ? netif_receive_skb+0x3bb/0x3d6 Sep 4 10:16:59 uriel kernel: [32571790.564864] [<ed823efc>] ? xennet_poll+0x9b8/0xafc [xen_netfront] Sep 4 10:16:59 uriel kernel: [32571790.564869] [<c11e40ee>] ? net_rx_action+0x96/0x194 Sep 4 10:16:59 uriel kernel: [32571790.564874] [<c103bd4c>] ? __do_softirq+0xaa/0x151 Sep 4 10:16:59 uriel kernel: [32571790.564878] [<c103be24>] ? do_softirq+0x31/0x3c Sep 4 10:16:59 uriel kernel: [32571790.564883] [<c103befa>] ? irq_exit+0x26/0x58 Sep 4 10:16:59 uriel kernel: [32571790.564890] [<c118ff9f>] ? xen_evtchn_do_upcall+0x12c/0x13e Sep 4 10:16:59 uriel kernel: [32571790.564896] [<c1008c3f>] ? xen_do_upcall+0x7/0xc Sep 4 10:16:59 uriel kernel: [32571790.564899] Mem-Info: Sep 4 10:16:59 uriel kernel: [32571790.564902] DMA per-cpu: Sep 4 10:16:59 uriel kernel: [32571790.564905] CPU 0: hi: 0, btch: 1 usd: 0 Sep 4 10:16:59 uriel kernel: [32571790.564908] CPU 1: hi: 0, btch: 1 usd: 0 Sep 4 10:16:59 uriel kernel: [32571790.564911] CPU 2: hi: 0, btch: 1 usd: 0 Sep 4 10:16:59 uriel kernel: [32571790.564914] CPU 3: hi: 0, btch: 1 usd: 0 Sep 4 10:16:59 uriel kernel: [32571790.564916] Normal per-cpu: Sep 4 10:16:59 uriel kernel: [32571790.564919] CPU 0: hi: 186, btch: 31 usd: 175 Sep 4 10:16:59 uriel kernel: [32571790.564922] CPU 1: hi: 186, btch: 31 usd: 165 Sep 4 10:16:59 uriel kernel: [32571790.564925] CPU 2: hi: 186, btch: 31 usd: 30 Sep 4 10:16:59 uriel kernel: [32571790.564928] CPU 3: hi: 186, btch: 31 usd: 140 Sep 4 10:16:59 uriel kernel: [32571790.564931] HighMem per-cpu: Sep 4 10:16:59 uriel kernel: [32571790.564933] CPU 0: hi: 186, btch: 31 usd: 159 Sep 4 10:16:59 uriel kernel: [32571790.564936] CPU 1: hi: 186, btch: 31 usd: 22 Sep 4 10:16:59 uriel kernel: [32571790.564939] CPU 2: hi: 186, btch: 31 usd: 24 Sep 4 10:16:59 uriel kernel: [32571790.564942] CPU 3: hi: 186, btch: 31 usd: 13 Sep 4 10:16:59 uriel kernel: [32571790.564947] active_anon:485974 inactive_anon:121138 isolated_anon:0 Sep 4 10:16:59 uriel kernel: [32571790.564948] active_file:75215 inactive_file:79510 isolated_file:0 Sep 4 10:16:59 uriel kernel: [32571790.564949] unevictable:0 dirty:516 writeback:15 unstable:0 Sep 4 10:16:59 uriel kernel: [32571790.564950] free:230770 slab_reclaimable:36661 slab_unreclaimable:21249 Sep 4 10:16:59 uriel kernel: [32571790.564952] mapped:20016 shmem:29450 pagetables:5600 bounce:0 Sep 4 10:16:59 uriel kernel: [32571790.564958] DMA free:2884kB min:72kB low:88kB high:108kB active_anon:0kB inactive_anon:0kB active_file:5692kB inactive_file:724kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15872kB mlocked:0kB dirty:8kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:5112kB slab_unreclaimable:156kB kernel_stack:56kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Sep 4 10:16:59 uriel kernel: [32571790.564964] lowmem_reserve[]: 0 698 4143 4143 Sep 4 10:16:59 uriel kernel: [32571790.564977] Normal free:143468kB min:3344kB low:4180kB high:5016kB active_anon:56kB inactive_anon:2068kB active_file:131812kB inactive_file:131728kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:715256kB mlocked:0kB dirty:156kB writeback:0kB mapped:308kB shmem:4kB slab_reclaimable:141532kB slab_unreclaimable:84840kB kernel_stack:1928kB pagetables:22400kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Sep 4 10:16:59 uriel kernel: [32571790.564983] lowmem_reserve[]: 0 0 27559 27559 Sep 4 10:16:59 uriel kernel: [32571790.564995] HighMem free:776728kB min:512kB low:4636kB high:8760kB active_anon:1943840kB inactive_anon:482484kB active_file:163356kB inactive_file:185588kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3527556kB mlocked:0kB dirty:1900kB writeback:60kB mapped:79756kB shmem:117796kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Sep 4 10:16:59 uriel kernel: [32571790.565001] lowmem_reserve[]: 0 0 0 0 Sep 4 10:16:59 uriel kernel: [32571790.565011] DMA: 385*4kB 16*8kB 3*16kB 9*32kB 6*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2900kB Sep 4 10:16:59 uriel kernel: [32571790.565032] Normal: 21505*4kB 6508*8kB 273*16kB 24*32kB 3*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 143412kB Sep 4 10:16:59 uriel kernel: [32571790.565054] HighMem: 949*4kB 8859*8kB 7063*16kB 6186*32kB 4631*64kB 727*128kB 6*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 776604kB Sep 4 10:16:59 uriel kernel: [32571790.565076] 198980 total pagecache pages Sep 4 10:16:59 uriel kernel: [32571790.565079] 14850 pages in swap cache Sep 4 10:16:59 uriel kernel: [32571790.565082] Swap cache stats: add 2556273, delete 2541423, find 82961339/83153719 Sep 4 10:16:59 uriel kernel: [32571790.565085] Free swap = 250592kB Sep 4 10:16:59 uriel kernel: [32571790.565087] Total swap = 385520kB Sep 4 10:16:59 uriel kernel: [32571790.575454] 1073152 pages RAM Sep 4 10:16:59 uriel kernel: [32571790.575458] 888834 pages HighMem Sep 4 10:16:59 uriel kernel: [32571790.575461] 11344 pages reserved Sep 4 10:16:59 uriel kernel: [32571790.575463] 1090481 pages shared Sep 4 10:16:59 uriel kernel: [32571790.575465] 737188 pages non-shared Now, I've no idea what this means. There's plenty of free memory; total used free shared buffers cached Mem: 4247232 3455904 791328 0 5348 736412 -/+ buffers/cache: 2714144 1533088 Swap: 385520 131004 254516 Though now I see the swap is relatively low in size, but would that matter? I've been starting to think about fragmentation, or inode usage on that large partition, but a recent fsck on it showed is as only like 0.5% fragmented. Which leaves me with inode usage, but how much of an effect really would a large inode table or filesystem TOC have? I've love to hear people's opinions on this. It's driving me potty! df -h output; Filesystem Size Used Avail Use% Mounted on /dev/xvda1 9.5G 6.6G 2.4G 74% / tmpfs 2.1G 0 2.1G 0% /lib/init/rw udev 10M 520K 9.5M 6% /dev tmpfs 2.1G 0 2.1G 0% /dev/shm /dev/xvdb 980G 921G 59G 94% /data

    Read the article

  • Nagios DNX plugins

    - by danneh3826
    I'm toying with the idea of multiple Nagios instances setup to monitor our infrastructure. I've looked at all the various methods of distributed Nagios checks, and I think DNX comes out the closest. DNX handles failure of worker nodes, that's fine. What happens if the main DNX server fails though? Is there a way to replicate the server too? I'm using AWS EC2 primarily, so I can utilise Elastic Load Balancing for the web UI, but I need to be able to handle the AZ where the monitoring server is to fail over, and essentially for a second to pick up the checking load (active/passive, active/active, so long as it doesn't fail completely) The other thing I'm trying to solve is an issue with routing. What I'd like is to have multiple nodes report a fault before Nagios confirms it as critical. Not the NRPE checks, as they're pretty self explanitory, but things more like check_ping. I often have routing issues out of AWS to certain datacenters, so Nagios can often report bad/no ping/timeout as a critical issue, even though the machine in question is working fine. Would it be possible to have a setup where a worker complains a service check is critical, and have a second worker node (positioned in another datacenter/AZ) also report the service as critical before the Nagios central server issues a critical alert? I realise I might be asking a bit much (how far down the line do you go setting up failover systems before it starts to get ridiculous), however surely someone must have thought of this scenario when developing DNX?

    Read the article

1