raid0 - Page 5 - Developer IT

Intel Rapid Storage / Smart Response SSD caching issue

- by goober

Background Recently built my own PC. It works! Almost. It's been a while since getting into the guts of these things, so I'm familiar but may be missing something simple. FYI, I don't care about blowing the OS away -- it's brand new and we can go back from scratch as many times as necessary. Goal / Issue I'd like to use the SSD to take advantage of Intel's Smart Response technology (allows the SSD to act as a cache for HDDs) I would like the SSD cache to act as a cache for my HDDs, which I would like to be in a RAID1 array (so I get the speed from the SSD and the redundancy from the RAID1) However, Windows only sees the drive in device manager (not as a drive), so I'm unsure what to do about that. Related: as far as I know, for this to work, the drives all have to be in a single RAID array (i.e. a RAID0 pairing of the SSD and the RAID1 HDD array). However, when attempting this at the BIOS level, I am told there is not enough space for an array. Steps so Far Moved the SSD onto the Intel controller (I'd had it on the Marvel 6.0 controller instead of the Intel controller, so the BIOS was only seeing it in a strange way) Updated the BIOS of the motherboard to the latest version Reinstalled Intel's RST (iRST?) software several times, as some forums reported it working after reinstalling 3 times (which does not inspire confidence). Checked Intel storage: it does see the SSD as a physical, non-RAID device. However, it says no space exists if I try to create an array. Checked the BIOS: it does not show up in the boot order, but is an option that can be selected under boot options. Tried the firmware update for that model. Issue: the firmware CD doesn't detect a drive; maybe the Intel storage controller is making it difficult? moved the ssd to the marvel controller. The firmware update cd appeared to hang while searching for drives. swapped out the SATA cable for the manufacturer's and moved back to the intel storage controller. Noticed at this point that in the Intel RST software, a device DOES show up in addition to the RAID set -- only shown as a "60 GB internal disk". Windows doesn't appear to see it as a drive, but it does still show in device manager. Move SSD to port from 0-3 on MOBO and set SATA mode to IDE (after disconnecting RAID1 config) to allow the firmware update to work. Firmware was already at the latest version. Next Steps ? Components involved ASUS P8Z68-V PRO motherboard (Intel Z68 Chipset) Intel i7 2600k Processor 2 x 1TB 7200 RPM HDDs 64 GB Crucial M4 SSD (M4-CT064M4SSD2) For Reference -- Storage Configuration Intel 3 gbps Intel 3gbps Intel 6gbps Marvel 6gbps +----------+ +----------+ +----------+ +----------+ | | <----+ | | +-+ | | | |----------| | |----------| |-|--------| |----------| | | | | + | | | | | | +----------+ | +--|-------+ +-|--------+ +----------+ | | | + v v | 1 TB HDD 64 GB SSD + +> 1 TB HDD For Reference -- Intel RST (v10.8.0.1003) Screenshot Don't mind the "rebuilding" -- knocked a power cable out at one point; it's doing its job, not an indicator of a bad HDD. Any thoughts? Thanks in advance for any help!

Read the article

Raid 1 array won't assemble after power outage. How do I fix this ext4 mirror?

- by Forkrul Assail

Two ext4 drives on Raid 1 with mdadm won't reassemble after the power went out for an extended period (UPS drained). After turning the machine back on, mdadm said that the array was degraded, after which it took about 2 days for a full resync, which completed without problems. On trying to remount the array I get: mount: you must specify the filesystem type cat /etc/fstab lines relevant to setup: /dev/md127 /media/mediapool ext4 defaults 0 0 dmesg | tail (on trying to mount) says: [ 1050.818782] EXT3-fs (md127): error: can't find ext3 filesystem on dev md127. [ 1050.849214] EXT4-fs (md127): VFS: Can't find ext4 filesystem [ 1050.944781] FAT-fs (md127): invalid media value (0x00) [ 1050.944782] FAT-fs (md127): Can't find a valid FAT filesystem [ 1058.272787] EXT2-fs (md127): error: can't find an ext2 filesystem on dev md127. cat /proc/mdstat says: Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md127 : active (auto-read-only) raid1 sdj[2] sdi[0] 2930135360 blocks super 1.2 [2/2] [UU] unused devices: <none> fsck /dev/md127 says: fsck from util-linux 2.20.1 e2fsck 1.42 (29-Nov-2011) fsck.ext2: Superblock invalid, trying backup blocks... fsck.ext2: Bad magic number in super-block while trying to open /dev/md127 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> mdadm -E /dev/sdi gives me: /dev/sdi: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 37ac1824:eb8a21f6:bd5afd6d:96da6394 Name : sojourn:33 Creation Time : Sat Nov 10 10:43:52 2012 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 5860271016 (2794.40 GiB 3000.46 GB) Array Size : 2930135360 (2794.39 GiB 3000.46 GB) Used Dev Size : 5860270720 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : 3e6e9a4f:6c07ab3d:22d47fce:13cecfd0 Update Time : Tue Nov 13 20:34:18 2012 Checksum : f7d10db9 - correct Events : 27 Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing) boot@boot ~ $ sudo mdadm -E /dev/sdj /dev/sdj: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 37ac1824:eb8a21f6:bd5afd6d:96da6394 Name : sojourn:33 Creation Time : Sat Nov 10 10:43:52 2012 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 5860271016 (2794.40 GiB 3000.46 GB) Array Size : 2930135360 (2794.39 GiB 3000.46 GB) Used Dev Size : 5860270720 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : 7fb84af4:e9295f7b:ede61f27:bec0cb57 Update Time : Tue Nov 13 20:34:18 2012 Checksum : b9d17fef - correct Events : 27 Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing) machine@user ~ dmesg | tail [ 61.785866] init: alsa-restore main process (2736) terminated with status 99 [ 68.433548] eth0: no IPv6 routers present [ 534.142511] EXT4-fs (sdi): ext4_check_descriptors: Block bitmap for group 0 not in group (block 2838187772)! [ 534.142518] EXT4-fs (sdi): group descriptors corrupted! [ 546.418780] EXT2-fs (sdi): error: couldn't mount because of unsupported optional features (240) [ 549.654127] EXT3-fs (sdi): error: couldn't mount because of unsupported optional features (240) Since this is Raid 1 it was suggested that I try and mount or fsck the drives separately. After a long fsck on one drive, it ended with this as tail: Illegal double indirect block (2298566437) in inode 39717736. CLEARED. Illegal block #4231180 (2611866932) in inode 39717736. CLEARED. Error storing directory block information (inode=39717736, block=0, num=1092368): Memory allocation failed Recreate journal? yes Creating journal (32768 blocks): Done. *** journal has been re-created - filesystem is now ext3 again *** The drive however still doesn't want to mount: dmesg | tail [ 170.674659] md: export_rdev(sdc) [ 170.675152] md: export_rdev(sdc) [ 195.275288] md: export_rdev(sdc) [ 195.275876] md: export_rdev(sdc) [ 1338.540092] CE: hpet increased min_delta_ns to 30169 nsec [26125.734105] EXT4-fs (sdc): ext4_check_descriptors: Checksum for group 0 failed (43502!=37987) [26125.734115] EXT4-fs (sdc): group descriptors corrupted! [26182.325371] EXT3-fs (sdc): error: couldn't mount because of unsupported optional features (240) [27083.316519] EXT4-fs (sdc): ext4_check_descriptors: Checksum for group 0 failed (43502!=37987) [27083.316530] EXT4-fs (sdc): group descriptors corrupted! Please help me fix this. I never in my wildest nightmares thought a complete mirror would die this badly. Am I missing something? Suggestions on fixing this? Could someone explain why it would resync after the powerout, only to seemingly nuke the drive? Thanks for reading. Any help much appreciated. I've tried everything I can think of, including booting and filesystem checking with SystemRescue and Ubuntu liveboot discs.

Read the article

XFS disk becomes unavailable after a while

- by Guard

Ubuntu 12.04 (but the same was on 11.10 before upgrading) WD MyBook, 2TB, no RAID (or RAID0, not completely sure, anyway no mirroring, both 1TB disks are in use, mounted as a single device). Formatted to XFS, normally used for big movie files. Connected to Firewire 800. At some point the LED started going up and down as when constantly reading/writing. The device gives access error. When unplugged (cable, then holding the power button for a while, then unplugging the power) and re-connected becomes available. xfs_check with no results. xfs_repair did something, but looks like didn't fix any error. Then after a massive read (checking 1.5GB torrent file for integrity) becomes unavailable again. Any ideas what's wrong? Drives? Cables? Motherboard? OS? UPD: not sure how relevant this is, but here are dmesg output [14380.632816] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled [14380.633356] SGI XFS Quota Management subsystem [14421.812220] firewire_core: phy config: card 0, new root=ffc1, gap_count=5 [14441.890596] firewire_core: phy config: card 0, new root=ffc1, gap_count=5 [14441.896858] firewire_core: phy config: card 0, new root=ffc1, gap_count=5 [14453.895347] firewire_core: created device fw1: GUID 0090a99500a35518, S400, 9 config ROM retries [14453.904818] scsi6 : SBP-2 IEEE-1394 [14453.905014] scsi7 : SBP-2 IEEE-1394 [14454.139993] firewire_sbp2: fw1.0: logged in to LUN 0000 (0 retries) [14454.158769] scsi 6:0:0:0: Direct-Access WD My Book 1015 PQ: 0 ANSI: 4 [14454.159251] sd 6:0:0:0: Attached scsi generic sg3 type 0 [14454.162391] firewire_sbp2: fw1.1: logged in to LUN 0001 (0 retries) [14454.167453] sd 6:0:0:0: [sdc] 3907017568 512-byte logical blocks: (2.00 TB/1.81 TiB) [14454.178822] sd 6:0:0:0: [sdc] Write Protect is off [14454.178826] sd 6:0:0:0: [sdc] Mode Sense: 10 00 00 00 [14454.186830] scsi 7:0:0:1: Enclosure WD My Book Device 1015 PQ: 0 ANSI: 4 [14454.186995] scsi 7:0:0:1: Attached scsi generic sg4 type 13 [14454.190078] sd 6:0:0:0: [sdc] Cache data unavailable [14454.190087] sd 6:0:0:0: [sdc] Assuming drive cache: write through [14454.202176] sd 6:0:0:0: [sdc] Cache data unavailable [14454.202185] sd 6:0:0:0: [sdc] Assuming drive cache: write through [14454.239940] sdc: [mac] sdc1 sdc2 sdc3 sdc4 [14454.271262] sd 6:0:0:0: [sdc] Cache data unavailable [14454.271270] sd 6:0:0:0: [sdc] Assuming drive cache: write through [14454.271354] sd 6:0:0:0: [sdc] Attached SCSI disk [14454.272149] ses 7:0:0:1: Attached Enclosure device [14606.090024] XFS (sdc3): Mounting Filesystem [14612.048343] XFS (sdc3): Starting recovery (logdev: internal) [14620.697636] XFS (sdc3): Ending recovery (logdev: internal) [14748.120957] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx [14748.120963] e1000e 0000:00:19.0: eth0: 10/100 speed: disabling TSO [14752.568382] uhci_hcd 0000:00:1a.0: PCI INT A disabled [14752.568579] uhci_hcd 0000:00:1a.1: PCI INT B disabled [14752.568738] ehci_hcd 0000:00:1a.7: PCI INT C disabled [14752.568779] ehci_hcd 0000:00:1a.7: PME# enabled [14752.584526] uhci_hcd 0000:00:1d.1: PCI INT B disabled [14752.584689] uhci_hcd 0000:00:1d.2: PCI INT C disabled [14752.680079] ehci_hcd 0000:00:1a.7: BAR 0: set to [mem 0xe4641000-0xe46413ff] (PCI address [0xe4641000-0xe46413ff]) [14752.680104] ehci_hcd 0000:00:1a.7: restoring config space at offset 0xf (was 0x300, writing 0x30b) [14752.680136] ehci_hcd 0000:00:1a.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002) [14752.680170] ehci_hcd 0000:00:1a.7: PME# disabled [14752.680182] ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18 [14752.680190] ehci_hcd 0000:00:1a.7: setting latency timer to 64 [14752.710334] uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [14752.710342] uhci_hcd 0000:00:1a.0: setting latency timer to 64 [14752.749186] uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17 [14752.749194] uhci_hcd 0000:00:1a.1: setting latency timer to 64 [14752.790231] uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 22 (level, low) -> IRQ 22 [14752.790239] uhci_hcd 0000:00:1d.1: setting latency timer to 64 [14752.829170] uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18 [14752.829178] uhci_hcd 0000:00:1d.2: setting latency timer to 64

Read the article

e2fsck / resize2fs problems

- by BlakBat

I've got 6 drives (each 1.5T, all same model and firmware revision) that are part of a RAID5 array. The RAID5 makes a LVM volume group and a logical group. The latter contains only one ext3 partition. I've recently ran: e2fsck -f /dev/vg03/lv01 && resize2fs -M /dev/vg03/lv01 which exited without an error. Now when I try to mount /dev/vg03/lv01 I get: EXT3-fs error (device dm-0): ext3_check_descriptors: Block bitmap for group 30533 not in group (block 1000532368)! EXT3-fs: group descriptors corrupted! How do I get out of this predicament? This is all the info I can currently give you: fdisk -l /dev/sd[cdefgh] shows (correctly) that they are "Linux raid autodetect" but fdisk now shows: fdisk -l /dev/md0 Disk /dev/md0: 7501.5 GB, 7501495664640 bytes ... Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table (instead of a LVM type partition) fdisk -l /dev/vg03/lv01 Disk /dev/vg03/lv01: 7501.5 GB, 7501491732480 bytes ... Disk identifier: 0x00000000 Disk /dev/vg03/lv01 doesn't contain a valid partition table (instead of a ext3 type partition) I've tried: e2fsck -fy /dev/vg03/lv01 e2fsck 1.41.12 (17-May-2010) e2fsck: Group descriptors look bad... trying backup blocks... Block bitmap for group 30533 is not in group. (block 1000532368) Relocate? yes Inode bitmap for group 30533 is not in group. (block 1000532369) Relocate? yes Pass 1: Checking inodes, blocks, and sizes Relocating group 30533's block bitmap to 1000524246... Error allocating 1 contiguous block(s) in block group 30533 for inode bitmap: Could not allocate block in ext2 filesystem e2fsck: aborted Extra information I can give you: cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active (auto-read-only) raid5 sdg1[0] sdh1[5] sdf1[4] sde1[3] sdc1[2] sdd1[1] 7325679360 blocks level 5, 128k chunk, algorithm 2 [6/6] [UUUUUU] bitmap: 1/175 pages [4KB], 4096KB chunk unused devices: Lastly, all smartctl tests (short and extendend) showed no errors on any of the disks. Should I try to resize2fs to grow /dev/vg03/lv01 and redo a e2fsck ? Should I cfdisk /dev/md0 and /dev/vg03/lv01 back to their real types? Thanks in advance for all and any help. 2011-09-20 UPDATE I issued the following commands and was able to remount the partition, but by viewing the size (df) of before and after, it seems that 1Tb of data have gone missing. By checking the MD5SUMS (from an old backup) of some files with the "same" files from the remounted partition, some errors have been detected. Commands issued to remount the partition were: dumpe2fs /dev/vg03/lv01 Block count: 1000491435<br /> Block size: 4096<br /> tune2fs -O ^has_journal /dev/vg03/lv01 resize2fs -p /dev/vg03/lv01 dumpe2fs /dev/vg03/lv01 Block count: 1831418880<br /> Block size: 4096<br /> mount -o ro,noatime /dev/vg03/lv01 /mnt/raid OK... but files have been damaged / gone missing.

Read the article

Reconstructing the disk order in RAID 6 with 7 disks

- by rkotulla

a little background to this question first: I am running a RAID-6 within a QNAP TS869L external RAID/NAS system. I started with 5 disks of 3 TB each back in the day, and later added another 2 disks of 3TB to the RAID. The QNAP internals handled the growing and re-syncing etc, and everything seemd to be perfectly fine. About 2 weeks ago, I had one of the disks (disk #5, disk #2 has gone bad in the mean time) fail, and somehow (I have no idea why), also disks 1 and 2 got kicked out of the array. I replaced disk #5, but the RAID didn't start working again. After some calls to QNAP technical support, they re-created the array (using mdadm --create --force --assume-clean ...), but the resulting array couldn't find a filesystem, and I was kindly referred to contact a data recovery company that I can't afford. After some digging through old log files, resetting the disk to factory default, etc, I found a few errors that were made during this re-create - I wish I still had some of the original metadata, but unfortunately i don't (I definitely learned that lesson). I'm currently at the point where I know the correct chunk-size (64K), metadata-version (1.0; factory default was 0.9, but from what I read 0.9 doesn't handle disks over 2 TB, mine are 3 TB), and I now find the ext4 filesystem that should be on the disks. Only variable left to determine is the right disk order! I started using the description found in answer #4 of "Recover RAID 5 data after created new array instead of re-using" but am a little confused on what the order should be for a proper RAID-6. RAID-5 is pretty well documented in a number of places, but RAID-6 much less so. Also, does the layout, i.e. distribution of parity and data chunks across the disks, change after the growing of the array from 5 to 7 disks, or does the re-sync re-organize them in such a way a native 7-disk RAID-6 would have been? Thanks some more mdadm output that might be helpful: mdadm version: [~] # mdadm --version mdadm - v2.6.3 - 20th August 2007 mdadm details from one of the disks in the array: [~] # mdadm --examine /dev/sda3 /dev/sda3: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 1c1614a5:e3be2fbb:4af01271:947fe3aa Name : 0 Creation Time : Tue Jun 10 10:27:58 2014 Raid Level : raid6 Raid Devices : 7 Used Dev Size : 5857395112 (2793.02 GiB 2998.99 GB) Array Size : 29286975360 (13965.12 GiB 14994.93 GB) Used Size : 5857395072 (2793.02 GiB 2998.99 GB) Super Offset : 5857395368 sectors State : clean Device UUID : 7c572d8f:20c12727:7e88c888:c2c357af Update Time : Tue Jun 10 13:01:06 2014 Checksum : d275c82d - correct Events : 7036 Chunk Size : 64K Array Slot : 0 (0, 1, failed, 3, failed, 5, 6) Array State : Uu_u_uu 2 failed mdadm details for the array in the current disk-order (based on my best guess reconstructed from old log-files) [~] # mdadm --detail /dev/md0 /dev/md0: Version : 01.00.03 Creation Time : Tue Jun 10 10:27:58 2014 Raid Level : raid6 Array Size : 14643487680 (13965.12 GiB 14994.93 GB) Used Dev Size : 2928697536 (2793.02 GiB 2998.99 GB) Raid Devices : 7 Total Devices : 5 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Jun 10 13:01:06 2014 State : clean, degraded Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Chunk Size : 64K Name : 0 UUID : 1c1614a5:e3be2fbb:4af01271:947fe3aa Events : 7036 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 2 0 0 2 removed 3 8 51 3 active sync /dev/sdd3 4 0 0 4 removed 5 8 99 5 active sync /dev/sdg3 6 8 83 6 active sync /dev/sdf3 output from /proc/mdstat (md8, md9, and md13 are internally used RAIDs holding swap, etc; the one I'm after is md0) [~] # more /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] md0 : active raid6 sdf3[6] sdg3[5] sdd3[3] sdb3[1] sda3[0] 14643487680 blocks super 1.0 level 6, 64k chunk, algorithm 2 [7/5] [UU_U_UU] md8 : active raid1 sdg2[2](S) sdf2[3](S) sdd2[4](S) sdc2[5](S) sdb2[6](S) sda2[1] sde2[0] 530048 blocks [2/2] [UU] md13 : active raid1 sdg4[3] sdf4[4] sde4[5] sdd4[6] sdc4[2] sdb4[1] sda4[0] 458880 blocks [8/7] [UUUUUUU_] bitmap: 21/57 pages [84KB], 4KB chunk md9 : active raid1 sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sda1[0] sdb1[1] 530048 blocks [8/7] [UUUUUUU_] bitmap: 37/65 pages [148KB], 4KB chunk unused devices: <none>

Read the article

linux raid 1: right after replacing and syncing one drive, the other disk fails - understanding what is going on with mdstat/mdadm

- by devicerandom

We have an old RAID 1 Linux server (Ubuntu Lucid 10.04), with four partitions. A few days ago /dev/sdb failed, and today we noticed /dev/sda had pre-failure ominous SMART signs (~4000 reallocated sector count). We replaced /dev/sdb this morning and rebuilt the RAID on the new drive, following this guide: http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array Everything went smooth until the very end. When it looked like it was finishing to synchronize the last partition, the other old one failed. At this point I am very unsure of the state of the system. Everything seems working and the files seem to be all accessible, just as if it synchronized everything, but I'm new to RAID and I'm worried about what is going on. The /proc/mdstat output is: Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md3 : active raid1 sdb4[2](S) sda4[0] 478713792 blocks [2/1] [U_] md2 : active raid1 sdb3[1] sda3[2](F) 244140992 blocks [2/1] [_U] md1 : active raid1 sdb2[1] sda2[2](F) 244140992 blocks [2/1] [_U] md0 : active raid1 sdb1[1] sda1[2](F) 9764800 blocks [2/1] [_U] unused devices: <none> The order of [_U] vs [U_]. Why aren't they consistent along all the array? Is the first U /dev/sda or /dev/sdb? (I tried looking on the web for this trivial information but I found no explicit indication) If I read correctly for md0, [_U] should be /dev/sda1 (down) and /dev/sdb1 (up). But if /dev/sda has failed, how can it be the opposite for md3 ? I understand /dev/sdb4 is now spare because probably it failed to synchronize it 100%, but why does it show /dev/sda4 as up? Shouldn't it be [__]? Or [_U] anyway? The /dev/sda drive now cannot even be accessed by SMART anymore apparently, so I wouldn't expect it to be up. What is wrong with my interpretation of the output? I attach also the outputs of mdadm --detail for the four partitions: /dev/md0: Version : 00.90 Creation Time : Fri Jan 21 18:43:07 2011 Raid Level : raid1 Array Size : 9764800 (9.31 GiB 10.00 GB) Used Dev Size : 9764800 (9.31 GiB 10.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Nov 5 17:27:33 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 UUID : a3b4dbbd:859bf7f2:bde36644:fcef85e2 Events : 0.7704 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 2 8 1 - faulty spare /dev/sda1 /dev/md1: Version : 00.90 Creation Time : Fri Jan 21 18:43:15 2011 Raid Level : raid1 Array Size : 244140992 (232.83 GiB 250.00 GB) Used Dev Size : 244140992 (232.83 GiB 250.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Tue Nov 5 17:39:06 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 UUID : 8bcd5765:90dc93d5:cc70849c:224ced45 Events : 0.1508280 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 18 1 active sync /dev/sdb2 2 8 2 - faulty spare /dev/sda2 /dev/md2: Version : 00.90 Creation Time : Fri Jan 21 18:43:19 2011 Raid Level : raid1 Array Size : 244140992 (232.83 GiB 250.00 GB) Used Dev Size : 244140992 (232.83 GiB 250.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Tue Nov 5 17:46:44 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 UUID : 2885668b:881cafed:b8275ae8:16bc7171 Events : 0.2289636 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 19 1 active sync /dev/sdb3 2 8 3 - faulty spare /dev/sda3 /dev/md3: Version : 00.90 Creation Time : Fri Jan 21 18:43:22 2011 Raid Level : raid1 Array Size : 478713792 (456.54 GiB 490.20 GB) Used Dev Size : 478713792 (456.54 GiB 490.20 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 3 Persistence : Superblock is persistent Update Time : Tue Nov 5 17:19:20 2013 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Number Major Minor RaidDevice State 0 8 4 0 active sync /dev/sda4 1 0 0 1 removed 2 8 20 - spare /dev/sdb4 The active sync on /dev/sda4 baffles me. I am worried because if tomorrow morning I have to replace /dev/sda, I want to be sure what should I sync with what and what is going on. I am also quite baffled by the fact /dev/sda decided to fail exactly when the raid finished resyncing. I'd like to understand what is really happening. Thanks a lot for your patience and help. Massimo

Read the article

Hard drive write speed - finding a lighter antivirus?

- by Shingetsu

I recently have been getting a lot of system lag here (for example, the mouse and the display in general take about 15 seconds to react in the worst cases). After a lot of monitoring the resources, I found that the problem mainly happens when too much Disk I/O is being done. Three culprits have been identified: My browser had the highest write I/O with 35,000,000 I/O Write Bytes. Steam had the highest read I/O (when IDLE!!!) with 106,000,000 I/O Read Bytes. My antivirus (in both cases I will soon mention) was the runner up in both cases with: 30,000,000ish write and 80,000,000ish read. The first AV I had was Avast! which I had liked on my previous system. After noticing it taking so much I/O I switched to Panda (supposing it wouldn't use TOO much during idle phase). However it only used a bit less I/O. Just a lot less memory and cpu and somewhat more network. My browser at the moment is Maxthon 3 (which I like a lot). Before this I was running chrome which had similar data and much higher cpu when running in the background was enabled. I'm not going to be running steam all the time and there aren't many alternatives to it. I like my browser very much, but I AM willing to switch if there's an obvious problem (I'm in programming, however I'm not a very good sysadmin, especially not when it comes to windows). Finally, my system almost stops lagging when I turn off the antivirus (and preferably steam) (some remains but once in every 5-6 hours for a few seconds so it isn't a big problem). My question (has a few parts): Is it possible to configure steam to lower it's I/O usage? (and maybe network while we're at it?) Which antivirus (very preferably free) uses lowest I/O while idle (I leave PC alone during active scans so that isn't a problem). Is there an obvious problem with my current browser and, if so, is there a way to fix it or should I switch and, if so, to what? (P.S. I've been on FFox for some time too). Info on system: Windows 7 (32 bit T_T, I am getting a new one in a few months but I want to keep using system during that time though). Hard Drive (main) is a Raid0. (Also have an external 1TB one which contains steam (and steam alone). As such it doesn't get used by much anything other than steam and isn't a very large problem. However steam still uses some I/O of registry) CPU: Intel(R) Core(TM)2 CPU [email protected] RAM: 6GB (3.25GB usable) (this and CPU have little effect as shown in next section) Additional info: Memory usage during problematic times: 44% CPU usage during problematic times: 35% Page File: main drive: system managed. 1TB drive: none. The current system I'm using is about 6 years old and is mainly a place holder while I await the new one in a few months. Final words: this is my 1st post on Super User (this question wouldn't feel right on Stack Overflow where I usually stay). If it doesn't have it's place here please tell me. If anything is wrong with it, same. Edit Technically I'm looking for a live thread detection program with minimal IO usage. I already have good active scan capability: Kaspersky (the free scanner uses the paid database) and MalwareBytes. Edit 2 Noticed another one, it seems that windows media player has been using stuff even when off! Turning it off and restarting now. If the problem is fixed I'll tell you guys. The reason I didn't notice it before was because I didn't have resource manager in front of me at the MOMENT of the problem. Now I did and it was at the very top of the list!

Read the article

Recover RAID 5 data after created new array instead of re-using

- by Brigadieren

Folks please help - I am a newb with a major headache at hand (perfect storm situation). I have a 3 1tb hdd on my ubuntu 11.04 configured as software raid 5. The data had been copied weekly onto another separate off the computer hard drive until that completely failed and was thrown away. A few days back we had a power outage and after rebooting my box wouldn't mount the raid. In my infinite wisdom I entered mdadm --create -f... command instead of mdadm --assemble and didn't notice the travesty that I had done until after. It started the array degraded and proceeded with building and syncing it which took ~10 hours. After I was back I saw that that the array is successfully up and running but the raid is not I mean the individual drives are partitioned (partition type f8 ) but the md0 device is not. Realizing in horror what I have done I am trying to find some solutions. I just pray that --create didn't overwrite entire content of the hard driver. Could someone PLEASE help me out with this - the data that's on the drive is very important and unique ~10 years of photos, docs, etc. Is it possible that by specifying the participating hard drives in wrong order can make mdadm overwrite them? when I do mdadm --examine --scan I get something like ARRAY /dev/md/0 metadata=1.2 UUID=f1b4084a:720b5712:6d03b9e9:43afe51b name=<hostname>:0 Interestingly enough name used to be 'raid' and not the host hame with :0 appended. Here is the 'sanitized' config entries: DEVICE /dev/sdf1 /dev/sde1 /dev/sdd1 CREATE owner=root group=disk mode=0660 auto=yes HOMEHOST <system> MAILADDR root ARRAY /dev/md0 metadata=1.2 name=tanserv:0 UUID=f1b4084a:720b5712:6d03b9e9:43afe51b Here is the output from mdstat cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sdd1[0] sdf1[3] sde1[1] 1953517568 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> fdisk shows the following: fdisk -l Disk /dev/sda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000bf62e Device Boot Start End Blocks Id System /dev/sda1 * 1 9443 75846656 83 Linux /dev/sda2 9443 9730 2301953 5 Extended /dev/sda5 9443 9730 2301952 82 Linux swap / Solaris Disk /dev/sdb: 750.2 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000de8dd Device Boot Start End Blocks Id System /dev/sdb1 1 91201 732572001 8e Linux LVM Disk /dev/sdc: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00056a17 Device Boot Start End Blocks Id System /dev/sdc1 1 60801 488384001 8e Linux LVM Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000ca948 Device Boot Start End Blocks Id System /dev/sdd1 1 121601 976760001 fd Linux raid autodetect Disk /dev/dm-0: 1250.3 GB, 1250254913536 bytes 255 heads, 63 sectors/track, 152001 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/dm-0 doesn't contain a valid partition table Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x93a66687 Device Boot Start End Blocks Id System /dev/sde1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xe6edc059 Device Boot Start End Blocks Id System /dev/sdf1 1 121601 976760001 fd Linux raid autodetect Disk /dev/md0: 2000.4 GB, 2000401989632 bytes 2 heads, 4 sectors/track, 488379392 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 524288 bytes / 1048576 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table Per suggestions I did clean up the superblocks and re-created the array with --assume-clean option but with no luck at all. Is there any tool that will help me to revive at least some of the data? Can someone tell me what and how the mdadm --create does when syncs to destroy the data so I can write a tool to un-do whatever was done? After the re-creating of the raid I run fsck.ext4 /dev/md0 and here is the output root@tanserv:/etc/mdadm# fsck.ext4 /dev/md0 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Superblock invalid, trying backup blocks... fsck.ext4: Bad magic number in super-block while trying to open /dev/md0 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 Per Shanes' suggestion I tried root@tanserv:/home/mushegh# mkfs.ext4 -n /dev/md0 mke2fs 1.41.14 (22-Dec-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=128 blocks, Stripe width=256 blocks 122101760 inodes, 488379392 blocks 24418969 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=0 14905 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 and run fsck.ext4 with every backup block but all returned the following: root@tanserv:/home/mushegh# fsck.ext4 -b 214990848 /dev/md0 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Invalid argument while trying to open /dev/md0 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> Any suggestions? Regards!

Read the article

Why is my mdadm raid-1 recovery so slow?

- by dimmer

On a system I'm running Ubuntu 10.04. My raid-1 restore started out fast but quickly became ridiculously slow (at this rate the restore will take 150 days!): dimmer@paimon:~$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdc1[2] sdb1[1] 1953513408 blocks [2/1] [_U] [====>................] recovery = 24.4% (477497344/1953513408) finish=217368.0min speed=113K/sec unused devices: <none> Eventhough I have set the kernel variables to reasonably quick values: dimmer@paimon:~$ cat /proc/sys/dev/raid/speed_limit_min 1000000 dimmer@paimon:~$ cat /proc/sys/dev/raid/speed_limit_max 100000000 I am using 2 2.0TB Western Digital Hard Disks, WDC WD20EARS-00M and WDC WD20EARS-00J. I believe they have been partitioned such that their sectors are aligned. dimmer@paimon:/sys$ sudo parted /dev/sdb GNU Parted 2.2 Using /dev/sdb Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA WDC WD20EARS-00M (scsi) Disk /dev/sdb: 2000GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 2000GB 2000GB ext4 (parted) unit s (parted) p Number Start End Size File system Name Flags 1 2048s 3907028991s 3907026944s ext4 (parted) q dimmer@paimon:/sys$ sudo parted /dev/sdc GNU Parted 2.2 Using /dev/sdc Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA WDC WD20EARS-00J (scsi) Disk /dev/sdc: 2000GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 2000GB 2000GB ext4 I am beginning to think that I have a hardware problem, otherwise I can't imagine why the mdadm restore should be so slow. I have done a benchmark on /dev/sdc using Ubuntu's disk utility GUI app, and the results looked normal so I know that sdc has the capability to write faster than this. I also had the same problem on a similar WD drive that I RMAd because of bad sectors. I suppose it's possible they sent me a replacement with bad sectors too, although there are no SMART values showing them yet. Any ideas? Thanks. As requested, output of top sorted by cpu usage (notice there is ~0 cpu usage). iowait is also zero which seems strange: top - 11:35:13 up 2 days, 9:40, 3 users, load average: 2.87, 2.58, 2.30 Tasks: 142 total, 1 running, 141 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3096304k total, 1482164k used, 1614140k free, 617672k buffers Swap: 1526132k total, 0k used, 1526132k free, 535416k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 45 root 20 0 0 0 0 S 0 0.0 2:17.02 scsi_eh_0 1 root 20 0 2808 1752 1204 S 0 0.1 0:00.46 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:00.02 migration/0 4 root 20 0 0 0 0 S 0 0.0 0:00.17 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 6 root RT 0 0 0 0 S 0 0.0 0:00.02 migration/1 ... dmesg errors, definitely looking like hardware: [202884.000157] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [202884.007015] ata5.00: failed command: FLUSH CACHE EXT [202884.013728] ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [202884.013730] res 40/00:00:ff:59:2e/00:00:35:00:00/e0 Emask 0x4 (timeout) [202884.033667] ata5.00: status: { DRDY } [202884.040329] ata5: hard resetting link [202889.400050] ata5: link is slow to respond, please be patient (ready=0) [202894.048087] ata5: COMRESET failed (errno=-16) [202894.054663] ata5: hard resetting link [202899.412049] ata5: link is slow to respond, please be patient (ready=0) [202904.060107] ata5: COMRESET failed (errno=-16) [202904.066646] ata5: hard resetting link [202905.840056] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [202905.849178] ata5.00: configured for UDMA/133 [202905.849188] ata5: EH complete [203899.000292] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [203899.007096] ata5.00: failed command: IDENTIFY DEVICE [203899.013841] ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in [203899.013843] res 40/00:00:ff:f9:f6/00:00:38:00:00/e0 Emask 0x4 (timeout) [203899.041232] ata5.00: status: { DRDY } [203899.048133] ata5: hard resetting link [203899.816134] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [203899.826062] ata5.00: configured for UDMA/133 [203899.826079] ata5: EH complete [204375.000200] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [204375.007421] ata5.00: failed command: IDENTIFY DEVICE [204375.014799] ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in [204375.014800] res 40/00:00:ff:0c:0f/00:00:39:00:00/e0 Emask 0x4 (timeout) [204375.044374] ata5.00: status: { DRDY } [204375.051842] ata5: hard resetting link [204380.408049] ata5: link is slow to respond, please be patient (ready=0) [204384.440076] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [204384.449938] ata5.00: configured for UDMA/133 [204384.449955] ata5: EH complete [204395.988135] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [204395.988140] ata5.00: failed command: IDENTIFY DEVICE [204395.988147] ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in [204395.988149] res 40/00:00:ff:0c:0f/00:00:39:00:00/e0 Emask 0x4 (timeout) [204395.988151] ata5.00: status: { DRDY } [204395.988156] ata5: hard resetting link [204399.320075] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [204399.330487] ata5.00: configured for UDMA/133 [204399.330503] ata5: EH complete

Read the article

webserver horrible slow, sometimes incredible fast

- by dhanke

i am running a small community ( 6000+ Members ) on a non-virtual 64-bit ubuntu 11.04 system. I am not a Linux-pro, not even advanced, i just tried to setup a webserver, which does nothing special actually. Delivering some dynamic PHP and RoR websites is its task. So it might be that my configuration files do look horrible bad. Also, i might use the wrong vocabulary, so in doubt, please ask. Having a current all-time record of 520 registered users (board-accounts, no system-users) online at same time, average server-load is about 2.0 - 5.0. Meantime (~250 users) average server load value is at about 0.4 - 0.8, sometimes, on some expensive searches a bit higher. everything fine. From time to time however, the load increases up to 120 (120.0, not 12.0 ;) ). In this time, its hard to even connect via SSH, but when i reach the server, and use top/htop/iotop to see whats happening, i cannot identify any process causing high CPU load. iotop tells me about a current reading/writing speed of about approx. 70kb/s, which is quite equal to power-off i think. Memory-Usage is max. at ~ 12GB of 16GB, so swap remains empty. now the odd (at least for me:) waiting some minutes ( since i always get a bit into a panic when this happens, it feels like 5 minutes, but i suppose its more like 20-30 minutes) and the server is back to normal. everything continues as normal. another odd fact: when i run hdparm -tT /dev/sda, i get answer like: /dev/sda: Timing cached reads: 7180 MB in 2.00 seconds = 3591.13 MB/sec Timing buffered disk reads: 348 MB in 3.02 seconds = 115.41 MB/sec when i run the same command while the server is "frozen", the answer is like /dev/sda: <- takes about 5 minutes until this line appears Timing cached reads: 7180 MB in 2.00 seconds = 3591.13 MB/sec <- 5 more minutes Timing buffered disk reads: 348 MB in 3.02 seconds = 115.41 MB/sec <- another 5 minutes so the values are the same, but the quoted time is completely wrong. using time command as prefix also tells me that ~ 15 minutes were used. I searched in dmesg, /var/log/[messages|syslog] - nothing found. /var/log/errors however tells me that: Jul 4 20:28:30 localhost kernel: [19080.671415] INFO: task php5-fpm:27728 blocked for more than 120 seconds. Jul 4 20:28:30 localhost kernel: [19080.671419] "echo 0 /proc/sys/kernel/hung_task_timeout_secs" disables this message. multiple times. now that message does tell me that php5-fpm task was blocked or did block ? - but not if that is the cause or just one of the results of that "freeze". Anyone? to cut the long story short, i dont know where even to start analyzing. So if you can give me any advice by looking at following specs and configs, or ask me to provide more information, i`d be glad. Specs: 6 Core AMD Phenom(tm) II X6 1055T Processor * 16 Gigabyte Ram 2x 1.5 TB Seagate ST1500DL003-9VT16L via SATA 3 via SoftwareRaid (i suppose) Services: (due to service --status-all, those with [ + ]) nginx Webserver 1.0.14 mySQL 5.1.63 Server Ruby on Rails 2.3.11 ( passenger-nginx-module ) php5-fpm 5.3.6-13ubuntu3.7 SSH ido2db Further services: default crontab + nightly backup. syslog-ng Website consists of 2 subdomains, forum. and www. where forum is a phpBB3.x PHP-Board, and www a Ruby on Rails 2.3.11 application (portal). Mini-Note: sometimes i notice that the forum is pretty slow, in contrast to the always-fast (except for this "freeze") portal. Both share the same Database, but the portal is using it read-only. The Webserver is nginx, using phusion passenger module to communicate with the ruby-application. Also, for the forum it communicates with php5-fpm via socket: relevant nginx configuration parts ( with comments/questions starting by ; ) ; in case of freeze due to too high Filesystem activity, maybe adding a limit? #worker_rlimit_nofile 50000; user www-data; ; 6 cores, so i read 6 fits. maybe already wrong? worker_processes 6; pid /var/run/nginx.pid; events { worker_connections 1024; } http { passenger_root /var/lib/gems/1.8/gems/passenger-3.0.11; passenger_ruby /usr/bin/ruby1.8; ; the forum once featured a chat, which was working w/o websockets. ; so it was a hell of pull requests (deactivated now, freeze still happening) keepalive_timeout 65; keepalive_requests 50; gzip on; server { listen 80; server_name www.domain.tld; root /var/www/domain/rails/public; passenger_enabled on; } server { listen 80; server_name forum.domain.tld; location / { root /var/www/domain/forum; index index.php; } ; satic stuff to be handled by nginx location ~* ^/style/.+.(jpg|jpeg|gif|css|png|js|ico|xml)$ { access_log off; expires 30d; root /var/www/domain/forum/; } ; now the php magic, note the "backend"-fcgi_pass location ~ .php$ { fastcgi_split_path_info ^(.+\.php)(.*)$; fastcgi_pass backend; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME /var/www/domain/forum$fastcgi_script_name; include fastcgi_params; fastcgi_param QUERY_STRING $query_string; fastcgi_param REQUEST_METHOD $request_method; fastcgi_param CONTENT_TYPE $content_type; fastcgi_param CONTENT_LENGTH $content_length; fastcgi_intercept_errors on; fastcgi_ignore_client_abort off; fastcgi_connect_timeout 60; fastcgi_send_timeout 180; fastcgi_read_timeout 180; fastcgi_buffer_size 128k; fastcgi_buffers 256 16k; fastcgi_busy_buffers_size 256k; fastcgi_temp_file_write_size 256k; fastcgi_max_temp_file_size 0; } location ~ /\.ht { deny all; } } ;the php5-fpm socket. i read that /dev/shm/ whould be the fastes place for this. bad idea in general? upstream backend { server unix:/dev/shm/phpfpm; } ... } php5-fpm settings (i changed this values due to php5-fpm error log messages higher and higher.. (freeze-problem was there before as well)* listen = /dev/shm/phpfpm user = www-data group = www-data pm = dynamic ; holy, 4000! well, shinking this value to earth-level gave me ; 100s of 502 bad gateway commands. this values were quite stable. ; since there are only max 520 users online i dont get it, why i would need ; as many children as configured here. due to keep-alive maybe? ; asking questions is easier for me since restarting server will make ; my community-members angry ;) pm.max_children = 4000 pm.start_servers = 100 pm.min_spare_servers = 50 pm.max_spare_servers = 150 pm.max_requests = 10 pm.status_path = /status ping.path = /ping ping.response = pong slowlog = log/$pool.log.slow ;should i use rlimit? ;rlimit_files = 1024 chdir = / mysql/my.cnf [client] port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking bind-address = 127.0.0.1 key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 myisam-recover = BACKUP ; high number, but less gives some phpBB errors. max_connections = 450 table_cache = 512 ; i read twice the cpu cores, bad? thread_concurrency = 12 join_buffer_size = 2084K concurrent_insert = 3 query_cache_limit = 64M query_cache_size = 512M query_cache_type = 1 log_error = /var/log/mysql/error.log log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 2 expire_logs_days = 10 max_binlog_size = 100M low_priority_updates=1 [mysqldump] quick quote-names max_allowed_packet = 16M [isamchk] key_buffer = 16M !includedir /etc/mysql/conf.d/ I used smartctl already, hdds seem to be fine. /proc/mdstatus quotes: Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md3 : active raid1 sda3[1] 1459264192 blocks [2/1] [_U] md1 : active raid1 sda1[0] 3911680 blocks [2/1] [U_] unused devices: ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127727 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 127727 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited I quote some questions in my configuration files, these are not (intentional) directly problem-related, but would be nice for me to know wether they are indeed questionable or done right. One additional Fact: my MYSQL-database is at 12GB size. i dont know if that does matter, but mytop sometimes shows me 4-5 seconds long insert queries, some are 20-30 seconds long. Its just a feeling that i am unable to prove (because i dont know how), but when i disable the database, the freeze seems not to happen. Example: i created a dummy rails application to see the development log. the app made some sql-queries, reads and inserts. the log quite often was like: DbTest Load (0.3ms) SELECT * FROM `db_test` WHERE (`db_test`.`id` = 31722) LIMIT 1 SQL (0.1ms) BEGIN DbTest Update (0.3ms) UPDATE `db_test` SET `updated_at` = '2012-07-04 23:32:34' WHERE `id` = 31722 - now the log stands still for 5-60 seconds. SQL (49.1ms) COMMIT - SQL-Update time in the log does not include freeze time Rendering test/index Completed in 96ms (View: 16, DB: 59) | 200 OK [http://localhost:9000/test] Bad part is: this mini-freeze here only happens from time to time as well. note: meanwhile i cannot even upload files via scp. I currently feel like running form bad to worse and back by googling for my server-problem due to immense lack of knowledge regarding server configurations. It still makes me wonder, why those problems even appear, since 250 users a time is not such a high amount, right? So my questions: whats wrong and how to fix? ;) or: what information can i provide to make the situation more clear? can you point at some critical bad configuration-line which i should consider to catch up in the documentation? are there any tools i can run to see some possible bottlenecks? any further advice? (next to: "pay someone who knows what he does" - its a private project, server costs enough already. :)) Thanks for your time and help. Best Regards, Daniel P.S.: i renamed the configfiles to domain.tld since i dont want to have any % more load to the server until its fixed. might be a exaggeratedly thought.. P.P.S: if i asked a complete duplicate question, sorry. my search results seemed to be quite specific in their own way.

Read the article

Did I lose my RAID again?

- by BarsMonster

Hi! A little history: 2 years ago I was really excited to find out that mdadm is so powerful that it even can reshape arrays, so you can start with a smaller array and then grow it as you need. I've bought 3x1Tb drives and made a RAID-5. It was fine for a year. Then I bought 2x more, and tried to reshape to RAID-6 out of 5 drives, and due to some mess with superblock versions, lost all content. Had to rebuild it from scratch, but 2Tb of data were gone. Yesterday I bought 2 more drives, and this time I had everything: properly built array, UPS. I've disabled write intent map, added 2 new drives as spares and run a command to grow array to 7-disks. It started working, but speed was ridiculously slow, ~100kb/sec. After processing first 37Mb at such an amazing speed, one of old HDDs fails. I properly shutdown the PC and disconnected the failed drive. After bootup it appeared that it recreated the intent map as it was still in mdadm config, so I removed it from config and rebooted again. Now all I see is that all mdadm processes deadlock, and don't do anything. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1937 root 20 0 12992 608 444 D 0 0.1 0:00.00 mdadm 2283 root 20 0 12992 852 704 D 0 0.1 0:00.01 mdadm 2287 root 20 0 0 0 0 D 0 0.0 0:00.01 md0_reshape 2288 root 18 -2 12992 820 676 D 0 0.1 0:00.01 mdadm And all I see in mdstat is: $ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid6 sdb1[1] sdg1[4] sdf1[7] sde1[6] sdd1[0] sdc1[5] 2929683456 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [7/6] [UU_UUUU] [>....................] reshape = 0.0% (37888/976561152) finish=567604147.2min speed=0K/sec I've already tried mdadm 2.6.7, 3.1.4 and 3.2 - nothing helps. Did I lose my data again? Any suggestions on how can I make this work? OS is Ubuntu Server 10.04.2. PS. Needless to say, the data is inaccessible - I cannot mount /dev/md0 to save the most valuable data. You can see my disappointment - the very specific thing I was excited about failed twice taking 5Tb of my data with it. Update: It appears there is some nice info in kern.log: 21:38:48 ...: [ 166.522055] raid5: reshape will continue 21:38:48 ...: [ 166.522085] raid5: device sdb1 operational as raid disk 1 21:38:48 ...: [ 166.522091] raid5: device sdg1 operational as raid disk 4 21:38:48 ...: [ 166.522097] raid5: device sdf1 operational as raid disk 5 21:38:48 ...: [ 166.522102] raid5: device sde1 operational as raid disk 6 21:38:48 ...: [ 166.522107] raid5: device sdd1 operational as raid disk 0 21:38:48 ...: [ 166.522111] raid5: device sdc1 operational as raid disk 3 21:38:48 ...: [ 166.523942] raid5: allocated 7438kB for md0 21:38:48 ...: [ 166.524041] 1: w=1 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524050] 4: w=2 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524056] 5: w=3 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524062] 6: w=4 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524068] 0: w=5 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524073] 3: w=6 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0 21:38:48 ...: [ 166.524079] raid5: raid level 6 set md0 active with 6 out of 7 devices, algorithm 2 21:38:48 ...: [ 166.524519] RAID5 conf printout: 21:38:48 ...: [ 166.524523] --- rd:7 wd:6 21:38:48 ...: [ 166.524528] disk 0, o:1, dev:sdd1 21:38:48 ...: [ 166.524532] disk 1, o:1, dev:sdb1 21:38:48 ...: [ 166.524537] disk 3, o:1, dev:sdc1 21:38:48 ...: [ 166.524541] disk 4, o:1, dev:sdg1 21:38:48 ...: [ 166.524545] disk 5, o:1, dev:sdf1 21:38:48 ...: [ 166.524550] disk 6, o:1, dev:sde1 21:38:48 ...: [ 166.524553] ...ok start reshape thread 21:38:48 ...: [ 166.524727] md0: detected capacity change from 0 to 2999995858944 21:38:48 ...: [ 166.524735] md: reshape of RAID array md0 21:38:48 ...: [ 166.524740] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. 21:38:48 ...: [ 166.524745] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape. 21:38:48 ...: [ 166.524756] md: using 128k window, over a total of 976561152 blocks. 21:39:05 ...: [ 166.525013] md0: 21:42:04 ...: [ 362.520063] INFO: task mdadm:1937 blocked for more than 120 seconds. 21:42:04 ...: [ 362.520068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:42:04 ...: [ 362.520073] mdadm D 00000000ffffffff 0 1937 1 0x00000000 21:42:04 ...: [ 362.520083] ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0 21:42:04 ...: [ 362.520092] ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0 21:42:04 ...: [ 362.520100] 0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198 21:42:04 ...: [ 362.520107] Call Trace: 21:42:04 ...: [ 362.520133] [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456] 21:42:04 ...: [ 362.520148] [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20 21:42:04 ...: [ 362.520159] [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456] 21:42:04 ...: [ 362.520169] [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456] 21:42:04 ...: [ 362.520179] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:42:04 ...: [ 362.520188] [<ffffffff81414df0>] md_make_request+0xc0/0x130 21:42:04 ...: [ 362.520194] [<ffffffff81414df0>] ? md_make_request+0xc0/0x130 21:42:04 ...: [ 362.520205] [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0 21:42:04 ...: [ 362.520214] [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20 21:42:04 ...: [ 362.520222] [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60 21:42:04 ...: [ 362.520230] [<ffffffff8129fc80>] submit_bio+0x80/0x110 21:42:04 ...: [ 362.520236] [<ffffffff8116c849>] submit_bh+0xf9/0x140 21:42:04 ...: [ 362.520244] [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0 21:42:04 ...: [ 362.520251] [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70 21:42:04 ...: [ 362.520258] [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40 21:42:04 ...: [ 362.520265] [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160 21:42:04 ...: [ 362.520272] [<ffffffff81173d78>] blkdev_readpage+0x18/0x20 21:42:04 ...: [ 362.520279] [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0 21:42:04 ...: [ 362.520285] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:42:04 ...: [ 362.520290] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:42:04 ...: [ 362.520297] [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120 21:42:04 ...: [ 362.520304] [<ffffffff810f5909>] read_cache_page_async+0x19/0x20 21:42:04 ...: [ 362.520310] [<ffffffff810f591e>] read_cache_page+0xe/0x20 21:42:04 ...: [ 362.520317] [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0 21:42:04 ...: [ 362.520324] [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460 21:42:04 ...: [ 362.520331] [<ffffffff811a7938>] check_partition+0x138/0x190 21:42:04 ...: [ 362.520338] [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0 21:42:04 ...: [ 362.520344] [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0 21:42:04 ...: [ 362.520350] [<ffffffff81174650>] ? blkdev_open+0x0/0xc0 21:42:04 ...: [ 362.520356] [<ffffffff81174640>] blkdev_get+0x10/0x20 21:42:04 ...: [ 362.520362] [<ffffffff811746c1>] blkdev_open+0x71/0xc0 21:42:04 ...: [ 362.520369] [<ffffffff811419f3>] __dentry_open+0x113/0x370 21:42:04 ...: [ 362.520377] [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30 21:42:04 ...: [ 362.520385] [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0 21:42:04 ...: [ 362.520391] [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70 21:42:04 ...: [ 362.520398] [<ffffffff8115207a>] do_filp_open+0x2da/0xba0 21:42:04 ...: [ 362.520406] [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310 21:42:04 ...: [ 362.520414] [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150 21:42:04 ...: [ 362.520421] [<ffffffff81141769>] do_sys_open+0x69/0x170 21:42:04 ...: [ 362.520428] [<ffffffff811418b0>] sys_open+0x20/0x30 21:42:04 ...: [ 362.520437] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:42:04 ...: [ 362.520446] INFO: task mdadm:2283 blocked for more than 120 seconds. 21:42:04 ...: [ 362.520450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:42:04 ...: [ 362.520454] mdadm D 0000000000000000 0 2283 2212 0x00000000 21:42:04 ...: [ 362.520462] ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0 21:42:04 ...: [ 362.520470] ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0 21:42:04 ...: [ 362.520478] 0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78 21:42:04 ...: [ 362.520485] Call Trace: 21:42:04 ...: [ 362.520495] [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180 21:42:04 ...: [ 362.520502] [<ffffffff8154397b>] mutex_lock+0x2b/0x50 21:42:04 ...: [ 362.520508] [<ffffffff8117404d>] __blkdev_put+0x3d/0x190 21:42:04 ...: [ 362.520514] [<ffffffff811741b0>] blkdev_put+0x10/0x20 21:42:04 ...: [ 362.520520] [<ffffffff811741f3>] blkdev_close+0x33/0x60 21:42:04 ...: [ 362.520527] [<ffffffff81145375>] __fput+0xf5/0x210 21:42:04 ...: [ 362.520534] [<ffffffff811454b5>] fput+0x25/0x30 21:42:04 ...: [ 362.520540] [<ffffffff811415ad>] filp_close+0x5d/0x90 21:42:04 ...: [ 362.520546] [<ffffffff81141697>] sys_close+0xb7/0x120 21:42:04 ...: [ 362.520553] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:42:04 ...: [ 362.520559] INFO: task md0_reshape:2287 blocked for more than 120 seconds. 21:42:04 ...: [ 362.520563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:42:04 ...: [ 362.520567] md0_reshape D ffff88003aee96f0 0 2287 2 0x00000000 21:42:04 ...: [ 362.520575] ffff88003cf05a70 0000000000000046 0000000000015bc0 0000000000015bc0 21:42:04 ...: [ 362.520582] ffff88003aee9aa8 ffff88003cf05fd8 0000000000015bc0 ffff88003aee96f0 21:42:04 ...: [ 362.520590] 0000000000015bc0 ffff88003cf05fd8 0000000000015bc0 ffff88003aee9aa8 21:42:04 ...: [ 362.520597] Call Trace: 21:42:04 ...: [ 362.520608] [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456] 21:42:04 ...: [ 362.520616] [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20 21:42:04 ...: [ 362.520626] [<ffffffffa0226f80>] reshape_request+0x4c0/0x9a0 [raid456] 21:42:04 ...: [ 362.520634] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:42:04 ...: [ 362.520644] [<ffffffffa022777a>] sync_request+0x31a/0x3a0 [raid456] 21:42:04 ...: [ 362.520651] [<ffffffff81052713>] ? __wake_up+0x53/0x70 21:42:04 ...: [ 362.520658] [<ffffffff814156b1>] md_do_sync+0x621/0xbb0 21:42:04 ...: [ 362.520668] [<ffffffff810387b9>] ? default_spin_lock_flags+0x9/0x10 21:42:04 ...: [ 362.520675] [<ffffffff8141640c>] md_thread+0x5c/0x130 21:42:04 ...: [ 362.520681] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:42:04 ...: [ 362.520688] [<ffffffff814163b0>] ? md_thread+0x0/0x130 21:42:04 ...: [ 362.520694] [<ffffffff81084416>] kthread+0x96/0xa0 21:42:04 ...: [ 362.520701] [<ffffffff810131ea>] child_rip+0xa/0x20 21:42:04 ...: [ 362.520707] [<ffffffff81084380>] ? kthread+0x0/0xa0 21:42:04 ...: [ 362.520713] [<ffffffff810131e0>] ? child_rip+0x0/0x20 21:42:04 ...: [ 362.520718] INFO: task mdadm:2288 blocked for more than 120 seconds. 21:42:04 ...: [ 362.520721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:42:04 ...: [ 362.520725] mdadm D 0000000000000000 0 2288 1 0x00000000 21:42:04 ...: [ 362.520733] ffff88002cca9c18 0000000000000086 0000000000015bc0 0000000000015bc0 21:42:04 ...: [ 362.520741] ffff88003aee83b8 ffff88002cca9fd8 0000000000015bc0 ffff88003aee8000 21:42:04 ...: [ 362.520748] 0000000000015bc0 ffff88002cca9fd8 0000000000015bc0 ffff88003aee83b8 21:42:04 ...: [ 362.520755] Call Trace: 21:42:04 ...: [ 362.520763] [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180 21:42:04 ...: [ 362.520771] [<ffffffff812a6d50>] ? exact_match+0x0/0x10 21:42:04 ...: [ 362.520777] [<ffffffff8154397b>] mutex_lock+0x2b/0x50 21:42:04 ...: [ 362.520783] [<ffffffff811742c8>] __blkdev_get+0x68/0x3d0 21:42:04 ...: [ 362.520790] [<ffffffff81174650>] ? blkdev_open+0x0/0xc0 21:42:04 ...: [ 362.520795] [<ffffffff81174640>] blkdev_get+0x10/0x20 21:42:04 ...: [ 362.520801] [<ffffffff811746c1>] blkdev_open+0x71/0xc0 21:42:04 ...: [ 362.520808] [<ffffffff811419f3>] __dentry_open+0x113/0x370 21:42:04 ...: [ 362.520815] [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30 21:42:04 ...: [ 362.520821] [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0 21:42:04 ...: [ 362.520828] [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70 21:42:04 ...: [ 362.520834] [<ffffffff8115207a>] do_filp_open+0x2da/0xba0 21:42:04 ...: [ 362.520841] [<ffffffff810ff0e1>] ? lru_cache_add_lru+0x21/0x40 21:42:04 ...: [ 362.520848] [<ffffffff8111109c>] ? do_anonymous_page+0x11c/0x330 21:42:04 ...: [ 362.520855] [<ffffffff81115d5f>] ? handle_mm_fault+0x31f/0x3c0 21:42:04 ...: [ 362.520862] [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150 21:42:04 ...: [ 362.520868] [<ffffffff81141769>] do_sys_open+0x69/0x170 21:42:04 ...: [ 362.520874] [<ffffffff811418b0>] sys_open+0x20/0x30 21:42:04 ...: [ 362.520882] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:44:04 ...: [ 482.520065] INFO: task mdadm:1937 blocked for more than 120 seconds. 21:44:04 ...: [ 482.520071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:44:04 ...: [ 482.520077] mdadm D 00000000ffffffff 0 1937 1 0x00000000 21:44:04 ...: [ 482.520087] ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0 21:44:04 ...: [ 482.520096] ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0 21:44:04 ...: [ 482.520104] 0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198 21:44:04 ...: [ 482.520112] Call Trace: 21:44:04 ...: [ 482.520139] [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456] 21:44:04 ...: [ 482.520154] [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20 21:44:04 ...: [ 482.520165] [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456] 21:44:04 ...: [ 482.520175] [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456] 21:44:04 ...: [ 482.520185] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:44:04 ...: [ 482.520194] [<ffffffff81414df0>] md_make_request+0xc0/0x130 21:44:04 ...: [ 482.520201] [<ffffffff81414df0>] ? md_make_request+0xc0/0x130 21:44:04 ...: [ 482.520212] [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0 21:44:04 ...: [ 482.520221] [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20 21:44:04 ...: [ 482.520229] [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60 21:44:04 ...: [ 482.520237] [<ffffffff8129fc80>] submit_bio+0x80/0x110 21:44:04 ...: [ 482.520244] [<ffffffff8116c849>] submit_bh+0xf9/0x140 21:44:04 ...: [ 482.520252] [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0 21:44:04 ...: [ 482.520258] [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70 21:44:04 ...: [ 482.520266] [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40 21:44:04 ...: [ 482.520273] [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160 21:44:04 ...: [ 482.520280] [<ffffffff81173d78>] blkdev_readpage+0x18/0x20 21:44:04 ...: [ 482.520286] [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0 21:44:04 ...: [ 482.520293] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:44:04 ...: [ 482.520299] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:44:04 ...: [ 482.520306] [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120 21:44:04 ...: [ 482.520313] [<ffffffff810f5909>] read_cache_page_async+0x19/0x20 21:44:04 ...: [ 482.520319] [<ffffffff810f591e>] read_cache_page+0xe/0x20 21:44:04 ...: [ 482.520327] [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0 21:44:04 ...: [ 482.520334] [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460 21:44:04 ...: [ 482.520341] [<ffffffff811a7938>] check_partition+0x138/0x190 21:44:04 ...: [ 482.520348] [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0 21:44:04 ...: [ 482.520355] [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0 21:44:04 ...: [ 482.520361] [<ffffffff81174650>] ? blkdev_open+0x0/0xc0 21:44:04 ...: [ 482.520367] [<ffffffff81174640>] blkdev_get+0x10/0x20 21:44:04 ...: [ 482.520373] [<ffffffff811746c1>] blkdev_open+0x71/0xc0 21:44:04 ...: [ 482.520380] [<ffffffff811419f3>] __dentry_open+0x113/0x370 21:44:04 ...: [ 482.520388] [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30 21:44:04 ...: [ 482.520396] [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0 21:44:04 ...: [ 482.520403] [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70 21:44:04 ...: [ 482.520410] [<ffffffff8115207a>] do_filp_open+0x2da/0xba0 21:44:04 ...: [ 482.520417] [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310 21:44:04 ...: [ 482.520426] [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150 21:44:04 ...: [ 482.520432] [<ffffffff81141769>] do_sys_open+0x69/0x170 21:44:04 ...: [ 482.520438] [<ffffffff811418b0>] sys_open+0x20/0x30 21:44:04 ...: [ 482.520447] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:44:04 ...: [ 482.520458] INFO: task mdadm:2283 blocked for more than 120 seconds. 21:44:04 ...: [ 482.520462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:44:04 ...: [ 482.520467] mdadm D 0000000000000000 0 2283 2212 0x00000000 21:44:04 ...: [ 482.520475] ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0 21:44:04 ...: [ 482.520483] ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0 21:44:04 ...: [ 482.520490] 0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78 21:44:04 ...: [ 482.520498] Call Trace: 21:44:04 ...: [ 482.520508] [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180 21:44:04 ...: [ 482.520515] [<ffffffff8154397b>] mutex_lock+0x2b/0x50 21:44:04 ...: [ 482.520521] [<ffffffff8117404d>] __blkdev_put+0x3d/0x190 21:44:04 ...: [ 482.520527] [<ffffffff811741b0>] blkdev_put+0x10/0x20 21:44:04 ...: [ 482.520533] [<ffffffff811741f3>] blkdev_close+0x33/0x60 21:44:04 ...: [ 482.520541] [<ffffffff81145375>] __fput+0xf5/0x210 21:44:04 ...: [ 482.520547] [<ffffffff811454b5>] fput+0x25/0x30 21:44:04 ...: [ 482.520554] [<ffffffff811415ad>] filp_close+0x5d/0x90 21:44:04 ...: [ 482.520560] [<ffffffff81141697>] sys_close+0xb7/0x120 21:44:04 ...: [ 482.520568] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:44:04 ...: [ 482.520574] INFO: task md0_reshape:2287 blocked for more than 120 seconds. 21:44:04 ...: [ 482.520578] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:44:04 ...: [ 482.520582] md0_reshape D ffff88003aee96f0 0 2287 2 0x00000000 21:44:04 ...: [ 482.520590] ffff88003cf05a70 0000000000000046 0000000000015bc0 0000000000015bc0 21:44:04 ...: [ 482.520597] ffff88003aee9aa8 ffff88003cf05fd8 0000000000015bc0 ffff88003aee96f0 21:44:04 ...: [ 482.520605] 0000000000015bc0 ffff88003cf05fd8 0000000000015bc0 ffff88003aee9aa8 21:44:04 ...: [ 482.520612] Call Trace: 21:44:04 ...: [ 482.520623] [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456] 21:44:04 ...: [ 482.520633] [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20 21:44:04 ...: [ 482.520643] [<ffffffffa0226f80>] reshape_request+0x4c0/0x9a0 [raid456] 21:44:04 ...: [ 482.520651] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:44:04 ...: [ 482.520661] [<ffffffffa022777a>] sync_request+0x31a/0x3a0 [raid456] 21:44:04 ...: [ 482.520668] [<ffffffff81052713>] ? __wake_up+0x53/0x70 21:44:04 ...: [ 482.520675] [<ffffffff814156b1>] md_do_sync+0x621/0xbb0 21:44:04 ...: [ 482.520685] [<ffffffff810387b9>] ? default_spin_lock_flags+0x9/0x10 21:44:04 ...: [ 482.520692] [<ffffffff8141640c>] md_thread+0x5c/0x130 21:44:04 ...: [ 482.520699] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:44:04 ...: [ 482.520705] [<ffffffff814163b0>] ? md_thread+0x0/0x130 21:44:04 ...: [ 482.520711] [<ffffffff81084416>] kthread+0x96/0xa0 21:44:04 ...: [ 482.520718] [<ffffffff810131ea>] child_rip+0xa/0x20 21:44:04 ...: [ 482.520725] [<ffffffff81084380>] ? kthread+0x0/0xa0 21:44:04 ...: [ 482.520730] [<ffffffff810131e0>] ? child_rip+0x0/0x20 21:44:04 ...: [ 482.520735] INFO: task mdadm:2288 blocked for more than 120 seconds. 21:44:04 ...: [ 482.520739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:44:04 ...: [ 482.520743] mdadm D 0000000000000000 0 2288 1 0x00000000 21:44:04 ...: [ 482.520751] ffff88002cca9c18 0000000000000086 0000000000015bc0 0000000000015bc0 21:44:04 ...: [ 482.520759] ffff88003aee83b8 ffff88002cca9fd8 0000000000015bc0 ffff88003aee8000 21:44:04 ...: [ 482.520767] 0000000000015bc0 ffff88002cca9fd8 0000000000015bc0 ffff88003aee83b8 21:44:04 ...: [ 482.520774] Call Trace: 21:44:04 ...: [ 482.520782] [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180 21:44:04 ...: [ 482.520790] [<ffffffff812a6d50>] ? exact_match+0x0/0x10 21:44:04 ...: [ 482.520797] [<ffffffff8154397b>] mutex_lock+0x2b/0x50 21:44:04 ...: [ 482.520804] [<ffffffff811742c8>] __blkdev_get+0x68/0x3d0 21:44:04 ...: [ 482.520810] [<ffffffff81174650>] ? blkdev_open+0x0/0xc0 21:44:04 ...: [ 482.520816] [<ffffffff81174640>] blkdev_get+0x10/0x20 21:44:04 ...: [ 482.520822] [<ffffffff811746c1>] blkdev_open+0x71/0xc0 21:44:04 ...: [ 482.520829] [<ffffffff811419f3>] __dentry_open+0x113/0x370 21:44:04 ...: [ 482.520837] [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30 21:44:04 ...: [ 482.520843] [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0 21:44:04 ...: [ 482.520850] [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70 21:44:04 ...: [ 482.520857] [<ffffffff8115207a>] do_filp_open+0x2da/0xba0 21:44:04 ...: [ 482.520864] [<ffffffff810ff0e1>] ? lru_cache_add_lru+0x21/0x40 21:44:04 ...: [ 482.520871] [<ffffffff8111109c>] ? do_anonymous_page+0x11c/0x330 21:44:04 ...: [ 482.520878] [<ffffffff81115d5f>] ? handle_mm_fault+0x31f/0x3c0 21:44:04 ...: [ 482.520885] [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150 21:44:04 ...: [ 482.520891] [<ffffffff81141769>] do_sys_open+0x69/0x170 21:44:04 ...: [ 482.520897] [<ffffffff811418b0>] sys_open+0x20/0x30 21:44:04 ...: [ 482.520905] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:46:04 ...: [ 602.520053] INFO: task mdadm:1937 blocked for more than 120 seconds. 21:46:04 ...: [ 602.520059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:46:04 ...: [ 602.520065] mdadm D 00000000ffffffff 0 1937 1 0x00000000 21:46:04 ...: [ 602.520075] ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0 21:46:04 ...: [ 602.520084] ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0 21:46:04 ...: [ 602.520091] 0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198 21:46:04 ...: [ 602.520099] Call Trace: 21:46:04 ...: [ 602.520127] [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456] 21:46:04 ...: [ 602.520142] [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20 21:46:04 ...: [ 602.520153] [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456] 21:46:04 ...: [ 602.520162] [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456] 21:46:04 ...: [ 602.520171] [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40 21:46:04 ...: [ 602.520180] [<ffffffff81414df0>] md_make_request+0xc0/0x130 21:46:04 ...: [ 602.520187] [<ffffffff81414df0>] ? md_make_request+0xc0/0x130 21:46:04 ...: [ 602.520197] [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0 21:46:04 ...: [ 602.520206] [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20 21:46:04 ...: [ 602.520215] [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60 21:46:04 ...: [ 602.520222] [<ffffffff8129fc80>] submit_bio+0x80/0x110 21:46:04 ...: [ 602.520229] [<ffffffff8116c849>] submit_bh+0xf9/0x140 21:46:04 ...: [ 602.520237] [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0 21:46:04 ...: [ 602.520244] [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70 21:46:04 ...: [ 602.520252] [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40 21:46:04 ...: [ 602.520259] [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160 21:46:04 ...: [ 602.520266] [<ffffffff81173d78>] blkdev_readpage+0x18/0x20 21:46:04 ...: [ 602.520273] [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0 21:46:04 ...: [ 602.520279] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:46:04 ...: [ 602.520285] [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20 21:46:04 ...: [ 602.520292] [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120 21:46:04 ...: [ 602.520300] [<ffffffff810f5909>] read_cache_page_async+0x19/0x20 21:46:04 ...: [ 602.520306] [<ffffffff810f591e>] read_cache_page+0xe/0x20 21:46:04 ...: [ 602.520314] [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0 21:46:04 ...: [ 602.520321] [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460 21:46:04 ...: [ 602.520328] [<ffffffff811a7938>] check_partition+0x138/0x190 21:46:04 ...: [ 602.520335] [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0 21:46:04 ...: [ 602.520342] [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0 21:46:04 ...: [ 602.520348] [<ffffffff81174650>] ? blkdev_open+0x0/0xc0 21:46:04 ...: [ 602.520354] [<ffffffff81174640>] blkdev_get+0x10/0x20 21:46:04 ...: [ 602.520359] [<ffffffff811746c1>] blkdev_open+0x71/0xc0 21:46:04 ...: [ 602.520367] [<ffffffff811419f3>] __dentry_open+0x113/0x370 21:46:04 ...: [ 602.520375] [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30 21:46:04 ...: [ 602.520383] [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0 21:46:04 ...: [ 602.520390] [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70 21:46:04 ...: [ 602.520397] [<ffffffff8115207a>] do_filp_open+0x2da/0xba0 21:46:04 ...: [ 602.520404] [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310 21:46:04 ...: [ 602.520413] [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150 21:46:04 ...: [ 602.520419] [<ffffffff81141769>] do_sys_open+0x69/0x170 21:46:04 ...: [ 602.520425] [<ffffffff811418b0>] sys_open+0x20/0x30 21:46:04 ...: [ 602.520434] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b 21:46:04 ...: [ 602.520443] INFO: task mdadm:2283 blocked for more than 120 seconds. 21:46:04 ...: [ 602.520447] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:46:04 ...: [ 602.520451] mdadm D 0000000000000000 0 2283 2212 0x00000000 21:46:04 ...: [ 602.520460] ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0 21:46:04 ...: [ 602.520468] ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0 21:46:04 ...: [ 602.520475] 0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78 21:46:04 ...: [ 602.520483] Call Trace: 21:46:04 ...: [ 602.520492] [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180 21:46:04 ...: [ 602.520500] [<ffffffff8154397b>] mutex_lock+0x2b/0x50 21:46:04 ...: [ 602.520506] [<ffffffff8117404d>] __blkdev_put+0x3d/0x190 21:46:04 ...: [ 602.520512] [<ffffffff811741b0>] blkdev_put+0x10/0x20 21:46:04 ...: [ 602.520518] [<ffffffff811741f3>] blkdev_close+0x33/0x60 21:46:04 ...: [ 602.520526] [<ffffffff81145375>] __fput+0xf5/0x210 21:46:04 ...: [ 602.520533] [<ffffffff811454b5>] fput+0x25/0x30 21:46:04 ...: [ 602.520539] [<ffffffff811415ad>] filp_close+0x5d/0x90 21:46:04 ...: [ 602.520545] [<ffffffff81141697>] sys_close+0xb7/0x120 21:46:04 ...: [ 602.520552] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b

Read the article

LSI 9285-8e and Supermicro SC837E26-RJBOD1 duplicate enclosure ID and slot numbers

- by Andy Shinn

I am working with 2 x Supermicro SC837E26-RJBOD1 chassis connected to a single LSI 9285-8e card in a Supermicro 1U host. There are 28 drives in each chassis for a total of 56 drives in 28 RAID1 mirrors. The problem I am running in to is that there are duplicate slots for the 2 chassis (the slots list twice and only go from 0 to 27). All the drives also show the same enclosure ID (ID 36). However, MegaCLI -encinfo lists the 2 enclosures correctly (ID 36 and ID 65). My question is, why would this happen? Is there an option I am missing to use 2 enclosures effectively? This is blocking me rebuilding a drive that failed in slot 11 since I can only specify enclosure and slot as parameters to replace a drive. When I do this, it picks the wrong slot 11 (device ID 46 instead of device ID 19). Adapter #1 is the LSI 9285-8e, adapter #0 (which I removed due to space limitations) is the onboard LSI. Adapter information: Adapter #1 ============================================================================== Versions ================ Product Name : LSI MegaRAID SAS 9285-8e Serial No : SV12704804 FW Package Build: 23.1.1-0004 Mfg. Data ================ Mfg. Date : 06/30/11 Rework Date : 00/00/00 Revision No : 00A Battery FRU : N/A Image Versions in Flash: ================ BIOS Version : 5.25.00_4.11.05.00_0x05040000 WebBIOS Version : 6.1-20-e_20-Rel Preboot CLI Version: 05.01-04:#%00001 FW Version : 3.140.15-1320 NVDATA Version : 2.1106.03-0051 Boot Block Version : 2.04.00.00-0001 BOOT Version : 06.253.57.219 Pending Images in Flash ================ None PCI Info ================ Vendor Id : 1000 Device Id : 005b SubVendorId : 1000 SubDeviceId : 9285 Host Interface : PCIE ChipRevision : B0 Number of Frontend Port: 0 Device Interface : PCIE Number of Backend Port: 8 Port : Address 0 5003048000ee8e7f 1 5003048000ee8a7f 2 0000000000000000 3 0000000000000000 4 0000000000000000 5 0000000000000000 6 0000000000000000 7 0000000000000000 HW Configuration ================ SAS Address : 500605b0038f9210 BBU : Present Alarm : Present NVRAM : Present Serial Debugger : Present Memory : Present Flash : Present Memory Size : 1024MB TPM : Absent On board Expander: Absent Upgrade Key : Absent Temperature sensor for ROC : Present Temperature sensor for controller : Absent ROC temperature : 70 degree Celcius Settings ================ Current Time : 18:24:36 3/13, 2012 Predictive Fail Poll Interval : 300sec Interrupt Throttle Active Count : 16 Interrupt Throttle Completion : 50us Rebuild Rate : 30% PR Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruction Rate : 30% Cache Flush Interval : 4s Max Drives to Spinup at One Time : 2 Delay Among Spinup Groups : 12s Physical Drive Coercion Mode : Disabled Cluster Mode : Disabled Alarm : Enabled Auto Rebuild : Enabled Battery Warning : Enabled Ecc Bucket Size : 15 Ecc Bucket Leak Rate : 1440 Minutes Restore HotSpare on Insertion : Disabled Expose Enclosure Devices : Enabled Maintain PD Fail History : Enabled Host Request Reordering : Enabled Auto Detect BackPlane Enabled : SGPIO/i2c SEP Load Balance Mode : Auto Use FDE Only : No Security Key Assigned : No Security Key Failed : No Security Key Not Backedup : No Default LD PowerSave Policy : Controller Defined Maximum number of direct attached drives to spin up in 1 min : 10 Any Offline VD Cache Preserved : No Allow Boot with Preserved Cache : No Disable Online Controller Reset : No PFK in NVRAM : No Use disk activity for locate : No Capabilities ================ RAID Level Supported : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span Supported Drives : SAS, SATA Allowed Mixing: Mix in Enclosure Allowed Mix of SAS/SATA of HDD type in VD Allowed Status ================ ECC Bucket Count : 0 Limitations ================ Max Arms Per VD : 32 Max Spans Per VD : 8 Max Arrays : 128 Max Number of VDs : 64 Max Parallel Commands : 1008 Max SGE Count : 60 Max Data Transfer Size : 8192 sectors Max Strips PerIO : 42 Max LD per array : 16 Min Strip Size : 8 KB Max Strip Size : 1.0 MB Max Configurable CacheCade Size: 0 GB Current Size of CacheCade : 0 GB Current Size of FW Cache : 887 MB Device Present ================ Virtual Drives : 28 Degraded : 0 Offline : 0 Physical Devices : 59 Disks : 56 Critical Disks : 0 Failed Disks : 0 Supported Adapter Operations ================ Rebuild Rate : Yes CC Rate : Yes BGI Rate : Yes Reconstruct Rate : Yes Patrol Read Rate : Yes Alarm Control : Yes Cluster Support : No BBU : No Spanning : Yes Dedicated Hot Spare : Yes Revertible Hot Spares : Yes Foreign Config Import : Yes Self Diagnostic : Yes Allow Mixed Redundancy on Array : No Global Hot Spares : Yes Deny SCSI Passthrough : No Deny SMP Passthrough : No Deny STP Passthrough : No Support Security : No Snapshot Enabled : No Support the OCE without adding drives : Yes Support PFK : Yes Support PI : No Support Boot Time PFK Change : Yes Disable Online PFK Change : No PFK TrailTime Remaining : 0 days 0 hours Support Shield State : Yes Block SSD Write Disk Cache Change: Yes Supported VD Operations ================ Read Policy : Yes Write Policy : Yes IO Policy : Yes Access Policy : Yes Disk Cache Policy : Yes Reconstruction : Yes Deny Locate : No Deny CC : No Allow Ctrl Encryption: No Enable LDBBM : No Support Breakmirror : No Power Savings : Yes Supported PD Operations ================ Force Online : Yes Force Offline : Yes Force Rebuild : Yes Deny Force Failed : No Deny Force Good/Bad : No Deny Missing Replace : No Deny Clear : No Deny Locate : No Support Temperature : Yes Disable Copyback : No Enable JBOD : No Enable Copyback on SMART : No Enable Copyback to SSD on SMART Error : Yes Enable SSD Patrol Read : No PR Correct Unconfigured Areas : Yes Enable Spin Down of UnConfigured Drives : Yes Disable Spin Down of hot spares : No Spin Down time : 30 T10 Power State : Yes Error Counters ================ Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 Cluster Information ================ Cluster Permitted : No Cluster Active : No Default Settings ================ Phy Polarity : 0 Phy PolaritySplit : 0 Background Rate : 30 Strip Size : 64kB Flush Time : 4 seconds Write Policy : WB Read Policy : Adaptive Cache When BBU Bad : Disabled Cached IO : No SMART Mode : Mode 6 Alarm Disable : Yes Coercion Mode : None ZCR Config : Unknown Dirty LED Shows Drive Activity : No BIOS Continue on Error : No Spin Down Mode : None Allowed Device Type : SAS/SATA Mix Allow Mix in Enclosure : Yes Allow HDD SAS/SATA Mix in VD : Yes Allow SSD SAS/SATA Mix in VD : No Allow HDD/SSD Mix in VD : No Allow SATA in Cluster : No Max Chained Enclosures : 16 Disable Ctrl-R : Yes Enable Web BIOS : Yes Direct PD Mapping : No BIOS Enumerate VDs : Yes Restore Hot Spare on Insertion : No Expose Enclosure Devices : Yes Maintain PD Fail History : Yes Disable Puncturing : No Zero Based Enclosure Enumeration : No PreBoot CLI Enabled : Yes LED Show Drive Activity : Yes Cluster Disable : Yes SAS Disable : No Auto Detect BackPlane Enable : SGPIO/i2c SEP Use FDE Only : No Enable Led Header : No Delay during POST : 0 EnableCrashDump : No Disable Online Controller Reset : No EnableLDBBM : No Un-Certified Hard Disk Drives : Allow Treat Single span R1E as R10 : No Max LD per array : 16 Power Saving option : Don't Auto spin down Configured Drives Max power savings option is not allowed for LDs. Only T10 power conditions are to be used. Default spin down time in minutes: 30 Enable JBOD : No TTY Log In Flash : No Auto Enhanced Import : No BreakMirror RAID Support : No Disable Join Mirror : No Enable Shield State : Yes Time taken to detect CME : 60s Exit Code: 0x00 Enclosure information: # /opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a1 Number of enclosures on adapter 1 -- 3 Enclosure 0: Device ID : 36 Number of Slots : 28 Number of Power Supplies : 2 Number of Fans : 3 Number of Temperature Sensors : 1 Number of Alarms : 1 Number of SIM Modules : 0 Number of Physical Drives : 28 Status : Normal Position : 1 Connector Name : Port B Enclosure type : SES VendorId is LSI CORP and Product Id is SAS2X36 VendorID and Product ID didnt match FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : 65 Inquiry data : Vendor Identification : LSI CORP Product Identification : SAS2X36 Product Revision Level : 0718 Vendor Specific : x36-55.7.24.1 Number of Voltage Sensors :2 Voltage Sensor :0 Voltage Sensor Status :OK Voltage Value :5020 milli volts Voltage Sensor :1 Voltage Sensor Status :OK Voltage Value :11820 milli volts Number of Power Supplies : 2 Power Supply : 0 Power Supply Status : OK Power Supply : 1 Power Supply Status : OK Number of Fans : 3 Fan : 0 Fan Speed :Low Speed Fan Status : OK Fan : 1 Fan Speed :Low Speed Fan Status : OK Fan : 2 Fan Speed :Low Speed Fan Status : OK Number of Temperature Sensors : 1 Temp Sensor : 0 Temperature : 48 Temperature Sensor Status : OK Number of Chassis : 1 Chassis : 0 Chassis Status : OK Enclosure 1: Device ID : 65 Number of Slots : 28 Number of Power Supplies : 2 Number of Fans : 3 Number of Temperature Sensors : 1 Number of Alarms : 1 Number of SIM Modules : 0 Number of Physical Drives : 28 Status : Normal Position : 1 Connector Name : Port A Enclosure type : SES VendorId is LSI CORP and Product Id is SAS2X36 VendorID and Product ID didnt match FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : 36 Inquiry data : Vendor Identification : LSI CORP Product Identification : SAS2X36 Product Revision Level : 0718 Vendor Specific : x36-55.7.24.1 Number of Voltage Sensors :2 Voltage Sensor :0 Voltage Sensor Status :OK Voltage Value :5020 milli volts Voltage Sensor :1 Voltage Sensor Status :OK Voltage Value :11760 milli volts Number of Power Supplies : 2 Power Supply : 0 Power Supply Status : OK Power Supply : 1 Power Supply Status : OK Number of Fans : 3 Fan : 0 Fan Speed :Low Speed Fan Status : OK Fan : 1 Fan Speed :Low Speed Fan Status : OK Fan : 2 Fan Speed :Low Speed Fan Status : OK Number of Temperature Sensors : 1 Temp Sensor : 0 Temperature : 47 Temperature Sensor Status : OK Number of Chassis : 1 Chassis : 0 Chassis Status : OK Enclosure 2: Device ID : 252 Number of Slots : 8 Number of Power Supplies : 0 Number of Fans : 0 Number of Temperature Sensors : 0 Number of Alarms : 0 Number of SIM Modules : 1 Number of Physical Drives : 0 Status : Normal Position : 1 Connector Name : Unavailable Enclosure type : SGPIO Failed in first Inquiry commnad FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : Unavailable Inquiry data : Vendor Identification : LSI Product Identification : SGPIO Product Revision Level : N/A Vendor Specific : Exit Code: 0x00 Now, notice that each slot 11 device shows an enclosure ID of 36, I think this is where the discrepancy happens. One should be 36. But the other should be on enclosure 65. Drives in slot 11: Enclosure Device ID: 36 Slot Number: 11 Drive's postion: DiskGroup: 5, Span: 0, Arm: 1 Enclosure position: 0 Device Id: 48 WWN: Sequence Number: 11 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online, Spun Up Is Commissioned Spare : YES Device Firmware Level: A5C0 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5003048000ee8a53 Connected Port Number: 1(path0) Inquiry Data: MJ1311YNG6YYXAHitachi HDS5C3030ALA630 MEAOA5C0 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :30C (86.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 36 Slot Number: 11 Drive's postion: DiskGroup: 19, Span: 0, Arm: 1 Enclosure position: 0 Device Id: 19 WWN: Sequence Number: 4 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online, Spun Up Is Commissioned Spare : NO Device Firmware Level: A580 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5003048000ee8e53 Connected Port Number: 0(path0) Inquiry Data: MJ1313YNG1VA5CHitachi HDS5C3030ALA630 MEAOA580 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :30C (86.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Update 06/28/12: I finally have some new information about (what we think) the root cause of this problem so I thought I would share. After getting in contact with a very knowledgeable Supermicro tech, they provided us with a tool called Xflash (doesn't appear to be readily available on their FTP). When we gathered some information using this utility, my colleague found something very strange: root@mogile2 test]# ./xflash.dat -i get avail Initializing Interface. Expander: SAS2X36 (SAS2x36) 1) SAS2X36 (SAS2x36) (50030480:00EE917F) (0.0.0.0) 2) SAS2X36 (SAS2x36) (50030480:00E9D67F) (0.0.0.0) 3) SAS2X36 (SAS2x36) (50030480:0112D97F) (0.0.0.0) This lists the connected enclosures. You see the 3 connected (we have since added a 3rd and a 4th which is not yet showing up) with their respective SAS address / WWN (50030480:00EE917F). Now we can use this address to get information on the individual enclosures: [root@mogile2 test]# ./xflash.dat -i 5003048000EE917F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:00EE917F Enclosure Logical Id: 50030480:0000007F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 [root@mogile2 test]# ./xflash.dat -i 5003048000E9D67F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:00E9D67F Enclosure Logical Id: 50030480:0000007F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 [root@mogile2 test]# ./xflash.dat -i 500304800112D97F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:0112D97F Enclosure Logical Id: 50030480:0112D97F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 Did you catch it? The first 2 enclosures logical ID is partially masked out where the 3rd one (which has a correct unique enclosure ID) is not. We pointed this out to Supermicro and were able to confirm that this address is supposed to be set during manufacturing and there was a problem with a certain batch of these enclosures where the logical ID was not set. We believe that the RAID controller is determining the ID based on the logical ID and since our first 2 enclosures have the same logical ID, they get the same enclosure ID. We also confirmed that 0000007F is the default which comes from LSI as an ID. The next pointer that helps confirm this could be a manufacturing problem with a run of JBODs is the fact that all 6 of the enclosures that have this problem begin with 00E. I believe that between 00E8 and 00EE Supermicro forgot to program the logical IDs correctly and neglected to recall or fix the problem post production. Fortunately for us, there is a tool to manage the WWN and logical ID of the devices from Supermicro: ftp://ftp.supermicro.com/utility/ExpanderXtools_Lite/. Our next step is to schedule a shutdown of these JBODs (after data migration) and reprogram the logical ID and see if it solves the problem. Update 06/28/12 #2: I just discovered this FAQ at Supermicro while Google searching for "lsi 0000007f": http://www.supermicro.com/support/faqs/faq.cfm?faq=11805. I still don't understand why, in the last several times we contacted Supermicro, they would have never directed us to this article :\

Search Results

Search found 112 results on 5 pages for 'raid0'.

Page 5/5 | < Previous Page | 1 2 3 4 5

- by goober

- by Forkrul Assail

- by Guard

- by BlakBat

- by rkotulla

- by devicerandom

- by Shingetsu

- by Brigadieren

- by dimmer

- by dhanke

- by BarsMonster

- by Andy Shinn

< Previous Page | 1 2 3 4 5