raid5 - Page 8 - Developer IT

Bad Blocks Exist in Virtual Device PERC H700 Integrated

- by neoX

I have a DELL server with PERC H700 Integrated controller. I've made RAID5 with 12 harddrives and the virtual device is in Optimal state, but I receive such errors under linux: sd 0:2:0:0: [sda] Unhandled error code sd 0:2:0:0: [sda] Result: hostbyte=0x07 driverbyte=0x00 sd 0:2:0:0: [sda] CDB: cdb[0]=0x88: 88 00 00 00 00 07 22 50 bd 98 00 00 00 08 00 00 end_request: I/O error, dev sda, sector 30640487832 sd 0:2:0:0: [sda] Unhandled error code sd 0:2:0:0: [sda] Result: hostbyte=0x07 driverbyte=0x00 sd 0:2:0:0: [sda] CDB: cdb[0]=0x88: 88 00 00 00 00 07 22 50 bd 98 00 00 00 08 00 00 end_request: I/O error, dev sda, sector 30640487832 sd 0:2:0:0: [sda] Unhandled error code sd 0:2:0:0: [sda] Result: hostbyte=0x07 driverbyte=0x00 sd 0:2:0:0: [sda] CDB: cdb[0]=0x88: 88 00 00 00 00 07 22 50 bc e0 00 00 01 00 00 00 end_request: I/O error, dev sda, sector 30640487648 But all disk are in Firmware state: Online, Spun Up. Also there is not a single ATA read or write error in any disk in the raid (I check them with smartctl -a -d sat+megaraid,N -H /dev/sda). The only strange thing is in the output in megacli: megacli -LDInfo -L0 -a0 ... Bad Blocks Exist: Yes How could there be bad blocks in a Virtual Drive, which is in optimal state and no disk is broken or even with a single error? I tried "Consistency Check", but it finished successfully and the errors are still in dmesg. Could Someone help me to figure it out what is wrong with my raid?

Read the article

NVidia ION and /dev/mapper/nvidia_... issues.

- by Ritsaert Hornstra

I have an NVidia ION board with 4 SATA ports and want to use that to run a Linux Server (CentOS 5.4). I first hooed up 3 HDs (that will be a RAID5 array) and a forth small boot HD. I first started to use the onboard RAID capability but that does not work correctly under Linux: the raid capacity is not a real RAID but uses lvm to define some arays. After setting the BIOS back to normal SATA mode and whiping the HDs, the first boot harddisk (/dev/sda) is seen as /dev/sda BEFORE mounting and after mounting as /dev/mapper/nvidia_. CentOS is unable to install on it (and grub is not installable on it either). So somehow the harddisk is still seen as if it belongs to some lvm volume. I tried to clean out the HD by issuing a few dd if=/dev/zero of=/dev/sda commands to wipe the starting cylinders and final cylinders but to no avail. Did anyone see this problem and did anyone find a solution? UPDATE When I create only a single ext3 partition on the first HD (/dev/mapper/nvidia_...) no LVM partitions are seen and I can boot from /dev/mapper/nvidia_.... Now the next step is to see how I can get rid of this folly.

Read the article

Ubuntu raid 1 write errors

- by Micah

I have an Ubuntu server set up with two SATA drives in a RAID 1 configuration with MDADM. The machine is used to record raw video, which involves a lot of writing to the disk. Sometimes during video recording the computer will crash, will the following errors in kern.log: Mar 15 10:39:41 video kernel: [414501.629864] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 Mar 15 10:39:41 video kernel: [414501.629870] ata2.00: BMDMA stat 0x26 Mar 15 10:39:41 video kernel: [414501.629875] ata2.00: SError: { UnrecovData Handshk } Mar 15 10:39:41 video kernel: [414501.629880] ata2.00: failed command: WRITE DMA EXT Mar 15 10:39:41 video kernel: [414501.629889] ata2.00: cmd 35/00:00:28:6d:f6/00:04:06:00:00/e0 tag 0 dma 524288 out Mar 15 10:39:41 video kernel: [414501.629891] res 51/84:b1:77:6e:f6/84:02:06:00:00/e0 Emask 0x30 (host bus error) Mar 15 10:39:41 video kernel: [414501.629896] ata2.00: status: { DRDY ERR } Mar 15 10:39:41 video kernel: [414501.629899] ata2.00: error: { ICRC ABRT } Mar 15 10:39:41 video kernel: [414501.629910] ata2.00: hard resetting link Mar 15 10:39:41 video kernel: [414501.973009] ata2.01: hard resetting link Mar 15 10:39:41 video kernel: [414502.482642] ata2.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 15 10:39:41 video kernel: [414502.482658] ata2.01: SATA link down (SStatus 0 SControl 300) Mar 15 10:39:41 video kernel: [414502.546160] ata2.00: configured for UDMA/133 Mar 15 10:39:41 video kernel: [414502.546203] ata2: EH complete Is this the result of faulty drives? Is software RAID just not performant enough for data rates ~15 MB/s, even with a quad-core i7? Thanks for your help. Edit: cat /proc/mdstat returns this: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdb1[1] sda1[0] 976760768 blocks [2/2] [UU] unused devices: <none>

Read the article

What benchmark tool to use to benchmark hardware for VM server?

- by Mark0978

We are setting up a new piece of hardware to virtualize several of our servers on. Choices are RAID 5, RAID 6, and RAID 0+1. We are wanting to benchmark all three before we go live with the machine, but I'm not sure how to test the speed. Since we will be using it to host VMs, what will the actual disk traffic look like? What can I use to see if RAID 6 is too slow? Short of setting up the system with all the VM's on it and running that way, then redoing on all the work, I'm not sure how to test it. It them becomes more of a subjective test than an objective one. I'm worried that RAID6 will have too much overhead, that RAID5 will be to fragile with 3TB drives and I've never worked with 0+1 at all. So in short I'd like to setup the base machine (which will be running Linux) and then test the underlying SW RAID for speed. What kind of tool exists to simulate this kind of load? Barring the lack of a specific tool, how about a generic FS testing tool that will simulate different loads?

Read the article

How do I diagnose a bottleneck in an Intel Atom based Ubuntu server?

- by Jon Cage

I have a small media server at home which has software raid and a gigabit link to the rest of my network. For some reason though, I only get ~10MB/s transfers when copying to/from the server. I use software RAID5 (mdadm) over 4 1TB disks. On top of that I then use LVM to give me a huge pool of disk space which is then split up into multiple partitions which can be resized as and when they need it. I'm guessing this it most likely the cause, but I'd like to know for sure where the root cause is. So, how can I benchmark network throughput (Windows 7 desktop <- Ubuntu server) and hard disk performance to try and identify where my bottleneck might be? [Edit] If anyone's interested, the motherboard is an Intel Desktop Board D945GCLF2. So that's a 300 series Atom processor with the Intel® 945GC Express Chipset [Edit2] I feel like such a fool! I just checked my desktop and I had the slower of the two onboard NICs plugged in so the server is probably not at fault here. Transferring a copy of ubuntu off the server I get ~35-40MB/s according to Windows 7. I'll do those HD tests when I get a chance though (just for completeness).

Read the article

32bit SQLServer with AWE NOT enabled. Buffer Cache Hit Ratio High, Disk Read Queue VERY HIGH, WHY?

- by chenwq

We have a "SQLServer 2005 SP3 32bit Enterprise Edition" running on a 32 bit Windows 2003 32bit Enterprise Edition 12GB RAM with AWE enabled using RAID5(5 pysical disks). We tuned AWE to enabled and restart sqlserver this afternoon after work, hope the performance will be better than old time. But there is something that we are very confused. On working days, SQLServer has a very bad performance. When we are looking for reasons, we check Windows Performance counter. Avg. Disk Read Queue Lenght > 140 Avg. Disk Write Queue Length < 1 SQL Server Buffer Cache Hit Ratio > 96% %Processor Time < 30% SQL Server Total Server Memory < 1.8G Obviously, without AWE enabled, SQL Server can use only less than 2G memory. My Question is: why "SQL Server Total server Memory" is less than 2G?I think SQL Server will use all 2G process address space. Does this counter count anything out? we known that sql server is sufferring lack of memory, but why "buffer hit ratio“ is as high as 96? Any advice is welcomed!

Read the article

DIR $file "File Not Found" vs DIR $filedir shows it....not permissions, not USB

- by Kev

I was having this problem before on a USB drive, but now it's happening on my main RAID5-backed hard disk: 2013-10-17 9:37 C:\>dir "C:\Shares\Shared\Reference\Safety Management System\Vid eo CD\AutoPlay\Docs\Manuel*" Volume in drive C has no label. Volume Serial Number is 3C18-E114 Directory of C:\Shares\Shared\Reference\Safety Management System\Video CD\AutoP lay\Docs 2003-09-09 11:29 PM 1,056,768 Manuel d'intervention d'urgence MFC.doc 2004-06-20 10:36 PM 139,849 Manuel d'intervention d'urgence MFC.pdf 2 File(s) 1,196,617 bytes 0 Dir(s) 196,068,691,968 bytes free 2013-10-17 9:38 C:\>dir "C:\Shares\Shared\Reference\Safety Management System\Vid eo CD\AutoPlay\Docs\Manuel d'intervention d'urgence MFC.doc" Volume in drive C has no label. Volume Serial Number is 3C18-E114 Directory of C:\Shares\Shared\Reference\Safety Management System\Video CD\AutoP lay\Docs File Not Found 2013-10-17 9:38 C:\> This is from a Command Prompt window where I went to Properties and told it I wanted to modify who it ran as. I opened it, had it run as me with the "restricted access" unchecked, then ran the above. The file in question has the following ACLs: Administrators, SYSTEM, and OurCompanyUsers. All three have full control of everything. Nobody has any Deny bits set. I am a member of Administrators. So I don't believe it's a permissions issue. It's not a USB drive, so this time there is no question of USB hardware. Windows Server 2003 Standard Edition SP2. What does this mean? Is this more likely a hardware or software problem?

Read the article

RAID Volume is no longer showing in Raid Controller BIOS and in Windows

- by Gordon

Hi all, I have installed some critical Windows Updates yesterday and now my external RAID Volume no longer shows in Windows Vista x64. All updates went through successfully. For their description, I cannot see how they should relate to the issue, but this is the only change that happened, so who knows. Anyway, here is the details: I have an external eSata enclosure that is running on a SiI4726 controller. I can connect to the controller with it's management utility from the computer the enclosure is connected to. The three drives in the enclosure show up as JBODs. I had those drives configured to be one logical RAID5 drive. RAID management is done through a SiI3132 SoftRaid controller. The Raid Management Utility just shows empty channels where it usually shows the Raid Group. In the Windows Disk Manager, I can see an unknown unitialized device. This is fine according to the setup manual. What it doesn't show is my Raid drive. It's gone. Also, when booting Windows, the BIOS of the controller used to show the RAID volume before booting the OS. This is not happening anymore. Updating drivers and firmware did not help. I have made sure the drivers and firmware are compatible to each others. And like I said, it used to work before. Any clues?

Read the article

NFS high CPU usage

- by user269836

Hello, I have a very strange issue. I have next server: Intel(R) Xeon(TM) MP CPU 3.16GHz cat /proc/cpuinfo | grep proce | wc -l 8 free -m total used free shared buffers cached Mem: 28203 27606 596 0 10789 9714 -/+ buffers/cache: 7103 21100 Swap: 24695 0 24695 RAID card *-storage description: RAID bus controller product: MegaRAID vendor: LSI Logic / Symbios Logic physical id: 7 bus info: pci@0000:13:07.0 logical name: scsi2 version: 01 width: 32 bits clock: 66MHz capabilities: storage pm bus_master cap_list rom configuration: driver=megaraid latency=32 resources: irq:134 memory:d8ff0000-d8ffffff(prefetchable) memory:df600000-df60ffff(prefetchable) HDD: 10x148Gb SCSI U320 15k - RAID5 /dev/sdb1 807G 674G 93G 88% /storage /dev/sdb1 /storage ext4 defaults,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,noatime,nodiratime,noacl,errors=remount-ro 0 1 network cards ethtool -i eth0 driver: tg3 version: 3.116 firmware-version: 5704-v3.36, ASFIPMIc v2.36 bus-info: 0000:10:02.0 ethtool -i eth1 driver: tg3 version: 3.116 firmware-version: 5704-v3.36, ASFIPMIc v2.36 bus-info: 0000:10:02.0 ifconfig bond0 Link encap:Ethernet HWaddr 00:0f:1f:ff:d6:4d inet addr:192.168.15.71 Bcast:192.168.15.255 Mask:255.255.255.0 inet6 addr: fe80::20f:1fff:feff:d64d/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:1062818202 errors:0 dropped:3918 overruns:0 frame:0 TX packets:1041317321 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:10000 RX bytes:258867684559 (241.0 GiB) TX bytes:396569192650 (369.3 GiB) this server running only nfs-kernel-server uname -a Linux nas2-backup 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64 GNU/Linux Debian 6. What do I have, once per day or two, LA goes up, it can be reached around LA: 40 but if I do: nfs-kernel-server restart. Every thing is OK. But on the next day or a little bit later, LA goes up again. Servers are connected to d-link dgs 1016d with 24 GBits ports. I have tried everything to find out what the problem is. Why it's happening, but still I can not resolve this issue. Any ideas on what is happening here?

Read the article

Slow File Copy observed copying 40GB files across network to iSCSI device

- by Rick

Here's a curious ones for the gurus: Setup: Source Machine: Windows Server 2003 R2 machine with local hard drive. VHD file of 40GB. 1 x 1Gbps network card, Cat6 cable, switch. Target Machine: Windows Server 2008 R2 machine with iSCSI connection to iSCSI target on separate machine (1TB, RAID5). 1 x 1Gbps network card, Cat6 cable, connected to same switch as for Source Machine. Second 1Gbps network card, Cat6 cable, connected via isolated switch to the iSCSI target. Switches are Netgear JGS524 model (web managed). If I copy from the Win2003R2 machine to Win2008R2 machine local drive I get 40GB in 45 minutes, 36 seconds. If I copy from the Win2008R2 machine to the iSCSI target (local drive to iSCSI target) I get 40GB in 37 minutes 56 seconds. If I copy from the Win2003R2 machine to the iSCSI target via the Win2008R2 machine I get 40GB in 3 hours, 50 minutes, 24 seconds. All copies were done via the following command issued on the Win2008R2 box: XCOPY <source> <target> /J XCOPY /J - Copies using unbuffered I/O. Recommended for very large files. So, what's the bit I'm missing here? Why does a back-to-back copy take in total 1 hour, 23 minutes, 32 seconds when a "straight through" copy take almost 3 times as long? Switches show no errors, network hovers around the 3% utilisation mark for the duration of the copy (whereas the "back-to-back" copies are around the 25% utilisation mark). What have I missed?

Read the article

Why are SMART error rates going down?

- by Jeff Shattock

I have a hard drive that's part of a Linux software raid5 array. SMART has reported that its multi_zone_error_rate was 0, then 1, then 3. So I figured I better start backing up more frequently and prepare to replace the drive. Now, today, the multi_zone_error_rate of that very same drive is back down to 1. It seems that 2 errors unhappened while I wasn't looking. I've also seen simliar behaviour by inspecting the syslog on the server. Jun 7 21:01:17 FS1 smartd[25593]: Device: /dev/sdc, SMART Usage Attribute: 7 Seek_Error_Rate changed from 200 to 100 Jun 7 21:01:17 FS1 smartd[25593]: Device: /dev/sde, SMART Usage Attribute: 7 Seek_Error_Rate changed from 200 to 100 Jun 7 21:01:18 FS1 smartd[25593]: Device: /dev/sdg, SMART Usage Attribute: 7 Seek_Error_Rate changed from 200 to 100 Jun 8 02:31:18 FS1 smartd[25593]: Device: /dev/sdg, SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200 Jun 8 03:01:17 FS1 smartd[25593]: Device: /dev/sdc, SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200 Jun 8 03:01:17 FS1 smartd[25593]: Device: /dev/sde, SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200 These are raw values, not the human-useful values that smartctl -a produces, but the behaviour is similar: error rates changing, then undoing the change. None of these are the drive that had the multi_zone weirdness. I haven't seen any problems from the RAID; its most recent scrub ( < 24 hours ago) came back totally clean. The only thing I can think of is that the SMART reporting circuitry on the drive isn't working properly all the time. The cables are in tight on the drive and board. What's going on here?

Read the article

Software mirroring (RAID1) versus "Fake Raid" for new Windows 7 install

- by kquinn

I've just ordered two new hard drives for my main desktop and a copy of Windows 7 Professional 64-bit. I'd like to do a clean install of Win7 onto the new drives (leaving my old XP Pro boot partition around for a while in case something goes disastrously wrong, etc.). I want to have them set up in mirrored (RAID-1) mode. My understanding is that Win7 Pro can do software mirroring, but can I set this up directly at install time? If so, how? Note that I'd like the disk to be split into three partitions (OS/Apps&Data/Bulk data), all of which should be mirrored. Would it be better (more reliable or faster) to use my motherboard's hardware RAID support? My motherboard is an older nVidia nForce 680i SLI, which is not the most stable of motherboards, and I'm not sure how trustworthy its RAID1 configuration might be (or if Win7 could even detect and install onto a hardware-mirrored volume). Also, the performance characteristics of RAID1 are rather different than RAID0 or RAID5, and I'm wondering if Win7's software mirroring might actually be faster than hardware RAID1 (for example, I'm more of a Unix admin when I have to wear the sysadmin hat, and I've had great success deploying ZFS; most hardware RAID1 implementations have to read both disks and compare results to look for data errors, but ZFS can read from only one disk in the mirror and just use the built-in checksum, meaning it can have up to 2x the number of reads in-flight, as long as there's no data corruption). Edit: Okay, my question about whether Windows 7 can do software mirroring has been answered, and it can. I'm still unsure whether Windows software RAID or my motherboard's hardware "fake RAID" function is a better choice, though. Remember, I'm only interested in mirroring -- not the more complicated striping or parity operations that generally show the poor performance of crappy motherboard RAID solutions.

Read the article

ESX 4.0 space: DASD, NAS, or ?

- by thormj

I put together an ESX box for better management, but its performance is a WTF item; I'm a noob at dealing with ESX, so I'm looking for a laundry-list of reading material to help me straighten this out so I can go back to .NET programming. Current storage system: We're running Raid5+Hotspare (8x500 GB spindles) on a PERC6i on a Dell 2910. Due to ESX limitatios, the PERC is showing the storage as 1x2TB + 1x800GB "partitions." I'm not sure of the setup's configuration (stride / stripe / ???) at all. Our Applications We have a SBS server as well as a minor (2x50 GB, but growing at 10GB/month) database server... Our application that lives on the database VM is CPU and I/O insense; it's a database churning excercise mixed in with a lot of computation on the data (fixing that performance is what I'm supposed to be working on)... Perfomance Issue When I do a backup, restore, or worse (copy a backup from 1 vm to another to move it to the QA VM), the entire system slows to a crawl (even "unrelated" VMs). I originally thought a DASD situation would be quite good since you had PCI-x bandwidth, but the systemwide slowdown is killing productivity. Questions What should I do to make an intelligent decision about NAS vs RAID vs SAN vs DASD? Are there sweet spots/ugly spots in the storage setup? Can you use a SSD PCI-X card in ESX for the tempdb? Good/Bad idea? Is there any way to "share" some image in a copy-on-write fashion? Most of the "Backup-Copy-Restore" is to "put a clean image on the dev boxes"; if I could have them "share" the master image, the "big copy" (2x50 GB) would only need to be done once per week instead of once per dev per week...[runtime performance isn't a concern with the dev boxes, but the backup/copy/restore kills production, SBS, and everything else on the box]

Read the article

Looking to replace Ghost with FSArchiver or Clonezilla, few questions about capabilities

- by Daniel Wright

I work for a PC Repair company and we are looking into setting up a dedicated machine with externally accessible SATA bays to clone harddrives as a safety net incase something goes wrong during a repair. We currently use a SATA/PATA to USB bridge called MagicBridge and Norton Ghost on any workstation, but we're looking to move away from Ghost. We have a computer with a large RAID5 array with Windows Server 2008 Standard currently installed, but this can be replaced with a flavour of *nix. I have some experience with Clonezilla, but FSArchiver also seems like a suitable replacment too. My Head Technician wants to know if my chosen solution (probably Clonezilla or FSArchiver, but I'm open to free suggestions) is capable of: Cloning a degraded RAID, such as a single drive from a RAID1 mirror without complaining Producing images that are easily mountable (he'd prefer them to be mountable in Windows, but if there is no other easy way, *nix should be fine) akin to Ghost Explorer so individual files can be restored as well as being able to do bare metal restores. My apologies for wordiness but I wanted to be thorough in my explaination. Thanks for any suggestions or tips :) EDIT: I've just found out that Clonezilla has a workaround for cloning RADI1 drives EDIT2: Found the answer to both of my questions, aparently I wasn't phrasing my searches right, could this question be deleted please?

Read the article

MSSQL error: consistency-based I/O error - can it be caused by an MSSQL or OS problem?

- by Philipp Keller

This is what I saw in the windows error log: SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x19fedd20; actual: 0x19fed5e3). It occurred during a read of page (1:1764) in database ID 6 at offset 0x00000000dc8000 in file 'D:\mssql\local_repository_pbdiffimport.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online. I ran dbcc checkdb which told me I should restore with option REPAIR_ALLOW_DATA_LOSS, so I eventually ran DBCC CHECKDB (my_db_name, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS But that resulted in about 2'000 rows being lost. I restored a backup but now I'm afraid this will happen again since we already had a consistency problem in the same database about 2 weeks ago but then it happened in an index (recreated indexes solved the problem). We have investigated the discs - RAID5 looks good, no errors, and also none of the disc-check-utilities have revealed any hardware problem. Can this be caused by OS (Windows Server 2003) or by MSSQL (MSSQL Server 2005)?

Read the article

How to get an inactive RAID device working again?

- by Jonik

After booting, my RAID1 device (/dev/md_d0 *) sometimes goes in some funny state and I cannot mount it. * Originally I created /dev/md0 but it has somehow changed itself into /dev/md_d0. # mount /opt mount: wrong fs type, bad option, bad superblock on /dev/md_d0, missing codepage or helper program, or other error (could this be the IDE device where you in fact use ide-scsi so that sr0 or sda or so is needed?) In some cases useful info is found in syslog - try dmesg | tail or so The RAID device appears to be inactive somehow: # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md_d0 : inactive sda4[0](S) 241095104 blocks # mdadm --detail /dev/md_d0 mdadm: md device /dev/md_d0 does not appear to be active. Question is, how to make active the device again (using mdmadm, I presume)? (Other times it's alright (active) after boot, and I can mount it manually without problems. But it still won't mount automatically even though I have it in /etc/fstab: /dev/md_d0 /opt ext4 defaults 0 0 So a bonus question: what should I do to make the RAID device automatically mount at /opt at boot time?) This is an Ubuntu 9.10 workstation. Background info about my RAID setup in this question.

Read the article

My client's solution of a Windows SBS 2011 VM on an Ubuntu host and VirtualBox is pinning the host CPU

- by Scott Stamp

Here's my situation, I've got a client hosting two servers (one VM), with the host providing VMware Zimbra, the other Windows Small Business Server 2011. Unfortunately, the person before me had configured this setup as follows. Host: Ubuntu Desktop Edition 10.04 (I know, again, not my choice) running VMware Zimbra 8GB of RAM On-board RAID1 of two 320GB Seagate Barracuda drives for the OS Software RAID5 of four 500GB WD Caviar Black drives on MDADM for bulk storage (sorry, I don't know the model #) A relatively competent quad-core Intel Core i7 CPU from the Nehalem architecture (not suspicious of this as the bottleneck) Guest: Windows Small Business Server 2011 4GB of RAM Host-equivalent CPU allocation VDI file for OS hosted on the on-board RAID, VDI file for storage hosted on the on-board RAID For some reason when running, the VM locks up when sitting nearly idle, and the VirtualBox process reports values of 240%+ in top (how is that even possible?!). Anyone have any ideas or suggestions? I'm totally stumped on this one. Happy to provide whatever logs you'd like to take a look at. Ideally I'd drop VirtualBox and provision this with VMware Workstation, but the client has objected to the (very nominal) costs involved. If hardware needs to be purchased to help, it will be, but we're considering upgrades a last-resort at this time. Thanks in advance! *fingers crossed*

Read the article

How to make a Linux software RAID1 detect disc corruption?

- by Paul

This is one of the nightmare days: A virtualized server running on a Linux SW-RAID1 runs a VM that exhibits random segfaults in seemingly random codechunks. While debugging I find that a file gives different md5sums on each and every run. Digging deeper I find this: The raw disc partitions that make up the RAID1 mirror contain 2 bit-differences and ca. 9 sectors are completely empty on one disc and filled with data on the other disc. Obviously Linux gives back a sector from a undeterministically chosen disc of the mirror set. So sometimes the same sector is returned OK, sometimes the corrupted is given back. The docs say: RAID cannot and is not supposed to guard against data corruption on the media. Therefore, it doesn't make any sense either, to purposely corrupt data (using dd for example) on a disk to see how the RAID system will handle that. It is most likely (unless you corrupt the RAID superblock) that the RAID layer will never find out about the corruption, but your filesystem on the RAID device will be corrupted. Thanks. That will help me sleep. :-/ Is there a way to have Linux at least detect this corruption by using sector checksumming or something like that? Would this be detected in a RAID5 setup? Is this the moment I wish I used ZFS or btrfs (once it becomes usable without uber-admin capabilities)?

Read the article

How to re-add a RAID-10 failed drive on Ubuntu?

- by thiesdiggity

I have a problem that I can't seem to solve. We have a Ubuntu server setup with RAID-10 and two of the drives dropped out of the array. When I try to re-add them using the following command: mdadm --manage --re-add /dev/md2 /dev/sdc1 I get the following error message: mdadm: Cannot open /dev/sdc1: Device or resource busy When I do a "cat /proc/mdstat" I get the following: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [r$ md2 : active raid10 sdb1[0] sdd1[3] 1953519872 blocks 64K chunks 2 near-copies [4/2] [U__U] md1 : active raid1 sda2[0] sdc2[1] 468853696 blocks [2/2] [UU] md0 : active raid1 sda1[0] sdc1[1] 19530688 blocks [2/2] [UU] unused devices: <none> When I run "/sbin/mdadm --detail /dev/md2" I get the following: /dev/md2: Version : 00.90 Creation Time : Mon Sep 5 23:41:13 2011 Raid Level : raid10 Array Size : 1953519872 (1863.02 GiB 2000.40 GB) Used Dev Size : 976759936 (931.51 GiB 1000.20 GB) Raid Devices : 4 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Thu Oct 25 09:25:08 2012 State : active, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : near=2, far=1 Chunk Size : 64K UUID : c6d87d27:aeefcb2e:d4453e2e:0b7266cb Events : 0.6688691 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 0 0 1 removed 2 0 0 2 removed 3 8 49 3 active sync /dev/sdd1 Output of df -h is: Filesystem Size Used Avail Use% Mounted on /dev/md1 441G 2.0G 416G 1% / none 32G 236K 32G 1% /dev tmpfs 32G 0 32G 0% /dev/shm none 32G 112K 32G 1% /var/run none 32G 0 32G 0% /var/lock none 32G 0 32G 0% /lib/init/rw tmpfs 64G 215M 63G 1% /mnt/vmware none 441G 2.0G 416G 1% /var/lib/ureadahead/debugfs /dev/mapper/RAID10VG-RAID10LV 1.8T 139G 1.6T 8% /mnt/RAID10 When I do a "fdisk -l" I can see all the drives needed for the RAID-10. The RAID-10 is part of the /dev/mapper, could that be the reason why the device is coming back as busy? Anyone have any suggestions on what I can try to get the drives back into the array? Any help would be greatly appreciated. Thanks!

Read the article

Load is 0, yet site crawls (sometimes). What gives?

- by Yegor

I have a ~1.5-2mil page views per day site running on 2 servers. One for mysql, other for everything else. Mysql box has a load of 3, frontend is usually 0.0-0.1. Both are dual quad core with 8GB ram running SAS drives in raid5. CPU is idle for majority of the time, iowait is non-existent. Im running nginx, memcache, and site is built on php. Half the time everything runs perfect, while at other times it lags something severe, when it takes 10-15 seconds for a page to load. Page execution time is always super low, but it seems to hang, waiting for something before it actually loads the page. Whats even more weird is that it only happens to 1 file on the site (but its the one thats most commonly accessed, that actually loads the content on the site). Other pages are super fast at all times, even when it takes 15 seconds to load actual content. I have nginx_stats plugin installed, and if I monitor it, the lag spikes happen when the write column starts going above 100, and it frequently does... all the way to 500-1000. It does so at totally random times... not when traffic is heavy... it can do this in the middle of the night, and work perfectly at 5pm when traffic is at its highest. Any ideas?

Read the article

Failing to load rootfs: Ubuntu 10 + grub2 + rootfs ext4 w/ RAID1

- by James

I am having problems booting a new Ubuntu 10 (server) install. My primary HD (/dev/sda) is laid out as follows: Device Boot Start End Blocks Id System /dev/sda1 * 1 18 144553+ 83 Linux <-- /BOOT /dev/sda2 19 182401 1464991447+ 5 Extended /dev/sda5 19 2207 17583111 fd Linux raid autodetect /dev/sda6 2208 11934 78132096 fd Linux raid autodetect <-- / (ROOTFS) /dev/sda7 11935 182401 1369276146 fd Linux raid autodetect The rootfs is part of a RAID1 (software) array (currently degraded): # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 sda6[1] 78132032 blocks [2/1] [_U] The UUIDs for the partitions are as follows: # blkid /dev/sda1 /dev/sda1: UUID="b25dd301-41b9-4f4d-9b0a-0e31713dd74c" TYPE="ext2" # blkid /dev/sda6 /dev/sda6: UUID="af7b9ede-fa53-c0c1-74be-31ec752c5cd5" TYPE="linux_raid_member" # blkid /dev/md2 /dev/md2: UUID="a0602d42-6855-482f-870c-6f6ecdcdae3f" TYPE="ext4" Finally, I have my grub2 menuentry setup as follows: ### BEGIN /etc/grub.d/10_linux ### menuentry 'Ubuntu, with Linux 2.6.32-25-server' --class ubuntu --class gnu-linux --class gnu --class os { insmod ext2 insmod raid insmod mdraid set root='(hd0,1)' search --no-floppy --fs-uuid --set b25dd301-41b9-4f4d-9b0a-0e31713dd74c linux /vmlinuz-2.6.32-25-server root=UUID=a0602d42-6855-482f-870c-6f6ecdcdae3f ro nosplash noplymouth initrd /initrd.img-2.6.32-25-server } When I attempt to boot, grub loads OK, however I eventually get the following error message: Gave up waiting for root device. ALERT /dev/disk/by-uuid/a0602d42-6855-482f-870c-6f6ecdcdae3f does not exist. Dropping to a shell! If from the grub bootloader I open a grub command line, I can ls (hd0,) and it lists the correct partitions with the UUIDs as shown above - sda6 shows 'a0602d42-6855-482f-870c-6f6ecdcdae3f' (the RAID UUID). If I ls (md2)/ it properly lists all the files on the RAID1 filesystem (ext4) so it doesn't appear to be an issue accessing the raid device. Does anyone have any suggestions as to what the problem might be? I can't figure this one out.

Read the article

Recover data from Dynamic Disk (MBR) bigger than 2TB

- by Helder

Here is the situation: Promise Array FastTrak TX4310 with 3 disks (750 GB each) in RAID5. This comes to around 1500 GB of data. Last week I had the idea of expanding the RAID with an additional 750 GB disk. This would bring the volume to around 2250 GB. I plugged the disk and used the Webpam software to do the RAID expansion. However, I didn't count with the MBR 2TB limit, as I didn't remembered that the disk was using MBR instead of GPT and I didn't check it prior to the expansion. After a couple of days of expansion, today when I got home, the disk in Windows disk manager showed the message "Invalid disk" and when I try to activate it, it says "The operation is not allowed on the Invalid pack". From what I figured, the logical volume on the RAID expanded, and passed that info to the Windows layer and I ended up with an "larger than 2TB" MBR disk. I'm hopping that somehow I can still recover some data from this, and I was wondering if I can "rewrite" the MBR structure back to the 1500 GB partition size, so I can access the partition in Windows. Right now I'm doing an "Analyse" with TestDisk, as I hope the program will pickup the old 1500 structure and allow me to somehow revert back to it. I think that even though the Logical Drive in the RAID is bigger than the 2TB, I can somehow correct the MBR to show the 1500 GB partition again. I had a similar problem once, and I was able to recover the data using a similar method. What do you guys think? Is it a dead end? Am I totally screwed because there is the extra RAID layer that I'm not counting? Or is there other way to move with this? Thanks all!

Read the article

Should I use "Raid 5 + spare" or "Raid 6"?

- by Trevor Boyd Smith

What is "Raid 5 + Spare" (excerpt from User Manual, Sect 4.17.2, P.54): RAID5+Spare: RAID 5+Spare is a RAID 5 array in which one disk is used as spare to rebuild the system as soon as a disk fails (Fig. 79). At least four disks are required. If one physical disk fails, the data remains available because it is read from the parity blocks. Data from a failed disk is rebuilt onto the hot spare disk. When a failed disk is replaced, the replacement becomes the new hot spare. No data is lost in the case of a single disk failure, but if a second disk fails before the system can rebuild data to the hot spare, all data in the array will be lost. What is "Raid 6" (excerpt from User Manual, Sect 4.17.2, P.54): RAID6: In RAID 6, data is striped across all disks (minimum of four) and a two parity blocks for each data block (p and q in Fig. 80) is written on the same stripe. If one physical disk fails, the data from the failed disk can be rebuilt onto a replacement disk. This Raid mode can support up to two disk failures with no data loss. RAID 6 provides for faster rebuilding of data from a failed disk. Both "Raid 5 + spare" and "Raid 6" are SO similar ... I can't tell the difference. When would "Raid 5 + Spare" be optimal? And when would "Raid 6" be optimal"? The manual dumbs down the different raid with 5 star ratings. "Raid 5 + Spare" only gets 4 stars but "Raid 6" gets 5 stars. If I were to blindly trust the manual I would conclude that "Raid 6" is always better. Is "Raid 6" always better?

Read the article

Photoshop CS5 performance over network drive (cifs)

- by grub

Hello Everyone I did install a QNAP NAS TS410 for a customer (professional photographer) with 3 Hitachi Deskstar 7200rpm 2TB disk configured as RAID5. The NAS and the workstations are connected over a Gigabit network. He and his co-worker are accessing the photos (about 1TB of photos) over a mapped network drive from their windows machines (Windows XP - 32bit and Windows 7 Ultimate - 32bit). Both are using Photoshop CS5 to edit the photos. The problem is that to save a edited photo takes a really long time, it takes about 3 times as long to save a photo as to open it. After some tests I can exclude the network, the NAS and the windows machines as source of the issue. I think the problem is the Photoshop software and its handling of the network drives. Officially network drives are not supported by Adobe. I do not have any experience with the Adobe products, especially with Adobe Photoshop CS5. What are your recommendation to solve the performance issue? Should my customer copy the photos to the local drive, edit them and upload them again to the network drive or is Adobe Drive or Adobe Version Cue the answer? One requirement is that the photos need to be accessible / editable from both computers even when one of them is offline. Adobe Version Cue needs a dedicated service running to be usable, so this solution is not possible as far as I understand the Cue software. Thank you for your input to this issue and have a nice day :-) Greetings grub

Read the article

Reusing slot numbers in Linux software RAID arrays

- by thkala

When a hard disk drive in one of my Linux machines failed, I took the opportunity to migrate from RAID5 to a 6-disk software RAID6 array. At the time of the migration I did not have all 6 drives - more specifically the fourth and fifth (slots 3 and 4) drives were already in use in the originating array, so I created the RAID6 array with a couple of missing devices. I now need to add those drives in those empty slots. Using mdadm --add does result in a proper RAID6 configuration, with one glitch - the new drives are placed in new slots, which results in this /proc/mdstat snippet: ... md0 : active raid6 sde1[7] sdd1[6] sda1[0] sdf1[5] sdc1[2] sdb1[1] 25185536 blocks super 1.0 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU] ... mdadm -E verifies that the actual slot numbers in the device superblocks are correct, yet the numbers shown in /proc/mdstat are still weird. I would like to fix this glitch, both to satisfy my inner perfectionist and to avoid any potential sources of future confusion in a crisis. Is there a way to specify which slot a new device should occupy in a RAID array? UPDATE: I have verified that the slot number persists in the component device superblock. For the version 1.0 superblocks that I am using that would be the dev_number field as defined in include/linux/raid/md_p.h of the Linux kernel source. I am now considering direct modification of said field to change the slot number - I don't suppose there is some standard way to manipulate the RAID superblock?

Search Results

Search found 227 results on 10 pages for 'raid5'.

Page 8/10 | < Previous Page | 4 5 6 7 8 9 10 | Next Page >

- by neoX

- by Ritsaert Hornstra

- by Micah

- by Mark0978

- by Jon Cage

- by chenwq

- by Kev

- by Gordon

- by user269836

- by Rick

- by Jeff Shattock

- by kquinn

- by thormj

- by Daniel Wright

- by Philipp Keller

- by Jonik

- by Scott Stamp

- by Paul

- by thiesdiggity

- by Yegor

- by James

- by Helder

- by Trevor Boyd Smith

- by grub

- by thkala

< Previous Page | 4 5 6 7 8 9 10 | Next Page >