xserve raid - Page 38 - Developer IT

Need hard disk recommendation for linux home server.

- by neotracker

Hello, I'm planing to build a little linux homeserver. It will mainly be used for storage and maybe as an media pc. I plan to build a software raid5 with 4 1.5TB or 2TB hard drives. I already decided to use the Western Digital Caviar Green 1.5 TB drive, but then I read about some problems with the WD green series about many drives failing and that they are not recommended for raid anyway. Of course, I couldn't find much facts on the issues so I thought I just ask here ;-) What hard drives would you recommended for a software raid5 setup? As I only need it for storage, the whole thing doesn't have to be too fast. So I prefer a cheap price and silence to great performance.

Read the article

SQL server availability issue: large query stops other connections from connecting

- by Carlos

I've got a high-spec (multicore, RAID) server running MS SQL 2008, with several databases on it. I have a low throughput process that periodically needs a small amount of information from one of the DBs, and the code seems to work fine. However, sometimes when one of my colleagues does a huge query against one of the other DBs, I see full CPU usage on the machine, and connections from my app time out. Why does this happen? I would have thought the many cores and harddisks would somehow (together with cleverly written DB server) be able to keep at least some of the resources free for other apps? I'm pretty sure he doesn't use multiple connections for his query. What can I do to prevent this?

Read the article

RocketRaid gives me studdering computer

- by Dan

I have a RocketRaid SanDigital 8U TowerRaid with 8x5900 RPM 2 TB harddrives in a raid 5. When I'm writing data to the drives my computer studders (meaning freezes for a 1/2 second every 2 seconds or so). My screen freezes, my music goes into a loop, its really annoying. I get the same thing in windows 7 as I do in linux. The only difference is it seems to happen less in linux, but occassionally linux will crash (I've never had linux crash for any other reason, so I'm assuming they have poor linux drivers and kernal mod). Any tips for how to deal with this? Thanks

Read the article

Can I mix drive types in an HP server?

- by Generic Error

I currently have the following setup: HP DL380 G4 Server 6 x 73GB, Ultra160, 10k, SCSI 80 Pin Drives Smart Array 6i Controller RAID 5 One of the drives is failing and needs to be replaced. I have on hand drives that are the same size and type, but are Ultra320, 15k instead. I have verified that these drives work in another system with the same type of drive controller. When I plug one of these in the system simply reports the drive as being offline and has nothing further to do with it. From what I have read these drives should be compatable. Should this work at all and if so, what might be preventing it?

Read the article

Unreadable sectors reported by smartd, is it serious?

- by stribika

I have a RAID 5 array of 4 disks. In the last 2 days I began to see these messages in the log: Jun 13 23:01:05 localhost smartd[4537]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors Jun 13 23:01:05 localhost smartd[4537]: Device: /dev/sdb [SAT], 2 Currently unreadable (pending) sectors If I have 2 faulty disks then the array should not show all disks OK: md0 : active raid1 sdd1[3] sdb1[1] sdc1[2] sda1[0] 64128 blocks [4/4] [UUUU] Strangely there are no other problems just the log messages. I am worried because sda is new and I previously had problems with sdb. (Completely died but the guy who sold it to me fixed it somehow.) Am I in danger of losing data? What should I do now?

Read the article

Hyper-V R2: Need help with disk structure

- by MojoDK

Hi all, I'm going to use the free (non gui) version of Hyper-V R2. In my new server I have 8 disks in total (for Hyper-V R2 installation and virtual machine). Atm I'm going to run a single virtual machine, with following tasks: Windows Server 2008 R2 x64 File/Print SQL Server My question is ... with my 8 disks in the server, which disks should contain wich data? Should I install "Hyper-V R2" and VM's drive c on same physical disks? Should I use raid 1 or 5? With the above tasks, how would you structure the disks? Hope you know what I mean (I'm not english, so it's difficult to explain). Thanks!!! Mojo

Read the article

Failing Sata HDD

- by DaveCol

I think my HDD is fried... Could someone confirm or help me restore it? I was using Hardware RAID 1 Configuration [2 x 160GB SATA HDD] on a CentOS 4 Installation. All of a sudden I started seeing bad sectors on the second HDD which stopped being mirrored. I have removed the RAID array and have tested with SMART which showed the following error: 187 Unknown_Attribute 0x003a 001 001 051 Old_age Always FAILING_NOW 4645 I have no clue what this means, or if I can recover from it. Could someone give me some ideas on how to fix this, or what HDD to get to replace this? Complete SMART report: Smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: GB0160CAABV Serial Number: 6RX58NAA Firmware Version: HPG1 User Capacity: 160,041,885,696 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a Local Time is: Tue Oct 19 13:42:42 2010 COT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 433) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 54) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 253 006 Pre-fail Always - 0 3 Spin_Up_Time 0x0002 097 097 000 Old_age Always - 0 4 Start_Stop_Count 0x0033 100 100 020 Pre-fail Always - 152 5 Reallocated_Sector_Ct 0x0033 095 095 036 Pre-fail Always - 214 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 73109713 9 Power_On_Hours 0x0032 083 083 000 Old_age Always - 15133 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0033 100 100 020 Pre-fail Always - 154 184 Unknown_Attribute 0x0032 038 038 000 Old_age Always - 62 187 Unknown_Attribute 0x003a 001 001 051 Old_age Always FAILING_NOW 4645 189 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 190 Unknown_Attribute 0x001a 061 055 000 Old_age Always - 656408615 194 Temperature_Celsius 0x0000 039 045 000 Old_age Offline - 39 (Lifetime Min/Max 0/22) 195 Hardware_ECC_Recovered 0x0032 070 059 000 Old_age Always - 12605265 197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 1 198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0000 200 200 000 Old_age Offline - 62 SMART Error Log Version: 1 ATA Error Count: 4645 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 4645 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 02 7b 86 b1 ea 00 00:38:52.796 READ DMA ec 03 45 00 00 00 a0 00 00:38:52.796 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:52.794 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:49.991 IDENTIFY DEVICE c8 00 04 79 86 b1 ea 00 00:38:49.935 READ DMA Error 4644 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 04 79 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:49.991 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:49.935 READ DMA Error 4643 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA Error 4642 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA Error 4641 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 15131 - # 2 Short offline Completed without error 00% 15131 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

Read the article

Ubuntu server has slow performance

- by Rich

I have a custom built Ubuntu 11.04 server with a 6 disk software RAID 10 primary drive. On it I'm primarily running a PostgreSQL and a few other utilities that stream data from the web. I often find after a few hours of uptime the server starts to lag with all kinds of processes. For example, it may take 10-15 seconds after log-in to get a shell prompt. It might take 5-10 seconds for top to come up. An ls might take a second or two. When I look at top there is almost no CPU usage. There's a fair amount of memory used by the PostgreSQL server but not enough to bleed into swap. I have no idea where to go from here, other than to suspect the RAID10 (I've only ever had software RAID 1's before). Edit: Output from top: top - 11:56:03 up 1:46, 3 users, load average: 0.89, 0.73, 0.72 Tasks: 119 total, 1 running, 118 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 0.0%sy, 0.0%ni, 93.5%id, 6.2%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16325596k total, 3478248k used, 12847348k free, 20880k buffers Swap: 19534176k total, 0k used, 19534176k free, 3041992k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1747 woodsp 20 0 109m 10m 4888 S 1 0.1 0:42.70 python 357 root 20 0 0 0 0 S 0 0.0 0:00.40 jbd2/sda3-8 1 root 20 0 24324 2284 1344 S 0 0.0 0:00.84 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:00.24 ksoftirqd/0 6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0 7 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/0 8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1 10 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/1 12 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/1 13 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/2 14 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/2:0 15 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 16 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/2 17 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/3 18 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/3:0 19 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/3 20 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/3 21 root 0 -20 0 0 0 S 0 0.0 0:00.00 cpuset 22 root 0 -20 0 0 0 S 0 0.0 0:00.00 khelper 23 root 20 0 0 0 0 S 0 0.0 0:00.00 kdevtmpfs 24 root 0 -20 0 0 0 S 0 0.0 0:00.00 netns 26 root 20 0 0 0 0 S 0 0.0 0:00.00 sync_supers df -h rpsharp@ncp-skookum:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 1.8T 549G 1.2T 32% / udev 7.8G 4.0K 7.8G 1% /dev tmpfs 3.2G 492K 3.2G 1% /run none 5.0M 0 5.0M 0% /run/lock none 7.8G 0 7.8G 0% /run/shm /dev/sda2 952M 128K 952M 1% /boot/efi /dev/md0 5.5T 562G 4.7T 11% /usr/local free -m psharp@ncp-skookum:~$ free -m total used free shared buffers cached Mem: 15942 3409 12533 0 20 2983 -/+ buffers/cache: 405 15537 Swap: 19076 0 19076 tail -50 /var/log/syslog Jul 3 06:31:32 ncp-skookum rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="1070" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Jul 3 06:39:01 ncp-skookum CRON[14211]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 06:40:01 ncp-skookum CRON[14223]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:00:01 ncp-skookum CRON[14328]: (woodsp) CMD (/home/woodsp/bin/mail_tweetupdate # email an update) Jul 3 07:00:01 ncp-skookum CRON[14327]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:00:28 ncp-skookum sendmail[14356]: q63E0SoZ014356: from=woodsp, size=2328, class=0, nrcpts=2, msgid=<[email protected]>, relay=woodsp@localhost Jul 3 07:00:29 ncp-skookum sm-mta[14357]: q63E0Si6014357: from=<[email protected]>, size=2569, class=0, nrcpts=2, msgid=<[email protected]>, proto=ESMTP, daemon=MTA-v4, relay=localhost [127.0.0.1] Jul 3 07:00:29 ncp-skookum sendmail[14356]: q63E0SoZ014356: to=Spencer Wood <[email protected]>,Martin Lacayo <[email protected]>, ctladdr=woodsp (1004/1005), delay=00:00:01, xdelay=00:00:01, mailer=relay, pri=62328, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (q63E0Si6014357 Message accepted for delivery) Jul 3 07:00:29 ncp-skookum sm-mta[14359]: STARTTLS=client, relay=mx3.stanford.edu., version=TLSv1/SSLv3, verify=FAIL, cipher=DHE-RSA-AES256-SHA, bits=256/256 Jul 3 07:00:29 ncp-skookum sm-mta[14359]: q63E0Si6014357: to=<[email protected]>,<[email protected]>, ctladdr=<[email protected]> (1004/1005), delay=00:00:01, xdelay=00:00:00, mailer=esmtp, pri=152569, relay=mx3.stanford.edu. [171.67.219.73], dsn=2.0.0, stat=Sent (Ok: queued as 8F3505802AC) Jul 3 07:09:08 ncp-skookum CRON[14396]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 07:17:01 ncp-skookum CRON[14438]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jul 3 07:20:01 ncp-skookum CRON[14453]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:39:01 ncp-skookum CRON[14551]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 07:40:01 ncp-skookum CRON[14562]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:00:01 ncp-skookum CRON[14668]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:09:01 ncp-skookum CRON[14724]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 08:17:01 ncp-skookum CRON[14766]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jul 3 08:20:01 ncp-skookum CRON[14781]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:39:01 ncp-skookum CRON[14881]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 08:40:01 ncp-skookum CRON[14892]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Output of hdparm -t /dev/sd{a,b,c,d,e,f} This looks suspicious? /dev/sda: Timing buffered disk reads: 2 MB in 4.84 seconds = 423.39 kB/sec /dev/sdb: Timing buffered disk reads: 420 MB in 3.01 seconds = 139.74 MB/sec /dev/sdc: Timing buffered disk reads: 390 MB in 3.00 seconds = 129.87 MB/sec /dev/sdd: Timing buffered disk reads: 416 MB in 3.00 seconds = 138.51 MB/sec /dev/sde: Timing buffered disk reads: 422 MB in 3.00 seconds = 140.50 MB/sec /dev/sdf: Timing buffered disk reads: 416 MB in 3.01 seconds = 138.26 MB/sec

Read the article

Rebuilding array on 3ware 9690SA-8I

- by Tim Jones

I have a RAID10 (8x1TB) array on a 3ware 9690 card running on an Ubuntu 1110 server. There was a kernel update so I scheduled a reboot after which the array was inaccessible. I checked the status a drive has died in the array, but the controller has thrown the entire array into an 'inoperable' state instead of simply degraded (what's the point of the RAID now ;-). After taking out the 'dead' drive I run a quick test to find it completely functional without a bad sector to be found. I try to put the drive back in but the array still marks the disk as degraded (remembering serial number or something??) and the entire array as inoperable... So I swap it out for a known working drive (not the same capacity but higher - should still work) and initiate a rebuild with the the new drive as a replacement. This fails instantly with the error "(0x0B:0x0033): Unit busy : Failed to start Rebuild on Unit 0". The unit shouldn't be busy as it is not mounted (the card itself is listed with lshw but the array it provides is not). I'm pretty much at an impasse now, I don't understand how I can have a single drive failure on a RAID10 that makes the entire array inaccessible, degraded I could understand but inaccessible?? I don't think the controller as prior to the reboot it was completely functional.

Read the article

HP P212 hangs on Initializing

- by user927586

I'm in trouble... Server: HP Microserver N36L Storage: HP P212/256 with BBWC Drives: 4 WD 2 Tb While expanding the RAID 5 array from 3 to 4 drives, the server rebooted itself. Expansion was at 76%. Now at boot the P212 seems to hang on initializing. After some time screen resets and the server boots, but in the os (Windows Server 2008R2) the P212 has errors (error code 10) and it's not seen by the HP ACU. Whats' going on? It's trying to complete the expansion? It's trying to make all over the expansion? It's simply stuck? What should I do? I NEED the data on the array! PS In the server BIOS, on the hd boot priority, the server still reports the array logical drive... maybe it's a good sign... PPS I'm not at the server physical location, so I cannot tell if the drives in the array are being accessed (that would be the case if the controller it's redoing/completing the expansion process) or not right now. Bye Dario

Read the article

Megacli is killing me, any help appreciated

- by Stefan

I run a server with 2 drives in raid0 configured through BIOS. I just added 2 more drives using hotplug (the server is dell r610 with RHEL 5.4 64bit) and I would like to configure a separate raid0 partition on these drives. I am getting the following error: /opt/MegaRAID/MegaCli/MegaCli64 -CfgLdAdd r0[32:2, 32:3] -a0 The specified physical disk does not have the appropriate attributes to complete the requested command. Exit Code: 0x26 All the parameters are correct and there is just no reason why this command could not work, see this (fujitsu is current raid, seagate is the new one I want to create): /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | egrep 'Adapter|Enclosure|Slot|Inquiry' Adapter #0 Enclosure Device ID: 32 Slot Number: 0 Enclosure position: 0 Inquiry Data: FUJITSU MBD2147RC D807D0A4PA101174 Enclosure Device ID: 32 Slot Number: 1 Enclosure position: 0 Inquiry Data: FUJITSU MBD2147RC D807D0A4PA10115T Enclosure Device ID: 32 Slot Number: 2 Enclosure position: 0 Inquiry Data: SEAGATE ST9300603SS FS033SE0TF5K Enclosure Device ID: 32 Slot Number: 3 Enclosure position: 0 Inquiry Data: SEAGATE ST9300603SS FS023SE070FK I also tried to set up the drive as hotspare, also some strange error: /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -physdrv[32:3] -a0 Adapter: 0: Set Physical Drive at EnclId-32 SlotId-3 as Hot Spare Failed. FW error description: The specified device is in a state that doesn't support the requested command. Exit Code: 0x32 As you can see the disk is in Unconfigured, Good state: Enclosure Device ID: 32 Slot Number: 3 Enclosure position: 0 Device Id: 3 Sequence Number: 1 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 279.396 GB [0x22ecb25c Sectors] Non Coerced Size: 278.896 GB [0x22dcb25c Sectors] Coerced Size: 278.875 GB [0x22dc0000 Sectors] Firmware state: Unconfigured(good), Spun Up SAS Address(0): 0x5000c50005cd20b1 SAS Address(1): 0x0 Connected Port Number: 3(path0) Inquiry Data: SEAGATE ST9300603SS FS023SE070FK FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: Foreign Foreign Secure: Drive is not secured by a foreign lock key Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Drive Temperature :30C (86.00 F)

Read the article

mdadm+zfs vs mdadm+lvm

- by Alex

This may be a naive question since I'm new to this and I cannot find any results about mdadm+zfs, but after some testing it seems it might work: The use case is a server with RAID6 for some data that is backed-up somewhat infrequently. I think I'm well served by any of ZFS or RAID6. Platform is Linux. Performance is secondary. So the two setups I am considering are: A RAID6 array plus regular LVM and ext4 A RAID6 array plus ZFS (without redundancy). Is this second option that I don't see discussed at all. Why ZFS+RAID6? It's mainly because the inability of ZFS to grow a raidz2 with new disks. You can replace disks with larger ones, I know, but not add another disk. You can accomplish 2-disk redundancy and ZFS disk growth using mdadm as the redundancy layer. Besides that main point (otherwise I could go directly to raidz2 without RAID under it), these are the pros-cons that I see for each option: ZFS has snapshots without preallocated space. LVM requires preallocation (might be no longer true). ZFS has checksumming (very interested in this) and compression (nice bonus). LVM has online filesystem growth (ZFS can do it offline with export/mdadm --grow/import). LVM has encryption (ZFS-on-Linux has not). This is the only major con of this combo I see. I guess I could go RAID6+LVM+ZFS... seems too heavy, or not? So, to close with a proper question: 1) Is there anything that inherently discourages or precludes RAID6+ZFS? Anyone has experience with a setup like this? 2) Are there possibilities for checksumming and compression that would make ZFS unnecessary (maintaining the possibility of filesystem growth)? Because the RAID6+LVM combo seems the sanctioned, tested way.

Read the article

How do I diagnose a bottleneck in an Intel Atom based Ubuntu server?

- by Jon Cage

I have a small media server at home which has software raid and a gigabit link to the rest of my network. For some reason though, I only get ~10MB/s transfers when copying to/from the server. I use software RAID5 (mdadm) over 4 1TB disks. On top of that I then use LVM to give me a huge pool of disk space which is then split up into multiple partitions which can be resized as and when they need it. I'm guessing this it most likely the cause, but I'd like to know for sure where the root cause is. So, how can I benchmark network throughput (Windows 7 desktop <- Ubuntu server) and hard disk performance to try and identify where my bottleneck might be? [Edit] If anyone's interested, the motherboard is an Intel Desktop Board D945GCLF2. So that's a 300 series Atom processor with the Intel® 945GC Express Chipset [Edit2] I feel like such a fool! I just checked my desktop and I had the slower of the two onboard NICs plugged in so the server is probably not at fault here. Transferring a copy of ubuntu off the server I get ~35-40MB/s according to Windows 7. I'll do those HD tests when I get a chance though (just for completeness).

Read the article

Server nearly unusable when doing disk writes

- by Wikser

My question closely relates to my last question here on serverfault. I was copying about 5GB from a 10 year old desktop computer to the server. The copy was done in Windows Explorer. In this situation I would assume the server to be bored by the dataflow. But as usual with this server, it really slowed down. At least I could work with the remote session, even there was some serious latency. The copy took its time (20min?). In this time I went to a colleague and he tried to log in in the same server via remote desktop (for some other reason). It took about a minute to get to the login screen, a minute to open the control panel, a minute to open the performance monitor, ... Icons were loading maybe one per second. We saw the following (from memory): CPU: 2% Avg. Queue Length: 50 Pages/sec: 115 (?) There was no other considerable activity on the server. The server seldom serves some ASP.NET pages, which became also very slow in this time. The relevant configuration is as follows: Windows 2003 SEAGATE ST3500631NS (7200 rpm, 500 GB) LSI MegaRAID based RAID 5 4 disks, 1 hot spare Write Through No read-ahead Direct Cache Mode Harddisk-Cache-Mode: off Is this normal behaviour for such a configuration? What measurements could give further clues? Is it reasonable to reduce the priority of such copy I/O and favour other processes like the remote desktop? How would you do that? Many thanks!

Read the article

Sparing level on HP EVA 4000

- by Samuel

One of the disks of our EVA4000 died today. This diskgroup (all volumes vraid5 with sparing level 1 and almost no space left for more volumes, 1TiB drives) is being rebuilt with "spare space" right now, and it will take at least 15 hours to do the leveling/rebuilding. We can't get a new disk until Friday. So, the question is, what would happen if another disk dies before the leveling completes? Would we lose data? And after that, how many aditional disks could die before losing data? 1 or 2? In "usual" RAID, we would be vulnerable to data loss while the rebuild takes place, but in this case the space reserved for sparing is two times the size of the bigger disk, so at the very least the effect should be the same of having two spares. Thanks in advance. Update: I have found some interesting threads about this question but still can't answer to this question, so I'm starting a bounty. http://blog.thestoragearchitect.com/2008/10/27/understanding-eva/ http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&url=http%3A%2F%2Fwww.experts-exchange.com%2FStorage%2FStorage_Technology%2FQ_25548177.html (Expert Exchange question from google).

Read the article

Create a partition table on a hardware RAID1 drive with [c]fdisk

- by Lev Levitsky

My question is, is there a reason for this not to work? Details: I have two 500 Gb drives, and my motherboard RAID support, so I created a RAID1 array and booted from a Linux live medium. I then listed the disks and, apart from the obvious /dev/sda, /dev/sdb, etc. there was /dev/md126 which, I figured, was the mirrored "virtual" drive. Its size was 475 Gb; I had seen that the size of the array would be smaller than 500 Gb when I was creating it, so no surprise there. I did cfdisk /dev/md126, created the necessary partitions and chose write. It's been about half an hour now, I think. It doesn't seem like it's ever going to finish. The only thing about cfdisk in dmesg is that it's "blocked for more than 120 seconds". Doing fdisk -l /dev/md126 in another terminal I see all three partitions I created and a note that "Partition 1 does not start on a physical sector boundary". The table is lost after reboot, though. I tried to partition /dev/sda individually, and it worked, the table was written in about a second. The "not on a physical sector boundary" message is there, too. EDIT: I tried fdisk on /dev/sda, then there were no messages about sector boundaries. After a reboot, I am able to use mkfs on /dev/dm126p1, etc. fdisk shows that /dev/md126 has the same partitions as /dev/sda (but /dev/sdb doesn't have any). But at some point ("writing superblock and filesystem accounting information") mkfs is also blocked. Using it on sda1 results in a "partition is used by the system" error. What can be the problem? EDIT 2: I booted a freshly updated system from a pendrive and was able to create partition table and filesystems on /dev/md126 without any apparent problems. Was it an issue with the support of the hardware? My MB is Asus P9X79.

Read the article

Solid State Drive Occasionally Freezes For A Minute While OS Is "Beach Balling"

- by Boris_yo

Almost a month ago I bought Intel 330 128GB solid state drive, migrated my data with Intel branded limited-feature Acronis from old HDD to new SSD, optimized with Intel Toolbox and started using it. Occassionally I get close to 1 minute freezes while seeing operating system "beach balls" and animations still work, I can interact and click on something but nothing responds, nothing loads. Recently a couple of such freezes occurred in shorter amount of time in a row. I have noticed that if I stop interacting with laptop, the freeze lasts less time than if I was interacting with laptop. But the bigger problem is when freeze just does not end and computer keeps being stuck unlil it is more than half hour and I run out of patience to keep waiting and feeling I need to restart the system because I am not getting anywhere. Such freeze happened while laptop was cold booting into Windows 7. This is when freeze hang occurred and I had to restart, only later to be greeted with Windows recovery screen stating something about faiure of boot sector and asking to insert Windows repair CD. But after I restarted, Windows booted successfully and all was well. I have filmed video of freeze hang occurring in cold boot which you can see here (on video page look below for description): http://www.youtube.com/watch?v=8b7MQlcDTUs As I have mentioned in the beginning, the SSD is less than a month old but here is S.M.A.R.T statistics just in case (TRIM is enabled btw according to CrystalDiskInfo): I want to emphasize that this SSD is the only drive I have, yet it is working in RAID mode (it was enabled initially in BIOS by previous laptop's owner) on Intel Rapid Storage drivers. I am contemplating about switching to AHCI mode but want to be sure this won't cause data loss. Additionally, the stock firmware is the only firmware available currently, yet Intel does not respond to my posts in their community board. If anyone here has this SSD model or generally has experience with SSD drives, I would love to know your thoughts.

Read the article

Recover data from a Thecus N4100 NAS jbod partition

- by TimothyP

I have a Thecus N4100 NAS wich had a 2TB drive in it configured as jbod partition. Later I tried adding a second drive to expand the available space. I added it to the jbod configuration but I did not get any extra space. Then I tried removing the second drive but then the NAS system indicated the jbod configuration was damaged. After a reboot it told me there is no configuration and I need to create a new RAID/jbod configuration and that I would lose all data on the drive. Of course I did not do this, I took the 2TB drive and checked with Linux if the partition was still there and it is... completely intact. I found that linux recognizes it as a /dev/mdX device and that I should be able to mount it but I don't know the file system type. Anyway.... is there a way to recover the data? Since everything has always been on a single drive it should all be there right? I can connect the drive to Windows , Linux or MacOS so whatever gets the job done.

Read the article

Scaling databases with cheap SSD hard drives

- by Dennis Kashkin

Hey guys! I hope that many of you are working with high traffic database-driven websites, and chances are that your main scalability issues are in the database. I noticed a couple of things lately: Most large databases require a team of DBAs in order to scale. They constantly struggle with limitations of hard drives and end up with very expensive solutions (SANs or large RAIDs, frequent maintenance windows for defragging and repartitioning, etc.) The actual annual cost of maintaining such databases is in $100K-$1M range which is too steep for me :) Finally, we got several companies like Intel, Samsung, FusionIO, etc. that just started selling extremely fast yet affordable SSD hard drives based on SLC Flash technology. These drives are 100 times faster in random read/writes than the best spinning hard drives on the market (up to 50,000 random writes per second). Their seek time is pretty much zero, so the cost of random I/O is the same as sequential I/O, which is awesome for databases. These SSD drives cost around $10-$20 per gigabyte, and they are relatively small (64GB). So, there seems to be an opportunity to avoid the HUGE costs of scaling databases the traditional way by simply building a big enough RAID 5 array of SSD drives (which would cost only a few thousand dollars). Then we don't care if the database file is fragmented, and we can afford 100 times more disk writes per second without having to spread the database across 100 spindles. . Is anybody else interested in this? I've been testing a few SSD drives and can share my results. If anybody on this site has already solved their I/O bottleneck with SSDs, I would love to hear your war stories! PS. I know that there are plenty of expensive solutions out there that help with scalability, for example the time proven RAM-based SANs. I want to be clear that even $50K is too expensive for my project. I have to find a solution that costs no more than $10K and does not take much time to implement.

Read the article

Using udev to create a character device based on a driver being loaded

- by SteveCB

I'm in the process of setting up RAID monitoring for a number of Dell servers that use the PERC 6i integrated card. We're using Nagios at present and the check_megasasctl plugin seems to fit the bill. However, the plugin relies upon the existence of: /dev/megaraid_sas_ioctl_node This device node doesn't exist by default, you have to create it by hand using something like: mknod /dev/megaraid_sas_ioctl_node c 253 0 Now, to make the existence of this device node persistent across reboots, I thought I could write a udev rule, but as usual, I'm missing something. I thought I could create a file such as /etc/udev/rules.d/10-local/rules that contained: DRIVER=="megasas" NAME="megaraid_sas_ioctl_node" MODE="0600" But this doesn't work - no device node after a reboot. Dmesg output indicates the megasas driver is loaded and functional: megasas: 00.00.04.01-RH1 Thu July 10 09:41:51 PST 2008 megasas: 0x1000:0x0060:0x1028:0x1f0c: bus 1:slot 0:func 0 megasas: FW now in Ready state Further, I don't see any means to instruct udev on which type of device node to create: character or block. I suspect I'm failing to understand exactly how udev is meant to work. I realise I could just cheat and run MegaCLI in /etc/rc.local, redirecting output to /dev/null; it creates the megaraid_sas_ioctl_node device node as part of its execution. I just thought using udev rules would be a) cleaner and b) a useful learning exercise. Perhaps I should just dump the above mknod command in /etc/rc.local... So how do I get udev to create the /dev/megaraid_sas_ioctl_node device node based on the presence of the megasas driver? Cheers Steve

Read the article

Flash Backed Write Cache (FBWC) without capacitor pack

- by Martyn

I brought a HP Smart Array P410 controller and it is installed and working fine in a HP Prolient Microserver with 4 drives in two RAID 1 arrays. I didn’t realise however that it came without any cache so would only work by directly writing straight to the disk, and the performance was horrible. So I then brought the 512MB Flash Backed Write Cache (FBWC) memory module as I was under the impression that with FBWC I would not need a battery. I got this idea from a forum post. "What do you guys think of the choice between 'BBWC' (battery backed write cache) and 'FBWC' (flash backed write cache)? The flashed based ones use non-volitile memory so need no battery." After installing the cache module however the server pretty much won’t boot. The P410 has a flashing amber light on it, and from the manual that doesn’t sound good. I’ve managed to get to the on board BIOS once and even managed to get to boot to the HP Array Configuration Utility (ACU) CD once, but every other time the Server continuingly reboots once it get to the POST screen and reads ARRAY INITILIZING %%%. The one time I reached the ACU, it reported a problem with the Cache Module. To me, it seems like the cache module is faulty, however the supplier tells me “Do you have an FBWC battery pack, p/n 587324-001, because that is required for the cache to work. If you have it, please complete an RMA form and we'll send a replacement / credit.” Does this sound right to you? I’ve been ordering the parts from the US and I don’t want to spend $77 + $40 p&p on a battery, wait a week for the shipping to find the card is faulty, and I don’t want to send back a working card?

Read the article

Would an array of SSD drives be able to succesfully substitute the system memory?

- by Florin Mircea

I watched a few videos trying to answer this. This video (youtube.com/watch?v=eULFf6F5Ri8) shows a bunch of guys stacking 24 SSD's reaching a peak of around 2GBps r/w. That's under the limit of the worst DDR3 in this list (memorybenchmark.net/write_ddr3_amd.html) - that shows DDR3 memory performance varying from 2.78 to 6.55 Gb per second, but that video is over 3 years old. This video (youtube.com/watch?v=27GmBzQWwP0) shows a more optimistic situation, but for PCI-E SSD drives: 5 drives peaking at around 4Gb. And this other video shows that stacking up more than 3 SSD's doesn't realistically offer a substantial added performance. This and the fact that in all benchmarks the drives act quite poorly when dealing with small files (5k file read/write averaging from 10MB to around 30-40MBps) as opposed to how native memory handles such files, seems to indicate a definite NO to this question. Also, the write life cycle is indeed limited and the drives might wear out quickly, as kindly pointed out by paddy. However, I wanted to get more opinions on this. Would it be possible to at least obtain current memory performance with SSD's in RAID 0? And if so, in what circumstances? I am assuming using this configuration with a Windows OS that has a memory pagefile resident to that stack of SSD's, thus making it very fast to work with.

Read the article

Can a non-redundant RAID5 cause any serious problems (compared to RAID0)?

- by leemes

I used to have a three-disc RAID5 (mdadm) in my computer for personal media storage (music, videos, photos, programs, games, ...). It had three discs with 750 GB each, resulting in an array capacity of 1.5 TB. One day (one year ago), I needed one of those discs to install another operating system. I thought, I don't need the redundancy anymore since I backup the most important stuff (personal photos e.g.) on an external disc anyway. So I decided to remove one of the three discs without converting the RAID to RAID0 or even two separate discs, because I had no temporary storage (since one cannot simply convert the RAID5 to RAID0 AFAIK). So now, for about one year, I have a non-redundant RAID5 with 2 of 3 discs running. Sometimes, one of the discs has a defective contact at the power cable or something similar causing the drive to stop working temporarily (I don't know exactly what it is). Since it still works when rebooting the computer and in most cases by calling some mdadm commands, it wasn't that problematic. Note that the data is not very critical, since I still have a backup of the most important stuff. But in the last few weeks, one of the drives fails very frequently (every few hours), so it gets really annoying to manage this. My questions are: Is there any disadvantage (apart from the annoying management) of a non-redundant RAID5 (with one drive less than typical) over a RAID0? If I understand it correctly, both have no redundancy and the same capacity. On a temporary drive failure, I can restart the array in both cases, assuming that the drive itself still works after the failure. Can it happen that the drive contents alter on a drive failure, making the array inconsistent? If so, can I tell mdadm to check the array for failures (without a file system level checking tool)? Since the drive most probably only has a defective contact causing it to fail for a second only, can I tell mdadm to automatically restart the array, so I will not even notice the failure if no application wanted to access the file system during the failure?

Read the article

Why database partitioning didn't work? Extract from thedailywtf.com

- by questzen

Original link. http://thedailywtf.com/Articles/The-Certified-DBA.aspx. Article summary: The DBA suggests an approach involving rigorous partitioning, 10 partitions per disk (3 actual disks and 3 raid). The stats show that the performance is non-optimal. Then the DBA suggests an alternative of 1 partition per disk (with more added disks). This also fails. The sys-admin then sets up a single disk, single partition and saves the day. The size of disks was not mentioned but given today,s typical disk sizes (of the order of 100 GB), the partitions ; would be huge, it surprises me that a single disk with all partitions outperformed. Initially I suspect that the data was segregated and hence faster reads. But how come the performance didn't degrade as time went by with all the inserts and updates happening? Saw this on reddit, but the explanation was by far spindle/platter centered. There was no mention in the article about this. Is there any other reason? I can only guess that the tables were using a incorrect hash distribution causing non-uniform allocation across disks (wrong partitioning); this would increase fetch times. Any thoughts?

Read the article

file read performance degrades as number of files increases

- by bfallik-bamboom

We're observing poor file read IO results that we'd like to better understand. We can use fio to write 100 files with a sustained aggregate throughput of ~700MB/s. When we switch the test to read instead of write, the aggregate throughput is only ~55MB/s. The drop seems related to the number of files since the throughput for read and write are comparable for a single file then diverge proportionally as we increase the number of files. The test server has 24 CPU cores, 48GB of memory, and is running CentOS 6.0. The disk hardware is a RAID 6 array with 12 disks and a Dell H800 controller. This device is partitioned with ext4 using the default settings. Increasing the readahead (using blockdev) improves the read throughput significantly but it still doesn't match write speed. For instance, increasing the readahead from 128KB to 1M improved the read throughput to ~145MB/s. Is this a known performance issue in our OS/disk/filesystem configuration? If so, how can we tell? If not, what tools or tests can we use to further isolate the issue? Thanks.

Search Results

Search found 1900 results on 76 pages for 'xserve raid'.

Page 38/76 | < Previous Page | 34 35 36 37 38 39 40 41 42 43 44 45 | Next Page >

- by neotracker

- by Carlos

- by Dan

- by Generic Error

- by stribika

- by MojoDK

- by DaveCol

- by Rich

- by Tim Jones

- by user927586

- by Stefan

- by Alex

- by Jon Cage

- by Wikser

- by Samuel

- by Lev Levitsky

- by Boris_yo

- by TimothyP

- by Dennis Kashkin

- by SteveCB

- by Martyn

- by Florin Mircea

- by leemes

- by questzen

- by bfallik-bamboom

< Previous Page | 34 35 36 37 38 39 40 41 42 43 44 45 | Next Page >