degraded - Page 7 - Developer IT

How can a single disk in a hardware SATA RAID-10 array bring the entire array to a screeching halt?

- by Stu Thompson

Prelude: I'm a code-monkey that's increasingly taken on SysAdmin duties for my small company. My code is our product, and increasingly we provide the same app as SaaS. About 18 months ago I moved our servers from a premium hosting centric vendor to a barebones rack pusher in a tier IV data center. (Literally across the street.) This ment doing much more ourselves--things like networking, storage and monitoring. As part the big move, to replace our leased direct attached storage from the hosting company, I built a 9TB two-node NAS based on SuperMicro chassises, 3ware RAID cards, Ubuntu 10.04, two dozen SATA disks, DRBD and . It's all lovingly documented in three blog posts: Building up & testing a new 9TB SATA RAID10 NFSv4 NAS: Part I, Part II and Part III. We also setup a Cacit monitoring system. Recently we've been adding more and more data points, like SMART values. I could not have done all this without the awesome boffins at ServerFault. It's been a fun and educational experience. My boss is happy (we saved bucket loads of $$$), our customers are happy (storage costs are down), I'm happy (fun, fun, fun). Until yesterday. Outage & Recovery: Some time after lunch we started getting reports of sluggish performance from our application, an on-demand streaming media CMS. About the same time our Cacti monitoring system sent a blizzard of emails. One of the more telling alerts was a graph of iostat await. Performance became so degraded that Pingdom began sending "server down" notifications. The overall load was moderate, there was not traffic spike. After logging onto the application servers, NFS clients of the NAS, I confirmed that just about everything was experiencing highly intermittent and insanely long IO wait times. And once I hopped onto the primary NAS node itself, the same delays were evident when trying to navigate the problem array's file system. Time to fail over, that went well. Within 20 minuts everything was confirmed to be back up and running perfectly. Post-Mortem: After any and all system failures I perform a post-mortem to determine the cause of the failure. First thing I did was ssh back into the box and start reviewing logs. It was offline, completely. Time for a trip to the data center. Hardware reset, backup an and running. In /var/syslog I found this scary looking entry: Nov 15 06:49:44 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_00], 6 Currently unreadable (pending) sectors Nov 15 06:49:44 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_07], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 171 to 170 Nov 15 06:49:45 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_10], 16 Currently unreadable (pending) sectors Nov 15 06:49:45 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_10], 4 Offline uncorrectable sectors Nov 15 06:49:45 umbilo smartd[2827]: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error Nov 15 06:49:45 umbilo smartd[2827]: # 1 Short offline Completed: read failure 90% 6576 3421766910 Nov 15 06:49:45 umbilo smartd[2827]: # 2 Short offline Completed: read failure 90% 6087 3421766910 Nov 15 06:49:45 umbilo smartd[2827]: # 3 Short offline Completed: read failure 10% 5901 656821791 Nov 15 06:49:45 umbilo smartd[2827]: # 4 Short offline Completed: read failure 90% 5818 651637856 Nov 15 06:49:45 umbilo smartd[2827]: So I went to check the Cacti graphs for the disks in the array. Here we see that, yes, disk 7 is slipping away just like syslog says it is. But we also see that disk 8's SMART Read Erros are fluctuating. There are no messages about disk 8 in syslog. More interesting is that the fluctuating values for disk 8 directly correlate to the high IO wait times! My interpretation is that: Disk 8 is experiencing an odd hardware fault that results in intermittent long operation times. Somehow this fault condition on the disk is locking up the entire array Maybe there is a more accurate or correct description, but the net result has been that the one disk is impacting the performance of the whole array. The Question(s) How can a single disk in a hardware SATA RAID-10 array bring the entire array to a screeching halt? Am I being naïve to think that the RAID card should have dealt with this? How can I prevent a single misbehaving disk from impacting the entire array? Am I missing something?

Read the article

What is bondib1 used for on SPARC SuperCluster with InfiniBand, Solaris 11 networking & Oracle RAC?

- by user12620111

A co-worker asked the following question about a SPARC SuperCluster InfiniBand network: > on the database nodes the RAC nodes communicate over the cluster_interconnect. This is the > 192.168.10.0 network on bondib0. (according to ./crs/install/crsconfig_params NETWORKS> setting) > What is bondib1 used for? Is it a HA counterpart in case bondib0 dies? This is my response: Summary: bondib1 is currently only being used for outbound cluster interconnect interconnect traffic. Details: bondib0 is the cluster_interconnect $ oifcfg getif bondeth0 10.129.184.0 global public bondib0 192.168.10.0 global cluster_interconnect ipmpapp0 192.168.30.0 global public bondib0 and bondib1 are on 192.168.10.1 and 192.168.10.2 respectively. # ipadm show-addr | grep bondi bondib0/v4static static ok 192.168.10.1/24 bondib1/v4static static ok 192.168.10.2/24 Hostnames tied to the IPs are node1-priv1 and node1-priv2 # grep 192.168.10 /etc/hosts 192.168.10.1 node1-priv1.us.oracle.com node1-priv1 192.168.10.2 node1-priv2.us.oracle.com node1-priv2 For the 4 node RAC interconnect: Each node has 2 private IP address on the 192.168.10.0 network. Each IP address has an active InfiniBand link and a failover InfiniBand link. Thus, the 4 node RAC interconnect is using a total of 8 IP addresses and 16 InfiniBand links. bondib1 isn't being used for the Virtual IP (VIP): $ srvctl config vip -n node1 VIP exists: /node1-ib-vip/192.168.30.25/192.168.30.0/255.255.255.0/ipmpapp0, hosting node node1 VIP exists: /node1-vip/10.55.184.15/10.55.184.0/255.255.255.0/bondeth0, hosting node node1 bondib1 is on bondib1_0 and fails over to bondib1_1: # ipmpstat -g GROUP GROUPNAME STATE FDT INTERFACES ipmpapp0 ipmpapp0 ok -- ipmpapp_0 (ipmpapp_1) bondeth0 bondeth0 degraded -- net2 [net5] bondib1 bondib1 ok -- bondib1_0 (bondib1_1) bondib0 bondib0 ok -- bondib0_0 (bondib0_1) bondib1_0 goes over net24 # dladm show-link | grep bond LINK CLASS MTU STATE OVER bondib0_0 part 65520 up net21 bondib0_1 part 65520 up net22 bondib1_0 part 65520 up net24 bondib1_1 part 65520 up net23 net24 is IB Partition FFFF # dladm show-ib LINK HCAGUID PORTGUID PORT STATE PKEYS net24 21280001A1868A 21280001A1868C 2 up FFFF net22 21280001CEBBDE 21280001CEBBE0 2 up FFFF,8503 net23 21280001A1868A 21280001A1868B 1 up FFFF,8503 net21 21280001CEBBDE 21280001CEBBDF 1 up FFFF On Express Module 9 port 2: # dladm show-phys -L LINK DEVICE LOC net21 ibp4 PCI-EM1/PORT1 net22 ibp5 PCI-EM1/PORT2 net23 ibp6 PCI-EM9/PORT1 net24 ibp7 PCI-EM9/PORT2 Outbound traffic on the 192.168.10.0 network will be multiplexed between bondib0 & bondib1 # netstat -rn Routing Table: IPv4 Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ---------- --------- 192.168.10.0 192.168.10.2 U 16 6551834 bondib1 192.168.10.0 192.168.10.1 U 9 5708924 bondib0 There is a lot more traffic on bondib0 than bondib1 # /bin/time snoop -I bondib0 -c 100 > /dev/null Using device ipnet/bondib0 (promiscuous mode) 100 packets captured real 4.3 user 0.0 sys 0.0 (100 packets in 4.3 seconds = 23.3 pkts/sec) # /bin/time snoop -I bondib1 -c 100 > /dev/null Using device ipnet/bondib1 (promiscuous mode) 100 packets captured real 13.3 user 0.0 sys 0.0 (100 packets in 13.3 seconds = 7.5 pkts/sec) Half of the packets on bondib0 are outbound (from self). The remaining packet are split evenly, from the other nodes in the cluster. # snoop -I bondib0 -c 100 | awk '{print $1}' | sort | uniq -c Using device ipnet/bondib0 (promiscuous mode) 100 packets captured 49 node1-priv1.us.oracle.com 24 node2-priv1.us.oracle.com 14 node3-priv1.us.oracle.com 13 node4-priv1.us.oracle.com 100% of the packets on bondib1 are outbound (from self), but the headers in the packets indicate that they are from the IP address associated with bondib0: # snoop -I bondib1 -c 100 | awk '{print $1}' | sort | uniq -c Using device ipnet/bondib1 (promiscuous mode) 100 packets captured 100 node1-priv1.us.oracle.com The destination of the bondib1 outbound packets are split evenly, to node3 and node 4. # snoop -I bondib1 -c 100 | awk '{print $3}' | sort | uniq -c Using device ipnet/bondib1 (promiscuous mode) 100 packets captured 51 node3-priv1.us.oracle.com 49 node4-priv1.us.oracle.com Conclusion: bondib1 is currently only being used for outbound cluster interconnect interconnect traffic.

Read the article

Fixing predicated NSFetchedResultsController/NSFetchRequest performance with SQLite backend?

- by Jaanus

I have a series of NSFetchedResultsControllers powering some table views, and their performance on device was abysmal, on the order of seconds. Since it all runs on main thread, it's blocking my app at startup, which is not great. I investigated and turns out the predicate is the problem: NSPredicate *somePredicate = [NSPredicate predicateWithFormat:@"ANY somethings == %@", something]; [fetchRequest setPredicate:somePredicate]; I.e the fetch entity, call it "things", has a many-to-many relation with entity "something". This predicate is a filter that limits the results to only things that have a relation with a particular "something". When I removed the predicate for testing, fetch time (the initial performFetch: call) dropped (for some extreme cases) from 4 seconds to around 100ms or less, which is acceptable. I am troubled by this, though, as it negates a lot of the benefit I was hoping to gain with Core Data and NSFRC, which otherwise seems like a powerful tool. So, my question is, how can I optimize this performance? Am I using the predicate wrong? Should I modify the model/schema somehow? And what other ways there are to fix this? Is this kind of degraded performance to be expected? (There are on the order of hundreds of <1KB objects.) EDIT WITH DETAILS: Here's the code: [fetchRequest setFetchLimit:200]; NSLog(@"before fetch"); BOOL success = [frc performFetch:&error]; if (!success) { NSLog(@"Fetch request error: %@", error); } NSLog(@"after fetch"); Updated logs (previously, I had some application inefficiencies degrading the performance here. These are the updated logs that should be as close to optimal as you can get under my current environment): 2010-02-05 12:45:22.138 Special Ppl[429:207] before fetch 2010-02-05 12:45:22.144 Special Ppl[429:207] CoreData: sql: SELECT DISTINCT 0, t0.Z_PK, t0.Z_OPT, <model fields> FROM ZTHING t0 LEFT OUTER JOIN Z_1THINGS t1 ON t0.Z_PK = t1.Z_2THINGS WHERE t1.Z_1SOMETHINGS = ? ORDER BY t0.ZID DESC LIMIT 200 2010-02-05 12:45:22.663 Special Ppl[429:207] CoreData: annotation: sql connection fetch time: 0.5094s 2010-02-05 12:45:22.668 Special Ppl[429:207] CoreData: annotation: total fetch execution time: 0.5240s for 198 rows. 2010-02-05 12:45:22.706 Special Ppl[429:207] after fetch If I do the same fetch without predicate (by commenting out the two lines in the beginning of the question): 2010-02-05 12:44:10.398 Special Ppl[414:207] before fetch 2010-02-05 12:44:10.405 Special Ppl[414:207] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, <model fields> FROM ZTHING t0 ORDER BY t0.ZID DESC LIMIT 200 2010-02-05 12:44:10.426 Special Ppl[414:207] CoreData: annotation: sql connection fetch time: 0.0125s 2010-02-05 12:44:10.431 Special Ppl[414:207] CoreData: annotation: total fetch execution time: 0.0262s for 200 rows. 2010-02-05 12:44:10.457 Special Ppl[414:207] after fetch 20-fold difference in times. 500ms is not that great, and there does not seem to be a way to do it in background thread or otherwise optimize that I can think of. (Apart from going to a binary store where this becomes a non-issue, so I might do that. Binary store performance is consistently ~100ms for the above 200-object predicated query.) (I nested another question here previously, which I now moved away).

Read the article

How to stop OpenGL from applying blending to certain content? (cocos2d/iPhone/OpenGL)

- by RexOnRoids

Supporting Info: I use cocos2d to draw a sprite (graph background) on the screen (z:-1). I then use cocos2d to draw lines/points (z:0) on top of the background -- and make some calls to OpenGL blending functions before the drawing to SMOOTH out the lines. Problem: The problem is that: aside from producing smooth lines/points, calling these OpenGL blending functions seems to degrade the underlying sprite (graph background). So there is a tradeoff: I can either have (Case 1) a nice background and choppy lines/points, or I can have (Case 2) nice smooth lines/points and a degraded background. But obviously I need both. The Code: I have included code of the draw() method of the CCLayer for both cases explained above. As you can see, the code producing the difference between Case 1 and Case 2 seems to be 1 or 2 lines involving OpenGL Blending. Case 1 -- MainScene.h (CCLayer): -(void)draw{ int lastPointX = 0; int lastPointY = 0; GLfloat colorMAX = 255.0f; GLfloat valR; GLfloat valG; GLfloat valB; if([self.myGraphManager ready]){ valR = (255.0f/colorMAX)*1.0f; valG = (255.0f/colorMAX)*1.0f; valB = (255.0f/colorMAX)*1.0f; NSEnumerator *enumerator = [[self.myGraphManager.currentCanvas graphPoints] objectEnumerator]; GraphPoint* object; while ((object = [enumerator nextObject])) { if(object.filled){ /*Commenting out the following two lines induces a problem of making it impossible to have smooth lines/points, but has merit in that it does not degrade the background sprite.*/ //glEnable (GL_BLEND); //glBlendFunc (GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); glHint (GL_LINE_SMOOTH_HINT, GL_DONT_CARE); glEnable (GL_LINE_SMOOTH); glLineWidth(1.5f); glColor4f(valR, valG, valB, 1.0); ccDrawLine(ccp(lastPointX, lastPointY), ccp(object.position.x, object.position.y)); lastPointX = object.position.x; lastPointY = object.position.y; glPointSize(3.0f); glEnable(GL_POINT_SMOOTH); glHint(GL_POINT_SMOOTH_HINT, GL_NICEST); ccDrawPoint(ccp(lastPointX, lastPointY)); } } } } Case 2 -- MainScene.h (CCLayer): -(void)draw{ int lastPointX = 0; int lastPointY = 0; GLfloat colorMAX = 255.0f; GLfloat valR; GLfloat valG; GLfloat valB; if([self.myGraphManager ready]){ valR = (255.0f/colorMAX)*1.0f; valG = (255.0f/colorMAX)*1.0f; valB = (255.0f/colorMAX)*1.0f; NSEnumerator *enumerator = [[self.myGraphManager.currentCanvas graphPoints] objectEnumerator]; GraphPoint* object; while ((object = [enumerator nextObject])) { if(object.filled){ /*Enabling the following two lines gives nice smooth lines/points, but has a problem in that it degrades the background sprite.*/ glEnable (GL_BLEND); glBlendFunc (GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); glHint (GL_LINE_SMOOTH_HINT, GL_DONT_CARE); glEnable (GL_LINE_SMOOTH); glLineWidth(1.5f); glColor4f(valR, valG, valB, 1.0); ccDrawLine(ccp(lastPointX, lastPointY), ccp(object.position.x, object.position.y)); lastPointX = object.position.x; lastPointY = object.position.y; glPointSize(3.0f); glEnable(GL_POINT_SMOOTH); glHint(GL_POINT_SMOOTH_HINT, GL_NICEST); ccDrawPoint(ccp(lastPointX, lastPointY)); } } } }

Read the article

Java algorithm for normalizing audio

- by Marty Pitt

I'm trying to normalize an audio file of speech. Specifically, where an audio file contains peaks in volume, I'm trying to level it out, so the quiet sections are louder, and the peaks are quieter. I know very little about audio manipulation, beyond what I've learnt from working on this task. Also, my math is embarrassingly weak. I've done some research, and the Xuggle site provides a sample which shows reducing the volume using the following code: (full version here) @Override public void onAudioSamples(IAudioSamplesEvent event) { // get the raw audio byes and adjust it's value ShortBuffer buffer = event.getAudioSamples().getByteBuffer().asShortBuffer(); for (int i = 0; i < buffer.limit(); ++i) buffer.put(i, (short)(buffer.get(i) * mVolume)); super.onAudioSamples(event); } Here, they modify the bytes in getAudioSamples() by a constant of mVolume. Building on this approach, I've attempted a normalisation modifies the bytes in getAudioSamples() to a normalised value, considering the max/min in the file. (See below for details). I have a simple filter to leave "silence" alone (ie., anything below a value). I'm finding that the output file is very noisy (ie., the quality is seriously degraded). I assume that the error is either in my normalisation algorithim, or the way I manipulate the bytes. However, I'm unsure of where to go next. Here's an abridged version of what I'm currently doing. Step 1: Find peaks in file: Reads the full audio file, and finds this highest and lowest values of buffer.get() for all AudioSamples @Override public void onAudioSamples(IAudioSamplesEvent event) { IAudioSamples audioSamples = event.getAudioSamples(); ShortBuffer buffer = audioSamples.getByteBuffer().asShortBuffer(); short min = Short.MAX_VALUE; short max = Short.MIN_VALUE; for (int i = 0; i < buffer.limit(); ++i) { short value = buffer.get(i); min = (short) Math.min(min, value); max = (short) Math.max(max, value); } // assign of min/max ommitted for brevity. super.onAudioSamples(event); } Step 2: Normalize all values: In a loop similar to step1, replace the buffer with normalized values, calling: buffer.put(i, normalize(buffer.get(i)); public short normalize(short value) { if (isBackgroundNoise(value)) return value; short rawMin = // min from step1 short rawMax = // max from step1 short targetRangeMin = 1000; short targetRangeMax = 8000; int abs = Math.abs(value); double a = (abs - rawMin) * (targetRangeMax - targetRangeMin); double b = (rawMax - rawMin); double result = targetRangeMin + ( a/b ); // Copy the sign of value to result. result = Math.copySign(result,value); return (short) result; } Questions: Is this a valid approach for attempting to normalize an audio file? Is my math in normalize() valid? Why would this cause the file to become noisy, where a similar approach in the demo code doesn't?

Read the article

What is stopping data flow with .NET 3.5 asynchronous System.Net.Sockets.Socket?

- by TonyG

I have a .NET 3.5 client/server socket interface using the asynchronous methods. The client connects to the server and the connection should remain open until the app terminates. The protocol consists of the following pattern: send stx receive ack send data1 receive ack send data2 (repeat 5-6 while more data) receive ack send etx So a single transaction with two datablocks as above would consist of 4 sends from the client. After sending etx the client simply waits for more data to send out, then begins the next transmission with stx. I do not want to break the connection between individual exchanges or after each stx/data/etx payload. Right now, after connection, the client can send the first stx, and get a single ack, but I can't put more data onto the wire after that. Neither side disconnects, the socket is still intact. The client code is seriously abbreviated as follows - I'm following the pattern commonly available in online code samples. private void SendReceive(string data) { // ... SocketAsyncEventArgs completeArgs; completeArgs.Completed += new EventHandler<SocketAsyncEventArgs>(OnSend); clientSocket.SendAsync(completeArgs); // two AutoResetEvents, one for send, one for receive if ( !AutoResetEvent.WaitAll(autoSendReceiveEvents , -1) ) Log("failed"); else Log("success"); // ... } private void OnSend( object sender , SocketAsyncEventArgs e ) { // ... Socket s = e.UserToken as Socket; byte[] receiveBuffer = new byte[ 4096 ]; e.SetBuffer(receiveBuffer , 0 , receiveBuffer.Length); e.Completed += new EventHandler<SocketAsyncEventArgs>(OnReceive); s.ReceiveAsync(e); // ... } private void OnReceive( object sender , SocketAsyncEventArgs e ) {} // ... if ( e.BytesTransferred > 0 ) { Int32 bytesTransferred = e.BytesTransferred; String received = Encoding.ASCII.GetString(e.Buffer , e.Offset , bytesTransferred); dataReceived += received; } autoSendReceiveEvents[ SendOperation ].Set(); // could be moved elsewhere autoSendReceiveEvents[ ReceiveOperation ].Set(); // releases mutexes } The code on the server is very similar except that it receives first and then sends a response - the server is not doing anything (that I can tell) to modify the connection after it sends a response. The problem is that the second time I hit SendReceive in the client, the connection is already in a weird state. Do I need to do something in the client to preserve the SocketAsyncEventArgs, and re-use the same object for the lifetime of the socket/connection? I'm not sure which eventargs object should hang around during the life of the connection or a given exchange. Do I need to do something, or Not do something in the server to ensure it continues to Receive data? The server setup and response processing looks like this: void Start() { // ... listenSocket.Bind(...); listenSocket.Listen(0); StartAccept(null); // note accept as soon as we start. OK? mutex.WaitOne(); } void StartAccept(SocketAsyncEventArgs acceptEventArg) { if ( acceptEventArg == null ) { acceptEventArg = new SocketAsyncEventArgs(); acceptEventArg.Completed += new EventHandler<SocketAsyncEventArgs>(OnAcceptCompleted); } Boolean willRaiseEvent = this.listenSocket.AcceptAsync(acceptEventArg); if ( !willRaiseEvent ) ProcessAccept(acceptEventArg); // ... } private void OnAcceptCompleted( object sender , SocketAsyncEventArgs e ) { ProcessAccept(e); } private void ProcessAccept( SocketAsyncEventArgs e ) { // ... SocketAsyncEventArgs readEventArgs = new SocketAsyncEventArgs(); readEventArgs.SetBuffer(dataBuffer , 0 , Int16.MaxValue); readEventArgs.Completed += new EventHandler<SocketAsyncEventArgs>(OnIOCompleted); readEventArgs.UserToken = e.AcceptSocket; dataReceived = ""; // note server is degraded for single client/thread use // As soon as the client is connected, post a receive to the connection. Boolean willRaiseEvent = e.AcceptSocket.ReceiveAsync(readEventArgs); if ( !willRaiseEvent ) this.ProcessReceive(readEventArgs); // Accept the next connection request. this.StartAccept(e); } private void OnIOCompleted( object sender , SocketAsyncEventArgs e ) { // switch ( e.LastOperation ) case SocketAsyncOperation.Receive: ProcessReceive(e); // similar to client code // operate on dataReceived here case SocketAsyncOperation.Send: ProcessSend(e); // similar to client code } // execute this when a data has been processed into a response (ack, etc) private SendResponseToClient(string response) { // create buffer with response // currentEventArgs has class scope and is re-used currentEventArgs.SetBuffer(sendBuffer , 0 , sendBuffer.Length); Boolean willRaiseEvent = currentClient.SendAsync(currentEventArgs); if ( !willRaiseEvent ) ProcessSend(currentEventArgs); } A .NET trace shows the following when sending ABC\r\n: Socket#7588182::SendAsync() Socket#7588182::SendAsync(True#1) Data from Socket#7588182::FinishOperation(SendAsync) 00000000 : 41 42 43 0D 0A Socket#7588182::ReceiveAsync() Exiting Socket#7588182::ReceiveAsync() - True#1 And it stops there. It looks just like the first send from the client but the server shows no activity. I think that could be info overload for now but I'll be happy to provide more details as required. Thanks!

Read the article

mdadm: Win7-install created a boot partition on one of my RAID6 drives. How to rebuild?

- by EXIT_FAILURE

My problem happened when I attempted to install Windows 7 on it's own SSD. The Linux OS I used which has knowledge of the software RAID system is on a SSD that I disconnected prior to the install. This was so that windows (or I) wouldn't inadvertently mess it up. However, and in retrospect, foolishly, I left the RAID disks connected, thinking that windows wouldn't be so ridiculous as to mess with a HDD that it sees as just unallocated space. Boy was I wrong! After copying over the installation files to the SSD (as expected and desired), it also created an ntfs partition on one of the RAID disks. Both unexpected and totally undesired! . I changed out the SSDs again, and booted up in linux. mdadm didn't seem to have any problem assembling the array as before, but if I tried to mount the array, I got the error message: mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so dmesg: EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)! EXT4-fs (md0): group descriptors corrupted! I then used qparted to delete the newly created ntfs partition on /dev/sdd so that it matched the other three /dev/sd{b,c,e}, and requested a resync of my array with echo repair > /sys/block/md0/md/sync_action This took around 4 hours, and upon completion, dmesg reports: md: md0: requested-resync done. A bit brief after a 4-hour task, though I'm unsure as to where other log files exist (I also seem to have messed up my sendmail configuration). In any case: No change reported according to mdadm, everything checks out. mdadm -D /dev/md0 still reports: Version : 1.2 Creation Time : Wed May 23 22:18:45 2012 Raid Level : raid6 Array Size : 3907026848 (3726.03 GiB 4000.80 GB) Used Dev Size : 1953513424 (1863.02 GiB 2000.40 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Mon May 26 12:41:58 2014 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 4K Name : okamilinkun:0 UUID : 0c97ebf3:098864d8:126f44e3:e4337102 Events : 423 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 32 1 active sync /dev/sdc 2 8 48 2 active sync /dev/sdd 3 8 64 3 active sync /dev/sde Trying to mount it still reports: mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so and dmesg: EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)! EXT4-fs (md0): group descriptors corrupted! I'm a bit unsure where to proceed from here, and trying stuff "to see if it works" is a bit too risky for me. This is what I suggest I should attempt to do: Tell mdadm that /dev/sdd (the one that windows wrote into) isn't reliable anymore, pretend it is newly re-introduced to the array, and reconstruct its content based on the other three drives. I also could be totally wrong in my assumptions, that the creation of the ntfs partition on /dev/sdd and subsequent deletion has changed something that cannot be fixed this way. My question: Help, what should I do? If I should do what I suggested , how do I do that? From reading documentation, etc, I would think maybe: mdadm --manage /dev/md0 --set-faulty /dev/sdd mdadm --manage /dev/md0 --remove /dev/sdd mdadm --manage /dev/md0 --re-add /dev/sdd However, the documentation examples suggest /dev/sdd1, which seems strange to me, as there is no partition there as far as linux is concerned, just unallocated space. Maybe these commands won't work without. Maybe it makes sense to mirror the partition table of one of the other raid devices that weren't touched, before --re-add. Something like: sfdisk -d /dev/sdb | sfdisk /dev/sdd Bonus question: Why would the Windows 7 installation do something so st...potentially dangerous? Update I went ahead and marked /dev/sdd as faulty, and removed it (not physically) from the array: # mdadm --manage /dev/md0 --set-faulty /dev/sdd # mdadm --manage /dev/md0 --remove /dev/sdd However, attempting to --re-add was disallowed: # mdadm --manage /dev/md0 --re-add /dev/sdd mdadm: --re-add for /dev/sdd to /dev/md0 is not possible --add, was fine. # mdadm --manage /dev/md0 --add /dev/sdd mdadm -D /dev/md0 now reports the state as clean, degraded, recovering, and /dev/sdd as spare rebuilding. /proc/mdstat shows the recovery progress: md0 : active raid6 sdd[4] sdc[1] sde[3] sdb[0] 3907026848 blocks super 1.2 level 6, 4k chunk, algorithm 2 [4/3] [UU_U] [>....................] recovery = 2.1% (42887780/1953513424) finish=348.7min speed=91297K/sec nmon also shows expected output: ¦sdb 0% 87.3 0.0| > |¦ ¦sdc 71% 109.1 0.0|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR > |¦ ¦sdd 40% 0.0 87.3|WWWWWWWWWWWWWWWWWWWW > |¦ ¦sde 0% 87.3 0.0|> || It looks good so far. Crossing my fingers for another five+ hours :) Update 2 The recovery of /dev/sdd finished, with dmesg output: [44972.599552] md: md0: recovery done. [44972.682811] RAID conf printout: [44972.682815] --- level:6 rd:4 wd:4 [44972.682817] disk 0, o:1, dev:sdb [44972.682819] disk 1, o:1, dev:sdc [44972.682820] disk 2, o:1, dev:sdd [44972.682821] disk 3, o:1, dev:sde Attempting mount /dev/md0 reports: mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so And on dmesg: [44984.159908] EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)! [44984.159912] EXT4-fs (md0): group descriptors corrupted! I'm not sure what do do now. Suggestions? Output of dumpe2fs /dev/md0: dumpe2fs 1.42.8 (20-Jun-2013) Filesystem volume name: Atlas Last mounted on: /mnt/atlas Filesystem UUID: e7bfb6a4-c907-4aa0-9b55-9528817bfd70 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 244195328 Block count: 976756712 Reserved block count: 48837835 Free blocks: 92000180 Free inodes: 243414877 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 791 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 RAID stripe width: 2 Flex block group size: 16 Filesystem created: Thu May 24 07:22:41 2012 Last mount time: Sun May 25 23:44:38 2014 Last write time: Sun May 25 23:46:42 2014 Mount count: 341 Maximum mount count: -1 Last checked: Thu May 24 07:22:41 2012 Check interval: 0 (<none>) Lifetime writes: 4357 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: e177a374-0b90-4eaa-b78f-d734aae13051 Journal backup: inode blocks dumpe2fs: Corrupt extent header while reading journal super block

Read the article

Rounded Corners and Shadows – Dialogs with CSS

- by Rick Strahl

Well, it looks like we’ve finally arrived at a place where at least all of the latest versions of main stream browsers support rounded corners and box shadows. The two CSS properties that make this possible are box-shadow and box-radius. Both of these CSS Properties now supported in all the major browsers as shown in this chart from QuirksMode: In it’s simplest form you can use box-shadow and border radius like this: .boxshadow { -moz-box-shadow: 3px 3px 5px #535353; -webkit-box-shadow: 3px 3px 5px #535353; box-shadow: 3px 3px 5px #535353; } .roundbox { -moz-border-radius: 6px 6px 6px 6px; -webkit-border-radius: 6px; border-radius: 6px 6px 6px 6px; } box-shadow: horizontal-shadow-pixels vertical-shadow-pixels blur-distance shadow-color box-shadow attributes specify the the horizontal and vertical offset of the shadow, the blur distance (to give the shadow a smooth soft look) and a shadow color. The spec also supports multiple shadows separated by commas using the attributes above but we’re not using that functionality here. box-radius: top-left-radius top-right-radius bottom-right-radius bottom-left-radius border-radius takes a pixel size for the radius for each corner going clockwise. CSS 3 also specifies each of the individual corner elements such as border-top-left-radius, but support for these is much less prevalent so I would recommend not using them for now until support improves. Instead use the single box-radius to specify all corners. Browser specific Support in older Browsers Notice that there are two variations: The actual CSS 3 properties (box-shadow and box-radius) and the browser specific ones (-moz, –webkit prefixes for FireFox and Chrome/Safari respectively) which work in slightly older versions of modern browsers before official CSS 3 support was added. The goal is to spread support as widely as possible and the prefix versions extend the range slightly more to those browsers that provided early support for these features. Notice that box-shadow and border-radius are used after the browser specific versions to ensure that the latter versions get precedence if the browser supports both (last assignment wins). Use the .boxshadow and .roundbox Styles in HTML To use these two styles create a simple rounded box with a shadow you can use HTML like this:  <div class="roundbox boxshadow" style="width: 550px; border: solid 2px steelblue"> <div class="boxcontenttext"> Simple Rounded Corner Box. </div> </div> which looks like this in the browser: This works across browsers and it’s pretty sweet and simple. Watch out for nested Elements! There are a couple of things to be aware of however when using rounded corners. Specifically, you need to be careful when you nest other non-transparent content into the rounded box. For example check out what happens when I change the inside <div> to have a colored background:  <div class="roundbox boxshadow" style="width: 550px; border: solid 2px steelblue"> <div class="boxcontenttext" style="background: khaki;"> Simple Rounded Corner Box. </div> </div> which renders like this: If you look closely you’ll find that the inside <div>’s corners are not rounded and so ‘poke out’ slightly over the rounded corners. It looks like the rounded corners are ‘broken’ up instead of a solid rounded line around the corner, which his pretty ugly. The bigger the radius the more drastic this effect becomes . To fix this issue the inner <div> also has have rounded corners at the same or slightly smaller radius than the outer <div>. The simple fix for this is to simply also apply the roundbox style to the inner <div> in addition to the boxcontenttext style already applied: <div class="boxcontenttext roundbox" style="background: khaki;"> The fixed display now looks proper: Separate Top and Bottom Elements This gets even a little more tricky if you have an element at the top or bottom only of the rounded box. What if you need to add something like a header or footer <div> that have non-transparent backgrounds which is a pretty common scenario? In those cases you want only the top or bottom corners rounded and not both. To make this work a couple of additional styles to round only the top and bottom corners can be created: .roundbox-top { -moz-border-radius: 4px 4px 0 0; -webkit-border-radius: 4px 4px 0 0; border-radius: 4px 4px 0 0; } .roundbox-bottom { -moz-border-radius: 0 0 4px 4px; -webkit-border-radius: 0 0 4px 4px; border-radius: 0 0 4px 4px; } Notice that radius used for the ‘inside’ rounding is smaller (4px) than the outside radius (6px). This is so the inner radius fills into the outer border – if you use the same size you may have some white space showing between inner and out rounded corners. Experiment with values to see what works – in my experimenting the behavior across browsers here is consistent (thankfully). These styles can be applied in addition to other styles to make only the top or bottom portions of an element rounded. For example imagine I have styles like this: .gridheader, .gridheaderbig, .gridheaderleft, .gridheaderright { padding: 4px 4px 4px 4px; background: #003399 url(images/vertgradient.png) repeat-x; text-align: center; font-weight: bold; text-decoration: none; color: khaki; } .gridheaderleft { text-align: left; } .gridheaderright { text-align: right; } .gridheaderbig { font-size: 135%; } If I just apply say gridheader by itself in HTML like this: <div class="roundbox boxshadow" style="width: 550px; border: solid 2px steelblue"> <div class="gridheaderleft">Box with a Header</div> <div class="boxcontenttext" style="background: khaki;"> Simple Rounded Corner Box. </div> </div> This results in a pretty funky display – again due to the fact that the inner elements render square rather than rounded corners: If you look close again you can see that both the header and the main content have square edges which jumps out at the eye. To fix this you can now apply the roundbox-top and roundbox-bottom to the header and content respectively: <div class="roundbox boxshadow" style="width: 550px; border: solid 2px steelblue"> <div class="gridheaderleft roundbox-top">Box with a Header</div> <div class="boxcontenttext roundbox-bottom" style="background: khaki;"> Simple Rounded Corner Box. </div> </div> Which now gives the proper display with rounded corners both on the top and bottom: All of this is sweet to be supported – at least by the newest browser – without having to resort to images and nasty JavaScripts solutions. While this is still not a mainstream feature yet for the majority of actually installed browsers, the majority of browser users are very likely to have this support as most browsers other than IE are actively pushing users to upgrade to newer versions. Since this is a ‘visual display only feature it degrades reasonably well in non-supporting browsers: You get an uninteresting square and non-shadowed browser box, but the display is still overall functional. The main sticking point – as always is Internet Explorer versions 8.0 and down as well as older versions of other browsers. With those browsers you get a functional view that is a little less interesting to look at obviously: but at least it’s still functional. Maybe that’s just one more incentive for people using older browsers to upgrade to a more modern browser :-) Creating Dialog Related Styles In a lot of my AJAX based applications I use pop up windows which effectively work like dialogs. Using the simple CSS behaviors above, it’s really easy to create some fairly nice looking overlaid windows with nothing but CSS. Here’s what a typical ‘dialog’ I use looks like: The beauty of this is that it’s plain CSS – no plug-ins or images (other than the gradients which are optional) required. Add jQuery-ui draggable (or ww.jquery.js as shown below) and you have a nice simple inline implementation of a dialog represented by a simple <div> tag. Here’s the HTML for this dialog: <div id="divDialog" class="dialog boxshadow" style="width: 450px;"> <div class="dialog-header"> <div class="closebox"></div> User Sign-in </div> <div class="dialog-content"> <label>Username:</label> <input type="text" name="txtUsername" value=" " /> <label>Password</label> <input type="text" name="txtPassword" value=" " /> <hr /> <input type="button" id="btnLogin" value="Login" /> </div> <div class="dialog-statusbar">Ready</div> </div> Most of this behavior is driven by the ‘dialog’ styles which are fairly basic and easy to understand. They do use a few support images for the gradients which are provided in the sample I’ve provided. Here’s what the CSS looks like: .dialog { background: White; overflow: hidden; border: solid 1px steelblue; -moz-border-radius: 6px 6px 4px 4px; -webkit-border-radius: 6px 6px 4px 4px; border-radius: 6px 6px 3px 3px; } .dialog-header { background-image: url(images/dialogheader.png); background-repeat: repeat-x; text-align: left; color: cornsilk; padding: 5px; padding-left: 10px; font-size: 1.02em; font-weight: bold; position: relative; -moz-border-radius: 4px 4px 0px 0px; -webkit-border-radius: 4px 4px 0px 0px; border-radius: 4px 4px 0px 0px; } .dialog-top { -moz-border-radius: 4px 4px 0px 0px; -webkit-border-radius: 4px 4px 0px 0px; border-radius: 4px 4px 0px 0px; } .dialog-bottom { -moz-border-radius: 0 0 3px 3px; -webkit-border-radius: 0 0 3px 3px; border-radius: 0 0 3px 3px; } .dialog-content { padding: 15px; } .dialog-statusbar, .dialog-toolbar { background: #eeeeee; background-image: url(images/dialogstrip.png); background-repeat: repeat-x; padding: 5px; padding-left: 10px; border-top: solid 1px silver; border-bottom: solid 1px silver; font-size: 0.8em; } .dialog-statusbar { -moz-border-radius: 0 0 3px 3px; -webkit-border-radius: 0 0 3px 3px; border-radius: 0 0 3px 3px; padding-right: 10px; } .closebox { position: absolute; right: 2px; top: 2px; background-image: url(images/close.gif); background-repeat: no-repeat; width: 14px; height: 14px; cursor: pointer; opacity: 0.60; filter: alpha(opacity="80"); } .closebox:hover { opacity: 1; filter: alpha(opacity="100"); } The main style is the dialog class which is the outer box. It has the rounded border that serves as the outline. Note that I didn’t add the box-shadow to this style because in some situations I just want the rounded box in an inline display that doesn’t have a shadow so it’s still applied separately. dialog-header, then has the rounded top corners and displays a typical dialog heading format. dialog-bottom and dialog-top then provide the same functionality as roundbox-top and roundbox-bottom described earlier but are provided mainly in the stylesheet for consistency to match the dialog’s round edges and making it easier to remember and find in Intellisense as it shows up in the same dialog- group. dialog-statusbar and dialog-toolbar are two elements I use a lot for floating windows – the toolbar serves for buttons and options and filters typically, while the status bar provides information specific to the floating window. Since the the status bar is always on the bottom of the dialog it automatically handles the rounding of the bottom corners. Finally there’s closebox style which is to be applied to an empty <div> tag in the header typically. What this does is render a close image that is by default low-lighted with a low opacity value, and then highlights when hovered over. All you’d have to do handle the close operation is handle the onclick of the <div>. Note that the <div> right aligns so typically you should specify it before any other content in the header. Speaking of closable – some time ago I created a closable jQuery plug-in that basically automates this process and can be applied against ANY element in a page, automatically removing or closing the element with some simple script code. Using this you can leave out the <div> tag for closable and just do the following: To make the above dialog closable (and draggable) which makes it effectively and overlay window, you’d add jQuery.js and ww.jquery.js to the page: <script type="text/javascript" src="../../scripts/jquery.min.js"></script> <script type="text/javascript" src="../../scripts/ww.jquery.min.js"></script> and then simply call: <script type="text/javascript"> $(document).ready(function () { $("#divDialog") .draggable({ handle: ".dialog-header" }) .closable({ handle: ".dialog-header", closeHandler: function () { alert("Window about to be closed."); return true; // true closes - false leaves open } }); }); </script> * ww.jquery.js emulates base features in jQuery-ui’s draggable. If jQuery-ui is loaded its draggable version will be used instead and voila you have now have a draggable and closable window – here in mid-drag: The dragging and closable behaviors are of course optional, but it’s the final touch that provides dialog like window behavior. Relief for older Internet Explorer Versions with CSS Pie If you want to get these features to work with older versions of Internet Explorer all the way back to version 6 you can check out CSS Pie. CSS Pie provides an Internet Explorer behavior file that attaches to specific CSS rules and simulates these behavior using script code in IE (mostly by implementing filters). You can simply add the behavior to each CSS style that uses box-shadow and border-radius like this: .boxshadow { -moz-box-shadow: 3px 3px 5px #535353; -webkit-box-shadow: 3px 3px 5px #535353; box-shadow: 3px 3px 5px #535353; behavior: url(scripts/PIE.htc); } .roundbox { -moz-border-radius: 6px 6px 6px 6px; -webkit-border-radius: 6px; border-radius: 6px 6px 6px 6px; behavior: url(scripts/PIE.htc); } CSS Pie requires the PIE.htc on your server and referenced from each CSS style that needs it. Note that the url() for IE behaviors is NOT CSS file relative as other CSS resources, but rather PAGE relative , so if you have more than one folder you probably need to reference the HTC file with a fixed path like this: behavior: url(/MyApp/scripts/PIE.htc); in the style. Small price to pay, but a royal pain if you have a common CSS file you use in many applications. Once the PIE.htc file has been copied and you have applied the behavior to each style that uses these new features Internet Explorer will render rounded corners and box shadows! Yay! Hurray for box-shadow and border-radius All of this functionality is very welcome natively in the browser. If you think this is all frivolous visual candy, you might be right :-), but if you take a look on the Web and search for rounded corner solutions that predate these CSS attributes you’ll find a boatload of stuff from image files, to custom drawn content to Javascript solutions that play tricks with a few images. It’s sooooo much easier to have this functionality built in and I for one am glad to see that’s it’s finally becoming standard in the box. Still remember that when you use these new CSS features, they are not universal, and are not going to be really soon. Legacy browsers, especially old versions of Internet Explorer that can’t be updated will continue to be around and won’t work with this shiny new stuff. I say screw ‘em: Let them get a decent recent browser or see a degraded and ugly UI. We have the luxury with this functionality in that it doesn’t typically affect usability – it just doesn’t look as nice. Resources Download the Sample The sample includes the styles and images and sample page as well as ww.jquery.js for the draggable/closable example. Online Sample Check out the sample described in this post online. Closable and Draggable Documentation Documentation for the closeable and draggable plug-ins in ww.jquery.js. You can also check out the full documentation for all the plug-ins contained in ww.jquery.js here. © Rick Strahl, West Wind Technologies, 2005-2011Posted in HTML CSS

Read the article

LSI 9285-8e and Supermicro SC837E26-RJBOD1 duplicate enclosure ID and slot numbers

- by Andy Shinn

I am working with 2 x Supermicro SC837E26-RJBOD1 chassis connected to a single LSI 9285-8e card in a Supermicro 1U host. There are 28 drives in each chassis for a total of 56 drives in 28 RAID1 mirrors. The problem I am running in to is that there are duplicate slots for the 2 chassis (the slots list twice and only go from 0 to 27). All the drives also show the same enclosure ID (ID 36). However, MegaCLI -encinfo lists the 2 enclosures correctly (ID 36 and ID 65). My question is, why would this happen? Is there an option I am missing to use 2 enclosures effectively? This is blocking me rebuilding a drive that failed in slot 11 since I can only specify enclosure and slot as parameters to replace a drive. When I do this, it picks the wrong slot 11 (device ID 46 instead of device ID 19). Adapter #1 is the LSI 9285-8e, adapter #0 (which I removed due to space limitations) is the onboard LSI. Adapter information: Adapter #1 ============================================================================== Versions ================ Product Name : LSI MegaRAID SAS 9285-8e Serial No : SV12704804 FW Package Build: 23.1.1-0004 Mfg. Data ================ Mfg. Date : 06/30/11 Rework Date : 00/00/00 Revision No : 00A Battery FRU : N/A Image Versions in Flash: ================ BIOS Version : 5.25.00_4.11.05.00_0x05040000 WebBIOS Version : 6.1-20-e_20-Rel Preboot CLI Version: 05.01-04:#%00001 FW Version : 3.140.15-1320 NVDATA Version : 2.1106.03-0051 Boot Block Version : 2.04.00.00-0001 BOOT Version : 06.253.57.219 Pending Images in Flash ================ None PCI Info ================ Vendor Id : 1000 Device Id : 005b SubVendorId : 1000 SubDeviceId : 9285 Host Interface : PCIE ChipRevision : B0 Number of Frontend Port: 0 Device Interface : PCIE Number of Backend Port: 8 Port : Address 0 5003048000ee8e7f 1 5003048000ee8a7f 2 0000000000000000 3 0000000000000000 4 0000000000000000 5 0000000000000000 6 0000000000000000 7 0000000000000000 HW Configuration ================ SAS Address : 500605b0038f9210 BBU : Present Alarm : Present NVRAM : Present Serial Debugger : Present Memory : Present Flash : Present Memory Size : 1024MB TPM : Absent On board Expander: Absent Upgrade Key : Absent Temperature sensor for ROC : Present Temperature sensor for controller : Absent ROC temperature : 70 degree Celcius Settings ================ Current Time : 18:24:36 3/13, 2012 Predictive Fail Poll Interval : 300sec Interrupt Throttle Active Count : 16 Interrupt Throttle Completion : 50us Rebuild Rate : 30% PR Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruction Rate : 30% Cache Flush Interval : 4s Max Drives to Spinup at One Time : 2 Delay Among Spinup Groups : 12s Physical Drive Coercion Mode : Disabled Cluster Mode : Disabled Alarm : Enabled Auto Rebuild : Enabled Battery Warning : Enabled Ecc Bucket Size : 15 Ecc Bucket Leak Rate : 1440 Minutes Restore HotSpare on Insertion : Disabled Expose Enclosure Devices : Enabled Maintain PD Fail History : Enabled Host Request Reordering : Enabled Auto Detect BackPlane Enabled : SGPIO/i2c SEP Load Balance Mode : Auto Use FDE Only : No Security Key Assigned : No Security Key Failed : No Security Key Not Backedup : No Default LD PowerSave Policy : Controller Defined Maximum number of direct attached drives to spin up in 1 min : 10 Any Offline VD Cache Preserved : No Allow Boot with Preserved Cache : No Disable Online Controller Reset : No PFK in NVRAM : No Use disk activity for locate : No Capabilities ================ RAID Level Supported : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span Supported Drives : SAS, SATA Allowed Mixing: Mix in Enclosure Allowed Mix of SAS/SATA of HDD type in VD Allowed Status ================ ECC Bucket Count : 0 Limitations ================ Max Arms Per VD : 32 Max Spans Per VD : 8 Max Arrays : 128 Max Number of VDs : 64 Max Parallel Commands : 1008 Max SGE Count : 60 Max Data Transfer Size : 8192 sectors Max Strips PerIO : 42 Max LD per array : 16 Min Strip Size : 8 KB Max Strip Size : 1.0 MB Max Configurable CacheCade Size: 0 GB Current Size of CacheCade : 0 GB Current Size of FW Cache : 887 MB Device Present ================ Virtual Drives : 28 Degraded : 0 Offline : 0 Physical Devices : 59 Disks : 56 Critical Disks : 0 Failed Disks : 0 Supported Adapter Operations ================ Rebuild Rate : Yes CC Rate : Yes BGI Rate : Yes Reconstruct Rate : Yes Patrol Read Rate : Yes Alarm Control : Yes Cluster Support : No BBU : No Spanning : Yes Dedicated Hot Spare : Yes Revertible Hot Spares : Yes Foreign Config Import : Yes Self Diagnostic : Yes Allow Mixed Redundancy on Array : No Global Hot Spares : Yes Deny SCSI Passthrough : No Deny SMP Passthrough : No Deny STP Passthrough : No Support Security : No Snapshot Enabled : No Support the OCE without adding drives : Yes Support PFK : Yes Support PI : No Support Boot Time PFK Change : Yes Disable Online PFK Change : No PFK TrailTime Remaining : 0 days 0 hours Support Shield State : Yes Block SSD Write Disk Cache Change: Yes Supported VD Operations ================ Read Policy : Yes Write Policy : Yes IO Policy : Yes Access Policy : Yes Disk Cache Policy : Yes Reconstruction : Yes Deny Locate : No Deny CC : No Allow Ctrl Encryption: No Enable LDBBM : No Support Breakmirror : No Power Savings : Yes Supported PD Operations ================ Force Online : Yes Force Offline : Yes Force Rebuild : Yes Deny Force Failed : No Deny Force Good/Bad : No Deny Missing Replace : No Deny Clear : No Deny Locate : No Support Temperature : Yes Disable Copyback : No Enable JBOD : No Enable Copyback on SMART : No Enable Copyback to SSD on SMART Error : Yes Enable SSD Patrol Read : No PR Correct Unconfigured Areas : Yes Enable Spin Down of UnConfigured Drives : Yes Disable Spin Down of hot spares : No Spin Down time : 30 T10 Power State : Yes Error Counters ================ Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 Cluster Information ================ Cluster Permitted : No Cluster Active : No Default Settings ================ Phy Polarity : 0 Phy PolaritySplit : 0 Background Rate : 30 Strip Size : 64kB Flush Time : 4 seconds Write Policy : WB Read Policy : Adaptive Cache When BBU Bad : Disabled Cached IO : No SMART Mode : Mode 6 Alarm Disable : Yes Coercion Mode : None ZCR Config : Unknown Dirty LED Shows Drive Activity : No BIOS Continue on Error : No Spin Down Mode : None Allowed Device Type : SAS/SATA Mix Allow Mix in Enclosure : Yes Allow HDD SAS/SATA Mix in VD : Yes Allow SSD SAS/SATA Mix in VD : No Allow HDD/SSD Mix in VD : No Allow SATA in Cluster : No Max Chained Enclosures : 16 Disable Ctrl-R : Yes Enable Web BIOS : Yes Direct PD Mapping : No BIOS Enumerate VDs : Yes Restore Hot Spare on Insertion : No Expose Enclosure Devices : Yes Maintain PD Fail History : Yes Disable Puncturing : No Zero Based Enclosure Enumeration : No PreBoot CLI Enabled : Yes LED Show Drive Activity : Yes Cluster Disable : Yes SAS Disable : No Auto Detect BackPlane Enable : SGPIO/i2c SEP Use FDE Only : No Enable Led Header : No Delay during POST : 0 EnableCrashDump : No Disable Online Controller Reset : No EnableLDBBM : No Un-Certified Hard Disk Drives : Allow Treat Single span R1E as R10 : No Max LD per array : 16 Power Saving option : Don't Auto spin down Configured Drives Max power savings option is not allowed for LDs. Only T10 power conditions are to be used. Default spin down time in minutes: 30 Enable JBOD : No TTY Log In Flash : No Auto Enhanced Import : No BreakMirror RAID Support : No Disable Join Mirror : No Enable Shield State : Yes Time taken to detect CME : 60s Exit Code: 0x00 Enclosure information: # /opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a1 Number of enclosures on adapter 1 -- 3 Enclosure 0: Device ID : 36 Number of Slots : 28 Number of Power Supplies : 2 Number of Fans : 3 Number of Temperature Sensors : 1 Number of Alarms : 1 Number of SIM Modules : 0 Number of Physical Drives : 28 Status : Normal Position : 1 Connector Name : Port B Enclosure type : SES VendorId is LSI CORP and Product Id is SAS2X36 VendorID and Product ID didnt match FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : 65 Inquiry data : Vendor Identification : LSI CORP Product Identification : SAS2X36 Product Revision Level : 0718 Vendor Specific : x36-55.7.24.1 Number of Voltage Sensors :2 Voltage Sensor :0 Voltage Sensor Status :OK Voltage Value :5020 milli volts Voltage Sensor :1 Voltage Sensor Status :OK Voltage Value :11820 milli volts Number of Power Supplies : 2 Power Supply : 0 Power Supply Status : OK Power Supply : 1 Power Supply Status : OK Number of Fans : 3 Fan : 0 Fan Speed :Low Speed Fan Status : OK Fan : 1 Fan Speed :Low Speed Fan Status : OK Fan : 2 Fan Speed :Low Speed Fan Status : OK Number of Temperature Sensors : 1 Temp Sensor : 0 Temperature : 48 Temperature Sensor Status : OK Number of Chassis : 1 Chassis : 0 Chassis Status : OK Enclosure 1: Device ID : 65 Number of Slots : 28 Number of Power Supplies : 2 Number of Fans : 3 Number of Temperature Sensors : 1 Number of Alarms : 1 Number of SIM Modules : 0 Number of Physical Drives : 28 Status : Normal Position : 1 Connector Name : Port A Enclosure type : SES VendorId is LSI CORP and Product Id is SAS2X36 VendorID and Product ID didnt match FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : 36 Inquiry data : Vendor Identification : LSI CORP Product Identification : SAS2X36 Product Revision Level : 0718 Vendor Specific : x36-55.7.24.1 Number of Voltage Sensors :2 Voltage Sensor :0 Voltage Sensor Status :OK Voltage Value :5020 milli volts Voltage Sensor :1 Voltage Sensor Status :OK Voltage Value :11760 milli volts Number of Power Supplies : 2 Power Supply : 0 Power Supply Status : OK Power Supply : 1 Power Supply Status : OK Number of Fans : 3 Fan : 0 Fan Speed :Low Speed Fan Status : OK Fan : 1 Fan Speed :Low Speed Fan Status : OK Fan : 2 Fan Speed :Low Speed Fan Status : OK Number of Temperature Sensors : 1 Temp Sensor : 0 Temperature : 47 Temperature Sensor Status : OK Number of Chassis : 1 Chassis : 0 Chassis Status : OK Enclosure 2: Device ID : 252 Number of Slots : 8 Number of Power Supplies : 0 Number of Fans : 0 Number of Temperature Sensors : 0 Number of Alarms : 0 Number of SIM Modules : 1 Number of Physical Drives : 0 Status : Normal Position : 1 Connector Name : Unavailable Enclosure type : SGPIO Failed in first Inquiry commnad FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : Unavailable Inquiry data : Vendor Identification : LSI Product Identification : SGPIO Product Revision Level : N/A Vendor Specific : Exit Code: 0x00 Now, notice that each slot 11 device shows an enclosure ID of 36, I think this is where the discrepancy happens. One should be 36. But the other should be on enclosure 65. Drives in slot 11: Enclosure Device ID: 36 Slot Number: 11 Drive's postion: DiskGroup: 5, Span: 0, Arm: 1 Enclosure position: 0 Device Id: 48 WWN: Sequence Number: 11 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online, Spun Up Is Commissioned Spare : YES Device Firmware Level: A5C0 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5003048000ee8a53 Connected Port Number: 1(path0) Inquiry Data: MJ1311YNG6YYXAHitachi HDS5C3030ALA630 MEAOA5C0 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :30C (86.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 36 Slot Number: 11 Drive's postion: DiskGroup: 19, Span: 0, Arm: 1 Enclosure position: 0 Device Id: 19 WWN: Sequence Number: 4 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online, Spun Up Is Commissioned Spare : NO Device Firmware Level: A580 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5003048000ee8e53 Connected Port Number: 0(path0) Inquiry Data: MJ1313YNG1VA5CHitachi HDS5C3030ALA630 MEAOA580 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :30C (86.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Update 06/28/12: I finally have some new information about (what we think) the root cause of this problem so I thought I would share. After getting in contact with a very knowledgeable Supermicro tech, they provided us with a tool called Xflash (doesn't appear to be readily available on their FTP). When we gathered some information using this utility, my colleague found something very strange: root@mogile2 test]# ./xflash.dat -i get avail Initializing Interface. Expander: SAS2X36 (SAS2x36) 1) SAS2X36 (SAS2x36) (50030480:00EE917F) (0.0.0.0) 2) SAS2X36 (SAS2x36) (50030480:00E9D67F) (0.0.0.0) 3) SAS2X36 (SAS2x36) (50030480:0112D97F) (0.0.0.0) This lists the connected enclosures. You see the 3 connected (we have since added a 3rd and a 4th which is not yet showing up) with their respective SAS address / WWN (50030480:00EE917F). Now we can use this address to get information on the individual enclosures: [root@mogile2 test]# ./xflash.dat -i 5003048000EE917F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:00EE917F Enclosure Logical Id: 50030480:0000007F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 [root@mogile2 test]# ./xflash.dat -i 5003048000E9D67F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:00E9D67F Enclosure Logical Id: 50030480:0000007F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 [root@mogile2 test]# ./xflash.dat -i 500304800112D97F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:0112D97F Enclosure Logical Id: 50030480:0112D97F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 Did you catch it? The first 2 enclosures logical ID is partially masked out where the 3rd one (which has a correct unique enclosure ID) is not. We pointed this out to Supermicro and were able to confirm that this address is supposed to be set during manufacturing and there was a problem with a certain batch of these enclosures where the logical ID was not set. We believe that the RAID controller is determining the ID based on the logical ID and since our first 2 enclosures have the same logical ID, they get the same enclosure ID. We also confirmed that 0000007F is the default which comes from LSI as an ID. The next pointer that helps confirm this could be a manufacturing problem with a run of JBODs is the fact that all 6 of the enclosures that have this problem begin with 00E. I believe that between 00E8 and 00EE Supermicro forgot to program the logical IDs correctly and neglected to recall or fix the problem post production. Fortunately for us, there is a tool to manage the WWN and logical ID of the devices from Supermicro: ftp://ftp.supermicro.com/utility/ExpanderXtools_Lite/. Our next step is to schedule a shutdown of these JBODs (after data migration) and reprogram the logical ID and see if it solves the problem. Update 06/28/12 #2: I just discovered this FAQ at Supermicro while Google searching for "lsi 0000007f": http://www.supermicro.com/support/faqs/faq.cfm?faq=11805. I still don't understand why, in the last several times we contacted Supermicro, they would have never directed us to this article :\

Developer IT