Search Results

Search found 159 results on 7 pages for 'degraded'.

Page 6/7 | < Previous Page | 2 3 4 5 6 7 | Next Page >

Performance degrades for more than 2 threads on Xeon X5355

- by zoolii

Hi All, I am writing an application using boost threads and using boost barriers to synchronize the threads. I have two machines to test the application. Machine 1 is a core2 duo (T8300) cpu machine (windows XP professional - 4GB RAM) where I am getting following performance figures : Number of threads :1 , TPS :21 Number of threads :2 , TPS :35 (66 % improvement) further increase in number of threads decreases the TPS but that is understandable as the machine has only two cores. Machine 2 is a 2 quad core ( Xeon X5355) cpu machine (windows 2003 server with 4GB RAM) and has 8 effective cores. Number of threads :1 , TPS :21 Number of threads :2 , TPS :27 (28 % improvement) Number of threads :4 , TPS :25 Number of threads :8 , TPS :24 As you can see, performance is degrading after 2 threads (though it has 8 cores). If the program has some bottle neck , then for 2 thread also it should have degraded. Any idea? , Explanations ? , Does the OS has some role in performance ? - It seems like the Core2duo (2.4GHz) scales better than Xeon X5355 (2.66GHz) though it has better clock speed. Thank you -Zoolii

Read the article
Response time increasing (worsening) over time with consistent load

- by NJ

Ok. I know I don't have a lot of information. That is, essentially, the reason for my question. I am building a game using Flash/Flex and Rails on the back-end. Communication between the two is via WebORB. Here is what is happening. When I start the client an operation calls the server every 60 seconds (not much, right?) which results in two database SELECTS and an UPDATE and a resulting response to the client. This repeats every 60 seconds. I deployed a test version on heroku and NewRelic's RPM told me that response time degraded over time. One client with one task every 60 seconds. Over several hours the response time drifted from 150ms to over 900ms in response time. I have been able to reproduce this in my development environment (my Macbook Pro) so it isn't a problem on Heroku's side. I am not doing anything sophisticated (by design) in the server app. An action gets called, gets some data from the database, performs an AR update and then returns a response. No caching, etc. Any thoughts? Anyone? I'd really appreciate it.

Read the article
detection of 'flush tables with read lock' in php

- by theduke0

I would like to know from my application if a myisam table can accept writes (i.e. not locked). If an exception is thrown, everything is fine as I can catch this and log the failed statement to a file. However, if a 'flush tables with read lock' command has been issued (possibly for backup), the query I send will pretty much hang out forever. If one table is locked at a time, insert delayed works well. But when this global lock is applied, my query just waits. The query I run is an insert statement. If this statement fails or hangs, user experience is degraded. I need a way to send the query to the server and forget about it (pretty much). Does anyone have any suggestions on how to deal with this? -set a query timeout? -run asyncronous request and allow for the lock to expire while application continues? -fork my php process? Please let me know if I can provide and clarification or details.

Read the article
Split a string by comma, quote and full-stop.. with a few exceptions

- by dunc

I've got a lot of text, similar to the following paragraph, which I'd like to split into words without punctuation (', ", ,, ., newline etc).. with a few exceptions. Initially considered endemic to the Chalakudy River system in Kerala state, southern India, but now recognised to have a wider distribution in surrounding drainages including the Periyar, Manimala, and Pamba river though the Manimala data may be questionable given it seems to be the type locality of P. denisonii. In the Achankovil River basin it occurs sympatrically, and sometimes syntopically, with P. denisonii. Wild stocks may have dwindled by as much as 50% in the last 15 years or so with collection for the aquarium trade largely held responsible although habitats are also being degraded by pollution from agricultural and domestic sources, plus destructive fishing methods involving explosives or organic toxins. The text refers to P. denisonii which is a species of fish. It's an abbreviation of Genus species. I would like this reference to be one word. So, for instance, this is the kind of array I'd like to see: Array ( ... [44] given [45] it [46] seems [47] to [48] be [49] the [50] type [51] locality [52] of [53] P. denisonii [54] In [55] the ... ) The only things that distinguish these species references such as P. denisonii from a new sentence like end. New are: The P (for Puntius, as in the P. in the aforementioned example) is only ever one letter, always a capital the d (as in . denisonii) is always either a lower case letter or an apostrophe (') What regexp can I use with preg_split to give me such an array? I've tried a simple explode( " ", $array ) but it doesn't do the job at all. Thanks in advance,

Read the article
Failed MDADM Array With Ext.4 Partition - "e2fsck: unable to set superblock flags on /dev/md0"

- by Matthew Hodgkins

Had a power failure and now my mdadm array is having problems. sudo mdadm -D /dev/md0 [hodge@hodge-fs ~]$ sudo mdadm -D /dev/md0 /dev/md0: Version : 0.90 Creation Time : Sun Apr 25 01:39:25 2010 Raid Level : raid5 Array Size : 8790815232 (8383.57 GiB 9001.79 GB) Used Dev Size : 1465135872 (1397.26 GiB 1500.30 GB) Raid Devices : 7 Total Devices : 7 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Aug 7 19:10:28 2010 State : clean, degraded, recovering Active Devices : 6 Working Devices : 7 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 128K Rebuild Status : 10% complete UUID : 44a8f730:b9bea6ea:3a28392c:12b22235 (local to host hodge-fs) Events : 0.1307608 Number Major Minor RaidDevice State 0 8 81 0 active sync /dev/sdf1 1 8 97 1 active sync /dev/sdg1 2 8 113 2 active sync /dev/sdh1 3 8 65 3 active sync /dev/sde1 4 8 49 4 active sync /dev/sdd1 7 8 33 5 spare rebuilding /dev/sdc1 6 8 16 6 active sync /dev/sdb sudo mount -a [hodge@hodge-fs ~]$ sudo mount -a mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so sudo fsck.ext4 /dev/md0 [hodge@hodge-fs ~]$ sudo fsck.ext4 /dev/md0 e2fsck 1.41.12 (17-May-2010) fsck.ext4: Group descriptors look bad... trying backup blocks... /dev/md0: recovering journal fsck.ext4: unable to set superblock flags on /dev/md0 sudo dumpe2fs /dev/md0 | grep -i superblock [hodge@hodge-fs ~]$ sudo dumpe2fs /dev/md0 | grep -i superblock dumpe2fs 1.41.12 (17-May-2010) Primary superblock at 0, Group descriptors at 1-524 Backup superblock at 32768, Group descriptors at 32769-33292 Backup superblock at 98304, Group descriptors at 98305-98828 Backup superblock at 163840, Group descriptors at 163841-164364 Backup superblock at 229376, Group descriptors at 229377-229900 Backup superblock at 294912, Group descriptors at 294913-295436 Backup superblock at 819200, Group descriptors at 819201-819724 Backup superblock at 884736, Group descriptors at 884737-885260 Backup superblock at 1605632, Group descriptors at 1605633-1606156 Backup superblock at 2654208, Group descriptors at 2654209-2654732 Backup superblock at 4096000, Group descriptors at 4096001-4096524 Backup superblock at 7962624, Group descriptors at 7962625-7963148 Backup superblock at 11239424, Group descriptors at 11239425-11239948 Backup superblock at 20480000, Group descriptors at 20480001-20480524 Backup superblock at 23887872, Group descriptors at 23887873-23888396 Backup superblock at 71663616, Group descriptors at 71663617-71664140 Backup superblock at 78675968, Group descriptors at 78675969-78676492 Backup superblock at 102400000, Group descriptors at 102400001-102400524 Backup superblock at 214990848, Group descriptors at 214990849-214991372 Backup superblock at 512000000, Group descriptors at 512000001-512000524 Backup superblock at 550731776, Group descriptors at 550731777-550732300 Backup superblock at 644972544, Group descriptors at 644972545-644973068 Backup superblock at 1934917632, Group descriptors at 1934917633-1934918156 sudo e2fsck -b 32768 /dev/md0 [hodge@hodge-fs ~]$ sudo e2fsck -b 32768 /dev/md0 e2fsck 1.41.12 (17-May-2010) /dev/md0: recovering journal e2fsck: unable to set superblock flags on /dev/md0 sudo dmesg | tail [hodge@hodge-fs ~]$ sudo dmesg | tail EXT4-fs (md0): ext4_check_descriptors: Checksum for group 0 failed (59837!=29115) EXT4-fs (md0): group descriptors corrupted! EXT4-fs (md0): ext4_check_descriptors: Checksum for group 0 failed (59837!=29115) EXT4-fs (md0): group descriptors corrupted! Please Help!!!

Read the article
Linux boot on a raid1 software raid ?

- by azera

Hello I am trying to convert my single disk boot to a raid1 boot So far here is what i have: I sucessfully create the raid 1 as degraded with the new drive alone, I copied all the data on it I can mount that raid 1, see its files etc I already have a raid5 that is working on the same box (although not booting on it) I have installed grub on both drive When grub boot, it loads the kernel alright, but during the kernel boot it fails to load the "root block device" The kernel tells me : 1 - detected that root device is an md device 2 - determining root devices 3 - mounting root 4 - mounting /dev/md125 on /newroot failed: input/output error. Please enter another root device: ... At this point, if I enter /dev/sda3 (my "old" root device that isn't converted to raid yet) everything boots fine without the root. The /dev/md125 device is indeed created but it seems to be created after the error happens, as in it creates it after loading the device, when mdadm is loaded. Somehow it looks like it can't/doesn't load the raid array before it needs to mount it, and I don't know how I can solve that. My config files (taken from the system once it boots with sda3 as root device): $ cat /etc/mdadm.conf ARRAY /dev/md/md0-r5 metadata=0.90 UUID=1a118934:c831bdb3:64188b84:66721085 ARRAY /dev/md125 metadata=0.90 UUID=48ec4190:a80d4dde:64188b84:66721085 $ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] [raid0] [raid10] md125 : active raid1 sdc3[1] 477853312 blocks [2/1] [_U] md127 : active raid5 sdd[0] sdf[3] sdb[2] sde[1] 4395415488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] unused devices: <none> $ cat /boot/grub/menu.lst default 0 timeout 8 splashimage=(hd0,0)/boot/grub/splash.xpm.gz title Gentoo Linux 2.6.31-r10 root (hd0,0) #kernel /boot/kernel-genkernel-x86_64-2.6.31-gentoo-r10 root=/dev/ram0 real_root=/dev/sda3 kernel /boot/kernel-genkernel-x86_64-2.6.31-gentoo-r10 root=/dev/md125 md=125,/dev/sdc3,/dev/sda3 initrd /boot/initramfs-genkernel-x86_64-2.6.31-gentoo-r10 # blkid /dev/sda1: UUID="89fee223-b845-4e0a-8a0b-e6cf695d5bcf" TYPE="ext2" /dev/sda2: UUID="a72296a8-d7d4-447f-a34b-ee920fd1a767" TYPE="swap" /dev/sda3: UUID="97eb0a6a-c385-4a9d-bf74-c0bab1fa4dc1" TYPE="ext3" /dev/sdb: UUID="1a118934-c831-bdb3-6418-8b8466721085" TYPE="linux_raid_member" /dev/sdc1: UUID="d36537fd-19a0-b8a3-6418-8b8466721085" TYPE="linux_raid_member" /dev/sdd: UUID="1a118934-c831-bdb3-6418-8b8466721085" TYPE="linux_raid_member" /dev/sde: UUID="1a118934-c831-bdb3-6418-8b8466721085" TYPE="linux_raid_member" /dev/md127: UUID="13a41589-4cf1-4c04-91ca-37484182c783" TYPE="ext4" /dev/sdf: UUID="1a118934-c831-bdb3-6418-8b8466721085" TYPE="linux_raid_member" /dev/sdc2: UUID="a1916397-1b48-45d7-9f98-73aa521e882f" TYPE="swap" /dev/sdc3: UUID="48ec4190-a80d-4dde-6418-8b8466721085" TYPE="linux_raid_member" /dev/md125: UUID="c947ed64-1d4d-4d1d-b4d2-24669fff916e" SEC_TYPE="ext2" TYPE="ext3" # mdadm -E mdadm: No devices to examine # fdisk -l Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xe975e9fc Device Boot Start End Blocks Id System /dev/sda1 1 5 40131 83 Linux /dev/sda2 6 1311 10490445 82 Linux swap / Solaris /dev/sda3 1312 60801 477853425 83 Linux Disk /dev/sdc: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xe975e9fc Device Boot Start End Blocks Id System /dev/sdc1 1 5 40131 83 Linux /dev/sdc2 6 1311 10490445 82 Linux swap / Solaris /dev/sdc3 1312 60801 477853425 83 Linux Disk /dev/md125: 489.3 GB, 489321791488 bytes 2 heads, 4 sectors/track, 119463328 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 Disk /dev/md125 doesn't contain a valid partition table

Read the article
Why Does Wireless Gear Degrade Over Time?

- by bahamat

I saw this originally posted on slashdot, but their comment format is not conducive to actually getting a correct answer. Having directly experienced this phenomenon myself, I'm now asking here where I think I can actually get an educated answer. Here's the original question verbatim: Lately I have replaced several home wireless routers because the signal strength has been found to be degraded. These devices, when new (2+ years ago) would cover an entire house. Over the years, the strength seems to decrease to a point where it might only cover one or two rooms. Of the three that I have replaced for friends, I have not found a common brand, age, etc. It just seems that after time, the signal strength decreases. I know that routers are cheap and easy to replace but I'm curious what actually causes this. I would have assumed that the components would either work or not work; we would either have a full signal or have no signal. I am not an electrical engineer and I can't find the answer online so I'm reaching out to you. Can someone explain how a transmitter can slowly go bad? Common (incorrect, but repeated) answers from slashdot include: Back then your neighbors didn't have wifi, now they do. They drowning you out. I don't think this is likely because replacing the access point with a new one and using the same frequencies solves the problem. Older devices had low transmit power. Crank that baby. As mentioned by a FreeBSD wireless developer this violates regulations and can physically damage the equipment. It was also mentioned that higher power in one direction is not necessarily reciprocated. This shows higher bars, but not necessarily a better connection. Manufacturers make cheap crap designed to wear out. This one actually may be legitimate although it is overly broad. What specifically causes damage over time? Heat? Excessive power? So can anyone provide an informed answer on this? Is there any way to fix these older access points?

Read the article
Moving Windows XP from ICH10R RAID 5 to single disk using Linux [migrated]

- by tudor

A friend's machine running Windows XP refused to boot recently which is running 3 SATA disks on RAID 5 (which was previously upgraded from RAID 1 not by me). I have determined there to be a disk failure. The disks have been replaced many times in the past few years. I wish to backup the RAID5 partition before I try anything to fix it. The RAID chipset used is ICH10R/DO. So, I plugged in an extra IDE drive and an Ubuntu USB key and looked at the RAID. The partitioning is a mess, but I did find at least one degraded but working RAID array with two partitions, one 79GB and the other 86GB. Then I: 1) Partitioned my IDE disk using fdisk to have a partition of 80GB and bootable, and marked as NTFS. 2) dd the contents of the array to the partition 3) disconnected everything else 4) inserted a Windows XP CD and ran fixboot, fixmbr, and bootcfg. They all run ok and claim that they worked. (e.g. bootcfg detects the Windows partition, fixboot returns saying that it was written correctly.) However, I'm still getting an error like "DISK FAILURE, BOOT DISK NOT FOUND". I have tried running the GRUB rescue disk, which also runs ok, but won't boot into Windows. It just stops with a flashing cursor after chainloader +1, boot. One clue may be that the partitions appear to be wack. One disk has a 79GB RAID partition on a 500GB drive with a offset, the second disk has a 320GB RAID partition across the whole drive. Additionally, the BIOS lists the RAID size as being 149GB. I don't see how this works. How are they even assembling the array when the partitions are so different? I have also tried running the Windows XP automated repair tool, but that didn't work either. I'm presuming this is something simple. Perhaps Windows is attempting to boot into RAID and, upon not finding it, simply crashing? Perhaps the 79GB partitions offset means that it's looking into the disk by that much? Please help!! To clarify: I want to make the single IDE disk bootable with a copy of the array so that I can prove/disprove that it's just that Windows has become corrupted, and use windows tools to correct it before attempting the same thing on the RAID array. That way I have a working backup and can show the process I used to fix it.

Read the article
Raid1 with active and spare partition

- by Daniel Baron

I am having the following problem with a RAID1 software raid partition on my Ubuntu system (10.04 LTS, 2.6.32-24-server in case it matters). One of my disks (sdb5) reported I/O errors and was therefore marked faulty in the array. The array was then degraded with one active device. Hence, I replaced the harddisk, cloned the partition table and added all new partitions to my raid arrays. After syncing all partitions ended up fine, having 2 active devices - except one of them. The partition which reported the faulty disk before, however, did not include the new partition as an active device but as a spare disk: md3 : active raid1 sdb5[2] sda5[1] 4881344 blocks [2/1] [_U] A detailed look reveals: root@server:~# mdadm --detail /dev/md3 [...] Number Major Minor RaidDevice State 2 8 21 0 spare rebuilding /dev/sdb5 1 8 5 1 active sync /dev/sda5 So here is the question: How do I tell my raid to turn the spare disk into an active one? And why has it been added as a spare device? Recreating or reassembling the array is not an option, because it is my root partition. And I can not find any hints to that subject in the Software Raid HOWTO. Any help would be appreciated. Current Solution I found a solution to my problem, but I am not sure that this is the actual way to do it. Having a closer look at my raid I found that sdb5 was always listed as a spare device: mdadm --examine /dev/sdb5 [...] Number Major Minor RaidDevice State this 2 8 21 2 spare /dev/sdb5 0 0 0 0 0 removed 1 1 8 5 1 active sync /dev/sda5 2 2 8 21 2 spare /dev/sdb5 so readding the device sdb5 to the array md3 always ended up in adding the device as a spare. Finally I just recreated the array mdadm --create /dev/md3 --level=1 -n2 -x0 /dev/sda5 /dev/sdb5 which worked. But the question remains open for me: Is there a better way to manipulate the summaries in the superblock and to tell the array to turn sdb5 from a spare disk to an active disk? I am still curious for an answer.

Read the article
How do I tell mdadm to start using a missing disk in my RAID5 array again?

- by Jon Cage

I have a 3-disk RAID array running in my Ubuntu server. This has been running flawlessly for over a year but I was recently forced to strip, move and rebuild the machine. When I had it all back together and ran up Ubuntu, I had some problems with disks not being detected. A couple of reboots later and I'd solved that issue. The problem now is that the 3-disk array is showing up as degraded every time I boot up. For some reason it seems that Ubuntu has made a new array and added the missing disk to it. I've tried stopping the new 1-disk array and adding the missing disk, but I'm struggling. On startup I get this: root@uberserver:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md_d1 : inactive sdf1[2](S) 1953511936 blocks md0 : active raid5 sdg1[2] sdc1[3] sdb1[1] sdh1[0] 2930279808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] I have two RAID arrays and the one that normally pops up as md1 isn't appearing. I read somewhere that calling mdadm --assemble --scan would re-assemble the missing array so I've tried first stopping the existing array that ubuntu started: root@uberserver:~# mdadm --stop /dev/md_d1 mdadm: stopped /dev/md_d1 ...and then tried to tell ubuntu to pick the disks up again: root@uberserver:~# mdadm --assemble --scan mdadm: /dev/md/1 has been started with 2 drives (out of 3). So that's started md1 again but it's not picking up the disk from md_d1: root@uberserver:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid5 sde1[1] sdf1[2] 3907023872 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU] md_d1 : inactive sdd1[0](S) 1953511936 blocks md0 : active raid5 sdg1[2] sdc1[3] sdb1[1] sdh1[0] 2930279808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] What's going wrong here? Why is Ubuntu trying to pick up sdd into a different array? How do I get that missing disk back home again? [Edit] - After adding the md1 to mdadm.conf it now tries to mount the array on startup but it's still missing the disk. If I tell it to try and assemble automatically I get the impression it know it needs sdd but can't use it: root@uberserver:~# mdadm --assemble --scan /dev/md1: File exists mdadm: /dev/md/1 already active, cannot restart it! mdadm: /dev/md/1 needed for /dev/sdd1... What am I missing?

Read the article
mdadm raid1 fails to resync

- by JuanD

Hello, I'm trying to solve this problem I'm having with an mdadm raid1. I have an ubuntu 9.04 server running on a software 2-drive raid1 with mdadm. Yesterday, one of the drives failed, and so I replaced it with a brand new drive of the same size. I removed the faulty drive, copied the partition from the remaining good drive to the new drive and then added it to the raid. It re-synced and the system worked fine, until the drive that hadn't failed, was also labeled failed. Now I had the raid running solely on the new drive. So I purchased another drive and repeated the procedure above. So now I had 2 brand new drives and the raid was syncing. However, after a few minutes I checked /proc/mdstat and the raid was no longer syncing. mdadm --detail /dev/md1 shows: (sdb is the first new drive, and sdc is the second new drive) root@dola:/home/jjaramillo# mdadm --detail /dev/md1 /dev/md1: Version : 00.90 Creation Time : Sat Dec 20 00:42:05 2008 Raid Level : raid1 Array Size : 974711680 (929.56 GiB 998.10 GB) Used Dev Size : 974711680 (929.56 GiB 998.10 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Wed Jun 2 10:09:35 2010 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 UUID : bba497c6:5029ba0b:bfa4f887:c0dc8f3d Events : 0.5395594 Number Major Minor RaidDevice State 2 8 35 0 spare rebuilding /dev/sdc3 1 8 19 1 active sync /dev/sdb3 I've tried removing and re-adding the drive a few times, but the same thing happens. The raid fails to resync. I've looked at /var/log/messages, and found the following: Jun 2 07:57:36 dola kernel: [35708.917337] sd 5:0:0:0: [sdb] Unhandled sense code Jun 2 07:57:36 dola kernel: [35708.917339] sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jun 2 07:57:36 dola kernel: [35708.917342] sd 5:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor] Jun 2 07:57:36 dola kernel: [35708.917346] Descriptor sense data with sense descriptors (in hex): Jun 2 07:57:36 dola kernel: [35708.917348] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jun 2 07:57:36 dola kernel: [35708.917357] 00 43 9e 47 Jun 2 07:57:36 dola kernel: [35708.917360] sd 5:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed So it looks like there's some kind of error on sdb (the first new drive). My question is, what would be the best approach to get the raid up and running again? I've thought about dd'ing the /dev/md1 to a blank hard drive, then re-doing the raid from scratch and loading the data back, but there could be an easier solution.. Any help would be appreciated.

Read the article
4 Ways Your Brand Can Jump From the Edge of Space

- by Mike Stiles

Can your brand’s social media content captivate the world and make it hold its collective breath? Can you put something on the screen that’s so compelling that your audience can’t look away? Will they want to make sure their friends see it so they can talk about it? If not, you’re probably not with Red Bull. I was impressed with Red Bull’s approach to social content even before Felix Baumgartner’s stunning skydive from the edge of space. And then they did this. According to Visible Measures, videos of the jump scored 50 million views in 4 days. 1,700 clips were generated from both official and organic sources. The live stream was the most watched YouTube Stream of all time (8 million concurrent viewers). The 2nd most watched live stream was…Felix’ first attempt Oct. 9. Are you ready to compete with that? I ask that question because some brands are still out there tying themselves up in knots about whether or not they should tweet. The public’s time and attention are scarce commodities, commodities they value greatly. The competition amongst brands for that time and attention is intense and going up like Felix’s capsule. If you still view your press releases as “content,” you won’t even be counted as being among the competition. Here are 5 lessons learned from Red Bull’s big leap: 1. They have a total understanding of their target market and audience. Not only do they have an understanding of it, they do something about it. They act on it. They fill the majority of their thoughts with what the audience wants. They hunger for wild applause from that audience. They want to do things that embrace the audience’s lifestyle and immerse in it so the target will identify the brand as “one of them.” Takeaway: BE your target market. 2. They deliver content that strikes the audience right where they emotionally live. If you want your content to have impact, you have to make your audience’s heart race, or make them tear up, or make them laugh. Label them “data points” all you want, but humans are emotional creatures. No message connects that’s not carried in on an emotion. Takeaway: You’re on the inside. If your content doesn’t make you say “wow,” it’s unlikely it will register with fans. 3. They put aside old school marketing and don’t let their content be degraded into a commercial. Their execs seem to understand the value in keeping a lid on the hard sell. So many brands just can’t bring themselves to disconnect advertising and social content. The result is, otherwise decent content gets contaminated with a desperation the viewer can smell a mile away. Think the Baumgartner skydive didn’t do Red Bull any good since he wasn’t drinking one on the way down while singing a jingle? Analysis company Taykey discovered that at the peak of the skydive buzz, about 1% of all online conversation was about the jump. Mentions of Red Bull constituted 1/3 of 1% of all Internet activity. Views of other Red Bull videos also shot up. Takeaway: Chill out with the ads. Your brand will get full credit for entertaining/informing fans in a relevant way, provided you do it. 4. They don’t hesitate to ask, “What can we do next”? Most corporate cultures are a virtual training facility for “we can’t do that.” Few are encouraged to innovate or think big, if think at all. Thinking big involves faith, and work. It means freedom and letting employees run a little wild with their ideas. There will always be the opportunity to let fear of everything that moves creep in and kill grand visions dead in their tracks. Experimenting must be allowed. Failure must be allowed. Red Bull didn’t think big. They thought mega. They tried to outdo themselves. Felix could have gone ahead and jumped halfway up, thinking, “This is still relatively high up. Good enough.” But that wouldn’t have left us breathless. Takeaway: Go for it. Jump. In putting up social properties and gathering fans of your brand, you’ve basically invited people to a party. A good host doesn’t just set out warm beer and stale chips because that’s inexpensive and easy. Be on the lookout for ways to make your guests walk away saying, “That was epic.”

Read the article
How do you unit test the real world?

- by Kim Sun-wu

I'm primarily a C++ coder, and thus far, have managed without really writing tests for all of my code. I've decided this is a Bad Idea(tm), after adding new features that subtly broke old features, or, depending on how you wish to look at it, introduced some new "features" of their own. But, unit testing seems to be an extremely brittle mechanism. You can test for something in "perfect" conditions, but you don't get to see how your code performs when stuff breaks. A for instance is a crawler, let's say it crawls a few specific sites, for data X. Do you simply save sample pages, test against those, and hope that the sites never change? This would work fine as regression tests, but, what sort of tests would you write to constantly check those sites live and let you know when the application isn't doing it's job because the site changed something, that now causes your application to crash? Wouldn't you want your test suite to monitor the intent of the code? The above example is a bit contrived, and something I haven't run into (in case you haven't guessed). Let me pick something I have, though. How do you test an application will do its job in the face of a degraded network stack? That is, say you have a moderate amount of packet loss, for one reason or the other, and you have a function DoSomethingOverTheNetwork() which is supposed to degrade gracefully when the stack isn't performing as it's supposed to; but does it? The developer tests it personally by purposely setting up a gateway that drops packets to simulate a bad network when he first writes it. A few months later, someone checks in some code that modifies something subtly, so the degradation isn't detected in time, or, the application doesn't even recognize the degradation, this is never caught, because you can't run real world tests like this using unit tests, can you? Further, how about file corruption? Let's say you're storing a list of servers in a file, and the checksum looks okay, but the data isn't really. You want the code to handle that, you write some code that you think does that. How do you test that it does exactly that for the life of the application? Can you? Hence, brittleness. Unit tests seem to test the code only in perfect conditions(and this is promoted, with mock objects and such), not what they'll face in the wild. Don't get me wrong, I think unit tests are great, but a test suite composed only of them seems to be a smart way to introduce subtle bugs in your code while feeling overconfident about it's reliability. How do I address the above situations? If unit tests aren't the answer, what is? Thanks!

Read the article
HP SmartArray P400: How to repair failed logical drive?

- by TegtmeierDE

I have a HP Server with SmartArray P400 controller (incl. 256 MB Cache/Battery Backup) with a logicaldrive with replaced failed physicaldrive that does not rebuild. This is how it looked when I detected the error: ~# /usr/sbin/hpacucli ctrl slot=0 show config Smart Array P400 in Slot 0 (Embedded) (sn: XXXX) array A (SATA, Unused Space: 0 MB) logicaldrive 1 (698.6 GB, RAID 1, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 750 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 750 GB, OK) array B (SATA, Unused Space: 0 MB) logicaldrive 2 (2.7 TB, RAID 5, Failed) physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 750 GB, OK) physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 750 GB, OK) physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 750 GB, OK) physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA, 750 GB, Failed) physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA, 750 GB, OK) unassigned physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SATA, 750 GB, OK) ~# I thought that I had drive 2I:1:8 configured as a spare for Array A and Array B, but it seems this was not the case :-(. I noticed the problem due to I/O errors on the host, even if only 1 physicaldrive of the RAID5 is failed. Does someone know why this could happen? The logicaldrive should go into "Degraded" mode but still be fully accessible from the host os!? I first tried to add the unassigned drive 2I:1:8 as a spare to logicaldrive 2, but this was not possible: ~# /usr/sbin/hpacucli ctrl slot=0 array B add spares=2I:1:8 Error: This operation is not supported with the current configuration. Use the "show" command on devices to show additional details about the configuration. ~# Interestingly it is possible to add the unassigned drive to the first array without problems. I thought maybe the controller put the array into "failed" state due to the missing spare and protects failed arrays from modification. So I tried was to reenable the logicaldrive (to add the spare afterwards): ~# /usr/sbin/hpacucli ctrl slot=0 ld 2 modify reenable Warning: Any previously existing data on the logical drive may not be valid or recoverable. Continue? (y/n) y Error: This operation is not supported with the current configuration. Use the "show" command on devices to show additional details about the configuration. ~# But as you can see, re-enabling the logicaldrive this was not possible. Now I replaced the failed drive by hotswapping it with the unassigned drive. The status now looks like this: ~# /usr/sbin/hpacucli ctrl slot=0 show config Smart Array P400 in Slot 0 (Embedded) (sn: XXXX) array A (SATA, Unused Space: 0 MB) logicaldrive 1 (698.6 GB, RAID 1, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 750 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 750 GB, OK) array B (SATA, Unused Space: 0 MB) logicaldrive 2 (2.7 TB, RAID 5, Failed) physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 750 GB, OK) physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 750 GB, OK) physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 750 GB, OK) physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA, 750 GB, OK) physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA, 750 GB, OK) ~# The logical drive is still not accessible. Why is it not rebuilding? What can I do? FYI, this is the configuration of my controller: ~# /usr/sbin/hpacucli ctrl slot=0 show Smart Array P400 in Slot 0 (Embedded) Bus Interface: PCI Slot: 0 Serial Number: XXXX Cache Serial Number: XXXX RAID 6 (ADG) Status: Enabled Controller Status: OK Chassis Slot: Hardware Revision: Rev E Firmware Version: 5.22 Rebuild Priority: Medium Expand Priority: Medium Surface Scan Delay: 15 secs Surface Analysis Inconsistency Notification: Disabled Raid1 Write Buffering: Disabled Post Prompt Timeout: 0 secs Cache Board Present: True Cache Status: OK Accelerator Ratio: 25% Read / 75% Write Drive Write Cache: Disabled Total Cache Size: 256 MB No-Battery Write Cache: Disabled Cache Backup Power Source: Batteries Battery/Capacitor Count: 1 Battery/Capacitor Status: OK SATA NCQ Supported: True ~# Thanks for you help in advance.

Read the article
Splitting a raidctl mirror safely

- by milkfilk

I have a Sun T5220 server with the onboard LSI card and two disks that were in a RAID 1 mirror. The data is not important right now but we had a failed disk and are trying to understand how to do this for real if we had to recover from a failure. The initial situation looked like this: # raidctl -l c1t0d0 Volume Size Stripe Status Cache RAID Sub Size Level Disk ---------------------------------------------------------------- c1t0d0 136.6G N/A DEGRADED OFF RAID1 0.1.0 136.6G GOOD N/A 136.6G FAILED Green light on the 0.0.0 disk. Find / lights up the 0.1.0 disk. So I know I have a bad drive and which one it is. Server still boots obviously. First, we tried putting a new disk in. This disk came from an unknown source. Format would not see it, cfgadm -al would not see it so raidctl -l would not see it. I figure it's bad. We tried another disk from another spare server: # raidctl -c c1t1d0 c1t0d0 (where t1 is my good disk - 0.1.0) Disk has occupied space. Also the different syntax options don't change anything: # raidctl -C "0.1.0 0.0.0" -r 1 1 Disk has occupied space. # raidctl -C "0.1.0 0.0.0" 1 Disk has occupied space. Ok. Maybe this is because the disk from the spare server had a RAID 1 on it already. Aha, I can see another volume in raidctl: # raidctl -l Controller: 1 Volume:c1t1d0 (this is my server's root mirror) Volume:c1t132d0 (this is the foreign root mirror) Disk: 0.0.0 Disk: 0.1.0 ... No problem. I don't care about the data, I'll just delete the foreign mirror. # raidctl -d c1t132d0 (warning about data deletion but it works) At this point, /usr/bin/ binaries freak out. By that I mean, ls -l /usr/bin/which shows 1.4k but cat /usr/bin/which gives me a newline. Great, I just blew away the binaries (ie: binaries in mem still work)? I bounce the box. It all comes back fine. WTF. Anyway, back to recreating my mirror. # raidctl -l Controller: 1 Volume:c1t1d0 (this is my server's root mirror) Disk: 0.0.0 Disk: 0.1.0 ... Man says that you can delete a mirror and it will split it. Ok, I'll delete the root mirror. # raidctl -d c1t0d0 Array in use. (this might not be the exact error) I googled this and found of course you can't do this (even with -f) while booted off the mirror. Ok. I boot cdrom -s and deleted the volume. Now I have one disk that has a type of "LSI-Logical-Volume" on c1t1d0 (where my data is) and a brand new "Hitachi 146GB" on c1t0d0 (what I'm trying to mirror to): (booted off the CD) # raidctl -c c1t1d0 c1t0d0 (man says it's source destination for mirroring) Illegal Array Layout. # raidctl -C "0.1.0 0.0.0" -r 1 1 (alt syntax per man) Illegal Array Layout. # raidctl -C "0.1.0 0.0.0" 1 (assumes raid1, no help) Illegal Array Layout. Same size disks, same manufacturer but I did delete the volume instead of throwing in a blank disk and waiting for it to resync. Maybe this was a critical error. I tried selecting the type in format for my good disk to be a plain 146gb disk but it resets the partition table which I'm pretty sure would wipe the data (bad if this was production). Am I boned? Anyone have experience with breaking and resyncing a mirror? There's nothing on Google about "Illegal Array Layout" so here's my contrib to the search gods.

Read the article
How to get rid of a stubborn 'removed' device in mdadm

- by T.J. Crowder

One of my server's drives failed and so I removed the failed drive from all three relevant arrays, had the drive swapped out, and then added the new drive to the arrays. Two of the arrays worked perfectly. The third added the drive back as a spare, and there's an odd "removed" entry in the mdadm details. I tried both mdadm /dev/md2 --remove failed and mdadm /dev/md2 --remove detached as suggested here and here, neither of which complained, but neither of which had any effect, either. Does anyone know how I can get rid of that entry and get the drive added back properly? (Ideally without resyncing a third time, I've already had to do it twice and it takes hours. But if that's what it takes, that's what it takes.) The new drive is /dev/sda, the relevant partition is /dev/sda3. Here's the detail on the array: # mdadm --detail /dev/md2 /dev/md2: Version : 0.90 Creation Time : Wed Oct 26 12:27:49 2011 Raid Level : raid1 Array Size : 729952192 (696.14 GiB 747.47 GB) Used Dev Size : 729952192 (696.14 GiB 747.47 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Tue Nov 12 17:48:53 2013 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 UUID : 2fdbf68c:d572d905:776c2c25:004bd7b2 (local to host blah) Events : 0.34665 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 19 1 active sync /dev/sdb3 2 8 3 - spare /dev/sda3 If it's relevant, it's a 64-bit server. It normally runs Ubuntu, but right now I'm in the data centre's "rescue" OS, which is Debian 7 (wheezy). The "removed" entry was there the last time I was in Ubuntu (it won't, currently, boot from the disk), so I don't think that's not some Ubuntu/Debian conflict (and they are, of course, closely related). Update: Having done extensive tests with test devices on a local machine, I'm just plain getting anomalous behavior from mdadm with this array. For instance, with /dev/sda3 removed from the array again, I did this: mdadm /dev/md2 --grow --force --raid-devices=1 And that got rid of the "removed" device, leaving me just with /dev/sdb3. Then I nuked /dev/sda3 (wrote a file system to it, so it didn't have the raid fs anymore), then: mdadm /dev/md2 --grow --raid-devices=2 ...which gave me an array with /dev/sdb3 in slot 0 and "removed" in slot 1 as you'd expect. Then mdadm /dev/md2 --add /dev/sda3 ...added it — as a spare again. (Another 3.5 hours down the drain.) So with the rebuilt spare in the array, given that mdadm's man page says RAID-DEVICES CHANGES ... When the number of devices is increased, any hot spares that are present will be activated immediately. ...I grew the array to three devices, to try to activate the "spare": mdadm /dev/md2 --grow --raid-devices=3 What did I get? Two "removed" devices, and the spare. And yet when I do this with a test array, I don't get this behavior. So I nuked /dev/sda3 again, used it to create a brand-new array, and am copying the data from the old array to the new one: rsync -r -t -v --exclude 'lost+found' --progress /mnt/oldarray/* /mnt/newarray This will, of course, take hours. Hopefully when I'm done, I can stop the old array entirely, nuke /dev/sdb3, and add it to the new array. Hopefully, it won't get added as a spare!

Read the article
Raid 1 array won't assemble after power outage. How do I fix this ext4 mirror?

- by Forkrul Assail

Two ext4 drives on Raid 1 with mdadm won't reassemble after the power went out for an extended period (UPS drained). After turning the machine back on, mdadm said that the array was degraded, after which it took about 2 days for a full resync, which completed without problems. On trying to remount the array I get: mount: you must specify the filesystem type cat /etc/fstab lines relevant to setup: /dev/md127 /media/mediapool ext4 defaults 0 0 dmesg | tail (on trying to mount) says: [ 1050.818782] EXT3-fs (md127): error: can't find ext3 filesystem on dev md127. [ 1050.849214] EXT4-fs (md127): VFS: Can't find ext4 filesystem [ 1050.944781] FAT-fs (md127): invalid media value (0x00) [ 1050.944782] FAT-fs (md127): Can't find a valid FAT filesystem [ 1058.272787] EXT2-fs (md127): error: can't find an ext2 filesystem on dev md127. cat /proc/mdstat says: Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md127 : active (auto-read-only) raid1 sdj[2] sdi[0] 2930135360 blocks super 1.2 [2/2] [UU] unused devices: <none> fsck /dev/md127 says: fsck from util-linux 2.20.1 e2fsck 1.42 (29-Nov-2011) fsck.ext2: Superblock invalid, trying backup blocks... fsck.ext2: Bad magic number in super-block while trying to open /dev/md127 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> mdadm -E /dev/sdi gives me: /dev/sdi: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 37ac1824:eb8a21f6:bd5afd6d:96da6394 Name : sojourn:33 Creation Time : Sat Nov 10 10:43:52 2012 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 5860271016 (2794.40 GiB 3000.46 GB) Array Size : 2930135360 (2794.39 GiB 3000.46 GB) Used Dev Size : 5860270720 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : 3e6e9a4f:6c07ab3d:22d47fce:13cecfd0 Update Time : Tue Nov 13 20:34:18 2012 Checksum : f7d10db9 - correct Events : 27 Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing) boot@boot ~ $ sudo mdadm -E /dev/sdj /dev/sdj: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 37ac1824:eb8a21f6:bd5afd6d:96da6394 Name : sojourn:33 Creation Time : Sat Nov 10 10:43:52 2012 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 5860271016 (2794.40 GiB 3000.46 GB) Array Size : 2930135360 (2794.39 GiB 3000.46 GB) Used Dev Size : 5860270720 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : 7fb84af4:e9295f7b:ede61f27:bec0cb57 Update Time : Tue Nov 13 20:34:18 2012 Checksum : b9d17fef - correct Events : 27 Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing) machine@user ~ dmesg | tail [ 61.785866] init: alsa-restore main process (2736) terminated with status 99 [ 68.433548] eth0: no IPv6 routers present [ 534.142511] EXT4-fs (sdi): ext4_check_descriptors: Block bitmap for group 0 not in group (block 2838187772)! [ 534.142518] EXT4-fs (sdi): group descriptors corrupted! [ 546.418780] EXT2-fs (sdi): error: couldn't mount because of unsupported optional features (240) [ 549.654127] EXT3-fs (sdi): error: couldn't mount because of unsupported optional features (240) Since this is Raid 1 it was suggested that I try and mount or fsck the drives separately. After a long fsck on one drive, it ended with this as tail: Illegal double indirect block (2298566437) in inode 39717736. CLEARED. Illegal block #4231180 (2611866932) in inode 39717736. CLEARED. Error storing directory block information (inode=39717736, block=0, num=1092368): Memory allocation failed Recreate journal? yes Creating journal (32768 blocks): Done. *** journal has been re-created - filesystem is now ext3 again *** The drive however still doesn't want to mount: dmesg | tail [ 170.674659] md: export_rdev(sdc) [ 170.675152] md: export_rdev(sdc) [ 195.275288] md: export_rdev(sdc) [ 195.275876] md: export_rdev(sdc) [ 1338.540092] CE: hpet increased min_delta_ns to 30169 nsec [26125.734105] EXT4-fs (sdc): ext4_check_descriptors: Checksum for group 0 failed (43502!=37987) [26125.734115] EXT4-fs (sdc): group descriptors corrupted! [26182.325371] EXT3-fs (sdc): error: couldn't mount because of unsupported optional features (240) [27083.316519] EXT4-fs (sdc): ext4_check_descriptors: Checksum for group 0 failed (43502!=37987) [27083.316530] EXT4-fs (sdc): group descriptors corrupted! Please help me fix this. I never in my wildest nightmares thought a complete mirror would die this badly. Am I missing something? Suggestions on fixing this? Could someone explain why it would resync after the powerout, only to seemingly nuke the drive? Thanks for reading. Any help much appreciated. I've tried everything I can think of, including booting and filesystem checking with SystemRescue and Ubuntu liveboot discs.

Read the article
Can't re-mount existing RAID10 on Ubuntu

- by Zoran

I saw similar questions, but didn't find what solution to my problem. After power-cut, one of RAID10 (4 disks were) appears to be malfunctioning. I make tha array active one, but can not mount it. Always the same error: mount: you must specify the filesystem type So, here is what I have when type mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Tue Sep 1 11:00:40 2009 Raid Level : raid10 Array Size : 1465148928 (1397.27 GiB 1500.31 GB) Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Jun 11 09:54:27 2012 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : near=2, far=1 Chunk Size : 64K UUID : 1a02e789:c34377a1:2e29483d:f114274d Events : 0.166 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 0 0 1 removed 2 8 48 2 active sync /dev/sdd 3 8 64 3 active sync /dev/sde At the /etc/mdadm/mdadm.conf I have by default, scan all partitions (/proc/partitions) for MD superblocks. alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes automatically tag new arrays as belonging to the local system HOMEHOST <system> instruct the monitoring daemon where to send mail alerts MAILADDR root definitions of existing MD arrays ARRAY /dev/md0 level=raid10 num-devices=4 UUID=1a02e789:c34377a1:2e29483d:f114274d ARRAY /dev/md1 level=raid1 num-devices=2 UUID=9b592be7:c6a2052f:2e29483d:f114274d This file was auto-generated... So, my question is, how can I mount md0 array (md1 has been mounted without problem) in order to preserve existing data? One more thing, fdisk -l command gives the following result: Disk /dev/sdb: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x660a6799 Device Boot Start End Blocks Id System /dev/sdb1 * 1 88217 708603021 83 Linux /dev/sdb2 88218 91201 23968980 5 Extended /dev/sdb5 88218 91201 23968948+ 82 Linux swap / Solaris Disk /dev/sdc: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x0008f8ae Device Boot Start End Blocks Id System /dev/sdc1 1 88217 708603021 83 Linux /dev/sdc2 88218 91201 23968980 5 Extended /dev/sdc5 88218 91201 23968948+ 82 Linux swap / Solaris Disk /dev/sdd: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x4be1abdb Device Boot Start End Blocks Id System Disk /dev/sde: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xa4d5632e Device Boot Start End Blocks Id System Disk /dev/sdf: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xdacb141c Device Boot Start End Blocks Id System Disk /dev/sdg: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xdacb141c Device Boot Start End Blocks Id System Disk /dev/md1: 750.1 GB, 750156251136 bytes 2 heads, 4 sectors/track, 183143616 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0xdacb141c Device Boot Start End Blocks Id System Warning: ignoring extra data in partition table 5 Warning: ignoring extra data in partition table 5 Warning: ignoring extra data in partition table 5 Warning: invalid flag 0x7b6e of partition table 5 will be corrected by w(rite) Disk /dev/md0: 1500.3 GB, 1500312502272 bytes 255 heads, 63 sectors/track, 182402 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x660a6799 Device Boot Start End Blocks Id System /dev/md0p1 * 1 88217 708603021 83 Linux /dev/md0p2 88218 91201 23968980 5 Extended /dev/md0p5 ? 121767 155317 269488144 20 Unknown And one more thing. When using mdadm --examine command, here ise result: mdadm -v --examine --scan /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sd ARRAY /dev/md1 level=raid1 num-devices=2 UUID=9b592be7:c6a2052f:2e29483d:f114274d devices=/dev/sdf ARRAY /dev/md0 level=raid10 num-devices=4 UUID=1a02e789:c34377a1:2e29483d:f114274d devices=/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde md0 has 3 devices which are active. Can someone instruct me how to solve this issue? If it is possible, I would like not to removing faulty HDD. Please advise

Read the article
Reconstructing the disk order in RAID 6 with 7 disks

- by rkotulla

a little background to this question first: I am running a RAID-6 within a QNAP TS869L external RAID/NAS system. I started with 5 disks of 3 TB each back in the day, and later added another 2 disks of 3TB to the RAID. The QNAP internals handled the growing and re-syncing etc, and everything seemd to be perfectly fine. About 2 weeks ago, I had one of the disks (disk #5, disk #2 has gone bad in the mean time) fail, and somehow (I have no idea why), also disks 1 and 2 got kicked out of the array. I replaced disk #5, but the RAID didn't start working again. After some calls to QNAP technical support, they re-created the array (using mdadm --create --force --assume-clean ...), but the resulting array couldn't find a filesystem, and I was kindly referred to contact a data recovery company that I can't afford. After some digging through old log files, resetting the disk to factory default, etc, I found a few errors that were made during this re-create - I wish I still had some of the original metadata, but unfortunately i don't (I definitely learned that lesson). I'm currently at the point where I know the correct chunk-size (64K), metadata-version (1.0; factory default was 0.9, but from what I read 0.9 doesn't handle disks over 2 TB, mine are 3 TB), and I now find the ext4 filesystem that should be on the disks. Only variable left to determine is the right disk order! I started using the description found in answer #4 of "Recover RAID 5 data after created new array instead of re-using" but am a little confused on what the order should be for a proper RAID-6. RAID-5 is pretty well documented in a number of places, but RAID-6 much less so. Also, does the layout, i.e. distribution of parity and data chunks across the disks, change after the growing of the array from 5 to 7 disks, or does the re-sync re-organize them in such a way a native 7-disk RAID-6 would have been? Thanks some more mdadm output that might be helpful: mdadm version: [~] # mdadm --version mdadm - v2.6.3 - 20th August 2007 mdadm details from one of the disks in the array: [~] # mdadm --examine /dev/sda3 /dev/sda3: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 1c1614a5:e3be2fbb:4af01271:947fe3aa Name : 0 Creation Time : Tue Jun 10 10:27:58 2014 Raid Level : raid6 Raid Devices : 7 Used Dev Size : 5857395112 (2793.02 GiB 2998.99 GB) Array Size : 29286975360 (13965.12 GiB 14994.93 GB) Used Size : 5857395072 (2793.02 GiB 2998.99 GB) Super Offset : 5857395368 sectors State : clean Device UUID : 7c572d8f:20c12727:7e88c888:c2c357af Update Time : Tue Jun 10 13:01:06 2014 Checksum : d275c82d - correct Events : 7036 Chunk Size : 64K Array Slot : 0 (0, 1, failed, 3, failed, 5, 6) Array State : Uu_u_uu 2 failed mdadm details for the array in the current disk-order (based on my best guess reconstructed from old log-files) [~] # mdadm --detail /dev/md0 /dev/md0: Version : 01.00.03 Creation Time : Tue Jun 10 10:27:58 2014 Raid Level : raid6 Array Size : 14643487680 (13965.12 GiB 14994.93 GB) Used Dev Size : 2928697536 (2793.02 GiB 2998.99 GB) Raid Devices : 7 Total Devices : 5 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Jun 10 13:01:06 2014 State : clean, degraded Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Chunk Size : 64K Name : 0 UUID : 1c1614a5:e3be2fbb:4af01271:947fe3aa Events : 7036 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 2 0 0 2 removed 3 8 51 3 active sync /dev/sdd3 4 0 0 4 removed 5 8 99 5 active sync /dev/sdg3 6 8 83 6 active sync /dev/sdf3 output from /proc/mdstat (md8, md9, and md13 are internally used RAIDs holding swap, etc; the one I'm after is md0) [~] # more /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] md0 : active raid6 sdf3[6] sdg3[5] sdd3[3] sdb3[1] sda3[0] 14643487680 blocks super 1.0 level 6, 64k chunk, algorithm 2 [7/5] [UU_U_UU] md8 : active raid1 sdg2[2](S) sdf2[3](S) sdd2[4](S) sdc2[5](S) sdb2[6](S) sda2[1] sde2[0] 530048 blocks [2/2] [UU] md13 : active raid1 sdg4[3] sdf4[4] sde4[5] sdd4[6] sdc4[2] sdb4[1] sda4[0] 458880 blocks [8/7] [UUUUUUU_] bitmap: 21/57 pages [84KB], 4KB chunk md9 : active raid1 sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sda1[0] sdb1[1] 530048 blocks [8/7] [UUUUUUU_] bitmap: 37/65 pages [148KB], 4KB chunk unused devices: <none>

Read the article
Keeping track of File System Utilization in Ops Center 12c

- by S Stelting

Enterprise Manager Ops Center 12c provides significant monitoring capabilities, combined with very flexible incident management. These capabilities even extend to monitoring the file systems associated with Solaris or Linux assets. Depending on your needs you can monitor and manage incidents, or you can fine tune alert monitoring rules to specific file systems. This article will show you how to use Ops Center 12c to Track file system utilization Adjust file system monitoring rules Disable file system rules Create custom monitoring rules If you're interested in this topic, please join us for a WebEx presentation! Date: Thursday, November 8, 2012 Time: 11:00 am, Eastern Standard Time (New York, GMT-05:00) Meeting Number: 598 796 842 Meeting Password: oracle123 To join the online meeting ------------------------------------------------------- 1. Go to https://oracleconferencing.webex.com/oracleconferencing/j.php?ED=209833597&UID=1512095432&PW=NOWQ3YjJlMmYy&RT=MiMxMQ%3D%3D 2. If requested, enter your name and email address. 3. If a password is required, enter the meeting password: oracle123 4. Click "Join". To view in other time zones or languages, please click the link: https://oracleconferencing.webex.com/oracleconferencing/j.php?ED=209833597&UID=1512095432&PW=NOWQ3YjJlMmYy&ORT=MiMxMQ%3D%3D Monitoring File Systems for OS Assets The Libraries tab provides basic, device-level information about the storage associated with an OS instance. This tab shows you the local file system associated with the instance and any shared storage libraries mounted by Ops Center. More detailed information about file system storage is available under the Analytics tab under the sub-tab named Charts. Here, you can select and display the individual mount points of an OS, and export the utilization data if desired: In this example, the OS instance has a basic root file partition and several NFS directories. Each file system mount point can be independently chosen for display in the Ops Center chart. File Systems and Incident Reporting Every asset managed by Ops Center has a "monitoring policy", which determines what represents a reportable issue with the asset. The policy is made up of a bunch of monitoring rules, where each rule describes An attribute to monitor The conditions which represent an issue The level or levels of severity for the issue When the conditions are met, Ops Center sends a notification and creates an incident. By default, OS instances have three monitoring rules associated with file systems: File System Reachability: Triggers an incident if a file system is not reachable NAS Library Status: Triggers an incident for a value of "WARNING" or "DEGRADED" for a NAS-based file system File System Used Space Percentage: Triggers an incident when file system utilization grows beyond defined thresholds You can view these rules in the Monitoring tab for an OS: Of course, the default monitoring rules is that they apply to every file system associated with an OS instance. As a result, any issue with NAS accessibility or disk utilization will trigger an incident. This can cause incidents for file systems to be reported multiple times if the same shared storage is used by many assets, as shown in this screen shot: Depending on the level of control you'd like, there are a number of ways to fine tune incident reporting. Note that any changes to an asset's monitoring policy will detach it from the default, creating a new monitoring policy for the asset. If you'd like, you can extract a monitoring policy from an asset, which allows you to save it and apply the customized monitoring profile to other OS assets. Solution #1: Modify the Reporting Thresholds In some cases, you may want to modify the basic conditions for incident reporting in your file system. The changes you make to a default monitoring rule will apply to all of the file systems associated with your operating system. Selecting the File Systems Used Space Percentage entry and clicking the "Edit Alert Monitoring Rule Parameters" button opens a pop-up dialog which allows you to modify the rule. The first screen lets you decide when you will check for file system usage, and how long you will wait before opening an incident in Ops Center. By default, Ops Center monitors continuously and reports disk utilization issues which exist for more than 15 minutes. The second screen lets you define actual threshold values. By default, Ops Center opens a Warning level incident is utilization rises above 80%, and a Critical level incident for utilization above 95% Solution #2: Disable Incident Reporting for File System If you'd rather not report file system incidents, you can disable the monitoring rules altogether. In this case, you can select the monitoring rules and click the "Disable Alert Monitoring Rule(s)" button to open the pop-up confirmation dialog. Like the first solution, this option affects all file system monitoring. It allows you to completely disable incident reporting for NAS library status or file system space consumption. Solution #3: Create New Monitoring Rules for Specific File Systems If you'd like to have the greatest flexibility when monitoring file systems, you can create entirely new rules. Clicking the "Add Alert Monitoring Rule" (the icon with the green plus sign) opens a wizard which allows you to define a new rule. This rule will be based on a threshold, and will be used to monitor operating system assets. We'd like to add a rule to track disk utilization for a specific file system - the /nfs-guest directory. To do this, we specify the following attribute FileSystemUsages.name=/nfs-guest.usedSpacePercentage The value of name in the attribute allows us to define a specific NFS shared directory or file system... in the case of this OS, we could have chosen any of the values shown in the File Systems Utilization chart at the beginning of this article. usedSpacePercentage lets us define a threshold based on the percentage of total disk space used. There are a number of other values that we could use for threshold-based monitoring of FileSystemUsages, including freeSpace freeSpacePercentage totalSpace usedSpace usedSpacePercentage The final sections of the screen allow us to determine when to monitor for disk usage, and how long to wait after utilization reaches a threshold before creating an incident. The next screen lets us define the threshold values and severity levels for the monitoring rule: If historical data is available, Ops Center will display it in the screen. Clicking the Apply button will create the new monitoring rule and active it in your monitoring policy. If you combine this with one of the previous solutions, you can precisely define which file systems will generate incidents and notifications. For example, this monitoring policy has the default "File System Used Space Percentage" rule disabled, but the new rule reports ONLY on utilization for the /nfs-guest directory. Stay Connected: Twitter | Facebook | YouTube | Linkedin | Newsletter

Read the article
OS Analytics - Deep Dive Into Your OS

- by Eran_Steiner

Enterprise Manager Ops Center provides a feature called "OS Analytics". This feature allows you to get a better understanding of how the Operating System is being utilized. You can research the historical usage as well as real time data. This post will show how you can benefit from OS Analytics and how it works behind the scenes. We will have a call to discuss this blog - please join us!Date: Thursday, November 1, 2012Time: 11:00 am, Eastern Daylight Time (New York, GMT-04:00)1. Go to https://oracleconferencing.webex.com/oracleconferencing/j.php?ED=209833067&UID=1512092402&PW=NY2JhMmFjMmFh&RT=MiMxMQ%3D%3D2. If requested, enter your name and email address.3. If a password is required, enter the meeting password: oracle1234. Click "Join". To join the teleconference:Call-in toll-free number: 1-866-682-4770 (US/Canada) Other countries: https://oracle.intercallonline.com/portlets/scheduling/viewNumbers/viewNumber.do?ownerNumber=5931260&audioType=RP&viewGa=true&ga=ONConference Code: 7629343#Security code: 7777# Here is quick summary of what you can do with OS Analytics in Ops Center: View historical charts and real time value of CPU, memory, network and disk utilization Find the top CPU and Memory processes in real time or at a certain historical day Determine proper monitoring thresholds based on historical data View Solaris services status details Drill down into a process details View the busiest zones if applicable Where to start To start with OS Analytics, choose the OS asset in the tree and click the Analytics tab. You can see the CPU utilization, Memory utilization and Network utilization, along with the current real time top 5 processes in each category (click the image to see a larger version): In the above screen, you can click each of the top 5 processes to see a more detailed view of that process. Here is an example of one of the processes: One of the cool things is that you can see the process tree for this process along with some port binding and open file descriptors. On Solaris machines with zones, you get an extra level of tabs, allowing you to get more information on the different zones: This is a good way to see the busiest zones. For example, one zone may not take a lot of CPU but it can consume a lot of memory, or perhaps network bandwidth. To see the detailed Analytics for each of the zones, simply click each of the zones in the tree and go to its Analytics tab. Next, click the "Processes" tab to see real time information of all the processes on the machine: An interesting column is the "Target" column. If you configured Ops Center to work with Enterprise Manager Cloud Control, then the two products will talk to each other and Ops Center will display the correlated target from Cloud Control in this table. If you are only using Ops Center - this column will remain empty. Next, if you view a Solaris machine, you will have a "Services" tab: By default, all services will be displayed, but you can choose to display only certain states, for example, those in maintenance or the degraded ones. You can highlight a service and choose to view the details, where you can see the Dependencies, Dependents and also the location of the service log file (not shown in the picture as you need to scroll down to see the log file). The "Threshold" tab is particularly helpful - you can view historical trends of different monitored values and based on the graph - determine what the monitoring values should be: You can ask Ops Center to suggest monitoring levels based on the historical values or you can set your own. The different colors in the graph represent the current set levels: Red for critical, Yellow for warning and Blue for Information, allowing you to quickly see how they're positioned against real data. It's important to note that when looking at longer periods, Ops Center smooths out the data and uses averages. So when looking at values such as CPU Usage, try shorter time frames which are more detailed, such as one hour or one day. Applying new monitoring values When first applying new values to monitored attributes - a popup will come up asking if it's OK to get you out of the current Monitoring Policy. This is OK if you want to either have custom monitoring for a specific machine, or if you want to use this current machine as a "Gold image" and extract a Monitoring Policy from it. You can later apply the new Monitoring Policy to other machines and also set it as a default Monitoring Profile. Once you're done with applying the different monitoring values, you can review and change them in the "Monitoring" tab. You can also click the "Extract a Monitoring Policy" in the actions pane on the right to save all the new values to a new Monitoring Policy, which can then be found under "Plan Management" -> "Monitoring Policies". Visiting the past Under the "History" tab you can "go back in time". This is very helpful when you know that a machine was busy a few hours ago (perhaps in the middle of the night?), but you were not around to take a look at it in real time. Here's a view into yesterday's data on one of the machines: You can see an interesting CPU spike happening at around 3:30 am along with some memory use. In the bottom table you can see the top 5 CPU and Memory consumers at the requested time. Very quickly you can see that this spike is related to the Solaris 11 IPS repository synchronization process using the "pkgrecv" command. The "time machine" doesn't stop here - you can also view historical data to determine which of the zones was the busiest at a given time: Under the hood The data collected is stored on each of the agents under /var/opt/sun/xvm/analytics/historical/ An "os.zip" file exists for the main OS. Inside you will find many small text files, named after the Epoch time stamp in which they were taken If you have any zones, there will be a file called "guests.zip" containing the same small files for all the zones, as well as a folder with the name of the zone along with "os.zip" in it If this is the Enterprise Controller or the Proxy Controller, you will have folders called "proxy" and "sat" in which you will find the "os.zip" for that controller The actual script collecting the data can be viewed for debugging purposes as well: On Linux, the location is: /opt/sun/xvmoc/private/os_analytics/collect On Solaris, the location is /opt/SUNWxvmoc/private/os_analytics/collect If you would like to redirect all the standard error into a file for debugging, touch the following file and the output will go into it: # touch /tmp/.collect.stderr The temporary data is collected under /var/opt/sun/xvm/analytics/.collectdb until it is zipped. If you would like to review the properties for the Analytics, you can view those per each agent in /opt/sun/n1gc/lib/XVM.properties. Find the section "Analytics configurable properties for OS and VSC" to view the Analytics specific values. I hope you find this helpful! Please post questions in the comments below. Eran Steiner

Read the article
SQL Sentry First Impressions

- by AjarnMark

After struggling to defend my SQL Servers from a political attack recently, I realized that I needed better tools to back me up, and SQL Sentry is the leading candidate. A couple of weeks ago, seemingly from out of nowhere, complaints from the business users started coming in that one of the core internal applications was running dramatically slower than normal, and fingers were being pointed at the SQL Server. Unfortunately, we don’t have a production DBA whose entire job is to monitor and maintain our SQL Servers. The responsibility falls to me to do the best I can, investing only a small portion of my time, because there are so many other responsibilities to take care of, and our industry is still deep in recession. I inherited these SQL Servers and have made significant improvements in process and procedure, but I had not yet made the time to take real baseline measurements or keep a really close eye on the performance. Like many DBAs, I wrote several of my own tools and used the “built-in tools” like Profiler, PerfMon, and sp_who2 (did I mention most of our instances are SQL Server 2000?). These have all served me well for in-the-moment troubleshooting and maintenance, but they really fell down on the job when I was called upon to “prove” that SQL Server performance was acceptable and more importantly had not degraded recently (i.e. historical comparisons). I really didn’t have anything from a historical comparison perspective, but I was able to show that current performance was acceptable, and deflect attention back onto other components (which in fact turned out to be the real culprit). That experience dramatically illustrated the need for better monitoring tools. Coincidentally, I had been talking recently to my boss about the mini nightmare of monitoring several critical and interdependent overnight jobs that operate on separate instances of SQL Server. Among other tools, I had been using Idera’s SQL Job Manager which is a free tool and did a nice job of showing me job schedules and histories in a nice calendar view. This worked fairly well, and for the money (did I mention it was free?) it couldn’t be beat. But it is based on the stored job history in MSDB, and there were other performance problems that we ran into when we started changing the settings for how much job history to retain, in order to be able to look back a month or more in the calendar view. Another coincidence (if you believe in such things) was that when we had some of those performance challenges, I posted a couple of questions to the #sqlhelp hashtag on Twitter and Greg Gonzalez (@SQLSensei) suggested I check out SQL Sentry’s Event Manager. At the time, I just thought he worked there, but later found out that he founded the company. When I took a quick look at the features & benefits, the one that really jumped out at me is Chaining and Queueing which sounded like it would really help with our “interdependent jobs on different servers” issue. I know that is a lot of background story and coincidences, but hopefully you have stuck with me so far, and now we have arrived at the point where last week I downloaded and installed the 30-day trial of the SQL Sentry Power Suite, which is Event Manager plus Performance Advisor. And I must say that I really like what I see so far. Here are a few highlights: Great Support. I had two issues getting the trial setup and monitoring a handful of our servers. One of which was entirely my fault (missed a security setting in SQL 2008) and the other was mostly my fault (late change to some config settings that were apparently cached and did not get refreshed properly). In both cases, the support staff at SQL Sentry were very responsive and rather quickly figured out what the cause and fix was for each of them. This left me with a great impression of the company. Kudos to them! Chaining and Queueing. While I have not yet activated this feature, I am very excited about the possibilities. We have jobs on three different instances of SQL Server that have to be run in a certain order, and each has to finish before the next can successfully begin, and I believe this feature will ensure just that. It has been a real pain in the backside when one of those jobs runs just a little too long and does not finish before the job on another instance starts, thus triggering a chain reaction of either outright job failures, or worse, successful completion of completely invalid processing. Calendar View. I really, really like the Event Manager calendar view where I can see all jobs and events across all instances and identify potential resource contention as well as windows of opportunity for maintenance activity. Very well done, and based on Event Manager’s own database of accumulated historical information rather than querying the source instances every time. Performance Advisor Dashboard History View. This view let’s me quickly select a date and time range and it displays graphs of key SQL Server and Windows metrics. This is exactly the thing I needed to answer the “has performance changed recently” question at the beginning of this post. Reporting Services Subscription Jobs with Report Name. This was a big and VERY pleasant surprise. If you have ever looked at the list of SQL Server jobs that SQL Server Reporting Services creates when you make a Subscription, you will notice that they all have some sort of GUID as the name of the job. This is really ugly, and really annoying because when you are just looking at the SQL Agent and Job Activity Monitor, if you see that Job X failed, you really do not have any indication in the name or the properties of the Job itself, as to what Report that was for. But with SQL Sentry Event Manager you do. The Jobs list in the Navigator pane in SQL Sentry, amazingly, displays the name of the Report that the Subscription Job is for. And when you open it to see more details, it shows you the full Reporting Services path to that Report, so you can immediately track it down in the Report Manager in case you want to identify/notify the owner or edit the Subscription information. I did not expect this at all, but I sure do like it. HOORAY! That is just my first impressions from using the tools for a few days. And I haven’t even gotten into how it showed me where I was completely mistaken about one aspect of my SQL Server disk configurations. I’ll share that lesson in another blog entry. But I have to say it again, the combination of Event Manager and Performance Advisor working together have really made me a fan.

Read the article
Super constructor must be a first statement in Java constructor [closed]

- by Val

I know the answer: "we need rules to prevent shooting into your own foot". Ok, I make millions of programming mistakes every day. To be prevented, we need one simple rule: prohibit all JLS and do not use Java. If we explain everything by "not shooting your foot", this is reasonable. But there is not much reason is such reason. When I programmed in Delphy, I always wanted the compiler to check me if I read uninitializable. I have discovered myself that is is stupid to read uncertain variable because it leads unpredictable result and is errorenous obviously. By just looking at the code I could see if there is an error. I wished if compiler could do this job. It is also a reliable signal of programming error if function does not return any value. But I never wanted it do enforce me the super constructor first. Why? You say that constructors just initialize fields. Super fields are derived; extra fields are introduced. From the goal point of view, it does not matter in which order you initialize the variables. I have studied parallel architectures and can say that all the fields can even be assigned in parallel... What? Do you want to use the unitialized fields? Stupid people always want to take away our freedoms and break the JLS rules the God gives to us! Please, policeman, take away that person! Where do I say so? I'm just saying only about initializing/assigning, not using the fields. Java compiler already defends me from the mistake of accessing notinitialized. Some cases sneak but this example shows how this stupid rule does not save us from the read-accessing incompletely initialized in construction: public class BadSuper { String field; public String toString() { return "field = " + field; } public BadSuper(String val) { field = val; // yea, superfirst does not protect from accessing // inconstructed subclass fields. Subclass constr // must be called before super()! System.err.println(this); } } public class BadPost extends BadSuper { Object o; public BadPost(Object o) { super("str"); this. o = o; } public String toString() { // superconstructor will boom here, because o is not initialized! return super.toString() + ", obj = " + o.toString(); } public static void main(String[] args) { new BadSuper("test 1"); new BadPost(new Object()); } } It shows that actually, subfields have to be inilialized before the supreclass! Meantime, java requirement "saves" us from writing specializing the class by specializing what the super constructor argument is, public class MyKryo extends Kryo { class MyClassResolver extends DefaultClassResolver { public Registration register(Registration registration) { System.out.println(MyKryo.this.getDepth()); return super.register(registration); } } MyKryo() { // cannot instantiate MyClassResolver in super super(new MyClassResolver(), new MapReferenceResolver()); } } Try to make it compilable. It is always pain. Especially, when you cannot assign the argument later. Initialization order is not important for initialization in general. I could understand that you should not use super methods before initializing super. But, the requirement for super to be the first statement is different. It only saves you from the code that does useful things simply. I do not see how this adds safety. Actually, safety is degraded because we need to use ugly workarounds. Doing post-initialization, outside the constructors also degrades safety (otherwise, why do we need constructors?) and defeats the java final safety reenforcer. To conclude Reading not initialized is a bug. Initialization order is not important from the computer science point of view. Doing initalization or computations in different order is not a bug. Reenforcing read-access to not initialized is good but compilers fail to detect all such bugs Making super the first does not solve the problem as it "Prevents" shooting into right things but not into the foot It requires to invent workarounds, where, because of complexity of analysis, it is easier to shoot into the foot doing post-initialization outside the constructors degrades safety (otherwise, why do we need constructors?) and that degrade safety by defeating final access modifier When there was java forum alive, java bigots attecked me for these thoughts. Particularly, they dislaked that fields can be initialized in parallel, saying that natural development ensures correctness. When I replied that you could use an advanced engineering to create a human right away, without "developing" any ape first, and it still be an ape, they stopped to listen me. Cos modern technology cannot afford it. Ok, Take something simpler. How do you produce a Renault? Should you construct an Automobile first? No, you start by producing a Renault and, once completed, you'll see that this is an automobile. So, the requirement to produce fields in "natural order" is unnatural. In case of alarmclock or armchair, which are still chair and clock, you may need first develop the base (clock and chair) and then add extra. So, I can have examples where superfields must be initialized first and, oppositely, when they need to be initialized later. The order does not exist in advance. So, the compiler cannot be aware of the proper order. Only programmer/constructor knows is. Compiler should not take more responsibility and enforce the wrong order onto programmer. Saying that I cannot initialize some fields because I did not ininialized the others is like "you cannot initialize the thing because it is not initialized". This is a kind of argument we have. So, to conclude once more, the feature that "protects" me from doing things in simple and right way in order to enforce something that does not add noticeably to the bug elimination at that is a strongly negative thing and it pisses me off, altogether with the all the arguments to support it I've seen so far. It is "a conceptual question about software development" Should there be the requirement to call super() first or not. I do not know. If you do or have an idea, you have place to answer. I think that I have provided enough arguments against this feature. Lets appreciate the ones who benefit form it. Let it just be something more than simple abstract and stupid "write your own language" or "protection" kind of argument. Why do we need it in the language that I am going to develop?

Read the article
RAID1 rebuild fails due to disk errors

- by overlord_tm

Quick info: Dell R410 with 2x500GB drives in RAID1 on H700 Adapter Recently one of drives in RAID1 array on server failed, lets call it Drive 0. RAID controller marked it as fault and put it offline. I replaced faulty disk with new one (same series and manufacturer, just bigger) and configured new disk as hot spare. Rebuild from Drive1 started immediately and after 1.5 hour I got message that Drive 1 failed. Server was unresponsive (kernel panic) and required reboot. Given that half hour before this error rebuild was at about 40%, I estimated that new drive is not in sync yet and tried to reboot just with Drive 1. RAID controller complained a bit about missing RAID arrays, but it found foreign RAID array on Drive 1 and I imported it. Server booted and it runs (from degraded RAID). Here is SMART data for disks. Drive 0 (the one that failed first) ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 1 3 Spin_Up_Time POS--K 142 142 021 - 3866 4 Start_Stop_Count -O--CK 100 100 000 - 12 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 086 086 000 - 10432 10 Spin_Retry_Count -O--CK 100 253 000 - 0 11 Calibration_Retry_Count -O--CK 100 253 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 11 192 Power-Off_Retract_Count -O--CK 200 200 000 - 10 193 Load_Cycle_Count -O--CK 200 200 000 - 1 194 Temperature_Celsius -O---K 112 106 000 - 31 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 0 198 Offline_Uncorrectable ----CK 200 200 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 198 000 - 3 And Drive 1 (the drive which was reported healthy from controller until rebuild was attempted) ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 35 3 Spin_Up_Time POS--K 143 143 021 - 3841 4 Start_Stop_Count -O--CK 100 100 000 - 12 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 086 086 000 - 10455 10 Spin_Retry_Count -O--CK 100 253 000 - 0 11 Calibration_Retry_Count -O--CK 100 253 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 11 192 Power-Off_Retract_Count -O--CK 200 200 000 - 10 193 Load_Cycle_Count -O--CK 200 200 000 - 1 194 Temperature_Celsius -O---K 114 105 000 - 29 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 3 198 Offline_Uncorrectable ----CK 100 253 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0 In extended error logs from SMART I found: Drive 0 has only one error Error 1 [0] occurred at disk power-on lifetime: 10282 hours (428 days + 10 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 10 -- 51 00 18 00 00 00 6a 24 20 40 00 Error: IDNF at LBA = 0x006a2420 = 6956064 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 61 00 60 00 f8 00 00 00 6a 24 20 40 00 17d+20:25:18.105 WRITE FPDMA QUEUED 61 00 18 00 60 00 00 00 6a 24 00 40 00 17d+20:25:18.105 WRITE FPDMA QUEUED 61 00 80 00 58 00 00 00 6a 23 80 40 00 17d+20:25:18.105 WRITE FPDMA QUEUED 61 00 68 00 50 00 00 00 6a 23 18 40 00 17d+20:25:18.105 WRITE FPDMA QUEUED 61 00 10 00 10 00 00 00 6a 23 00 40 00 17d+20:25:18.104 WRITE FPDMA QUEUED But Drive 1 has 883 errors. I see only few last ones and all errors I can see look like this: Error 883 [18] occurred at disk power-on lifetime: 10454 hours (435 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 01 -- 51 00 80 00 00 39 97 19 c2 40 00 Error: AMNF at LBA = 0x399719c2 = 966203842 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 80 00 00 00 00 39 97 19 80 40 00 1d+00:25:57.802 READ FPDMA QUEUED 2f 00 00 00 01 00 00 00 00 00 10 40 00 1d+00:25:57.779 READ LOG EXT 60 00 80 00 00 00 00 39 97 19 80 40 00 1d+00:25:55.704 READ FPDMA QUEUED 2f 00 00 00 01 00 00 00 00 00 10 40 00 1d+00:25:55.681 READ LOG EXT 60 00 80 00 00 00 00 39 97 19 80 40 00 1d+00:25:53.606 READ FPDMA QUEUED Given those errors, is there any way I can rebuild RAID back, or should I make backup, shutdown server, replace disks with new ones and restore it? What about if I dd faulty disk to new one from linux running on USB/CD? Also, if anyone have more experiences, what could be causes for those errors? Crappy controller or disks? Disks are about 1 year old, but it is pretty unbelievable to me that both would die within so short timespan.

Read the article
Recover RAID 5 data after created new array instead of re-using

- by Brigadieren

Folks please help - I am a newb with a major headache at hand (perfect storm situation). I have a 3 1tb hdd on my ubuntu 11.04 configured as software raid 5. The data had been copied weekly onto another separate off the computer hard drive until that completely failed and was thrown away. A few days back we had a power outage and after rebooting my box wouldn't mount the raid. In my infinite wisdom I entered mdadm --create -f... command instead of mdadm --assemble and didn't notice the travesty that I had done until after. It started the array degraded and proceeded with building and syncing it which took ~10 hours. After I was back I saw that that the array is successfully up and running but the raid is not I mean the individual drives are partitioned (partition type f8 ) but the md0 device is not. Realizing in horror what I have done I am trying to find some solutions. I just pray that --create didn't overwrite entire content of the hard driver. Could someone PLEASE help me out with this - the data that's on the drive is very important and unique ~10 years of photos, docs, etc. Is it possible that by specifying the participating hard drives in wrong order can make mdadm overwrite them? when I do mdadm --examine --scan I get something like ARRAY /dev/md/0 metadata=1.2 UUID=f1b4084a:720b5712:6d03b9e9:43afe51b name=<hostname>:0 Interestingly enough name used to be 'raid' and not the host hame with :0 appended. Here is the 'sanitized' config entries: DEVICE /dev/sdf1 /dev/sde1 /dev/sdd1 CREATE owner=root group=disk mode=0660 auto=yes HOMEHOST <system> MAILADDR root ARRAY /dev/md0 metadata=1.2 name=tanserv:0 UUID=f1b4084a:720b5712:6d03b9e9:43afe51b Here is the output from mdstat cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sdd1[0] sdf1[3] sde1[1] 1953517568 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> fdisk shows the following: fdisk -l Disk /dev/sda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000bf62e Device Boot Start End Blocks Id System /dev/sda1 * 1 9443 75846656 83 Linux /dev/sda2 9443 9730 2301953 5 Extended /dev/sda5 9443 9730 2301952 82 Linux swap / Solaris Disk /dev/sdb: 750.2 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000de8dd Device Boot Start End Blocks Id System /dev/sdb1 1 91201 732572001 8e Linux LVM Disk /dev/sdc: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00056a17 Device Boot Start End Blocks Id System /dev/sdc1 1 60801 488384001 8e Linux LVM Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000ca948 Device Boot Start End Blocks Id System /dev/sdd1 1 121601 976760001 fd Linux raid autodetect Disk /dev/dm-0: 1250.3 GB, 1250254913536 bytes 255 heads, 63 sectors/track, 152001 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/dm-0 doesn't contain a valid partition table Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x93a66687 Device Boot Start End Blocks Id System /dev/sde1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xe6edc059 Device Boot Start End Blocks Id System /dev/sdf1 1 121601 976760001 fd Linux raid autodetect Disk /dev/md0: 2000.4 GB, 2000401989632 bytes 2 heads, 4 sectors/track, 488379392 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 524288 bytes / 1048576 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table Per suggestions I did clean up the superblocks and re-created the array with --assume-clean option but with no luck at all. Is there any tool that will help me to revive at least some of the data? Can someone tell me what and how the mdadm --create does when syncs to destroy the data so I can write a tool to un-do whatever was done? After the re-creating of the raid I run fsck.ext4 /dev/md0 and here is the output root@tanserv:/etc/mdadm# fsck.ext4 /dev/md0 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Superblock invalid, trying backup blocks... fsck.ext4: Bad magic number in super-block while trying to open /dev/md0 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 Per Shanes' suggestion I tried root@tanserv:/home/mushegh# mkfs.ext4 -n /dev/md0 mke2fs 1.41.14 (22-Dec-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=128 blocks, Stripe width=256 blocks 122101760 inodes, 488379392 blocks 24418969 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=0 14905 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 and run fsck.ext4 with every backup block but all returned the following: root@tanserv:/home/mushegh# fsck.ext4 -b 214990848 /dev/md0 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Invalid argument while trying to open /dev/md0 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> Any suggestions? Regards!

Read the article

Search Results

Search found 159 results on 7 pages for 'degraded'.

Page 6/7 | < Previous Page | 2 3 4 5 6 7 | Next Page >

- by zoolii

- by NJ

- by theduke0

- by dunc

- by Matthew Hodgkins

- by azera

- by bahamat

- by tudor

- by Daniel Baron

- by Jon Cage

- by JuanD

- by Mike Stiles

- by Kim Sun-wu

- by TegtmeierDE

- by milkfilk

- by T.J. Crowder

- by Forkrul Assail

- by Zoran

- by rkotulla

- by S Stelting

- by Eran_Steiner

- by AjarnMark

- by Val

- by overlord_tm

- by Brigadieren

< Previous Page | 2 3 4 5 6 7 | Next Page >