Search Results

Search found 346 results on 14 pages for 'faulty'.

Page 13/14 | < Previous Page | 9 10 11 12 13 14 | Next Page >

PHP - error when insert date into MySQL

- by Michael Mao

Hello everyone: I've got a typical problem when trying to insert a date into MySQL. The column defined in MySQL is of type DATE. My PHP version is 5.3.0 Apart from this date-related issue, the rest of my code works just fine. And this is my PHP script to do this: $tablename = BOOKS_TABLE; $insert = mysql_query("INSERT INTO $tablename (barcode, book_name, volume_num,". " author, publisher, item_type, buy_price, buy_date) VALUES ". "(". "'" . $barcode . "', ". "'" . $bookname . "', ". "'" . $volumenum . "', ". "'" . $author . "', ". "'" . $publisher . "', ". "'" . $itemtype . "', ". "'" . $buyprice . "', ". "'" . getMySQLDateString($buydate). //"'STR_TO_DATE('".$buydate ."', '%d/%m/%Y'))'". //nothing changes in MySQL ")"); And this is the faulty function : function getMySQLDateString($buydate) //typical buydate : 04/21/2009 { $mysqlDateString = date('Y-m-d H:i:s', $strtotime($buydate)); return $mysqlDateString; } The first commented out line wouldn't do anything, the script is executed with no error, however, there is nothing changed in datebase after this. The current approach will cause a Fatal error saying function name must be a string in this line. Actually I followed this thread on SO, but just cannot pass the date into MySQL... Can anyone help me figure out which part is not right? How would you do it, in this case, to get it right? Sorry about such a journeyman-like question, thanks a lot in advance.

Read the article
add space to every word's end in a string in C

- by hlx98007

Here I have a string: *line = "123 567 890 "; with 2 spaces at the end. I wish to add those 2 spaces to 3's end and 7's end to make it like this: "123 567 890" I was trying to achieve the following steps: parse the string into words by words list (array of strings). From upstream function I will get values of variables word_count, *line and remain. concatenate them with a space at the end. add space distributively, with left to right priority, so when a fair division cannot be done, the second to last word's end will have (no. of spaces) spaces, the previous ones will get (spaces + 1) spaces. concatenate everything together to make it a new *line. Here is a part of my faulty code: int add_space(char *line, int remain, int word_count) { if (remain == 0.0) return 0; // Don't need to operate. int ret; char arr[word_count][line_width]; memset(arr, 0, word_count * line_width * sizeof(char)); char *blank = calloc(line_width, sizeof(char)); if (blank == NULL) { fprintf(stderr, "calloc for arr error!\n"); return -1; } for (int i = 0; i < word_count; i++) { ret = sscanf(line, "%s", arr[i]); // gdb shows somehow it won't read in. if (ret != 1) { fprintf(stderr, "Error occured!\n"); return -1; } arr[i] = strcat(arr[i], " "); // won't compile. } size_t spaces = remain / (word_count * 1.0); memset(blank, ' ', spaces + 1); for (int i = 0; i < word_count - 1; i++) { arr[0] = strcat(arr[i], blank); // won't compile. } memset(blank, ' ', spaces); arr[word_count-1] = strcat(arr[word_count-1], blank); for (int i = 1; i < word_count; i++) { arr[0] = strcat(arr[0], arr[i]); } free(blank); return 0; } It is not working, could you help me find the parts that do not work and fix them please? Thank you guys.

Read the article
How do I stop and repair a RAID 5 array that has failed and has I/O pending?

- by Ben Hymers

The short version: I have a failed RAID 5 array which has a bunch of processes hung waiting on I/O operations on it; how can I recover from this? The long version: Yesterday I noticed Samba access was being very sporadic; accessing the server's shares from Windows would randomly lock up explorer completely after clicking on one or two directories. I assumed it was Windows being a pain and left it. Today the problem is the same, so I did a little digging; the first thing I noticed was that running ps aux | grep smbd gives a lot of lines like this: ben 969 0.0 0.2 96088 4128 ? D 18:21 0:00 smbd -F root 1708 0.0 0.2 93468 4748 ? Ss 18:44 0:00 smbd -F root 1711 0.0 0.0 93468 1364 ? S 18:44 0:00 smbd -F ben 3148 0.0 0.2 96052 4160 ? D Mar07 0:00 smbd -F ... There are a lot of processes stuck in the "D" state. Running ps aux | grep " D" shows up some other processes including my nightly backup script, all of which need to access the volume mounted on my RAID array at some point. After some googling, I found that it might be down to the RAID array failing, so I checked /proc/mdstat, which shows this: ben@jack:~$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sdb1[3](F) sdc1[1] sdd1[2] 2930271872 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU] unused devices: <none> And running mdadm --detail /dev/md0 gives this: ben@jack:~$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 00.90 Creation Time : Sat Oct 31 20:53:10 2009 Raid Level : raid5 Array Size : 2930271872 (2794.53 GiB 3000.60 GB) Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Mar 7 03:06:35 2011 State : active, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : f114711a:c770de54:c8276759:b34deaa0 Events : 0.208245 Number Major Minor RaidDevice State 3 8 17 0 faulty spare rebuilding /dev/sdb1 1 8 33 1 active sync /dev/sdc1 2 8 49 2 active sync /dev/sdd1 I believe this says that sdb1 has failed, and so the array is running with two drives out of three 'up'. Some advice I found said to check /var/log/messages for notices of failures, and sure enough there are plenty: ben@jack:~$ grep sdb /var/log/messages ... Mar 7 03:06:35 jack kernel: [4525155.384937] md/raid:md0: read error NOT corrected!! (sector 400644912 on sdb1). Mar 7 03:06:35 jack kernel: [4525155.389686] md/raid:md0: read error not correctable (sector 400644920 on sdb1). Mar 7 03:06:35 jack kernel: [4525155.389686] md/raid:md0: read error not correctable (sector 400644928 on sdb1). Mar 7 03:06:35 jack kernel: [4525155.389688] md/raid:md0: read error not correctable (sector 400644936 on sdb1). Mar 7 03:06:56 jack kernel: [4525176.231603] sd 0:0:1:0: [sdb] Unhandled sense code Mar 7 03:06:56 jack kernel: [4525176.231605] sd 0:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Mar 7 03:06:56 jack kernel: [4525176.231608] sd 0:0:1:0: [sdb] Sense Key : Medium Error [current] [descriptor] Mar 7 03:06:56 jack kernel: [4525176.231623] sd 0:0:1:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed Mar 7 03:06:56 jack kernel: [4525176.231627] sd 0:0:1:0: [sdb] CDB: Read(10): 28 00 17 e1 5f bf 00 01 00 00 To me it is clear that device sdb has failed, and I need to stop the array, shutdown, replace it, reboot, then repair the array, bring it back up and mount the filesystem. I cannot hot-swap a replacement drive in, and don't want to leave the array running in a degraded state. I believe I am supposed to unmount the filesystem before stopping the array, but that is failing, and that is where I'm stuck now: ben@jack:~$ sudo umount /storage umount: /storage: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) It is indeed busy; there are some 30 or 40 processes waiting on I/O. What should I do? Should I kill all these processes and try again? Is that a wise move when they are 'uninterruptable'? What would happen if I tried to reboot? Please let me know what you think I should do. And please ask if you need any extra information to diagnose the problem or to help!

Read the article
Graphics card initialisation problems when booting - requires a "double" boot

- by DMA57361

Problem Outline When booting from cold (and my machine is disconnected from main power when off, but leaving it connected doesn't help) the graphics card (single PCI-e card GeForce 460) will not initialise on the first boot, leaving me with the motherboards on-board graphics (which kick in automatically if no PCI-e card is found). However, if I restart the computer - normally I do this by powering it off just after the numlock lights up on the keyboard (ie, just after POST/BIOS and before Windows takes over), wait for the system to whirr down, and power up again - the graphics card will work correctly. Once double-booted in this matter the system seems to work correctly - with no noticeable problems. This is reproducible every time I try to boot - it has been working like this for about a month now. Background Information Sept 2010 - I suffered a hardware malfunction (crashes in Windows and graphics corruption on BIOS screens). By way of spare hardware I determined that replacing the PSU removed the issue, so I replaced the PSU with a brand new one of slightly higher power (460W replaced with 500W). Oct 2010 - The problem resurfaced. I purchased a new graphics card (GeForce 460), which removed the problem. The new graphics card immediately started having the boot initialisation problems mentioned. I presumed there was a motherboard fault all along, but because the system worked once booted, and I was temporarily out of spare money, I left the system alone and continued to use it. Early/Mid Dec 2010 - In the space of 5 days I recieved 3 instances of hard drive corruption (seemlingly fixed by chkdsk and sfc in each case...). Since I was already under the impression the motherboard was faulty, I purchased a new one ASAP, this also required new RAM (as I dropped from 4 slots to 2 and didn't want to drop mem quantity). Past 3-4 weeks - With a brand new PSU, Graphics Card, Motherboard and RAM I'm suffering the problem outlined above. So, what could be causing this and how do I can resolve it? Additional Notes Once double-booted the system seems to work entirely correctly. The graphics card problem has occured on two entirely different motherboards. I do not have the opportunity to test the graphics card in a different computer (I've only the old motherboard, which is dubious, or a really old desktop that still has an AGP port). Under load (ie, modern games for long enough for temperatures to plateau) the system remains stable and performs as expected. The software that came with the new motherboard and SpeenFan both report all voltages and temperatures are within nominal bounds, when idle and when under load. I've looking over the BIOS settings for my motherboard multiple times and can find nothing that helps. This system is configured to run with everything at standard levels - no overclocking. I've tried booting the system with only the mobo and graphics card connected (thinking maybe my new PSU was too weak for the new gfx card, even though it meets the quoted PSU requirements for the card) but the same problem persists (and really if the PSU was weak I'd have problems with the system under load). When the gfx card does not initialise the fan on its cooling unit is running, possibly slower than otherwise - but this measurement is by eye and so unreliable.

Read the article
Why do GPUs overheat?

- by JAD

About a year ago, I added a 9800GT (1 GB version) and a Corsair CX500 PSU to an HP M8000N computer. A few weeks ago, the HDD overheated and I decided to transfer the GPU & PSU to a new build, which consists of: i3 @ 3.3Ghz Gigabyte H61 Micro ATX Mobo 4GB RAM 500GB WD HDD DVD RW Drive Cooler Master Elite 430 Tower Once I had Win7 up and running, I installed all the essential drivers that came with the Gigabyte Mobo CD. However, whenever I tried installing the Graphics Media Accelerator driver, the computer would crash and enter an endless boot sequence on the next startup. I skipped installing this driver and installed the CD driver for the 9800GT, which by now is a year old. Everything was working fine, WEI rated my GPU at 6.6 graphics & aero performance. However, after updating my Nvidia drivers to the latest, the WEI dropped my rating to 3.3 for Aero, and 4.7 for graphics performance. Just to make sure that everything was ok, I ran Bad Company 2 on medium settings. The first few minutes ran just fine at a smooth framerate, so I dismissed this as Windows being Windows. About 6 hours later, I ran BC2 again. This time I averaged anywhere from 2-5 FPS. I checked the GPU temperature through GPU-Z, and it came back as 120C. The problem with this, is that the computer was on for six hours up to that point. Wouldn't the card have experienced a reactor core meltdown a lot sooner than that? Granted, the computer was "sleeping" some of the time, but still... The next day I took out a temperature gun and ran some tests. I would point the laser at a very specific area on the reverse side of the card (not the fan or "front"), and compare the temp reading with GPU-Z. After leaving the system on idle on idle for a few minutes, I ran BC2 twice. Here are the results: GPU-Z Reading / Temp Gun Reading / Time Null / 22.3°C / Comp is Off 53°C / 33.5°C / 1:49 78°C / 46°C / 1:53 - (First BC2 run; good framerate) 102°C / 64.6°C / 2:01 - (System is again on idle) 113°C / 64.8°C / 2:10 119°C / 71.8°C / 2:17 - (Second BC2 run; poor framerate) I should also mention that I also took a temp recording of another part of the GPU from 2:01-2:17. The temp in this area jumped from 75°C to 82.9°C in that time frame. This pretty much confirms that GPU-Z is reporting the temperature accurately, and the card is overheating. But I'd like to know why; the cars is doing nothing and still the temperature climbs at a steady rate. I thoroughly cleaned the GPU and PSU when I salvaged them from the old HP M8000N computer with a can of compressed air, dust cant be the issue. Similarly, the rest of the computer is brand new. I installed various Nvidia drivers, but no luck. It seems strange to me that a year-old card is suddenly failing on me; aren't they supposed to last at least two years? Could this be a driver issue? Is the motherboard faulty? Could the PSU be overfeeding the card on voltage? Neither case seems likely, as the CPU, RAM and otherwise the rest of the comp has worked flawlessly and has stayed well within respectable temp ranges (the i3 lingers around 50C, the HDD stays at 30C, so does the PSU). How can I pinpoint the issue?

Read the article
Computer makes hissing noise, turns off after few seconds

- by Kaustubh P

I have a problem similar to the questions posted here and here. This is my config: Asus M3N78-EM, with AMD Phenom X3 720 2800 Black Edition, 4GB Transcend DDR2 RAM, Nvidia 9400GT. HD is a 160 GB IDE, and a LG IDE DVD-ROM. The power button is a bit off, I have removed the cover of the switch, and the only way it turns on is just giving the "stick" under the cover a gentle press. It turns on sometimes, and at other times, I have to cut-off the power from the PSU, and try again. I will describe my problem in as detail as possible, please bear with me: The problem has started in the last week, a few months after I changed the to the powerswitch arrangement as described above. The PC makes a hissing noise, and I wasn't able to pin-point the noise source, because of the various other fans. At first, removing the HD, rebooting w/o the HD, turning it off, reconnecting and booting made the problem go away. But of late, it doesn't happen. As suggested in the other questions, I tried reducing the load by disconnecting both the IDE drives, and the problem (noise + turn-off) still occurs. I also connected another 80G IDE HD,today morning, adn it still made that noise, and turned off. I also opened up the PSU, but I couldn't see any fault in that, I tried rotating the fan by blowing into the blades, and with my fingers, but the hissing noise didn't come from there. Or maybe the speed wasn't enough to evoke that noise. A few weeks ago: I had cleaned the Cabinet and had repasted the processor and its fan using some thermal paste. Could that be at fault? I also used a vacuum to blow the dust out of the PSU, could the power have been too much, to maybe offset the fan or something? A label on the PSU says it uses a ball-bearing fan. That only leaves me with the Processor fan and the processor itself. I didn't try removing the processor fan and processor from the motherboard, and then turning the PC on, fearing damage. Will doing so cause any damage? What can I do to localize and pin-point the problem? Also, after a few tries, the Computer starts up. Sometimes it turns of within 2 seconds, sometimes after the POST. Once it turned off at the grub. Another time it booted completely and then turned off. The only way to ensure that the PC wont turn off, is if the hissing noise stops. EDIT: I suspect it to be the Processor/Processor fan, owing to the source of noise. All the config, except for the Cabinet, is just over a year old. EDIT2: I also just remembered, that I had set the "On-power resume" to turn on, i.e. If I supply he PC with power, it will turn itself on, w/o me needing to press the switch. I had done that to workaround the faulty power-switch, as noted above. EDIT3: I calculated the power my system needs, from the antec site, and I just arrived at 292W

Read the article
Can't re-mount existing RAID10 on Ubuntu

- by Zoran

I saw similar questions, but didn't find what solution to my problem. After power-cut, one of RAID10 (4 disks were) appears to be malfunctioning. I make tha array active one, but can not mount it. Always the same error: mount: you must specify the filesystem type So, here is what I have when type mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Tue Sep 1 11:00:40 2009 Raid Level : raid10 Array Size : 1465148928 (1397.27 GiB 1500.31 GB) Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Jun 11 09:54:27 2012 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : near=2, far=1 Chunk Size : 64K UUID : 1a02e789:c34377a1:2e29483d:f114274d Events : 0.166 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 0 0 1 removed 2 8 48 2 active sync /dev/sdd 3 8 64 3 active sync /dev/sde At the /etc/mdadm/mdadm.conf I have by default, scan all partitions (/proc/partitions) for MD superblocks. alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes automatically tag new arrays as belonging to the local system HOMEHOST <system> instruct the monitoring daemon where to send mail alerts MAILADDR root definitions of existing MD arrays ARRAY /dev/md0 level=raid10 num-devices=4 UUID=1a02e789:c34377a1:2e29483d:f114274d ARRAY /dev/md1 level=raid1 num-devices=2 UUID=9b592be7:c6a2052f:2e29483d:f114274d This file was auto-generated... So, my question is, how can I mount md0 array (md1 has been mounted without problem) in order to preserve existing data? One more thing, fdisk -l command gives the following result: Disk /dev/sdb: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x660a6799 Device Boot Start End Blocks Id System /dev/sdb1 * 1 88217 708603021 83 Linux /dev/sdb2 88218 91201 23968980 5 Extended /dev/sdb5 88218 91201 23968948+ 82 Linux swap / Solaris Disk /dev/sdc: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x0008f8ae Device Boot Start End Blocks Id System /dev/sdc1 1 88217 708603021 83 Linux /dev/sdc2 88218 91201 23968980 5 Extended /dev/sdc5 88218 91201 23968948+ 82 Linux swap / Solaris Disk /dev/sdd: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x4be1abdb Device Boot Start End Blocks Id System Disk /dev/sde: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xa4d5632e Device Boot Start End Blocks Id System Disk /dev/sdf: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xdacb141c Device Boot Start End Blocks Id System Disk /dev/sdg: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xdacb141c Device Boot Start End Blocks Id System Disk /dev/md1: 750.1 GB, 750156251136 bytes 2 heads, 4 sectors/track, 183143616 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0xdacb141c Device Boot Start End Blocks Id System Warning: ignoring extra data in partition table 5 Warning: ignoring extra data in partition table 5 Warning: ignoring extra data in partition table 5 Warning: invalid flag 0x7b6e of partition table 5 will be corrected by w(rite) Disk /dev/md0: 1500.3 GB, 1500312502272 bytes 255 heads, 63 sectors/track, 182402 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x660a6799 Device Boot Start End Blocks Id System /dev/md0p1 * 1 88217 708603021 83 Linux /dev/md0p2 88218 91201 23968980 5 Extended /dev/md0p5 ? 121767 155317 269488144 20 Unknown And one more thing. When using mdadm --examine command, here ise result: mdadm -v --examine --scan /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sd ARRAY /dev/md1 level=raid1 num-devices=2 UUID=9b592be7:c6a2052f:2e29483d:f114274d devices=/dev/sdf ARRAY /dev/md0 level=raid10 num-devices=4 UUID=1a02e789:c34377a1:2e29483d:f114274d devices=/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde md0 has 3 devices which are active. Can someone instruct me how to solve this issue? If it is possible, I would like not to removing faulty HDD. Please advise

Read the article
Router 2wire, Slackware desktop in DMZ mode, iptables policy aginst ping, but still pingable

- by user135501

I'm in DMZ mode, so I'm firewalling myself, stealthy all ok, but I get faulty test results from Shields Up that there are pings. Yesterday I couldn't make a connection to game servers work, because ping block was enabled (on the router). I disabled it, but this persists even due to my firewall. What is the connection between me and my router in DMZ mode (for my machine, there is bunch of others too behind router firewall)? When it allows router affecting if I'm pingable or not and if router has setting not blocking ping, rules in my iptables for this scenario do not work. Please ignore commented rules, I do uncomment them as I want. These two should do the job right? iptables -A INPUT -p icmp --icmp-type echo-request -j DROP echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_all Here are my iptables: #!/bin/sh # Begin /bin/firewall-start # Insert connection-tracking modules (not needed if built into the kernel). #modprobe ip_tables #modprobe iptable_filter #modprobe ip_conntrack #modprobe ip_conntrack_ftp #modprobe ipt_state #modprobe ipt_LOG # allow local-only connections iptables -A INPUT -i lo -j ACCEPT # free output on any interface to any ip for any service # (equal to -P ACCEPT) iptables -A OUTPUT -j ACCEPT # permit answers on already established connections # and permit new connections related to established ones (eg active-ftp) iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT #Gamespy&NWN #iptables -A INPUT -p tcp -m tcp -m multiport --ports 5120:5129 -j ACCEPT #iptables -A INPUT -p tcp -m tcp --dport 6667 --tcp-flags SYN,RST,ACK SYN -j ACCEPT #iptables -A INPUT -p tcp -m tcp --dport 28910 --tcp-flags SYN,RST,ACK SYN -j ACCEPT #iptables -A INPUT -p tcp -m tcp --dport 29900 --tcp-flags SYN,RST,ACK SYN -j ACCEPT #iptables -A INPUT -p tcp -m tcp --dport 29901 --tcp-flags SYN,RST,ACK SYN -j ACCEPT #iptables -A INPUT -p tcp -m tcp --dport 29920 --tcp-flags SYN,RST,ACK SYN -j ACCEPT #iptables -A INPUT -p udp -m udp -m multiport --ports 5120:5129 -j ACCEPT #iptables -A INPUT -p udp -m udp --dport 6500 -j ACCEPT #iptables -A INPUT -p udp -m udp --dport 27900 -j ACCEPT #iptables -A INPUT -p udp -m udp --dport 27901 -j ACCEPT #iptables -A INPUT -p udp -m udp --dport 29910 -j ACCEPT # Log everything else: What's Windows' latest exploitable vulnerability? iptables -A INPUT -j LOG --log-prefix "FIREWALL:INPUT" # set a sane policy: everything not accepted > /dev/null iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT DROP iptables -A INPUT -p icmp --icmp-type echo-request -j DROP # be verbose on dynamic ip-addresses (not needed in case of static IP) echo 2 > /proc/sys/net/ipv4/ip_dynaddr # disable ExplicitCongestionNotification - too many routers are still # ignorant echo 0 > /proc/sys/net/ipv4/tcp_ecn #ping death echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_all # If you are frequently accessing ftp-servers or enjoy chatting you might # notice certain delays because some implementations of these daemons have # the feature of querying an identd on your box for your username for # logging. Although there's really no harm in this, having an identd # running is not recommended because some implementations are known to be # vulnerable. # To avoid these delays you could reject the requests with a 'tcp-reset': #iptables -A INPUT -p tcp --dport 113 -j REJECT --reject-with tcp-reset #iptables -A OUTPUT -p tcp --sport 113 -m state --state RELATED -j ACCEPT # To log and drop invalid packets, mostly harmless packets that came in # after netfilter's timeout, sometimes scans: #iptables -I INPUT 1 -p tcp -m state --state INVALID -j LOG --log-prefix \ "FIREWALL:INVALID" #iptables -I INPUT 2 -p tcp -m state --state INVALID -j DROP # End /bin/firewall-start

Read the article
Vista 64-bit, DISK BOOT FAILURE

- by weka

So I have this Acer Aspire AX3200-U3600A with Windows Vista (64-bit). Every night I turn it off and turn it back on in the morning. Around three weeks ago, I did a fresh factory reimage. Good as new. Then around two days ago, when I turned it on, I noticed it was running extremly slow. As in, it would often freeze up while I had multiple applications open when it usually never froze up. So I decided to restart my computer. Big mistake. My computer froze right after I clicked shut-down. I waited a while. Nothing. Waited some minutes. Nope. I decided to shut it down by pressing the power button. Here is where the problems begin. When I turned it back on, I saw the Windows logo and loading bar and then it loaded to black. I turned it off again forcefully by power button and then once more... then I got: AMD Data Change... Update New Data to DMI! then later the screen clears and I get: AHCI Option ROM BIOS Revision: 01.05.92 Date: 02-19-2008 Copyright (c) 2006-2008 Phoenix Technologies, LTD Port 01: Reset Port Error!! Port 02: then the screen clears again but this time, this loads from the bottom: Nvidia Boot Agent 249.0542 (copyright stuff... blah blah) PXE-E61: Media test failure, check cable. PXE-M0F: Exiting Nvidia Boot Agent DISK BOOT FAILURE, INSERT SYSTEM DISK AND PRESS ENTER. So I try to go into Safe Mode. Well, first of all it doesn't load as fast. After it loads disk.sys from windows/drivers, it will wait a while (2-3 mins) THEN load. However it loads the Acer eRecovery Management Tool. I have three options: Reset computer to factory default, Restore computer from user's backup, or Exit. However, the top two options are gray and disabled where as the Exit is in blue and definitely clickable. So obviously safe mode is not there... A strong thing to note: In the beginning when all of this started, I did a Boot Windows Normal from pressing f8 and I got to my desktop! It logged me in. I could see the icons on my files. However my desktop was extremely slow as in when I clicked on the Start menu, it would wait a while, then load up the menu with JUST the gradient, no text or icons... so as you can see... it saw my HDD? Also, before anyone says, I have NO USB plugged in. My mouse and keyboard are not USB inputs, I assure you. And this came without a recovery CD AND when I went in BIOS, to change the BOOT ORDER, I did NOT see a CD-ROM option. And when I tried pressing ALT+F10 to get into Acer eRecovery Management, the top two options were disabled as well. But sometimes on start-up, I get: Windows has encountered a problem communicating with a device connected to your computer. This error can be caused by unplugging a removable storage device such as an external USB drive while the device is in use, or by faulty hardware such as a hard drive or CD-ROM drive that is failing. Make sure any removeable storage is properly connected and then restart your computer. If you continue to receive this error message, contact the hardware manufacturer. Status: 0xc00000e9 Info: An unexpected I/O error has occured. Then I tried Last Known Good Configuration Settings, that gives me a BSOD. What should I do/

Read the article
Source-control 'wet-work'?

- by Phil Factor

When a design or creative work is flawed beyond remedy, it is often best to destroy it and start again. The other day, I lost the code to a long and intricate SQL batch I was working on. I’d thought it was impossible, but it happened. With all the technology around that is designed to prevent this occurring, this sort of accident has become a rare event. If it weren’t for a deranged laptop, and my distraction, the code wouldn’t have been lost this time. As always, I sighed, had a soothing cup of tea, and typed it all in again. The new code I hastily tapped in was much better: I’d held in my head the essence of how the code should work rather than the details: I now knew for certain the start point, the end, and how it should be achieved. Instantly the detritus of half-baked thoughts fell away and I was able to write logical code that performed better. Because I could work so quickly, I was able to hold the details of all the columns and variables in my head, and the dynamics of the flow of data. It was, in fact, easier and quicker to start from scratch rather than tidy up and refactor the existing code with its inevitable fumbling and half-baked ideas. What a shame that technology is now so good that developers rarely experience the cleansing shock of losing one’s code and having to rewrite it from scratch. If you’ve never accidentally lost your code, then it is worth doing it deliberately once for the experience. Creative people have, until Technology mistakenly prevented it, torn up their drafts or sketches, threw them in the bin, and started again from scratch. Leonardo’s obsessive reworking of the Mona Lisa was renowned because it was so unusual: Most artists have been utterly ruthless in destroying work that didn’t quite make it. Authors are particularly keen on writing afresh, and the results are generally positive. Lawrence of Arabia actually lost the entire 250,000 word manuscript of ‘The Seven Pillars of Wisdom’ by accidentally leaving it on a train at Reading station, before rewriting a much better version. Now, any writer or artist is seduced by technology into altering or refining their work rather than casting it dramatically in the bin or setting a light to it on a bonfire, and rewriting it from the blank page. It is easy to pick away at a flawed work, but the real creative process is far more brutal. Once, many years ago whilst running a software house that supplied commercial software to local businesses, I’d been supervising an accounting system for a farming cooperative. No packaged system met their needs, and it was all hand-cut code. For us, it represented a breakthrough as it was for a government organisation, and success would guarantee more contracts. As you’ve probably guessed, the code got mangled in a disk crash just a week before the deadline for delivery, and the many backups all proved to be entirely corrupted by a faulty tape drive. There were some fragments left on individual machines, but they were all of different versions. The developers were in despair. Strangely, I managed to re-write the bulk of a three-month project in a manic and caffeine-soaked weekend. Sure, that elegant universally-applicable input-form routine was‘nt quite so elegant, but it didn’t really need to be as we knew what forms it needed to support. Yes, the code lacked architectural elegance and reusability. By dawn on Monday, the application passed its integration tests. The developers rose to the occasion after I’d collapsed, and tidied up what I’d done, though they were reproachful that some of the style and elegance had gone out of the application. By the delivery date, we were able to install it. It was a smaller, faster application than the beta they’d seen and the user-interface had a new, rather Spartan, appearance that we swore was done to conform to the latest in user-interface guidelines. (we switched to Helvetica font to look more ‘Bauhaus’ ). The client was so delighted that he forgave the new bugs that had crept in. I still have the disk that crashed, up in the attic. In IT, we have had mixed experiences from complete re-writes. Lotus 123 never really recovered from a complete rewrite from assembler into C, Borland made the mistake with Arago and Quattro Pro and Netscape’s complete rewrite of their Navigator 4 browser was a white-knuckle ride. In all cases, the decision to rewrite was a result of extreme circumstances where no other course of action seemed possible. The rewrite didn’t come out of the blue. I prefer to remember the rewrite of Minix by young Linus Torvalds, or the rewrite of Bitkeeper by a slightly older Linus. The rewrite of CP/M didn’t do too badly either, did it? Come to think of it, the guy who decided to rewrite the windowing system of the Xerox Star never regretted the decision. I’ll agree that one should often resist calls for a rewrite. One of the worst habits of the more inexperienced programmer is to denigrate whatever code he or she inherits, and then call loudly for a complete rewrite. They are buoyed up by the mistaken belief that they can do better. This, however, is a different psychological phenomenon, more related to the idea of some motorcyclists that they are operating on infinite lives, or the occasional squaddies that if they charge the machine-guns determinedly enough all will be well. Grim experience brings out the humility in any experienced programmer. I’m referring to quite different circumstances here. Where a team knows the requirements perfectly, are of one mind on methodology and coding standards, and they already have a solution, then what is wrong with considering a complete rewrite? Rewrites are so painful in the early stages, until that point where one realises the payoff, that even I quail at the thought. One needs a natural disaster to push one over the edge. The trouble is that source-control systems, and disaster recovery systems, are just too good nowadays. If I were to lose this draft of this very blog post, I know I’d rewrite it much better. However, if you read this, you’ll know I didn’t have the nerve to delete it and start again. There was a time that one prayed that unreliable hardware would deliver you from an unmaintainable mess of a codebase, but now technology has made us almost entirely immune to such a merciful act of God. An old friend of mine with long experience in the software industry has long had the idea of the ‘source-control wet-work’, where one hires a malicious hacker in some wild eastern country to hack into one’s own source control system to destroy all trace of the source to an application. Alas, backup systems are just too good to make this any more than a pipedream. Somehow, it would be difficult to promote the idea. As an alternative, could one construct a source control system that, on doing all the code-quality metrics, would systematically destroy all trace of source code that failed the quality test? Alas, I can’t see many managers buying into the idea. In reading the full story of the near-loss of Toy Story 2, it set me thinking. It turned out that the lucky restoration of the code wasn’t the happy ending one first imagined it to be, because they eventually came to the conclusion that the plot was fundamentally flawed and it all had to be rewritten anyway. Was this an early case of the ‘source-control wet-job’?’ It is very hard nowadays to do a rapid U-turn in a development project because we are far too prone to cling to our existing source-code.

Read the article
How to Identify Which Hardware Component is Failing in Your Computer

- by Chris Hoffman

Concluding that your computer has a hardware problem is just the first step. If you’re dealing with a hardware issue and not a software issue, the next step is determining what hardware problem you’re actually dealing with. If you purchased a laptop or pre-built desktop PC and it’s still under warranty, you don’t need to care about this. Have the manufacturer fix the PC for you — figuring it out is their problem. If you’ve built your own PC or you want to fix a computer that’s out of warranty, this is something you’ll need to do on your own. Blue Screen 101: Search for the Error Message This may seem like obvious advice, but searching for information about a blue screen’s error message can help immensely. Most blue screens of death you’ll encounter on modern versions of Windows will likely be caused by hardware failures. The blue screen of death often displays information about the driver that crashed or the type of error it encountered. For example, let’s say you encounter a blue screen that identified “NV4_disp.dll” as the driver that caused the blue screen. A quick Google search will reveal that this is the driver for NVIDIA graphics cards, so you now have somewhere to start. It’s possible that your graphics card is failing if you encounter such an error message. Check Hard Drive SMART Status Hard drives have a built in S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) feature. The idea is that the hard drive monitors itself and will notice if it starts to fail, providing you with some advance notice before the drive fails completely. This isn’t perfect, so your hard drive may fail even if SMART says everything is okay. If you see any sort of “SMART error” message, your hard drive is failing. You can use SMART analysis tools to view the SMART health status information your hard drives are reporting. Test Your RAM RAM failure can result in a variety of problems. If the computer writes data to RAM and the RAM returns different data because it’s malfunctioning, you may see application crashes, blue screens, and file system corruption. To test your memory and see if it’s working properly, use Windows’ built-in Memory Diagnostic tool. The Memory Diagnostic tool will write data to every sector of your RAM and read it back afterwards, ensuring that all your RAM is working properly. Check Heat Levels How hot is is inside your computer? Overheating can rsult in blue screens, crashes, and abrupt shut downs. Your computer may be overheating because you’re in a very hot location, it’s ventilated poorly, a fan has stopped inside your computer, or it’s full of dust. Your computer monitors its own internal temperatures and you can access this information. It’s generally available in your computer’s BIOS, but you can also view it with system information utilities such as SpeedFan or Speccy. Check your computer’s recommended temperature level and ensure it’s within the appropriate range. If your computer is overheating, you may see problems only when you’re doing something demanding, such as playing a game that stresses your CPU and graphics card. Be sure to keep an eye on how hot your computer gets when it performs these demanding tasks, not only when it’s idle. Stress Test Your CPU You can use a utility like Prime95 to stress test your CPU. Such a utility will fore your computer’s CPU to perform calculations without allowing it to rest, working it hard and generating heat. If your CPU is becoming too hot, you’ll start to see errors or system crashes. Overclockers use Prime95 to stress test their overclock settings — if Prime95 experiences errors, they throttle back on their overclocks to ensure the CPU runs cooler and more stable. It’s a good way to check if your CPU is stable under load. Stress Test Your Graphics Card Your graphics card can also be stress tested. For example, if your graphics driver crashes while playing games, the games themselves crash, or you see odd graphical corruption, you can run a graphics benchmark utility like 3DMark. The benchmark will stress your graphics card and, if it’s overheating or failing under load, you’ll see graphical problems, crashes, or blue screens while running the benchmark. If the benchmark seems to work fine but you have issues playing a certain game, it may just be a problem with that game. Swap it Out Not every hardware problem is easy to diagnose. If you have a bad motherboard or power supply, their problems may only manifest through occasional odd issues with other components. It’s hard to tell if these components are causing problems unless you replace them completely. Ultimately, the best way to determine whether a component is faulty is to swap it out. For example, if you think your graphics card may be causing your computer to blue screen, pull the graphics card out of your computer and swap in a new graphics card. If everything is working well, it’s likely that your previous graphics card was bad. This isn’t easy for people who don’t have boxes of components sitting around, but it’s the ideal way to troubleshoot. Troubleshooting is all about trial and error, and swapping components out allows you to pin down which component is actually causing the problem through a process of elimination. This isn’t a complete guide to everything that could likely go wrong and how to identify it — someone could write a full textbook on identifying failing components and still not cover everything. But the tips above should give you some places to start dealing with the more common problems. Image Credit: Justin Marty on Flickr

Read the article
Source-control 'wet-work'?

- by Phil Factor

When a design or creative work is flawed beyond remedy, it is often best to destroy it and start again. The other day, I lost the code to a long and intricate SQL batch I was working on. I’d thought it was impossible, but it happened. With all the technology around that is designed to prevent this occurring, this sort of accident has become a rare event. If it weren’t for a deranged laptop, and my distraction, the code wouldn’t have been lost this time. As always, I sighed, had a soothing cup of tea, and typed it all in again. The new code I hastily tapped in was much better: I’d held in my head the essence of how the code should work rather than the details: I now knew for certain the start point, the end, and how it should be achieved. Instantly the detritus of half-baked thoughts fell away and I was able to write logical code that performed better. Because I could work so quickly, I was able to hold the details of all the columns and variables in my head, and the dynamics of the flow of data. It was, in fact, easier and quicker to start from scratch rather than tidy up and refactor the existing code with its inevitable fumbling and half-baked ideas. What a shame that technology is now so good that developers rarely experience the cleansing shock of losing one’s code and having to rewrite it from scratch. If you’ve never accidentally lost your code, then it is worth doing it deliberately once for the experience. Creative people have, until Technology mistakenly prevented it, torn up their drafts or sketches, threw them in the bin, and started again from scratch. Leonardo’s obsessive reworking of the Mona Lisa was renowned because it was so unusual: Most artists have been utterly ruthless in destroying work that didn’t quite make it. Authors are particularly keen on writing afresh, and the results are generally positive. Lawrence of Arabia actually lost the entire 250,000 word manuscript of ‘The Seven Pillars of Wisdom’ by accidentally leaving it on a train at Reading station, before rewriting a much better version. Now, any writer or artist is seduced by technology into altering or refining their work rather than casting it dramatically in the bin or setting a light to it on a bonfire, and rewriting it from the blank page. It is easy to pick away at a flawed work, but the real creative process is far more brutal. Once, many years ago whilst running a software house that supplied commercial software to local businesses, I’d been supervising an accounting system for a farming cooperative. No packaged system met their needs, and it was all hand-cut code. For us, it represented a breakthrough as it was for a government organisation, and success would guarantee more contracts. As you’ve probably guessed, the code got mangled in a disk crash just a week before the deadline for delivery, and the many backups all proved to be entirely corrupted by a faulty tape drive. There were some fragments left on individual machines, but they were all of different versions. The developers were in despair. Strangely, I managed to re-write the bulk of a three-month project in a manic and caffeine-soaked weekend. Sure, that elegant universally-applicable input-form routine was‘nt quite so elegant, but it didn’t really need to be as we knew what forms it needed to support. Yes, the code lacked architectural elegance and reusability. By dawn on Monday, the application passed its integration tests. The developers rose to the occasion after I’d collapsed, and tidied up what I’d done, though they were reproachful that some of the style and elegance had gone out of the application. By the delivery date, we were able to install it. It was a smaller, faster application than the beta they’d seen and the user-interface had a new, rather Spartan, appearance that we swore was done to conform to the latest in user-interface guidelines. (we switched to Helvetica font to look more ‘Bauhaus’ ). The client was so delighted that he forgave the new bugs that had crept in. I still have the disk that crashed, up in the attic. In IT, we have had mixed experiences from complete re-writes. Lotus 123 never really recovered from a complete rewrite from assembler into C, Borland made the mistake with Arago and Quattro Pro and Netscape’s complete rewrite of their Navigator 4 browser was a white-knuckle ride. In all cases, the decision to rewrite was a result of extreme circumstances where no other course of action seemed possible. The rewrite didn’t come out of the blue. I prefer to remember the rewrite of Minix by young Linus Torvalds, or the rewrite of Bitkeeper by a slightly older Linus. The rewrite of CP/M didn’t do too badly either, did it? Come to think of it, the guy who decided to rewrite the windowing system of the Xerox Star never regretted the decision. I’ll agree that one should often resist calls for a rewrite. One of the worst habits of the more inexperienced programmer is to denigrate whatever code he or she inherits, and then call loudly for a complete rewrite. They are buoyed up by the mistaken belief that they can do better. This, however, is a different psychological phenomenon, more related to the idea of some motorcyclists that they are operating on infinite lives, or the occasional squaddies that if they charge the machine-guns determinedly enough all will be well. Grim experience brings out the humility in any experienced programmer. I’m referring to quite different circumstances here. Where a team knows the requirements perfectly, are of one mind on methodology and coding standards, and they already have a solution, then what is wrong with considering a complete rewrite? Rewrites are so painful in the early stages, until that point where one realises the payoff, that even I quail at the thought. One needs a natural disaster to push one over the edge. The trouble is that source-control systems, and disaster recovery systems, are just too good nowadays. If I were to lose this draft of this very blog post, I know I’d rewrite it much better. However, if you read this, you’ll know I didn’t have the nerve to delete it and start again. There was a time that one prayed that unreliable hardware would deliver you from an unmaintainable mess of a codebase, but now technology has made us almost entirely immune to such a merciful act of God. An old friend of mine with long experience in the software industry has long had the idea of the ‘source-control wet-work’, where one hires a malicious hacker in some wild eastern country to hack into one’s own source control system to destroy all trace of the source to an application. Alas, backup systems are just too good to make this any more than a pipedream. Somehow, it would be difficult to promote the idea. As an alternative, could one construct a source control system that, on doing all the code-quality metrics, would systematically destroy all trace of source code that failed the quality test? Alas, I can’t see many managers buying into the idea. In reading the full story of the near-loss of Toy Story 2, it set me thinking. It turned out that the lucky restoration of the code wasn’t the happy ending one first imagined it to be, because they eventually came to the conclusion that the plot was fundamentally flawed and it all had to be rewritten anyway. Was this an early case of the ‘source-control wet-job’?’ It is very hard nowadays to do a rapid U-turn in a development project because we are far too prone to cling to our existing source-code.

Read the article
General Policies and Procedures for Maintaining the Value of Data Assets

Here is a general list for policies and procedures regarding maintaining the value of data assets. Data Backup Policies and Procedures Backups are very important when dealing with data because there is always the chance of losing data due to faulty hardware or a user activity. So the need for a strategic backup system should be mandatory for all companies. This being said, in the real world some companies that I have worked for do not really have a good data backup plan. Typically when companies tend to take this kind of approach in data backups usually the data is not really recoverable. Unfortunately when companies do not regularly test their backup plans they get a false sense of security because they think that they are covered. However, I can tell you from personal and professional experience that a backup plan/system is never fully implemented until it is regularly tested prior to the time when it actually needs to be used. Disaster Recovery Plan Expanding on Backup Policies and Procedures, a company needs to also have a disaster recovery plan in order to protect its data in case of a catastrophic disaster. Disaster recovery plans typically encompass how to restore all of a company’s data and infrastructure back to a restored operational status. Most Disaster recovery plans also include time estimates on how long each step of the disaster recovery plan should take to be executed. It is important to note that disaster recovery plans are never fully implemented until they have been tested just like backup plans. Disaster recovery plans should be tested regularly so that the business can be confident in not losing any or minimal data due to a catastrophic disaster. Firewall Policies and Content Filters One way companies can protect their data is by using a firewall to separate their internal network from the outside. Firewalls allow for enabling or disabling network access as data passes through it by applying various defined restrictions. Furthermore firewalls can also be used to prevent access from the internal network to the outside by these same factors. Common Firewall Restrictions Destination/Sender IP Address Destination/Sender Host Names Domain Names Network Ports Companies can also desire to restrict what their network user’s view on the internet through things like content filters. Content filters allow a company to track what webpages a person has accessed and can also restrict user’s access based on established rules set up in the content filter. This device and/or software can block access to domains or specific URLs based on a few factors. Common Content Filter Criteria Known malicious sites Specific Page Content Page Content Theme Anti-Virus/Mal-ware Polices Fortunately, most companies utilize antivirus programs on all computers and servers for good reason, virus have been known to do the following: Corrupt/Invalidate Data, Destroy Data, and Steal Data. Anti-Virus applications are a great way to prevent any malicious application from being able to gain access to a company’s data. However, anti-virus programs must be constantly updated because new viruses are always being created, and the anti-virus vendors need to distribute updates to their applications so that they can catch and remove them. Data Validation Policies and Procedures Data validation is very important to ensure that only accurate information is stored. The existence of invalid data can cause major problems when businesses attempt to use data for knowledge based decisions and for performance reporting. Data Scrubbing Policies and Procedures Data scrubbing is valuable to companies in one of two ways. The first can be used to clean data prior to being analyzed for report generation. The second is that it allows companies to remove things like personally Identifiable information from its data prior to transmit it between multiple environments or if the information is sent to an external location. An example of this can be seen with medical records in regards to HIPPA laws that prohibit the storage of specific personal and medical information. Additionally, I have professionally run in to a scenario where the Canadian government does not allow any Canadian’s personal information to be stored on a server not located in Canada. Encryption Practices The use of encryption is very valuable when a company needs to any personal information. This allows users with the appropriated access levels to view or confirm the existence or accuracy of data within a system by either decrypting the information or encrypting a piece of data and comparing it to the stored version. Additionally, if for some unforeseen reason the data got in to the wrong hands then they would have to first decrypt the data before they could even be able to read it. Encryption just adds and additional layer of protection around data itself. Standard Normalization Practices The use of standard data normalization practices is very important when dealing with data because it can prevent allot of potential issues by eliminating the potential for unnecessary data duplication. Issues caused by data duplication include excess use of data storage, increased chance for invalidated data, and over use of data processing. Network and Database Security/Access Policies Every company has some form of network/data access policy even if they have none. These policies help secure data from being seen by inappropriate users along with preventing the data from being updated or deleted by users. In addition, without a good security policy there is a large potential for data to be corrupted by unassuming users or even stolen. Data Storage Policies Data storage polices are very important depending on how they are implemented especially when a company is trying to utilize them in conjunction with other policies like Data Backups. I have worked at companies where all network user folders are constantly backed up, and if a user wanted to ensure the existence of a piece of data in the form of a file then they had to store that file in their network folder. Conversely, I have also worked in places where when a user logs on or off of the network there entire user profile is backed up. Training Policies One of the biggest ways to prevent data loss and ensure that data will remain a company asset is through training. The practice of properly train employees on how to work with in systems that access data is crucial when trying to ensure a company’s data will remain an asset. Users need to be trained on how to manipulate a company’s data in order to perform their tasks to reduce the chances of invalidating data.

Read the article
To SYNC or not to SYNC – Part 3

- by AshishRay

I can't believe it has been almost a year since my last blog post. I know, that's an absolute no-no in the blogosphere. And I know that "I have been busy" is not a good excuse. So - without trying to come up with an excuse - let me state this - my apologies for taking such a long time to write the next Part. Without further ado, here goes. This is Part 3 of a multi-part blog article where we are discussing various aspects of setting up Data Guard synchronous redo transport (SYNC). In Part 1 of this article, I debunked the myth that Data Guard SYNC is similar to a two-phase commit operation. In Part 2, I discussed the various ways that network latency may or may not impact a Data Guard SYNC configuration. In this article, I will talk in details regarding why Data Guard SYNC is a good thing. I will also talk about distance implications for setting up such a configuration. So, Why Good? Why is Data Guard SYNC a good thing? Because, at the end of the day, this gives you the assurance of zero data loss - it doesn’t matter what outage may befall your primary system. Befall! Boy, that sounds theatrical. But seriously - think about this - it minimizes your data risks. That’s a big deal. Whether you have an outage due to bad disks, faulty hardware components, hardware / software bugs, physical data corruptions, power failures, lightning that takes out significant part of your data center, fire that melts your assets, water leakage from the cooling system, human errors such as accidental deletion of online redo log files - it doesn’t matter - you can have that “Om - peace” look on your face and then you can failover to the standby system, without losing a single bit of data in your Oracle database. You will be a hero, as shown in this not so imaginary conversation: IT Manager: Well, what’s the status? You: John is doing the trace analysis on the storage array. IT Manager: So? How long is that gonna take? You: Well, he is stuck, waiting for a response from <insert your not-so-favorite storage vendor here>. IT Manager: So, no root cause yet? You: I told you, he is stuck. We have escalated with their Support, but you know how long these things take. IT Manager: Darn it - the site is down! You: Not really … IT Manager: What do you mean? You: John is stuck, but Sreeni has already done a failover to the Data Guard standby. IT Manager: Whoa, whoa - wait! Failover means we lost some data, why did you do this without letting the Business group know? You: We didn’t lose any data. Remember, we had set up Data Guard with SYNC? So now, any problems on the production – we just failover. No data loss, and we are up and running in minutes. The Business guys don’t need to know. IT Manager: Wow! Are we great or what!! You: I guess … Ok, so you get it - SYNC is good. But as my dear friend Larry Carpenter says, “TANSTAAFL”, or "There ain't no such thing as a free lunch". Yes, of course - investing in Data Guard SYNC means that you have to invest in a low-latency network, you have to monitor your applications and database especially in peak load conditions, and you cannot under-provision your standby systems. But all these are good and necessary things, if you are supporting mission-critical apps that are supposed to be running 24x7. The peace of mind that this investment will give you is priceless, especially if you are serious about HA. How Far Can We Go? Someone may say at this point - well, I can’t use Data Guard SYNC over my coast-to-coast deployment. Most likely - true. So how far can you go? Well, we have customers who have deployed Data Guard SYNC over 300+ miles! Does this mean that you can also deploy over similar distances? Duh - no! I am going to say something here that most IT managers don’t like to hear - “It depends!” It depends on your application design, application response time / throughput requirements, network topology, etc. However, because of the optimal way we do SYNC, customers have been able to stretch Data Guard SYNC deployments over longer distances compared to traditional, storage-centric ways of doing this. The MAA Database 10.2 best practices paper Data Guard Redo Transport & Network Configuration, and Oracle Database 11.2 High Availability Best Practices Manual talk about some of these SYNC-related metrics. For example, a test deployment of Data Guard SYNC over 330 miles with 10ms latency showed an impact less than 5% for a busy OLTP application. Even if you can’t deploy Data Guard SYNC over your WAN distance, or if you already have an ASYNC standby located 1000-s of miles away, here’s another nifty way to boost your HA. Have a local standby, configured SYNC. How local is “local”? Again - it depends. One customer runs a local SYNC standby across the campus. Another customer runs it across 15 miles in another data center. Both of these customers are running Data Guard SYNC as their HA standard. If a localized outage affects their primary system, no problem! They have all the data available on the standby, to which they can failover. Very fast. In seconds. Wait - did I say “seconds”? Yes, Virginia, there is a Santa Claus. But you have to wait till the next blog article to find out more. I assure you tho’ that this time you won’t have to wait for another year for this.

Read the article
RHEL - blocked FC remote port time out: saving binding

- by Dev G

My Server went into a faulty state since the database could not write on the partition. I found out that the partition went into Read Only mode. Finally to fix it, I had to do a hard reboot. Linux 2.6.18-164.el5PAE #1 SMP Tue Aug 18 15:59:11 EDT 2009 i686 i686 i386 GNU/Linux /var/log/messages Oct 31 00:56:45 ota3g1 Had[17275]: VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group sg_network Oct 31 00:57:05 ota3g1 Had[17275]: VCS CRITICAL V-16-1-50086 CPU usage on ota3g1.mtsallstream.com is 100% Oct 31 01:01:47 ota3g1 Had[17275]: VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group sg_network Oct 31 01:06:50 ota3g1 Had[17275]: VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group sg_network Oct 31 01:11:52 ota3g1 Had[17275]: VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group sg_network Oct 31 01:12:10 ota3g1 kernel: lpfc 0000:29:00.1: 1:1305 Link Down Event x2 received Data: x2 x20 x80000 x0 x0 Oct 31 01:12:10 ota3g1 kernel: lpfc 0000:29:00.1: 1:1303 Link Up Event x3 received Data: x3 x1 x10 x1 x0 x0 0 Oct 31 01:12:12 ota3g1 kernel: lpfc 0000:29:00.1: 1:1305 Link Down Event x4 received Data: x4 x20 x80000 x0 x0 Oct 31 01:12:40 ota3g1 kernel: rport-8:0-0: blocked FC remote port time out: saving binding Oct 31 01:12:40 ota3g1 kernel: lpfc 0000:29:00.1: 1:(0):0203 Devloss timeout on WWPN 20:25:00:a0:b8:74:f5:65 NPort x0000e4 Data: x0 x7 x0 Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 38617577 Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 283532153 Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 90825 Oct 31 01:12:40 ota3g1 kernel: Aborting journal on device dm-16. Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 868841 Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: Aborting journal on device dm-10. Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 37759889 Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 283349449 Oct 31 01:12:40 ota3g1 kernel: printk: 6 messages suppressed. Oct 31 01:12:40 ota3g1 kernel: Aborting journal on device dm-12. Oct 31 01:12:40 ota3g1 kernel: EXT3-fs error (device dm-12) in ext3_reserve_inode_write: Journal has aborted Oct 31 01:12:40 ota3g1 kernel: Buffer I/O error on device dm-16, logical block 1545 Oct 31 01:12:40 ota3g1 kernel: lost page write due to I/O error on dm-16 Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 12745 Oct 31 01:12:40 ota3g1 kernel: Buffer I/O error on device dm-10, logical block 1545 Oct 31 01:12:40 ota3g1 kernel: EXT3-fs error (device dm-16) in ext3_reserve_inode_write: Journal has aborted Oct 31 01:12:40 ota3g1 kernel: lost page write due to I/O error on dm-10 Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 37749121 Oct 31 01:12:40 ota3g1 kernel: Buffer I/O error on device dm-12, logical block 0 Oct 31 01:12:40 ota3g1 kernel: lost page write due to I/O error on dm-12 Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: EXT3-fs error (device dm-12) in ext3_dirty_inode: Journal has aborted Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 37757897 Oct 31 01:12:40 ota3g1 kernel: Buffer I/O error on device dm-12, logical block 1097 Oct 31 01:12:40 ota3g1 kernel: lost page write due to I/O error on dm-12 Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 283337089 Oct 31 01:12:40 ota3g1 kernel: Buffer I/O error on device dm-16, logical block 0 Oct 31 01:12:40 ota3g1 kernel: lost page write due to I/O error on dm-16 Oct 31 01:12:40 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:40 ota3g1 kernel: EXT3-fs error (device dm-16) in ext3_dirty_inode: Journal has aborted Oct 31 01:12:40 ota3g1 kernel: end_request: I/O error, dev sdi, sector 37749121 Oct 31 01:12:40 ota3g1 kernel: Buffer I/O error on device dm-12, logical block 0 Oct 31 01:12:41 ota3g1 kernel: lost page write due to I/O error on dm-12 Oct 31 01:12:41 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 Oct 31 01:12:41 ota3g1 kernel: end_request: I/O error, dev sdi, sector 283337089 Oct 31 01:12:41 ota3g1 kernel: Buffer I/O error on device dm-16, logical block 0 Oct 31 01:12:41 ota3g1 kernel: lost page write due to I/O error on dm-16 Oct 31 01:12:41 ota3g1 kernel: sd 8:0:0:4: SCSI error: return code = 0x00010000 df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/cciss-root 4.9G 730M 3.9G 16% / /dev/mapper/cciss-home 9.7G 1.2G 8.1G 13% /home /dev/mapper/cciss-var 9.7G 494M 8.8G 6% /var /dev/mapper/cciss-usr 15G 2.6G 12G 19% /usr /dev/mapper/cciss-tmp 3.9G 153M 3.6G 5% /tmp /dev/sda1 996M 43M 902M 5% /boot tmpfs 5.9G 0 5.9G 0% /dev/shm /dev/mapper/cciss-product 25G 16G 7.4G 68% /product /dev/mapper/cciss-opt 20G 4.5G 14G 25% /opt /dev/mapper/dg_db1-vol_db1_system 18G 2.2G 15G 14% /database/OTADB/sys /dev/mapper/dg_db1-vol_db1_undo 18G 5.8G 12G 35% /database/OTADB/undo /dev/mapper/dg_db1-vol_db1_redo 8.9G 4.3G 4.2G 51% /database/OTADB/redo /dev/mapper/dg_db1-vol_db1_sgbd 8.9G 654M 7.8G 8% /database/OTADB/admin /dev/mapper/dg_db1-vol_db1_arch 98G 24G 69G 26% /database/OTADB/arch /dev/mapper/dg_db1-vol_db1_indexes 240G 14G 214G 6% /database/OTADB/index /dev/mapper/dg_db1-vol_db1_data 275G 47G 215G 18% /database/OTADB/data /dev/mapper/dg_dbrman-vol_db_rman 8.9G 351M 8.1G 5% /database/RMAN /dev/mapper/dg_app1-vol_app1 151G 113G 31G 79% /files/ota /etc/fstab /dev/cciss/root / ext3 defaults 1 1 /dev/cciss/home /home ext3 defaults 1 2 /dev/cciss/var /var ext3 defaults 1 2 /dev/cciss/usr /usr ext3 defaults 1 2 /dev/cciss/tmp /tmp ext3 defaults 1 2 LABEL=/boot /boot ext3 defaults 1 2 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 /dev/cciss/swap swap swap defaults 0 0 /dev/cciss/product /product ext3 defaults 1 2 /dev/cciss/opt /opt ext3 defaults 1 2 /dev/dg_db1/vol_db1_system /database/OTADB/sys ext3 defaults 1 2 /dev/dg_db1/vol_db1_undo /database/OTADB/undo ext3 defaults 1 2 /dev/dg_db1/vol_db1_redo /database/OTADB/redo ext3 defaults 1 2 /dev/dg_db1/vol_db1_sgbd /database/OTADB/admin ext3 defaults 1 2 /dev/dg_db1/vol_db1_arch /database/OTADB/arch ext3 defaults 1 2 /dev/dg_db1/vol_db1_indexes /database/OTADB/index ext3 defaults 1 2 /dev/dg_db1/vol_db1_data /database/OTADB/data ext3 defaults 1 2 /dev/dg_dbrman/vol_db_rman /database/RMAN ext3 defaults 1 2 /dev/dg_app1/vol_app1 /files/ota ext3 defaults 1 2 Thanks for all the help.

Read the article
Linux networking crash: best steps to find out the cause?

- by Aron Rotteveel

One of our Linux (CentOS) servers was unreachable last night. The server was not reachable in any way except for the remote console. After logging in with the remote console, it turned out I could not ping any outside hosts either. A simple service network restart solved the issue, but I am still wondering what could have caused this. My log files seem to indicate no error at all (except for the various daemons that need a network connection and failed after the network failure). Are there any additional steps I can take to find out the cause of this problem? EDIT: this just happened again. The server was completely unresponsive until I issued a networking service restart. Any advise is welcome. Could this be caused by a faulty hardware component? As per Madhatters request, here are some excerpts from the log at the time (the network crashed at 20:13): /var/log/messages: Dec 2 20:01:05 graviton kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=<stripped> SRC=<stripped> DST=<stripped> LEN=40 TOS=0x00 PREC=0x00 TTL=101 ID=256 PROTO=TCP SPT=6000 DPT=3306 WINDOW=16384 RES=0x00 SYN URGP=0 Dec 2 20:01:05 graviton kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=<stripped> SRC=<stripped> DST=<stripped> LEN=40 TOS=0x00 PREC=0x00 TTL=100 ID=256 PROTO=TCP SPT=6000 DPT=3306 WINDOW=16384 RES=0x00 SYN URGP=0 Dec 2 20:01:05 graviton kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=<stripped> SRC=<stripped> DST=<stripped> LEN=40 TOS=0x00 PREC=0x00 TTL=101 ID=256 PROTO=TCP SPT=6000 DPT=3306 WINDOW=16384 RES=0x00 SYN URGP=0 Dec 2 20:13:34 graviton junglediskserver: Connection to gateway failed: xGatewayTransport - Connection to gateway failed. The first three messages are simple responses to iptables rules I have set up through the LFD firewall. The last message indicates that JungleDisk, which I use for backups can no longer connect to the gateway. Apart from this, there are no interesting messages around this time. EDIT 4 dec: as per Mattdm's request, here is the output of ethtool eth0: (Please not that these are the settings that currently work. If things go wrong again, I will be sure to post this again if necessary. Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: yes As per Joris' request, here is also the output of route -n: aron@graviton [~]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface xx.xx.xx.58 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.42 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.43 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.41 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.46 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.47 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.44 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.45 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.50 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.51 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.48 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.49 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.54 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.52 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.53 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.0 0.0.0.0 255.255.255.192 U 0 0 0 eth0 xx.xx.xx.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 0.0.0.0 xx.xx.xx.62 0.0.0.0 UG 0 0 0 eth0 The bottom xx.62 is my gateway. EDIT december 28th: the problem occurred again and I got the chance to compare some of the outputs of the above tests. What I found out is that arp -an returns an incomplete MAC address for my gateway (which is not under my control; the server is in a shared rack): During failure: ? (xx.xx.xx.62) at <incomplete> on eth0 After service network restart: ? (xx.xx.xx.62) at 00:00:0C:9F:F0:30 [ether] on eth0 Is this something I can fix or is it time for me to contact the data centre?

Read the article
Confirm disk is broken when it passes all diagnostics

- by Halfgaar

I have a system with a potentially broken disk, but the disk passes all manner of diagnostics. I have been unable to confirm that the disk is broken. What are my options? I could just replace the disk, but because this situation is very similar to another more severe situation I have (long story), I'd like to actually make a proper diagnosis as opposed to randomly binning hardware. The issue and history is this: I had a Debian Linux PC (500 MHz P3) acting as router, nagios and munin. It crashed every couple of weeks. No logs or dmesg could be obtained (because it's an old Compaq that only boots when you configure it as keyboardless, making connecting a keyboard later, once it's booted, impossible). At the time, I just replaced the computer with another Compaq (P4 2.4 GHz) because I thought the hardware was faulty. However, it still crashed every couple of weeks. the difference is that on this computer, I can still SSH into it. It gives all kinds of errors on hda. I'd like to confirm that the disk is broken, but nothing I do confirms this: SMART error logs shows no errors. Normally when a disk starts acting up, SMART my pass, but it still records a read-error in the error log. SMART self-test (smartctl -t long /dev/sda) completes without errors. re-allocated sector count (a tell-tale parameter) has been 31 all its life, even when the disk was still in use in my desktop PC years ago, and it still is. The figure never changed. dd if=/dev/sda of=/dev/null bs=4096 passes with flying colors. What else can I do to assess the health of the drive? Again, this is not about making this router fully functional again, this is a disk forensic question, because it just so happens that I have another server that potentially has the same problem, and knowing the answer to this will possibly help me greatly. For the record, below are logs and such. This is the smartctl -a output: smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family Device Model: ST3120026A Serial Number: 5JT1CLQM Firmware Version: 3.06 User Capacity: 120,034,123,776 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2 Local Time is: Mon Jul 1 21:18:33 2013 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 24) The self-test routine was aborted by the host. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 85) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 050 046 006 Pre-fail Always - 47766662 3 Spin_Up_Time 0x0003 097 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 31 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 820305 9 Power_On_Hours 0x0032 048 048 000 Old_age Always - 46373 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 605 194 Temperature_Celsius 0x0022 036 065 000 Old_age Always - 36 195 Hardware_ECC_Recovered 0x001a 050 046 000 Old_age Always - 47766662 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 196 000 Old_age Always - 6 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Aborted by host 80% 46361 - # 2 Extended offline Completed without error 00% 46358 - # 3 Short offline Completed without error 00% 12046 - # 4 Extended offline Completed without error 00% 10472 - # 5 Short offline Completed without error 00% 10471 - # 6 Short offline Completed without error 00% 10471 - # 7 Short offline Completed without error 00% 6770 - # 8 Extended offline Aborted by host 90% 5958 - # 9 Extended offline Aborted by host 90% 5951 - #10 Short offline Completed without error 00% 5024 - #11 Extended offline Aborted by host 80% 5024 - #12 Short offline Completed without error 00% 3697 - #13 Short offline Completed without error 00% 237 - #14 Short offline Completed without error 00% 145 - #15 Short offline Completed without error 00% 69 - #16 Extended offline Completed without error 00% 68 - #17 Short offline Completed without error 00% 66 - #18 Short offline Completed without error 00% 49 - #19 Short offline Completed without error 00% 29 - #20 Short offline Completed without error 00% 29 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. And this is the dmesg error when it has crashed (which repeats for a bunch of different sectors): [1755091.211136] sd 0:0:0:0: [sda] Unhandled error code [1755091.211144] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [1755091.211151] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 08 fe ad 38 00 00 08 00 [1755091.211166] end_request: I/O error, dev sda, sector 150908216

Read the article
mdadm raid5 recover double disk failure - with a twist (drive order)

- by Peter Bos

Let me acknowledge first off that I have made mistakes, and that I have a backup for most but not all of the data on this RAID. I still have hope of recovering the rest of the data. I don't have the kind of money to take the drives to a recovery expert company. Mistake #0, not having a 100% backup. I know. I have a mdadm RAID5 system of 4x3TB. Drives /dev/sd[b-e], all with one partition /dev/sd[b-e]1. I'm aware that RAID5 on very large drives is risky, yet I did it anyway. Recent events The RAID become degraded after a two drive failure. One drive [/dev/sdc] is really gone, the other [/dev/sde] came back up after a power cycle, but was not automatically re-added to the RAID. So I was left with a 4 device RAID with only 2 active drives [/dev/sdb and /dev/sdd]. Mistake #1, not using dd copies of the drives for restoring the RAID. I did not have the drives or the time. Mistake #2, not making a backup of the superblock and mdadm -E of the remaining drives. Recovery attempt I reassembled the RAID in degraded mode with mdadm --assemble --force /dev/md0, using /dev/sd[bde]1. I could then access my data. I replaced /dev/sdc with a spare; empty; identical drive. I removed the old /dev/sdc1 from the RAID mdadm --fail /dev/md0 /dev/sdc1 Mistake #3, not doing this before replacing the drive I then partitioned the new /dev/sdc and added it to the RAID. mdadm --add /dev/md0 /dev/sdc1 It then began to restore the RAID. ETA 300 mins. I followed the process via /proc/mdstat to 2% and then went to do other stuff. Checking the result Several hours (but less then 300 mins) later, I checked the process. It had stopped due to a read error on /dev/sde1. Here is where the trouble really starts I then removed /dev/sde1 from the RAID and re-added it. I can't remember why I did this; it was late. mdadm --manage /dev/md0 --remove /dev/sde1 mdadm --manage /dev/md0 --add /dev/sde1 However, /dev/sde1 was now marked as spare. So I decided to recreate the whole array using --assume-clean using what I thought was the right order, and with /dev/sdc1 missing. mdadm --create /dev/md0 --assume-clean -l5 -n4 /dev/sdb1 missing /dev/sdd1 /dev/sde1 That worked, but the filesystem was not recognized while trying to mount. (It should have been EXT4). Device order I then checked a recent backup I had of /proc/mdstat, and I found the drive order. md0 : active raid5 sdb1[0] sde1[4] sdd1[2] sdc1[1] 8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] I then remembered this RAID had suffered a drive loss about a year ago, and recovered from it by replacing the faulty drive with a spare one. That may have scrambled the device order a bit...so there was no drive [3] but only [0],[1],[2], and [4]. I tried to find the drive order with the Permute_array script: https://raid.wiki.kernel.org/index.php/Permute_array.pl but that did not find the right order. Questions I now have two main questions: I screwed up all the superblocks on the drives, but only gave: mdadm --create --assume-clean commands (so I should not have overwritten the data itself on /dev/sd[bde]1. Am I right that in theory the RAID can be restored [assuming for a moment that /dev/sde1 is ok] if I just find the right device order? Is it important that /dev/sde1 be given the device number [4] in the RAID? When I create it with mdadm --create /dev/md0 --assume-clean -l5 -n4 \ /dev/sdb1 missing /dev/sdd1 /dev/sde1 it is assigned the number [3]. I wonder if that is relevant to the calculation of the parity blocks. If it turns out to be important, how can I recreate the array with /dev/sdb1[0] missing[1] /dev/sdd1[2] /dev/sde1[4]? If I could get that to work I could start it in degraded mode and add the new drive /dev/sdc1 and let it resync again. It's OK if you would like to point out to me that this may not have been the best course of action, but you'll find that I realized this. It would be great if anyone has any suggestions.

Read the article
Can someone explain the "use-cases" for the default munin graphs?

- by exhuma

When installing munin, it activates a default set of plugins (at least on ubuntu). Alternatively, you can simply run munin-node-configure to figure out which plugins are supported on your system. Most of these plugins plot straight-forward data. My question is not to explain the nature of the data (well... maybe for some) but what is it that you look for in these graphs? It is easy to install munin and see fancy graphs. But having the graphs and not being able to "read" them renders them totally useless. I am going to list standard plugins which are enabled by default on my system. So it's going to be a long list. For completeness, I am also going to list plugins which I think to understand and give a short explanation as to what I think it's used for. Pleas correct if I am wrong with any of them. So let me split this questions in three parts: Plugins where I don't even understand the data Plugins where I understand the data but don't know what I should look out for Plugins which I think to understand Plugins where I don't even understand the data These may contain questions that are not necessarily aimed at munin alone. Not understanding the data usually mean a gap in fundamental knowledge on operating systems/hardware.... ;) Feel free to respond with a "giyf" answer. These are plugins where I can only guess what's going on... I hardly want to look at these "guessing"... Disk IOs per device (IOs/second)What's an IO. I know it stands for input/output. But that's as far as it goes. Disk latency per device (Average IO wait)Not a clue what an "IO wait" is... IO Service TimeThis one is a huge mess, and it's near impossible to see something in the graph at all. Plugins where I understand the data but don't know what I should look out for IOStat (blocks/second read/written)I assume, the thing to look out for in here are spikes? Which would mean that the device is in heavy use? Available entropy (bytes)I assume that this is important for random number generation? Why would I graph this? So far the value has always been near constant. VMStat (running/I/O sleep processes)What's the difference between this one and the "processes" graph? Both show running/sleeping processes, whereas the "Processes" graph seems to have more details. Disk throughput per device (bytes/second read/written) What's thedifference between this one and the "IOStat" graph? inode table usageWhat should I look for in this graph? Plugins which I think to understand I'll be guessing some things here... correct me if I am wrong. Disk usage in percent (percent)How much disk space is used/remaining. As this is approaching 100%, you should consider cleaning up or extend the partition. This is extremely important for the root partition. Firewall Throughput (packets/second)The number of packets passing through the firewall. If this is spiking for a longer period of time, it could be a sign of a DOS attack (or we are simply recieving a large file). It can also give you an idea about your firewall performance. If it's levelling out and you need more "power" you should consider load balancing. If it's levelling out and see a correlation with your CPU load, it could also mean that your hardware is not fast enough. Correlations with disk usage could point to excessive LOG targets in you FW config. eth0 errors (packets in/out)Network errors. If this value is increasing, it could be a sign of faulty hardware. eth0 traffic (bits/second in/out)Raw network traffic. This should correlate with Firewall throughput. number of threadsAn ever-increasing value might point to a process not properly closing threads. Investigate! processesBreakdown of active processes (including sleeping). A quick spike in here might point to a fork-bomb. A slowly, but ever-increasing value might point to an application spawning sub-processes but not properly closing them. Investigate using ps faux. process priorityThis shows the distribution of process priorities. Having only high-priority processes is not of much use. Consider de-prioritizing some. cpu usageFairly straight-forward. If this is spiking, you may have an attack going on, or a process is hogging the CPU. Idf it's slowly increasing and approaching max in normal operations, you should consider upgrading your hardware (or load-balancing). file table usageNumber of actively open files. If this is reaching max, you may have a process opening, but not properly releasing files. load averageShows an summarized value for the system load. Should correlate with CPU usage. Increasing values can come from a number of sources. Look for correlations with other graphs. memory usageA graphical representation of you memory. As long as you have a lot of unused+cache+buffers you are fine. swap in/outShows the activity on your swap partition. This should always be 0. If you see activity on this, you should add more memory to your machine!

Read the article
Motherboard/PSU crippling USB and Sata

- by celebdor

I very recently bought a new desktop computer. The motherboard is: Z77MX-D3H and the power supply is ocz zs series 550w. The issue I have is that once I boot to the operating system (I have tried with fedora and Ubuntu with kernels 2.6.38 - 3.4.0), my hard drive (2.5" Magnetic) occasionally makes a power switch noise and it resets. Needless to say, when this drive is the OS drive, the OS crashes. I also have a SSD that works fine with the same OS configurations, but if I have the magnetic hard drive attached as second drive, it works erratically and the reconnects result in corrupted data. I also noticed that whenever I plug an external hard drive USB2.0 or USB3.0 to the computer the issue with the reconnects is even worse: [ 52.198441] sd 7:0:0:0: [sdc] Spinning up disk... [ 57.955811] usb 4-3: USB disconnect, device number 3 [ 58.023687] .ready [ 58.023914] sd 7:0:0:0: [sdc] READ CAPACITY(16) failed [ 58.023919] sd 7:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 58.023932] sd 7:0:0:0: [sdc] Sense not available. [ 58.024061] sd 7:0:0:0: [sdc] READ CAPACITY failed [ 58.024063] sd 7:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 58.024064] sd 7:0:0:0: [sdc] Sense not available. [ 58.024099] sd 7:0:0:0: [sdc] Write Protect is off [ 58.024101] sd 7:0:0:0: [sdc] Mode Sense: 00 00 00 00 [ 58.024135] sd 7:0:0:0: [sdc] Asking for cache data failed [ 58.024137] sd 7:0:0:0: [sdc] Assuming drive cache: write through [ 58.024400] sd 7:0:0:0: [sdc] READ CAPACITY(16) failed [ 58.024402] sd 7:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 58.024405] sd 7:0:0:0: [sdc] Sense not available. [ 58.024448] sd 7:0:0:0: [sdc] READ CAPACITY failed [ 58.024450] sd 7:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 58.024451] sd 7:0:0:0: [sdc] Sense not available. [ 58.024469] sd 7:0:0:0: [sdc] Asking for cache data failed [ 58.024471] sd 7:0:0:0: [sdc] Assuming drive cache: write through [ 58.024472] sd 7:0:0:0: [sdc] Attached SCSI disk [ 58.407725] usb 4-3: new SuperSpeed USB device number 4 using xhci_hcd [ 58.424921] scsi8 : usb-storage 4-3:1.0 [ 59.424185] scsi 8:0:0:0: Direct-Access WD My Passport 0740 1003 PQ: 0 ANSI: 6 [ 59.424406] scsi 8:0:0:1: Enclosure WD SES Device 1003 PQ: 0 ANSI: 6 [ 59.425098] sd 8:0:0:0: Attached scsi generic sg2 type 0 [ 59.425176] ses 8:0:0:1: Attached Enclosure device [ 59.425248] ses 8:0:0:1: Attached scsi generic sg3 type 13 [ 61.845836] sd 8:0:0:0: [sdc] 976707584 512-byte logical blocks: (500 GB/465 GiB) [ 61.845838] sd 8:0:0:0: [sdc] 4096-byte physical blocks [ 61.846336] sd 8:0:0:0: [sdc] Write Protect is off [ 61.846338] sd 8:0:0:0: [sdc] Mode Sense: 47 00 10 08 [ 61.846718] sd 8:0:0:0: [sdc] No Caching mode page present [ 61.846720] sd 8:0:0:0: [sdc] Assuming drive cache: write through [ 61.848105] sd 8:0:0:0: [sdc] No Caching mode page present [ 61.848106] sd 8:0:0:0: [sdc] Assuming drive cache: write through [ 61.857147] sdc: sdc1 [ 61.858915] sd 8:0:0:0: [sdc] No Caching mode page present [ 61.858916] sd 8:0:0:0: [sdc] Assuming drive cache: write through [ 61.858918] sd 8:0:0:0: [sdc] Attached SCSI disk [ 69.875809] usb 4-3: USB disconnect, device number 4 [ 70.275816] usb 4-3: new SuperSpeed USB device number 5 using xhci_hcd [ 70.293063] scsi9 : usb-storage 4-3:1.0 [ 71.292257] scsi 9:0:0:0: Direct-Access WD My Passport 0740 1003 PQ: 0 ANSI: 6 [ 71.292505] scsi 9:0:0:1: Enclosure WD SES Device 1003 PQ: 0 ANSI: 6 [ 71.293527] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ 71.293668] ses 9:0:0:1: Attached Enclosure device [ 71.293758] ses 9:0:0:1: Attached scsi generic sg3 type 13 [ 73.323804] usb 4-3: USB disconnect, device number 5 [ 101.868078] ses 9:0:0:1: Device offlined - not ready after error recovery [ 101.868124] ses 9:0:0:1: Failed to get diagnostic page 0x50000 [ 101.868131] ses 9:0:0:1: Failed to bind enclosure -19 [ 101.868288] sd 9:0:0:0: [sdc] READ CAPACITY(16) failed [ 101.868292] sd 9:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 101.868296] sd 9:0:0:0: [sdc] Sense not available. [ 101.868428] sd 9:0:0:0: [sdc] READ CAPACITY failed [ 101.868434] sd 9:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 101.868439] sd 9:0:0:0: [sdc] Sense not available. [ 101.868468] sd 9:0:0:0: [sdc] Write Protect is off [ 101.868473] sd 9:0:0:0: [sdc] Mode Sense: 00 00 00 00 [ 101.868580] sd 9:0:0:0: [sdc] Asking for cache data failed [ 101.868584] sd 9:0:0:0: [sdc] Assuming drive cache: write through [ 101.868845] sd 9:0:0:0: [sdc] READ CAPACITY(16) failed [ 101.868849] sd 9:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 101.868854] sd 9:0:0:0: [sdc] Sense not available. [ 101.868894] sd 9:0:0:0: [sdc] READ CAPACITY failed [ 101.868898] sd 9:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 101.868903] sd 9:0:0:0: [sdc] Sense not available. [ 101.868961] sd 9:0:0:0: [sdc] Asking for cache data failed [ 101.868966] sd 9:0:0:0: [sdc] Assuming drive cache: write through [ 101.868969] sd 9:0:0:0: [sdc] Attached SCSI disk Now, if I plug the same drive to the powered usb 2.0 hub of my monitor, the issue is not reproduced (at least on a 20h long operation). Also the issue of the usb reconnects is less frequent if the hard drive is plugged before I switch on the computer. Does anybody have some advice as to what I could do? Which is the faulty part/s that I should replace? As for me, I really don't know if to point my finger to the PSU or the Motherboard (I have updated to the latest firmware and checked the BIOS settings several times). EDIT: The reconnects are happening both in the Sata connected drives and the USBX connected drives.

Read the article
fd partitions gone from 2 discs, md happy with it and resyncs. How to recover ?

- by d0nd

Hey gurus, need some help badly with this one. I run a server with a 6Tb md raid5 volume built over 7*1Tb disks. I've had to shut down the server lately and when it went back up, 2 out of the 7 disks used for the raid volume had lost its conf : dmesg : [ 10.184167] sda: sda1 sda2 sda3 // System disk [ 10.202072] sdb: sdb1 [ 10.210073] sdc: sdc1 [ 10.222073] sdd: sdd1 [ 10.229330] sde: sde1 [ 10.239449] sdf: sdf1 [ 11.099896] sdg: unknown partition table [ 11.255641] sdh: unknown partition table All 7 disks have same geometry and were configured alike : dmesg : Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x1e7481a5 Device Boot Start End Blocks Id System /dev/sdb1 1 121601 976760001 fd Linux raid autodetect All 7 disks (sdb1, sdc1, sdd1, sde1, sdf1, sdg1, sdh1) were used in a md raid5 xfs volume. When booting, md, which was (obviously) out of sync kicked in and automatically started rebuilding over the 7 disks, including the two "faulty" ones; xfs tried to do some shenanigans as well: dmesg : [ 19.566941] md: md0 stopped. [ 19.817038] md: bind<sdc1> [ 19.817339] md: bind<sdd1> [ 19.817465] md: bind<sde1> [ 19.817739] md: bind<sdf1> [ 19.817917] md: bind<sdh> [ 19.818079] md: bind<sdg> [ 19.818198] md: bind<sdb1> [ 19.818248] md: md0: raid array is not clean -- starting background reconstruction [ 19.825259] raid5: device sdb1 operational as raid disk 0 [ 19.825261] raid5: device sdg operational as raid disk 6 [ 19.825262] raid5: device sdh operational as raid disk 5 [ 19.825264] raid5: device sdf1 operational as raid disk 4 [ 19.825265] raid5: device sde1 operational as raid disk 3 [ 19.825267] raid5: device sdd1 operational as raid disk 2 [ 19.825268] raid5: device sdc1 operational as raid disk 1 [ 19.825665] raid5: allocated 7334kB for md0 [ 19.825667] raid5: raid level 5 set md0 active with 7 out of 7 devices, algorithm 2 [ 19.825669] RAID5 conf printout: [ 19.825670] --- rd:7 wd:7 [ 19.825671] disk 0, o:1, dev:sdb1 [ 19.825672] disk 1, o:1, dev:sdc1 [ 19.825673] disk 2, o:1, dev:sdd1 [ 19.825675] disk 3, o:1, dev:sde1 [ 19.825676] disk 4, o:1, dev:sdf1 [ 19.825677] disk 5, o:1, dev:sdh [ 19.825679] disk 6, o:1, dev:sdg [ 19.899787] PM: Starting manual resume from disk [ 28.663228] Filesystem "md0": Disabling barriers, not supported by the underlying device [ 28.663228] XFS mounting filesystem md0 [ 28.884433] md: resync of RAID array md0 [ 28.884433] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 28.884433] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. [ 28.884433] md: using 128k window, over a total of 976759936 blocks. [ 29.025980] Starting XFS recovery on filesystem: md0 (logdev: internal) [ 32.680486] XFS: xlog_recover_process_data: bad clientid [ 32.680495] XFS: log mount/recovery failed: error 5 [ 32.682773] XFS: log mount failed I ran fdisk and flagged sdg1 and sdh1 as fd. I tried to reassemble the array but it didnt work: no matter what was in mdadm.conf, it still uses sdg and sdh instead of sdg1 and sdh1. I checked in /dev and I see no sdg1 and and sdh1, shich explains why it wont use it. I just don't know why those partitions are gone from /dev and how to readd those... blkid : /dev/sda1: LABEL="boot" UUID="519790ae-32fe-4c15-a7f6-f1bea8139409" TYPE="ext2" /dev/sda2: TYPE="swap" /dev/sda3: LABEL="root" UUID="91390d23-ed31-4af0-917e-e599457f6155" TYPE="ext3" /dev/sdb1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sdc1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sdd1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sde1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sdf1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sdg: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sdh: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" fdisk -l : Disk /dev/sda: 40.0 GB, 40020664320 bytes 255 heads, 63 sectors/track, 4865 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x8c878c87 Device Boot Start End Blocks Id System /dev/sda1 * 1 12 96358+ 83 Linux /dev/sda2 13 134 979965 82 Linux swap / Solaris /dev/sda3 135 4865 38001757+ 83 Linux Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x1e7481a5 Device Boot Start End Blocks Id System /dev/sdb1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xc9bdc1e9 Device Boot Start End Blocks Id System /dev/sdc1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xcc356c30 Device Boot Start End Blocks Id System /dev/sdd1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xe87f7a3d Device Boot Start End Blocks Id System /dev/sde1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xb17a2d22 Device Boot Start End Blocks Id System /dev/sdf1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x8f3bce61 Device Boot Start End Blocks Id System /dev/sdg1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xa98062ce Device Boot Start End Blocks Id System /dev/sdh1 1 121601 976760001 fd Linux raid autodetect I really dont know what happened nor how to recover from this mess. Needless to say the 5TB or so worth of data sitting on those disks are very valuable to me... Any idea any one? Did anybody ever experienced a similar situation or know how to recover from it ? Can someone help me? I'm really desperate... :x

Read the article
Build and migrated to software raid (mdadm) on GPT disk, now can't assemble array

- by John H

mdadm, gpt issues, unrecognized partitions. Simplified question: How do I get mdadm to recognize GPT partitions? I have been attempting to convert/copy my Ubuntu 11.10 OS from a single drive to software raid 1. I have done similar in the past, but in this case, I was adding in a drive that has been configured for GPT and I tried to work with that without fully looking into the implications. Currently, I have a non-booting mdadm RAID 1 array of /dev/md127 (the OS assigned that and it keeps picking up). I am booting off of live USB keys, currently System Rescue CD from sysresccd. While gdisk and parted can see all the partitions, most of the OS utilities do not, including mdadm. My main goal is just to make the raid array accessible so I can get pull the data and start fresh (without using GPT). /dev/md127 /dev/sda /dev/sda1 <- GPT type partition /dev/sda1 <- exists within the GPT part, member of md127 /dev/sda2 <- exists within the GPT part, empty /dev/sdb /dev/sdb1 <- GPT type partition /dev/sdb1 <- exists within the GPT part, member of md127 History: POINT A: The original OS was install on sda (actually /dev/sda6). I used a the Ubuntu live usb to add sdb. I got warning from fdisk about GPT so I used gdisk to create a raid partition (sdb1) and mdadm to create a raid1 mirror with a missing drive. I had many issues getting this working (including being unable to get grub to install) but I eventually got it to boot using grub on sda and /dev/md127 off of sdb. So at point A, I had copied my OS from sda6 to md127 on sdb. I then booted into a rescue mode and attempted to get a bootloader onto sdb, which failed. I then discovered my mistake: I had installed the raid onto sdb instead of sdb1, essentially overwriting the sdb1 partition. POINT B: I now had two copies of my data- one on md127/sdb, and one on sda. I destroyed data on sda and created a new GPT table on sda. I then created sda1 for the raid array, and sda2 for a scratch partition. I added sda1 into the raid array and let it rebuild. md127 now covered /dev/sdb and /dev/sda1 as fully active and synced. POINT C: I rebooted onto linux rescue again and was still able to access the raid array. I then removed /dev/sdb from the array and created /dev/sdb1 for the raid. I added sdb1 to the array and let it sync. I was able to mount and access /dev/md127 without issues. Once it completed, both /dev/sda1 and /dev/sdb1 were GPT partitions and actively syncing. POINT D (current): I rebooted again to test if the array would boot and grub failed to load. I booted off of my live thumb drive and found that I can no longer assemble the raid array. mdadm doesn't see the required partitions. -- root@freshdesk /root % uname -a Linux freshdesk 3.0.24-std251-amd64 #2 SMP Sat Mar 17 12:08:55 UTC 2012 x86_64 AMD Athlon(tm) II X4 645 Processor AuthenticAMD GNU/Linux === /proc/partitions and parted look good: root@freshdesk /root % cat /proc/partitions major minor #blocks name 7 0 301788 loop0 8 0 976762584 sda 8 1 732579840 sda1 8 2 244181703 sda2 8 16 732574584 sdb 8 17 732573543 sdb1 8 32 7876607 sdc 8 33 7873349 sdc1 (parted) print all Model: ATA ST31000528AS (scsi) Disk /dev/sda: 1000GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 750GB 750GB ext4 2 750GB 1000GB 250GB Linux/Windows data Model: ATA SAMSUNG HD753LJ (scsi) Disk /dev/sdb: 750GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 750GB 750GB ext4 Linux RAID raid Model: SanDisk SanDisk Cruzer (scsi) Disk /dev/sdc: 8066MB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 31.7kB 8062MB 8062MB primary fat32 boot, lba === # no sda2, and I double the sdb1 is the one shown in parted root@freshdesk /root % blkid /dev/loop0: TYPE="squashfs" /dev/sda1: UUID="75dd6c2d-f0a8-4302-9da4-792cc7d72355" TYPE="ext4" /dev/sdc1: LABEL="PENDRIVE" UUID="1102-3720" TYPE="vfat" /dev/sdb1: UUID="2dd89f15-65bb-ff88-e368-bf24bd0fce41" TYPE="linux_raid_member" root@freshdesk /root % mdadm -E /dev/sda1 mdadm: No md superblock detected on /dev/sda1. # this is probably a result of me attempting to force the array up, putting superblocks on the GPT partition root@freshdesk /root % mdadm -E /dev/sdb1 /dev/sdb1: Magic : a92b4efc Version : 0.90.00 UUID : 2dd89f15:65bbff88:e368bf24:bd0fce41 Creation Time : Fri Mar 30 19:25:30 2012 Raid Level : raid1 Used Dev Size : 732568320 (698.63 GiB 750.15 GB) Array Size : 732568320 (698.63 GiB 750.15 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 127 Update Time : Sat Mar 31 12:39:38 2012 State : clean Active Devices : 1 Working Devices : 2 Failed Devices : 1 Spare Devices : 1 Checksum : a7d038b3 - correct Events : 20195 Number Major Minor RaidDevice State this 2 8 17 2 spare /dev/sdb1 0 0 8 1 0 active sync /dev/sda1 1 1 0 0 1 faulty removed 2 2 8 17 2 spare /dev/sdb1 === root@freshdesk /root % mdadm -A /dev/md127 /dev/sda1 /dev/sdb1 mdadm: no recogniseable superblock on /dev/sda1 mdadm: /dev/sda1 has no superblock - assembly aborted root@freshdesk /root % mdadm -A /dev/md127 /dev/sdb1 mdadm: cannot open device /dev/sdb1: Device or resource busy mdadm: /dev/sdb1 has no superblock - assembly aborted

Read the article
Router 2wire, Slackware desktop in DMZ mode, iptables policy aginst ping, but still pingable

- by skriatok

I'm in DMZ mode, so I'm firewalling myself, stealthy all ok, but I get faulty test results from Shields Up that there are pings. Yesterday I couldn't make a connection to game servers work, because ping block was enabled (on the router). I disabled it, but this persists even due to my firewall. What is the connection between me and my router in DMZ mode (for my machine, there is bunch of others too behind router firewall)? When it allows router affecting if I'm pingable or not and if router has setting not blocking ping, rules in my iptables for this scenario do not work. Please ignore commented rules, I do uncomment them as I want. These two should do the job right? iptables -A INPUT -p icmp --icmp-type echo-request -j DROP echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_all Here are my iptables: #!/bin/sh # Begin /bin/firewall-start # Insert connection-tracking modules (not needed if built into the kernel). #modprobe ip_tables #modprobe iptable_filter #modprobe ip_conntrack #modprobe ip_conntrack_ftp #modprobe ipt_state #modprobe ipt_LOG # allow local-only connections iptables -A INPUT -i lo -j ACCEPT # free output on any interface to any ip for any service # (equal to -P ACCEPT) iptables -A OUTPUT -j ACCEPT # permit answers on already established connections # and permit new connections related to established ones (eg active-ftp) iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT #Gamespy&NWN #iptables -A INPUT -p tcp -m tcp -m multiport --ports 5120:5129 -j ACCEPT #iptables -A INPUT -p tcp -m tcp --dport 6667 --tcp-flags SYN,RST,ACK SYN -j ACCEPT #iptables -A INPUT -p tcp -m tcp --dport 28910 --tcp-flags SYN,RST,ACK SYN -j ACCEPT #iptables -A INPUT -p tcp -m tcp --dport 29900 --tcp-flags SYN,RST,ACK SYN -j ACCEPT #iptables -A INPUT -p tcp -m tcp --dport 29901 --tcp-flags SYN,RST,ACK SYN -j ACCEPT #iptables -A INPUT -p tcp -m tcp --dport 29920 --tcp-flags SYN,RST,ACK SYN -j ACCEPT #iptables -A INPUT -p udp -m udp -m multiport --ports 5120:5129 -j ACCEPT #iptables -A INPUT -p udp -m udp --dport 6500 -j ACCEPT #iptables -A INPUT -p udp -m udp --dport 27900 -j ACCEPT #iptables -A INPUT -p udp -m udp --dport 27901 -j ACCEPT #iptables -A INPUT -p udp -m udp --dport 29910 -j ACCEPT # Log everything else: What's Windows' latest exploitable vulnerability? iptables -A INPUT -j LOG --log-prefix "FIREWALL:INPUT" # set a sane policy: everything not accepted > /dev/null iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT DROP iptables -A INPUT -p icmp --icmp-type echo-request -j DROP # be verbose on dynamic ip-addresses (not needed in case of static IP) echo 2 > /proc/sys/net/ipv4/ip_dynaddr # disable ExplicitCongestionNotification - too many routers are still # ignorant echo 0 > /proc/sys/net/ipv4/tcp_ecn #ping death echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_all # If you are frequently accessing ftp-servers or enjoy chatting you might # notice certain delays because some implementations of these daemons have # the feature of querying an identd on your box for your username for # logging. Although there's really no harm in this, having an identd # running is not recommended because some implementations are known to be # vulnerable. # To avoid these delays you could reject the requests with a 'tcp-reset': #iptables -A INPUT -p tcp --dport 113 -j REJECT --reject-with tcp-reset #iptables -A OUTPUT -p tcp --sport 113 -m state --state RELATED -j ACCEPT # To log and drop invalid packets, mostly harmless packets that came in # after netfilter's timeout, sometimes scans: #iptables -I INPUT 1 -p tcp -m state --state INVALID -j LOG --log-prefix \ "FIREWALL:INVALID" #iptables -I INPUT 2 -p tcp -m state --state INVALID -j DROP # End /bin/firewall-start Active ruleset: bash-4.1# iptables -L -n -v Chain INPUT (policy DROP 38 packets, 2228 bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0 844 542K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 38 2228 LOG all -- * * 0.0.0.0/0 0.0.0.0/0 LOG flags 0 level 4 prefix `FIREWALL:INPUT' 0 0 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 38 2228 LOG all -- * * 0.0.0.0/0 0.0.0.0/0 LOG flags 0 level 4 prefix `FIREWALL:INPUT' Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 1158 111K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 Active ruleset: (after editing iptables into below sugested form) bash-4.1# iptables -L -n -v Chain INPUT (policy DROP 2567 packets, 172K bytes) pkts bytes target prot opt in out source destination 49 4157 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0 412K 441M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 2567 172K LOG all -- * * 0.0.0.0/0 0.0.0.0/0 LOG flags 0 level 4 prefix `FIREWALL:INPUT' 0 0 DROP icmp -- * * 0.0.0.0/0 0.0.0.0/0 icmp type 8 Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 312K packets, 25M bytes) pkts bytes target prot opt in out source destination ping and syslog simultaneous screenshots from phone (pinger) and from laptop (being pinged) http://dl.dropbox.com/u/4160051/slckwr/pingfrom%20mobile.jpg http://dl.dropbox.com/u/4160051/slckwr/tailsyslog.jpg

Read the article
Button click does not start Service in Android App Widget

- by Feanor

I'm having trouble starting a Service to update an AppWidget that I'm creating as an exercise. I'm trying to get the latitude and longitude of spoofed location data from DDMS to display in the widget. The widget uses a service to update the TextView, which may be slightly overkill, but I wanted to follow the template that seems to be common in AppWidgets that do more work (like the Forecast widget or the Wiktionary widget). Right now, I'm not getting any error messages or strange behavior; nothing at all happens when the button is pressed. I'm a bit mystified as to what might be wrong. Could anyone out there point me in the right direction? Additionally, if my logic for location is faulty, I'd love recommendations on that too. I've looked at several blogs, the Google examples, and the documentation, but I feel a little fuzzy on how it works. Here is the current state of the widget: public class Widget extends AppWidgetProvider { static final String TAG = "Widget"; /** * {@inheritDoc} */ public void onUpdate(Context context, AppWidgetManager appWidgetManager, int[] appWidgetIds) { // Create an intent to launch the service Intent serviceIntent = new Intent(context, UpdateService.class); // PendingIntent is required for the onClickPendingIntent that actually // starts the service from a button click PendingIntent pendingServiceIntent = PendingIntent.getService(context, 0, serviceIntent, 0); // Get the layout for the App Widget and attach a click listener to the // button RemoteViews views = new RemoteViews(context.getPackageName(), R.layout.main); views.setOnClickPendingIntent(R.id.address_button, pendingServiceIntent); super.onUpdate(context, appWidgetManager, appWidgetIds); } // To prevent any ANR timeouts, we perform the update in a service; // really should have its own thread too public static class UpdateService extends Service { static final String TAG = "UpdateService"; private LocationManager locationManager; private Location currentLocation; private double latitude; private double longitude; public void onStart(Intent intent, int startId) { // Get a LocationManager from the system services locationManager = (LocationManager) getSystemService(Context.LOCATION_SERVICE); // Register for updates from spoofed GPS locationManager.requestLocationUpdates("gps", 30000L, 0.0f, new LocationListener() { @Override public void onLocationChanged(Location location) { currentLocation = location; } @Override public void onProviderDisabled(String provider) {} @Override public void onProviderEnabled(String provider) {} @Override public void onStatusChanged(String provider, int status, Bundle extras) {} }); // Get the last known location from GPS currentLocation = locationManager.getLastKnownLocation("gps"); // Build the widget update RemoteViews updateViews = buildUpdate(this); // Push update for this widget to the home screen ComponentName thisWidget = new ComponentName(this, Widget.class); // AppWidgetManager updates AppWidget state; gets information about // installed AppWidget providers and other AppWidget related state AppWidgetManager manager = AppWidgetManager.getInstance(this); // Updates the views based on the RemoteView returned from the // buildUpdate method (stored in updateViews) manager.updateAppWidget(thisWidget, updateViews); } public RemoteViews buildUpdate(Context context) { latitude = currentLocation.getLatitude(); longitude = currentLocation.getLongitude(); RemoteViews updateViews = new RemoteViews(context.getPackageName(), R.layout.main); updateViews.setTextViewText(R.id.latitude_text, "" + latitude); updateViews.setTextViewText(R.id.longitude_text, "" + longitude); return updateViews; } @Override public IBinder onBind(Intent intent) { // We don't need to bind to this service return null; } } }

Read the article
SqlBulkCopy is slow, doesn't utilize full network speed

- by Alex

Hi, for that past couple of weeks I have been creating generic script that is able to copy databases. The goal is to be able to specify any database on some server and copy it to some other location, and it should only copy the specified content. The exact content to be copied over is specified in a configuration file. This script is going to be used on some 10 different databases and run weekly. And in the end we are copying only about 3%-20% of databases which are as large as 500GB. I have been using the SMO assemblies to achieve this. This is my first time working with SMO and it took a while to create generic way to copy the schema objects, filegroups ...etc. (Actually helped find some bad stored procs). Overall I have a working script which is lacking on performance (and at times times out) and was hoping you guys would be able to help. When executing the WriteToServer command to copy large amount of data ( 6GB) it reaches my timeout period of 1hr. Here is the core code for copying table data. The script is written in PowerShell. $query = ("SELECT * FROM $selectedTable " + $global:selectiveTables.Get_Item($selectedTable)).Trim() Write-LogOutput "Copying $selectedTable : '$query'" $cmd = New-Object Data.SqlClient.SqlCommand -argumentList $query, $source $cmd.CommandTimeout = 120; $bulkData = ([Data.SqlClient.SqlBulkCopy]$destination) $bulkData.DestinationTableName = $selectedTable; $bulkData.BulkCopyTimeout = $global:tableCopyDataTimeout # = 3600 $reader = $cmd.ExecuteReader(); $bulkData.WriteToServer($reader); # Takes forever here on large tables The source and target databases are located on different servers so I kept track of the network speed as well. The network utilization never went over 1% which was quite surprising to me. But when I just transfer some large files between the servers, the network utilization spikes up to 10%. I have tried setting the $bulkData.BatchSize to 5000 but nothing really changed. Increasing the BulkCopyTimeout to an even greater amount would only solve the timeout. I really would like to know why the network is not being used fully. Anyone else had this problem? Any suggestions on networking or bulk copy will be appreciated. And please let me know if you need more information. Thanks. UPDATE I have tweaked several options that increase the performance of SqlBulkCopy, such as setting the transaction logging to simple and providing a table lock to SqlBulkCopy instead of the default row lock. Also some tables are better optimized for certain batch sizes. Overall, the duration of the copy was decreased by some 15%. And what we will do is execute the copy of each database simultaneously on different servers. But I am still having a timeout issue when copying one of the databases. When copying one of the larger databases, there is a table for which I consistently get the following exception: System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. It is thrown about 16 after it starts copying the table which is no where near my BulkCopyTimeout. Even though I get the exception that table is fully copied in the end. Also, if I truncate that table and restart my process for that table only, the tables is copied over without any issues. But going through the process of copying that entire database fails always for that one table. I have tried executing the entire process and reseting the connection before copying that faulty table, but it still errored out. My SqlBulkCopy and Reader are closed after each table. Any suggestions as to what else could be causing the script to fail at the point each time?

Read the article

Search Results

Search found 346 results on 14 pages for 'faulty'.

Page 13/14 | < Previous Page | 9 10 11 12 13 14 | Next Page >

- by Michael Mao

- by hlx98007

- by Ben Hymers

- by DMA57361

- by JAD

- by Kaustubh P

- by Zoran

- by user135501

- by weka

- by Phil Factor

- by Chris Hoffman

- by Phil Factor

- by AshishRay

- by Dev G

- by Aron Rotteveel

- by Halfgaar

- by Peter Bos

- by exhuma

- by celebdor

- by d0nd

- by John H

- by skriatok

- by Feanor

- by Alex

< Previous Page | 9 10 11 12 13 14 | Next Page >