Search Results

Search found 8037 results on 322 pages for 'hardware hacking'.

Page 314/322 | < Previous Page | 310 311 312 313 314 315 316 317 318 319 320 321 | Next Page >

What does it mean when ARP shows <incomplete> on eth1

- by Geoff Dalgas

We have been using HAProxy along with heartbeat from the Linux-HA project. We are using two linux instances to provide a failover. Each server has with their own public IP and a single IP which is shared between the two using a virtual interface (eth1:1) at IP: 69.59.196.211 The virtual interface (eth1:1) IP 69.59.196.211 is configured as the gateway for the windows servers behind them and we use ip_forwarding to route traffic. We are experiencing an occasional network outage on one of our windows servers behind our linux gateways. HAProxy will detect the server is offline which we can verify by remoting to the failed server and attempting to ping the gateway: Pinging 69.59.196.211 with 32 bytes of data: Reply from 69.59.196.220: Destination host unreachable. Running arp -a on this failed server shows that there is no entry for the gateway address (69.59.196.211): Interface: 69.59.196.220 --- 0xa Internet Address Physical Address Type 69.59.196.161 00-26-88-63-c7-80 dynamic 69.59.196.210 00-15-5d-0a-3e-0e dynamic 69.59.196.212 00-21-5e-4d-45-c9 dynamic 69.59.196.213 00-15-5d-00-b2-0d dynamic 69.59.196.215 00-21-5e-4d-61-1a dynamic 69.59.196.217 00-21-5e-4d-2c-e8 dynamic 69.59.196.219 00-21-5e-4d-38-e5 dynamic 69.59.196.221 00-15-5d-00-b2-0d dynamic 69.59.196.222 00-15-5d-0a-3e-09 dynamic 69.59.196.223 ff-ff-ff-ff-ff-ff static 224.0.0.22 01-00-5e-00-00-16 static 224.0.0.252 01-00-5e-00-00-fc static 225.0.0.1 01-00-5e-00-00-01 static On our linux gateway instances arp -a shows: peak-colo-196-220.peak.org (69.59.196.220) at <incomplete> on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-222.peak.org (69.59.196.222) at 00:15:5d:0a:3e:09 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 Why would arp occasionally set the entry for this failed server as <incomplete>? Should we be defining our arp entries statically? I've always left arp alone since it works 99% of the time, but in this one instance it appears to be failing. Are there any additional troubleshooting steps we can take help resolve this issue? THINGS WE HAVE TRIED I added a static arp entry for testing on one of the linux gateways which still didn't help. root@haproxy2:~# arp -a peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-221.peak.org (69.59.196.221) at 00:15:5d:00:b2:0d [ether] on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 peak-colo-196-220.peak.org (69.59.196.220) at 00:21:5e:4d:30:8d [ether] PERM on eth1 root@haproxy2:~# arp -i eth1 -s 69.59.196.220 00:21:5e:4d:30:8d root@haproxy2:~# ping 69.59.196.220 PING 69.59.196.220 (69.59.196.220) 56(84) bytes of data. --- 69.59.196.220 ping statistics --- 7 packets transmitted, 0 received, 100% packet loss, time 6006ms Rebooting the windows web server solves this issue temporarily with no other changes to the network but our experience shows this issue will come back. Swapping network cards and switches I noticed the link light on the port of the switch for the failed windows server was running at 100Mb instead of 1Gb on the failed interface. I moved the cable to several other open ports and the link indicated 100Mb for each port that I tried. I also swapped the cable with the same result. I tried changing the properties of the network card in windows and the server locked up and required a hard reset after clicking apply. This windows server has two physical network interfaces so I have swapped the cables and network settings on the two interfaces to see if the problem follows the interface. If the public interface goes down again we will know that it is not an issue with the network card. (We also tried another switch we have on hand, no change) Changing network hardware driver versions We've had the same problem with the latest Broadcom driver, as well as the built-in driver that ships in Windows Server 2008 R2. Replacing network cables As a last ditch effort we remembered another change that occurred was the replacement of all of the patch cords between our servers / switch. We had purchased two sets, one green of lengths 1ft - 3ft for the private interfaces and another set of red cables for the public interfaces. We swapped out all of the public interface patch cables with a different brand and ran our servers without issue for a full week ... aaaaaand then the problem recurred. Disable checksum offload, remove TProxy We also tried disabling TCP/IP checksum offload in the driver, no change. We're now pulling out TProxy and moving to a more traditional x-forwarded-for network arrangement without any fancy IP address rewriting. We'll see if that helps.

Read the article
Linux software raid fails to include one device for one RAID1 array

- by user1389890

One of my four Linux software raid arrays drops one of its two devices when I reboot my system. The other three arrays work fine. I am running RAID1 on kernel version 2.6.32-5-amd64. Every time I reboot, /dev/md2 comes up with only one device. I can manually add the device by saying $ sudo mdadm /dev/md2 --add /dev/sdc1. This works fine, and mdadm confirms that the device has been re-added as follows: mdadm: re-added /dev/sdc1 After adding the device and and allowing the array time to resynch, this is what the output of $ cat /proc/mdstat looks like: Personalities : [raid1] md3 : active raid1 sda4[0] sdb4[1] 244186840 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdc1[0] sdd1[1] 732574464 blocks [2/2] [UU] md1 : active raid1 sda3[0] sdb3[1] 722804416 blocks [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 6835520 blocks [2/2] [UU] unused devices: <none> Then after I reboot, this is what the output of $ cat /proc/mdstat looks like: Personalities : [raid1] md3 : active raid1 sda4[0] sdb4[1] 244186840 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdd1[1] 732574464 blocks [2/1] [_U] md1 : active raid1 sda3[0] sdb3[1] 722804416 blocks [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 6835520 blocks [2/2] [UU] unused devices: <none> During reboot, here is the output of $ sudo cat /var/log/syslog | grep mdadm : Jun 22 19:00:08 rook mdadm[1709]: RebuildFinished event detected on md device /dev/md2 Jun 22 19:00:08 rook mdadm[1709]: SpareActive event detected on md device /dev/md2, component device /dev/sdc1 Jun 22 19:00:20 rook kernel: [ 7819.446412] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446415] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446782] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446785] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.515844] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.515847] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.606829] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.606832] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855616] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855620] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855950] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855952] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8027.962169] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8027.962171] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8028.054365] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8028.054368] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.588662] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.588664] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.601990] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.601991] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.602693] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.602695] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.605981] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.605983] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.606138] mdadm: sending ioctl 800c0910 to a partition! Jun 22 19:10:23 rook kernel: [ 9.606139] mdadm: sending ioctl 800c0910 to a partition! Jun 22 19:10:48 rook mdadm[1737]: DegradedArray event detected on md device /dev/md2 Here is the mdadm.conf file: ARRAY /dev/md0 metadata=0.90 UUID=92121d42:37f46b82:926983e9:7d8aad9b ARRAY /dev/md1 metadata=0.90 UUID=9c1bafc3:1762d51d:c1ae3c29:66348110 ARRAY /dev/md2 metadata=0.90 UUID=98cea6ca:25b5f305:49e8ec88:e84bc7f0 ARRAY /dev/md3 metadata=1.2 name=rook:3 UUID=ca3fce37:95d49a09:badd0ddc:b63a4792 I also ran $ sudo smartctl -t long /dev/sdc and no hardware issues were detected. As long as I do not reboot, /dev/md2 seems to work fine. Does anyone have any suggestions? Here is the output of $ sudo mdadm -E /dev/sdc1 after re-adding the device and letting it resync: /dev/sdc1: Magic : a92b4efc Version : 0.90.00 UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook) Creation Time : Sun Jul 13 08:05:55 2008 Raid Level : raid1 Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Array Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Update Time : Mon Jun 24 07:42:49 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 5fd6cc13 - correct Events : 180998 Number Major Minor RaidDevice State this 0 8 33 0 active sync /dev/sdc1 0 0 8 33 0 active sync /dev/sdc1 1 1 8 49 1 active sync /dev/sdd1 Here is the output of $ sudo mdadm -D /dev/md2 after re-adding the device and letting it resync: /dev/md2: Version : 0.90 Creation Time : Sun Jul 13 08:05:55 2008 Raid Level : raid1 Array Size : 732574464 (698.64 GiB 750.16 GB) Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Mon Jun 24 07:42:49 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook) Events : 0.180998 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 49 1 active sync /dev/sdd1

Read the article
Upon reboot, Linux software raid fails to include one device of a RAID1 array

- by user1389890

One of my four Linux software raid arrays drops one of its two devices when I reboot my system. The other three arrays work fine. I am running RAID1 on kernel version 2.6.32-5-amd64 (Debian Squeeze). Every time I reboot, /dev/md2 comes up with only one device. I can manually add the device by saying $ sudo mdadm /dev/md2 --add /dev/sdc1. This works fine, and mdadm confirms that the device has been re-added as follows: mdadm: re-added /dev/sdc1 After adding the device and allowing the array time to resynch, this is what the output of $ cat /proc/mdstat looks like: Personalities : [raid1] md3 : active raid1 sda4[0] sdb4[1] 244186840 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdc1[0] sdd1[1] 732574464 blocks [2/2] [UU] md1 : active raid1 sda3[0] sdb3[1] 722804416 blocks [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 6835520 blocks [2/2] [UU] unused devices: <none> Then after I reboot, this is what the output of $ cat /proc/mdstat looks like: Personalities : [raid1] md3 : active raid1 sda4[0] sdb4[1] 244186840 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdd1[1] 732574464 blocks [2/1] [_U] md1 : active raid1 sda3[0] sdb3[1] 722804416 blocks [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 6835520 blocks [2/2] [UU] unused devices: <none> During reboot, here is the output of $ sudo cat /var/log/syslog | grep mdadm : Jun 22 19:00:08 rook mdadm[1709]: RebuildFinished event detected on md device /dev/md2 Jun 22 19:00:08 rook mdadm[1709]: SpareActive event detected on md device /dev/md2, component device /dev/sdc1 Jun 22 19:00:20 rook kernel: [ 7819.446412] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446415] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446782] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446785] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.515844] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.515847] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.606829] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.606832] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855616] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855620] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855950] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855952] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8027.962169] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8027.962171] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8028.054365] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8028.054368] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.588662] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.588664] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.601990] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.601991] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.602693] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.602695] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.605981] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.605983] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.606138] mdadm: sending ioctl 800c0910 to a partition! Jun 22 19:10:23 rook kernel: [ 9.606139] mdadm: sending ioctl 800c0910 to a partition! Jun 22 19:10:48 rook mdadm[1737]: DegradedArray event detected on md device /dev/md2 Here is the result of $ cat /etc/mdadm/mdadm.conf: ARRAY /dev/md0 metadata=0.90 UUID=92121d42:37f46b82:926983e9:7d8aad9b ARRAY /dev/md1 metadata=0.90 UUID=9c1bafc3:1762d51d:c1ae3c29:66348110 ARRAY /dev/md2 metadata=0.90 UUID=98cea6ca:25b5f305:49e8ec88:e84bc7f0 ARRAY /dev/md3 metadata=1.2 name=rook:3 UUID=ca3fce37:95d49a09:badd0ddc:b63a4792 Here is the output of $ sudo mdadm -E /dev/sdc1 after re-adding the device and letting it resync: /dev/sdc1: Magic : a92b4efc Version : 0.90.00 UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook) Creation Time : Sun Jul 13 08:05:55 2008 Raid Level : raid1 Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Array Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Update Time : Mon Jun 24 07:42:49 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 5fd6cc13 - correct Events : 180998 Number Major Minor RaidDevice State this 0 8 33 0 active sync /dev/sdc1 0 0 8 33 0 active sync /dev/sdc1 1 1 8 49 1 active sync /dev/sdd1 Here is the output of $ sudo mdadm -D /dev/md2 after re-adding the device and letting it resync: /dev/md2: Version : 0.90 Creation Time : Sun Jul 13 08:05:55 2008 Raid Level : raid1 Array Size : 732574464 (698.64 GiB 750.16 GB) Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Mon Jun 24 07:42:49 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook) Events : 0.180998 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 49 1 active sync /dev/sdd1 I also ran $ sudo smartctl -t long /dev/sdc and no hardware issues were detected. As long as I do not reboot, /dev/md2 seems to work fine. Does anyone have any suggestions?

Read the article
Windows 7 Machine Makes Router Drop -All- Wireless Connections

- by Hammer Bro.

Some background: My home network consists of my Desktop, a two-month old Windows 7 (x64) machine which is online most frequently (N-spec), as well as three other Windows XP laptops (all G) that only connect every now and then (one for work, one for Netflix, and the other for infrequent regular laptop uses). I used to have a Belkin F5D8236-4 wireless router, and everything worked great. A week ago, however, I found out that the Belkin absolutely in no way would establish a VPN connection, something that has become important for work. So I bought a Netgear WNR3500v2/U/L. The wireless was acting a little sketchy at first for just the Windows 7 machine, but I thought it had something to do with 802.11N and I was in a hurry so I just fished up an ethernet cable and disabled the computer's wireless. It has now become apparent, though, that whenever the Windows 7 machine is connected to the router, all wireless connections become unstable. I was using my work laptop for a solid six hours today with no trouble, having multiple SSH connections open over VPN and streaming internet radio in the background. Then, within two minutes of turning on this Windows 7 box, I had lost all connectivity over the wireless. And I was two feet away from the router. The same sort of thing happens on all of the other laptops -- Netflix can be playing stuff all weekend, but if I come up here and do things on this (W7) computer, the streaming will be dead within ten minutes. So here are my basic observations: If the Windows 7 machine is off, then all connections will have a Signal Strength of Very Good or Excellent and a Speed of 48-54 Mbps for an indefinite amount of time. Shortly after the Windows 7 machine is turned on, all wireless connections will experience a consistent decline in Speed down to 1.0 Mbps, eventually losing their connection entirely. These machines will continue to maintain 70% signal strength, as observed by themselves and router. Once dropped, a wireless connection will have difficulty reconnecting. And, if a connection manages to become established, it will quickly drop off again. The Windows 7 machine itself will continue to function just fine if it's using a wired connection, although it will experience these same issues over the wireless. All of the drivers and firmwares are up to date, and this happened both with the stock Netgear firmware as well as the (current) DD-WRT. What I've tried: Making sure each computer is being assigned a distinct IP. (They are.) Disabling UPnP and Stateful Packet Inspection on the router. Disabling Network Sharing, SSDP Discovery, TCP/IP NetBios Helper and Computer Browser services on the Windows 7 machine. Disabling QoS Packet Scheduler, IPv6, and Link Layer Topology Discovery options on my ethernet controller (leaving only Client for Microsoft Networks, File and Printer Sharing, and IPv4 enabled). What I think: It seems awfully similar to the problems discussed in detail at http://social.msdn.microsoft.com/Forums/en/wsk/thread/1064e397-9d9b-4ae2-bc8e-c8798e591915 (which was both the most relevant and concrete information I could dig up on the internet). I still think that something the Windows 7 IP stack (or just Operating System itself) is doing is giving the router fits. However, I could be wrong, because I have two key differences. One is that most instances of this problem are reported as the entire router dying or restarting, and mine still works just fine over the wired connection. The other is that it's a new router, tested with both the factory firmware and the (I assume) well-maintained DD-WRT project. Even if Windows 7 is still secretly sending IPv6 packets or the TCP Window Scaling implementation that I hear Vista caused some trouble with (even though I've tried my best to disable anything fancy), this router should support those functions. I don't want to get a new or a replacement router unless someone can convince me that this is a defective unit. But the problem seems too specific and predictable by my instincts to be a hardware hiccup. And I don't want to deal with the inevitable problems that always seem to take half a day to resolve when getting a new router, since I'm frantically working (including tomorrow) to complete a project by next week's deadline. Plus, I think in the worst case scenario, I could keep this router connected directly to the modem, disable its wireless entirely, and connect the old Belkin to it directly. That should allow me to still use VPN (although I'll have to plug my work laptop directly into that router), and then maintain wireless connections for all of the other computers. But that feels so wrong to me. Anyone have any ideas what the cause and possible solution could be? Clarifications: The Windows 7 machine is directly connected via an ethernet cable to the router for everything above. But while it is online, all other computers' wireless connections become unusable. It is not an issue of signal strength or interference -- no other devices within scanning range are using Channel 1, and the problem will affect computers that are literally feet away from the router with 95% signal strength.

Read the article
Hyper-V File Server Clustering - at my wit’s end

- by René Kåbis

I am at my wit’s end with File Server clustering under Hyper-V. I am hoping that someone might be able to help me figure out this Gordian Knot of a technology that seems to have dead ends (like forcing cluster VMs to use iSCSI drives where normally-attached VHDX drives could suffice) where logic and reason would normally provide a logical solution. My hardware: I will be running three servers (in the end), but right now everything is taking place on one server. One of the secondary servers will exist purely as a witness/quorum, and another slightly more powerful one will be acting as an emergency backup (with additional storage, just not redundant) to hold the secondary AD VM and the other halves of a set of clustered VMs: the SQL VM and the file system VM. Please note, these each are the depreciated nodes of a cluster, the main nodes will be on the most powerful first machine. My heavy lifter is a machine that also contains all of the truly redundant storage on the network. If this gives anyone the heebie-geebies, too bad. It has a 6TB (usable) RAID-10 array, and will (in the end) hold the primary nodes of both aforementioned clusters, but is right now holding all VMs. This is, right now: DC01, DC02, SQL01, SQL02, FS01 & FS02. Eventually, I will be adding additional VMs to handle Exchange, Sharepoint and Lync, but only to this main server (the secondary server won't be able to handle more than three or four VMs, so why burden it? The AD, SQL & FS VMs are the most critical for the business). If anyone is now saying, “wait, what about a SAN or a NAS for the file servers?”, well too bad. What exists on the main machine is what I have to deal with. I followed these instructions, but I seem to be unable to get things to work. In order to make the file server truly redundant, I cannot trust any one machine to hold the only data store on the network. Therefore, I have created a set of iSCSI drives on the VM-host of the main machine, and attached one to each file server VM. The end result is that I want my FS01 to sit on the heavy lifter, along with its iSCSI “drive”, and FS02 will sit on the secondary machine with its own iSCSI “drive” there as well. That is, neither iSCSI drive will end up sitting on the same machine as the other. As such, the clustered FS will utterly duplicate the contents of the iSCSI drives between each other, so that if one physical machine (or the FS VM) goes toes-up, the other has got a full copy of the data on its own iSCSI drive. My problem occurs when I try to apply the file server role within the failover cluster manager. Actually, it is even before that -- it occurs when adding the disks. Since I have added each disk preferentially to a specific VM (by limiting the initiator by DNS hostname, and by adding two-way CHAP authentication), this forces each VM to be in control of its own iSCSI disk. However, when I try to add the disks to the Disks section of Storage within Failover Cluster Manager, the entire process fails for a random disk of the pair. That is, one will get online, but the other will remain offline because it does not have the correct “owner node”. I mean, really -- WTF? Of course it doesn’t have the right owner node, both drives are showing the same node name!! I cannot seem to have one drive show up with one node name as owner, and the other drive show up with the other node name as owner. And because both drives are not “online”, I cannot create a pool to apply to a cluster role. Talk about getting stuck between a rock and a hard place! I’ve got more to add, but my work is closing for the day and I have to wrap things up. I will try to add more tomorrow morning when I get in. My main objective is to have a file server VM on each machine, the storage on each machine, but a transparent failover in case one physical machine fails. Essentially, a failover FS that doesn’t care which machine fails -- the storage contents are replicated equally on each machine. Am I even heading in the right direction?

Read the article
LDAP query on linux against AD returns groups with no members

- by SethG

I am using LDAP+kerberos to authenticate against Active Directory on Windows 2003 R2. My krb5.conf and ldap.conf appear to be correct (according to pretty much every sample I found on the 'net). I can login to the host with both password and ssh keys. When I run getent passwd, all my ldap user accounts are listed with all the important attributes. When I run getent group, all the ldap groups and their gid's are listed, but no group members. If I run ldapsearch and filter on any group, the members are all listed with the "member" attribute. So the data is there for the taking, it's just not being parsed properly. It would appear that I simply am using an incorrect mapping in ldap.conf, but I can't see it. I've tried several variations and all give the same result. Here is my current ldap.conf: host <ad-host1-ip> <ad-host2-ip> base dc=my,dc=full,dc=dn uri ldap://<ad-host1> ldap://<ad-host2> ldap_version 3 binddn <mybinddn> bindpw <mybindpw> scope sub bind_policy hard nss_reconnect_tries 3 nss_reconnect_sleeptime 1 nss_reconnect_maxsleeptime 8 nss_reconnect_maxconntries 3 nss_map_objectclass posixAccount User nss_map_objectclass posixGroup Group nss_map_attribute uid sAMAccountName nss_map_attribute gidNumber msSFU30GidNumber nss_map_attribute uidNumber msSFU30UidNumber nss_map_attribute cn cn nss_map_attribute gecos displayName nss_map_attribute homeDirectory msSFU30HomeDirectory nss_map_attribute loginShell msSFU30LoginShell nss_map_attribute uniqueMember member pam_filter objectcategory=User pam_login_attribute sAMAccountName pam_member_attribute member pam_password ad Here's the kicker: this config works 100% fine on a different linux box with a different distro. It does not work on the distro I am planning on switching to. I have installed from source the versions of pam_ldap and nss_ldap on the new box to match the old box, which fixed another problem I was having with this setup. Other relevant info is the original AD box was Windows 2003. It's mirror died a horrible hardware death so I'm trying to add two more 2003-R2 servers to the mirror tree and ultimately drop the old 2003 box. The new R2 boxes appear to have joined the DC forest properly. What do I need to do to get groups working? I've exhausted all the resources I could find and need a different angle. Any input is appreciated. Status update, 7/31/09 I have managed to tweak my config file to get full info from the AD and performance is nice and snappy. I replaced the back-rev'd copies of pam_ldap and nss_ldap with the current ones for the distro I'm using, so it's back to a standard out-of-the-box install. Here's my current config: host <ad-host1-ip> <ad-host2-ip> base dc=my,dc=full,dc=dn uri ldap://<ad-host1> ldap://<ad-host2> ldap_version 3 binddn <mybinddn> bindpw <mybindpw> scope sub bind_policy soft nss_reconnect_tries 3 nss_reconnect_sleeptime 1 nss_reconnect_maxsleeptime 8 nss_reconnect_maxconntries 3 nss_connect_policy oneshot referrals no nss_map_objectclass posixAccount User nss_map_objectclass posixGroup Group nss_map_attribute uid sAMAccountName nss_map_attribute gidNumber msSFU30GidNumber nss_map_attribute uidNumber msSFU30UidNumber nss_map_attribute cn cn nss_map_attribute gecos displayName nss_map_attribute homeDirectory msSFU30HomeDirectory nss_map_attribute loginShell msSFU30LoginShell nss_map_attribute uniqueMember member pam_filter objectcategory=CN=Person,CN=Schema,CN=Configuration,DC=w2k,DC=cis,DC=ksu,DC=edu pam_login_attribute sAMAccountName pam_member_attribute member pam_password ad ssl off tls_checkpeer no sasl_secprops maxssf=0 The remaining problem now is when you run the groups command, not all subscribed groups are listed. Some are (one or two), but not all. Group memberships are still honored, such as file and printer access. getent group foo still shows that the user is a member of group foo. So it appears to be a presentation bug, and does not interfere with normal operation. It also appears that some (I have not determined exactly how many) group searches do not resolve correctly, even though the group is listed. eg, when you run "getent group bar", nothing is returned, but if you run "getent group|grep bar" or "getent group|grep <bar_gid>" you can see that it indeed listed and your group name and gid are correct. This still seems like an LDAP search or mapping error, but I can't figure out what it is. I'm a heckuva lot closer than earlier in the week, but I'd really like to get this last detail ironed out.

Read the article
Can't Remove Logical Drive/Array from HP P400

- by Myles

This is my first post here. Thank you in advance for any assistance with this matter. I'm trying to remove a logical drive (logical drive 2) and an array (array "B") from my Smart Array P400. The host is a DL580 G5 running 64-bit Red Hat Enterprise Linux Server release 5.7 (Tikanga). I am unable to remove the array using either hpacucli or cpqacuxe. I believe it is because of "OS Status: LOCKED". The file system that lives on this array has been unmounted. I do not want to reboot the host. Is there some way to "release" this logical drive so I can remove the array? Note that I do not need to preserve the data on logical drive 2. I intend to physically remove the drives from the machine and replace them with larger drives. I'm using the cciss kernel module that ships with Red Hat 5.7. Here is some information pertaining to the host and the P400 configuration: [root@gort ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.7 (Tikanga) [root@gort ~]# uname -a Linux gort 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux [root@gort ~]# rpm -qa | egrep '^(hp|cpq)' cpqacuxe-9.30-15.0 hp-health-9.25-1551.7.rhel5 hpsmh-7.1.2-3 hpdiags-9.3.0-466 hponcfg-3.1.0-0 hp-snmp-agents-9.25-2384.8.rhel5 hpacucli-9.30-15.0 [root@gort ~]# hpacucli HP Array Configuration Utility CLI 9.30.15.0 Detecting Controllers...Done. Type "help" for a list of supported commands. Type "exit" to close the console. => ctrl all show config detail Smart Array P400 in Slot 0 (Embedded) Bus Interface: PCI Slot: 0 Cache Serial Number: PA82C0J9SVW34U RAID 6 (ADG) Status: Enabled Controller Status: OK Hardware Revision: D Firmware Version: 7.22 Rebuild Priority: Medium Expand Priority: Medium Surface Scan Delay: 15 secs Surface Scan Mode: Idle Wait for Cache Room: Disabled Surface Analysis Inconsistency Notification: Disabled Post Prompt Timeout: 0 secs Cache Board Present: True Cache Status: OK Cache Ratio: 25% Read / 75% Write Drive Write Cache: Disabled Total Cache Size: 256 MB Total Cache Memory Available: 208 MB No-Battery Write Cache: Disabled Cache Backup Power Source: Batteries Battery/Capacitor Count: 1 Battery/Capacitor Status: OK SATA NCQ Supported: True Logical Drive: 1 Size: 136.7 GB Fault Tolerance: RAID 1 Heads: 255 Sectors Per Track: 32 Cylinders: 35132 Strip Size: 128 KB Full Stripe Size: 128 KB Status: OK Caching: Enabled Unique Identifier: 600508B100184A395356573334550002 Disk Name: /dev/cciss/c0d0 Mount Points: /boot 101 MB, /tmp 7.8 GB, /usr 3.9 GB, /usr/local 2.0 GB, /var 3.9 GB, / 2.0 GB, /local 113.2 GB OS Status: LOCKED Logical Drive Label: A0027AA78DEE Mirror Group 0: physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK) Mirror Group 1: physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK) Drive Type: Data Array: A Interface Type: SAS Unused Space: 0 MB Status: OK Array Type: Data physicaldrive 1I:1:1 Port: 1I Box: 1 Bay: 1 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM57RF40000983878FX Model: HP DG146BB976 Current Temperature (C): 29 Maximum Temperature (C): 35 PHY Count: 2 PHY Transfer Rate: Unknown, Unknown physicaldrive 1I:1:2 Port: 1I Box: 1 Bay: 2 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM55VQC000098388524 Model: HP DG146BB976 Current Temperature (C): 29 Maximum Temperature (C): 36 PHY Count: 2 PHY Transfer Rate: Unknown, Unknown Logical Drive: 2 Size: 546.8 GB Fault Tolerance: RAID 5 Heads: 255 Sectors Per Track: 32 Cylinders: 65535 Strip Size: 64 KB Full Stripe Size: 256 KB Status: OK Caching: Enabled Parity Initialization Status: Initialization Completed Unique Identifier: 600508B100184A395356573334550003 Disk Name: /dev/cciss/c0d1 Mount Points: None OS Status: LOCKED Logical Drive Label: A5C9C6F81504 Drive Type: Data Array: B Interface Type: SAS Unused Space: 0 MB Status: OK Array Type: Data physicaldrive 1I:1:3 Port: 1I Box: 1 Bay: 3 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM2H5PE00009802NK19 Model: HP DG146ABAB4 Current Temperature (C): 30 Maximum Temperature (C): 37 PHY Count: 1 PHY Transfer Rate: Unknown physicaldrive 1I:1:4 Port: 1I Box: 1 Bay: 4 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM28YY400009750MKPJ Model: HP DG146ABAB4 Current Temperature (C): 31 Maximum Temperature (C): 36 PHY Count: 1 PHY Transfer Rate: 3.0Gbps physicaldrive 2I:1:5 Port: 2I Box: 1 Bay: 5 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM2FGYV00009802N3GN Model: HP DG146ABAB4 Current Temperature (C): 30 Maximum Temperature (C): 38 PHY Count: 1 PHY Transfer Rate: Unknown physicaldrive 2I:1:6 Port: 2I Box: 1 Bay: 6 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM8AFAK00009920MMV1 Model: HP DG146BB976 Current Temperature (C): 31 Maximum Temperature (C): 41 PHY Count: 2 PHY Transfer Rate: Unknown, Unknown physicaldrive 2I:1:7 Port: 2I Box: 1 Bay: 7 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM2FJQD00009801MSHQ Model: HP DG146ABAB4 Current Temperature (C): 29 Maximum Temperature (C): 39 PHY Count: 1 PHY Transfer Rate: Unknown

Read the article
Can connect to Samba server but cannot access shares?

- by jlego

I have setup a stand-alone box running Fedora 16 to use as a file-sharing and web development server. Needs to be able to share files with a PC running Windows 7 and a Mac running OSX Snow Leopard. I've setup Samba using the Samba configuration GUI tool. Added users to Fedora and connected them as Samba users (which are the same as the Windows and Mac usernames and passwords). The workgroup name is the same as the Windows workgroup. Authentication is set to User. I've allowed Samba and Samba client through the firewall and set the ethernet to a trusted port in the firewall. Both the Windows and Mac machines can connect to the server and view the shares, however when trying to access the shares, Windows throws error 0x80070035 " Windows cannot access \SERVERNAME\ShareName." Windows user is not prompted for a username or password when accessing the server (found under "Network Places"). This also happens when connecting with the IP rather than the server name. The Mac can also connect to the server and see the shares but when choosing a share gives the error "The original item for ShareName cannot be found." When connecting via IP, the Mac user is prompted for username and password, which when authenticated gives a list of shares, however when choosing a share to connect to, the error is displayed and the user cannot access the share. Since both machines are acting similarly when trying to access the shares, I assume it is an issue with how Samba is configured. smb.conf: [global] workgroup = workgroup server string = Server log file = /var/log/samba/log.%m max log size = 50 security = user load printers = yes cups options = raw printcap name = lpstat printing = cups [homes] comment = Home Directories browseable = no writable = yes [printers] comment = All Printers path = /var/spool/samba browseable = yes printable = yes [FileServ] comment = FileShare path = /media/FileServ read only = no browseable = yes valid users = user1, user2 [webdev] comment = Web development path = /var/www/html/webdev read only = no browseable = yes valid users = user1 How do I get samba sharing working? UPDATE: Before this box I had another box with the same version of fedora installed (16) and samba working for these same computers. I started up the old machine and copied the smb.conf file from the old machine to the new one (editing the share definitions for the new shares of course) and I still get the same errors on both client machines. The only difference in environment is the hardware and the router. On the old machine the router received a dynamic public IP and assigned dynamic private IPs to each device on the network while the new machine is connected to a router that has a static public IP (still dynamic internal IPs though.) Could either one of these be affecting Samba? UPDATE 2: As the directory I am trying to share is actually an entire internal disk, I have tried to things: 1.) changing the owner of the mounted disk from root to my user (which is the same username as on the Windows machine) 2.) made a share that only included one of the folders on the disk instead of the entire disk with my user again as the owner. Both tests failed giving me the same errors regarding the network address. UPDATE 3: Not sure exactly what I did, but now whenever I try to connect to the share on the Windows 7 client I am prompted for my username and password. When I enter the correct credentials I get an access denied message. However I did notice that under the login box "domain: WINDOWS-PC-NAME" is listed. I believe this could very well be the problem. Any suggestions? UPDATE 4: So I've completely reinstalled Fedora and Samba now. I've created a share on the first harddrive (one fedora is installed on) and I can access that fine from Windows. However when I try to share any data on the second disk, I am receiving the same error. This I believe is the problem. I think I need to change some things in fstab or fdisk or something. UPDATE 5: So in fstab I mapped the drive to automount in a folder which works correctly. I also added the samba_share_t SElinux label to the mountpoint directory which now allows me to access the shares on the Windows machine, however I cannot see any of the files in the directory on the windows machine. (They are there, I can see them in the fedora file browser locally) UPDATE 6: Figured it out. See answer below

Read the article
Windows 7 Pro sysprep not working

- by Callum D

Hello, I'm trying to sysprep a Windows 7 Professional machine, prior to grabbing an image for mass deployment on identical hardware, and am having a hard time getting sysprep to work (at all). I've created an XML answer file with WSIM, and have a basic setupcomplete.cmd file, but none of the configurations in the answer file seem to be applied. I've read technet articles and googled, and I still have no idea why this is happening. Is someone able to have a look at the answer file I've attached and let me know where I'm going wrong? thanks, Callum AutoUnattend.XML <?xml version="1.0" encoding="utf-8"?> <unattend xmlns="urn:schemas-microsoft-com:unattend"> <settings pass="specialize"> <component name="Microsoft-Windows-Shell-Setup" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <AutoLogon> <Password> <Value>**********************************</Value> <PlainText>false</PlainText> </Password> <Username>administrator</Username> <LogonCount>1</LogonCount> <Enabled>true</Enabled> </AutoLogon> <WindowsFeatures> <ShowMediaCenter>false</ShowMediaCenter> <ShowWindowsMediaPlayer>false</ShowWindowsMediaPlayer> </WindowsFeatures> <CopyProfile>true</CopyProfile> <DoNotCleanTaskBar>true</DoNotCleanTaskBar> <RegisteredOrganization>SomeCompany (UK) Ltd.</RegisteredOrganization> <RegisteredOwner>SomeCompany User</RegisteredOwner> <ShowWindowsLive>false</ShowWindowsLive> <TimeZone>GMT Standard Time</TimeZone> </component> <component name="Security-Malware-Windows-Defender" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <DisableAntiSpyware>true</DisableAntiSpyware> </component> </settings> <settings pass="oobeSystem"> <component name="Microsoft-Windows-International-Core" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <SystemLocale>en-UK</SystemLocale> <UserLocale>en-UK</UserLocale> <UILanguage>en-US</UILanguage> <InputLocale>0809:00000809</InputLocale> </component> <component name="Microsoft-Windows-Shell-Setup" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <OOBE> <HideEULAPage>true</HideEULAPage> <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE> <NetworkLocation>Work</NetworkLocation> <ProtectYourPC>1</ProtectYourPC> </OOBE> <UserAccounts> <AdministratorPassword> <Value>*************************************************=</Value> <PlainText>false</PlainText> </AdministratorPassword> </UserAccounts> </component> <component name="Microsoft-Windows-Deployment" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Reseal> <Mode>OOBE</Mode> </Reseal> </component> </settings> <settings pass="generalize"> <component name="Microsoft-Windows-Security-SPP" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <SkipRearm>0</SkipRearm> </component> </settings> <settings pass="windowsPE"> <component name="Microsoft-Windows-Setup" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <UseConfigurationSet>true</UseConfigurationSet> </component> </settings> <cpi:offlineImage cpi:source="wim:c:/wim/install.wim#Windows 7 PROFESSIONAL" xmlns:cpi="urn:schemas-microsoft-com:cpi" /> </unattend>

Read the article
How do I make Linux recognize a new SATA /dev/sda drive I hot swapped in without rebooting?

- by Philip Durbin

Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized: [root@fs-2 ~]# tail -18 /var/log/messages May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake } May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset May 5 16:54:45 fs-2 kernel: ata1: soft resetting link May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:54:55 fs-2 kernel: ata1: soft resetting link May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:05 fs-2 kernel: ata1: soft resetting link May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps May 5 16:55:40 fs-2 kernel: ata1: soft resetting link May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up May 5 16:55:45 fs-2 kernel: ata1: EH complete I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work: [root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan -bash: echo: write error: Invalid argument [root@fs-2 ~]# [root@fs-2 ~]# /root/rescan-scsi-bus.sh -l [snip] 0 new device(s) found. 0 device(s) removed. [root@fs-2 ~]# [root@fs-2 ~]# ls /dev/sda ls: /dev/sda: No such file or directory I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting? The operating system in question is RHEL5.3: [root@fs-2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.3 (Tikanga) The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS. Here is the lscpi output: [root@fs-2 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful. The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output: [root@www-1 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)

Read the article
Linux networking crash: best steps to find out the cause?

- by Aron Rotteveel

One of our Linux (CentOS) servers was unreachable last night. The server was not reachable in any way except for the remote console. After logging in with the remote console, it turned out I could not ping any outside hosts either. A simple service network restart solved the issue, but I am still wondering what could have caused this. My log files seem to indicate no error at all (except for the various daemons that need a network connection and failed after the network failure). Are there any additional steps I can take to find out the cause of this problem? EDIT: this just happened again. The server was completely unresponsive until I issued a networking service restart. Any advise is welcome. Could this be caused by a faulty hardware component? As per Madhatters request, here are some excerpts from the log at the time (the network crashed at 20:13): /var/log/messages: Dec 2 20:01:05 graviton kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=<stripped> SRC=<stripped> DST=<stripped> LEN=40 TOS=0x00 PREC=0x00 TTL=101 ID=256 PROTO=TCP SPT=6000 DPT=3306 WINDOW=16384 RES=0x00 SYN URGP=0 Dec 2 20:01:05 graviton kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=<stripped> SRC=<stripped> DST=<stripped> LEN=40 TOS=0x00 PREC=0x00 TTL=100 ID=256 PROTO=TCP SPT=6000 DPT=3306 WINDOW=16384 RES=0x00 SYN URGP=0 Dec 2 20:01:05 graviton kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=<stripped> SRC=<stripped> DST=<stripped> LEN=40 TOS=0x00 PREC=0x00 TTL=101 ID=256 PROTO=TCP SPT=6000 DPT=3306 WINDOW=16384 RES=0x00 SYN URGP=0 Dec 2 20:13:34 graviton junglediskserver: Connection to gateway failed: xGatewayTransport - Connection to gateway failed. The first three messages are simple responses to iptables rules I have set up through the LFD firewall. The last message indicates that JungleDisk, which I use for backups can no longer connect to the gateway. Apart from this, there are no interesting messages around this time. EDIT 4 dec: as per Mattdm's request, here is the output of ethtool eth0: (Please not that these are the settings that currently work. If things go wrong again, I will be sure to post this again if necessary. Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: yes As per Joris' request, here is also the output of route -n: aron@graviton [~]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface xx.xx.xx.58 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.42 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.43 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.41 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.46 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.47 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.44 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.45 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.50 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.51 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.48 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.49 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.54 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.52 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.53 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 xx.xx.xx.0 0.0.0.0 255.255.255.192 U 0 0 0 eth0 xx.xx.xx.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 0.0.0.0 xx.xx.xx.62 0.0.0.0 UG 0 0 0 eth0 The bottom xx.62 is my gateway. EDIT december 28th: the problem occurred again and I got the chance to compare some of the outputs of the above tests. What I found out is that arp -an returns an incomplete MAC address for my gateway (which is not under my control; the server is in a shared rack): During failure: ? (xx.xx.xx.62) at <incomplete> on eth0 After service network restart: ? (xx.xx.xx.62) at 00:00:0C:9F:F0:30 [ether] on eth0 Is this something I can fix or is it time for me to contact the data centre?

Read the article
Trouble recovering MySQL InnoDB database after server crash

- by Andy Shinn

I had a server crash due to a broken iSCSI link (the filesystem went into read-only mode). After repairing the link and rebooting the machine (CentOS 5 / MySQL 5.1), the MySQL server would not start and gave the following error: 100603 19:11:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 100603 19:11:46 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... InnoDB: Warning: database page corruption or a failed InnoDB: file read of page 112541. InnoDB: Trying to recover it from the doublewrite buffer. InnoDB: Dump of the page: 100603 19:11:46 InnoDB: Page dump in ascii and hex (16384 bytes): lots of binary data 100603 19:11:46 InnoDB: Page checksum 953720272, prior-to-4.0.14-form checksum 2641912043 InnoDB: stored checksum 617821918, prior-to-4.0.14-form stored checksum 2080617765 InnoDB: Page lsn 115 2632899642, low 4 bytes of lsn at page end 2641594600 InnoDB: Page number (if stored to page already) 112541, InnoDB: space id (if created with = MySQL-4.1.1 and stored already) 0 InnoDB: Page may be an index page where index id is 0 616 InnoDB: Dump of corresponding page in doublewrite buffer: 100603 19:11:46 InnoDB: Page dump in ascii and hex (16384 bytes): more binary data 100603 19:11:46 InnoDB: Page checksum 908374788, prior-to-4.0.14-form checksum 824841363 InnoDB: stored checksum 912869634, prior-to-4.0.14-form stored checksum 2210927931 InnoDB: Page lsn 115 2635312169, low 4 bytes of lsn at page end 2633173354 InnoDB: Page number (if stored to page already) 112541, InnoDB: space id (if created with = MySQL-4.1.1 and stored already) 0 InnoDB: Page may be an index page where index id is 0 616 InnoDB: Also the page in the doublewrite buffer is corrupt. InnoDB: Cannot continue operation. InnoDB: You can try to recover the database with the my.cnf InnoDB: option: InnoDB: innodb_force_recovery=6 100603 19:11:46 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended Per the error message, I have tried setting set-variable=innodb_force_recovery=6 in the my.cnf to get to the data. This allows the MySQL server to start. But when I try to do a mysqldump of the database or a SELECT * INTO OUTFILE "filename" FROM broken_table; it seems to endlessly just export the same line over and over again. I have also tried http://code.google.com/p/innodb-tools/. But this tool fails with an error that 'blob' type is not supported. If I try to access the data using the PHP application it crashes MySQL: `100603 19:19:19 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8384512 read_buffer_size=131072 max_used_connections=2 max_threads=151 threads_connected=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338317 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x15d33f0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x453aff00 thread_stack 0x40000 /usr/libexec/mysqld(my_print_stacktrace+0x24) [0x874364] /usr/libexec/mysqld(handle_segfault+0x346) [0x5c9166] /lib64/libpthread.so.0 [0x3a6e40eb10] /usr/libexec/mysqld(rec_get_offsets_func+0x30) [0x7cc310] /usr/libexec/mysqld [0x7674d8] /usr/libexec/mysqld(btr_search_info_update_slow+0x638) [0x768d48] /usr/libexec/mysqld(btr_cur_search_to_nth_level+0xc7d) [0x75f86d] /usr/libexec/mysqld [0x7dd1c1] /usr/libexec/mysqld(row_search_for_mysql+0x18b0) [0x7e03d0] /usr/libexec/mysqld(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int)+0x7c) [0x7526fc] /usr/libexec/mysqld(handler::read_multi_range_next(st_key_multi_range**)+0x29) [0x6aed09] /usr/libexec/mysqld(QUICK_RANGE_SELECT::get_next()+0x194) [0x69a964] /usr/libexec/mysqld [0x6aafe9] /usr/libexec/mysqld(sub_select(JOIN*, st_join_table*, bool)+0x56) [0x635196] /usr/libexec/mysqld [0x63f9cd] /usr/libexec/mysqld(JOIN::exec()+0x950) [0x6497c0] /usr/libexec/mysqld(mysql_select(THD*, Item*, TABLE_LIST*, unsigned int, List&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsign ed long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x17b) [0x64b34b] /usr/libexec/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x169) [0x64bc79] /usr/libexec/mysqld [0x5d34b6] /usr/libexec/mysqld(mysql_execute_command(THD*)+0x4e5) [0x5d6b45] /usr/libexec/mysqld(mysql_parse(THD*, char const*, unsigned int, char const)+0x211) [0x5dc321] /usr/libexec/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x10b8) [0x5dd3f8] /usr/libexec/mysqld(do_command(THD*)+0xe6) [0x5dd9e6] /usr/libexec/mysqld(handle_one_connection+0x73d) [0x5d036d] /lib64/libpthread.so.0 [0x3a6e40673d] /lib64/libc.so.6(clone+0x6d) [0x3a6d4d3d1d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd-query at 0x15de5e0 = SELECT DISTINCT count(DISTINCT i.itemid) as rowscount,i.hostid FROM items i WHERE ((i.itemid BETWEEN 000000000000000 AND 0999999 99999999)) AND i.type<9 AND (i.hostid IN (10017,10047,10050,10054,10056,10059,10062,10063,10064,10065,10066,10067,10068,10069,10070,10071,10072,10073,100 74,10075,10076,10077,10078,10079,10080,10081,10082,10084,10088,10089,10090,10091,10092,10093,10094,10095,10096,10097,10098,10099,10100,10101,10102,10103,10 104,10105,10106,10107,10108,10109)) GROUP BY i.hostid thd-thread_id=3 thd-killed=NOT_KILLED The manual page at contains information that should help you find out what is causing the crash. 100603 19:19:19 mysqld_safe Number of processes running now: 0 100603 19:19:19 mysqld_safe mysqld restarted` Before recovering form an older backup as a last resort I am looking for anymore suggestions. Thanks!

Read the article
GRUB-2 Bootloader fails to load for lack of floppy drive. Ubuntu 10.4 & Windows XP

- by kammer

2010.07.21 while trying to install Ubuntu 10.4 Hello all, I've been trying to install Ubuntu 10.04 on my Dell workstation and am unable to get the Grub-2 bootloader to load properly. It seems to be failing for lack of a floppy drive on the system resulting in an error message that reads : error: fd0 cannot get C/H/S values. I've gone through the Grub-2 page at https://help.ubuntu.com/community/Grub2 to no avail and other sources having similar problems have likewise turned up no solutions. I would certainly appreciate any insight, here's the background: A while back I was trying to install a different version of Linux and had the same problems, then had to set the project aside for a bit. I don't think this has anything to do with Linux or Ubuntu per se, but rather Grub. The system is an old (4-5 years) Dell workstation that has one drive (128 GB) set up for Windows XP and a second new drive (500GB) which I installed for Linux. There is a DVD/CD drive and the system contains no floppy drive at all. In one attempt to get this working I tried modifying the BIOS to indicate there was a floppy drive - this created a failure earlier in the chain with the BIOS failing to load properly, not unexpected, just a shot in the dark at that point. At the moment I am considering just running out to buy and install a cheap floppy drive to see if that helps. I'll never use the thing though so I'd rather find a solution that doesn't require me to spend money on useless hardware. In any case, here's the /boot/grub/grub.cfg contents: # # DO NOT EDIT THIS FILE # # It is automatically generated by /usr/sbin/grub-mkconfig using templates # from /etc/grub.d and settings from /etc/default/grub # ### BEGIN /etc/grub.d/00_header ### if [ -s $prefix/grubenv ]; then load_env fi set default="0" if [ ${prev_saved_entry} ]; then set saved_entry=${prev_saved_entry} save_env saved_entry set prev_saved_entry= save_env prev_saved_entry set boot_once=true fi function savedefault { if [ -z ${boot_once} ]; then saved_entry=${chosen} save_env saved_entry fi } function recordfail { set recordfail=1 if [ -n ${have_grubenv} ]; then if [ -z ${boot_once} ]; then save_env recordfail; fi; fi } insmod ext2 set root='(hd1,1)' search --no-floppy --fs-uuid --set fbebde47-f488-41b0-9480-337802ecb988 if loadfont /usr/share/grub/unicode.pf2 ; then set gfxmode=640x480 insmod gfxterm insmod vbe if terminal_output gfxterm ; then true ; else # For backward compatibility with versions of terminal.mod that don't # understand terminal_output terminal gfxterm fi fi insmod ext2 set root='(hd1,1)' search --no-floppy --fs-uuid --set fbebde47-f488-41b0-9480-337802ecb988 set locale_dir=($root)/boot/grub/locale set lang=en insmod gettext if [ ${recordfail} = 1 ]; then set timeout=-1 else set timeout=10 fi insmod play play 480 440 1 ### END /etc/grub.d/00_header ### ### BEGIN /etc/grub.d/05_debian_theme ### set menu_color_normal=white/black set menu_color_highlight=black/light-gray ### END /etc/grub.d/05_debian_theme ### ### BEGIN /etc/grub.d/10_linux ### menuentry 'Ubuntu, with Linux 2.6.32-21-generic' --class ubuntu --class gnu-linux --class gnu --class os { recordfail insmod ext2 set root='(hd1,1)' search --no-floppy --fs-uuid --set fbebde47-f488-41b0-9480-337802ecb988 linux /boot/vmlinuz-2.6.32-21-generic root=UUID=fbebde47-f488-41b0-9480-337802ecb988 ro quiet splash initrd /boot/initrd.img-2.6.32-21-generic } menuentry 'Ubuntu, with Linux 2.6.32-21-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os { recordfail insmod ext2 set root='(hd1,1)' search --no-floppy --fs-uuid --set fbebde47-f488-41b0-9480-337802ecb988 echo 'Loading Linux 2.6.32-21-generic ...' linux /boot/vmlinuz-2.6.32-21-generic root=UUID=fbebde47-f488-41b0-9480-337802ecb988 ro single echo 'Loading initial ramdisk ...' initrd /boot/initrd.img-2.6.32-21-generic } ### END /etc/grub.d/10_linux ### ### BEGIN /etc/grub.d/20_memtest86+ ### menuentry "Memory test (memtest86+)" { insmod ext2 set root='(hd1,1)' search --no-floppy --fs-uuid --set fbebde47-f488-41b0-9480-337802ecb988 linux16 /boot/memtest86+.bin } menuentry "Memory test (memtest86+, serial console 115200)" { insmod ext2 set root='(hd1,1)' search --no-floppy --fs-uuid --set fbebde47-f488-41b0-9480-337802ecb988 linux16 /boot/memtest86+.bin console=ttyS0,115200n8 } ### END /etc/grub.d/20_memtest86+ ### ### BEGIN /etc/grub.d/30_os-prober ### menuentry "Microsoft Windows XP Home Edition (on /dev/sda1)" { insmod ntfs set root='(hd0,1)' search --no-floppy --fs-uuid --set 6ef0d4b4f0d4842d drivemap -s (hd0) ${root} chainloader +1 } ### END /etc/grub.d/30_os-prober ### ### BEGIN /etc/grub.d/40_custom ### # This file provides an easy way to add custom menu entries. Simply type the # menu entries you want to add after this comment. Be careful not to change # the 'exec tail' line above. ### END /etc/grub.d/40_custom ### Thoughts anyone? Thanks in advance.

Read the article
Spotlight Infinite Indexing issue (external data drive)

- by Manca Weeks

This is an external drive, formerly a boot drive which is now in use only to access music files (sibelius, audio, midi, live, logic etc.) without transferring the data into a new boot system, partly because of the issue I am about to describe, but mostly because the majority of the data is mainly there for archival purposes. The user is a composer and prominent musician and needs to be able to rehash the data at will. I have tried several things - here is a list: - make complete filesystem clone with antonio diaz's ddrescue - run Disk Warrior on copy, repair whatever errors occurred - wipe out all ACLs on entire drive - set all permissions to the same value - wide open 777 - remove any system data (applications, system files, including hidden files to the best of my knowledge) by selecting only non-system/app data and using Carbon Copy Cloner to put only the data of interest onto a newly formatted drive - transfer data to newly formatted drive folder by folder, resetting the spotlight index in between adding each to observe for issues (interesting here is that no issues occurred except for in Documents folder - when I transferred only the Documents folder to a newly formatted drive on its own - no trouble. It appears almost as thought it may not be the content but the quantity or specific combination of data that results in problems) - use DataRescue to transfer the data to yet another newly formatted drive to expose any missed hidden files Between each of the above steps I stopped Spotlight (search for anything beginning with md in Activity Monitor - All Processes and quitting it), deleted the .Spotlight-V100 directory from the affected drive. Restart Splotlight indexing by adding drive to Spotlight privacy list and removing it. In each case the same issue occurs - Spotlight begins indexing normally (or so it seems), then the index estimated time increases, usually to 4 hours remaining. This is where it gets stuck and continues to predict 4 hours remaining but never finishes. Sometimes I can't eject the drive and have to quit the md.. processes from Activity Monitor to be able to eject the drive without Force Eject. Once I disconnect the drive after the 4 hours remaining situation - if I reattach it, Spotlight forever estimates remaining time and never gets going again. So there it is. It is apparently not a filesystem issue, not a permissions issue and not tied to any particular piece of hardware or protocol (used USB and FW drives). I have tried this on several machines (3 to be precise) and in 10.5.8 and 10.6.5. Simply disabling Spotlight on this volume is not an option because the owner has no clue where things are as the data on the volume dates back to music projects and compositions from 2003 and before. He needs to be able to query for results. Anyone got any ideas? ---update 2-6-11 Since I have not received any responses except the one below which appears to misunderstand my point, I am updating this post hoping to get more responses. I have used the terminal command sudo opensnoop -p PID where PID is the mdworker process ID to try and determine what Spotlight is doing and hopefully find the files it's having trouble with. Here's what happens: After indexing for a few hours, mdworker is gone. It no longer shows up in Activity Monitor under "All Processes" and the Terminal window with the opensnoop result stops moving. I then proceeded to execute the same command on mds to see what it was doing and here's what I get, repeatedly: 501 57 mds 21 / 501 57 mds 21 /Volumes/Sno Leppard 501 57 mds 21 /Volumes/Tiger 501 57 mds 21 /Volumes/Leppard 501 57 mds 21 /Volumes/Disk Warrior 501 57 mds 21 /Volumes/ONM Data These represent all the volumes currently mounted in the system. All except ONM Data, which is the one I am trying to index, are excluded from SPotlight indexing at the moment. The sequence above repeats over and over, with slight variation, sometimes skipping one of the volumes. Questions - what happened to mdworker? What is mds doing? I will let this run until tomorrow morning and throughout the day and monitor for any changes. Any input would be very much appreciated. Even if you're not sure what the ultimate answer is, please alert me to anything you think I may be missing. Hopefully at some point we will figure this out... Thanks, M __final edit__ I finally resolved the issue and here is how I did it. I used the terminal command "sudo opensnoop -p PID" where the PID is the process id of the processes I was monitoring. I was looking at all instances of mds and mdworker running in the system. After the third time through indexing the same data set (see info above), I contacted Apple and got to their highest level of support - they were flabbergasted as well. They advised me to install yet another default 10.6.6 system and try again. The same pattern repeated - mds and mdworker(s) would start indexing and eventually the spotlight icon would say 6 hours remaining and all mdworkers were gone, mds at 90% or so of CPU. But I did finally figure out that the first time mdworker stopped like that, the last file it touched was always in the same folder. I excluded that folder from spotlight search and the rest of the data set indexed within about 2 hours with no strange behavior or failures. I copied that folder to another machine and Spotlight barfed immediately. Exclude that folder and all is well again. I have no clue what is causing this behavior, still, but I did find a functional solution to the problem. Anyone with a similar problem - run opensnoop on all instances of mds and mdworker and wait patiently for wdworker to exit. Look at the last file it touched and exclude the enclosing folder from being indexed. I was able to repeat the issue and solution on 2 different installs and 2 different copies of the data set. Hope this helps. If we find an actual cause of the folder being such a problem (it is called MICHAEL BRECKER RECORD SOLOS and contains almost 1 GB of audio related files - performer, live, SD2 - things like that), I will edit again to let you all know. Thanks for ay attempts to help, M

Read the article
Unicast traffic between hosts on a switch leaving the switch by its uplink. Why?

- by Rich Lafferty

I have a weird thing happening on our network at my office which I can't quite get my head around. In particular I can't tell if it's a problem with a switch, or a problem with configuration. We have a Cisco SG300-52 switch (sw01) in the top of a rack in our server room, connected to another SG300-28 that acts as our core switch (core01). Both run layer 2 only, our firewalls do routing between VLANs. They have a dozen or so VLANs between them. Gi1 on sw01 is a trunk port connected to gi1 on core01. (Disclosure: There are other switches in our environment but I'm pretty sure I've isolated the problem down to these two. Happy to provide more info if necessary.) The behaviour I'm seeing is limited to one VLAN, vlan 12 -- or, at least, it's not happening on the other ones I checked (It's hard to guarantee the absence of packets), and it is: sw01 is forwarding, to core01, traffic which is between two hosts which are both plugged into sw01. (I noticed this because the IDS in our firewall gave a false positive on traffic which should not reach the firewall.) We noticed this mostly between our two dhcp/dns servers, net01 (10.12.0.10) and net02 (10.12.0.11). net01 is physical hardware and net02 is on a VMware ESX server. net01 is connected to gi44 on sw01 and net02's ESX server to gi11. [net01]----gi44-[sw01]-gi1----gi1-[core01] [net02]----gi11/ Let's see some interfaces! Remember, vlan 12 is the problem vlan. Of the others I explicitly verified that vlan 27 was not affected. Here's the two hosts' ports: esx01 contains net02. sw01#sh run int gi11 interface gigabitethernet11 description esx01 lldp med disable switchport trunk allowed vlan add 5-7,11-13,100 switchport trunk native vlan 27 ! sw01#sh run int gi44 interface gigabitethernet44 description net01-1 lldp med disable switchport mode access switchport access vlan 12 ! Here's the trunk on sw01. sw01#sh run int gi1 interface gigabitethernet1 description "trunk to core01" lldp med disable switchport trunk allowed vlan add 4-7,11-13,27,100 ! And the other end of the trunk on core01. interface gigabitethernet1 description sw01 macro description switch switchport trunk allowed vlan add 2-7,11-16,27,100 ! I have a monitor port on core01, thus: core01#sh run int gi12 interface gigabitethernet12 description "monitor port" port monitor GigabitEthernet 1 ! And the monitor port on core01 sees unicast traffic going between net01 and net02, both of which are on sw01! I've verified this with a monitor port on sw01 that sees the net01-net02 unicast traffic leaving via gi1 too. sw01 knows that both of those hosts are on ports that are not its trunk port: :) ratchet$ arp -a | grep net net02.2ndsiteinc.com (10.12.0.11) at 00:0C:29:1A:66:15 [ether] on eth0 net01.2ndsiteinc.com (10.12.0.10) at 00:11:43:D8:9F:94 [ether] on eth0 sw01#sh mac addr addr 00:0C:29:1A:66:15 Aging time is 300 sec Vlan Mac Address Port Type -------- --------------------- ---------- ---------- 12 00:0c:29:1a:66:15 gi11 dynamic sw01#sh mac addr addr 00:11:43:D8:9F:94 Aging time is 300 sec Vlan Mac Address Port Type -------- --------------------- ---------- ---------- 12 00:11:43:d8:9f:94 gi44 dynamic I also brought up an unused port on sw01 on vlan 12, but the unicast traffic was (as best as I could tell) not coming out that port. So it doesn't look like sw01 is pushing it out all its ports, just the right ports and also gi1! I've verified that sw01 is not filling up its address-table: sw01#sh mac addr count This may take some time. Capacity : 8192 Free : 7983 Used : 208 The full configs for both core01 and sw01 are available: core01, sw01. Finally, versions: sw01#sh ver SW version 1.1.2.0 ( date 12-Nov-2011 time 23:34:26 ) Boot version 1.0.0.4 ( date 08-Apr-2010 time 16:37:57 ) HW version V01 core01#sh ver SW version 1.1.2.0 ( date 12-Nov-2011 time 23:34:26 ) Boot version 1.1.0.6 ( date 11-May-2011 time 18:31:00 ) HW version V01 So my understanding is this: sw01 should take unicast traffic for net01 and send it only out net02's port, and vice versa; none of it should go out sw01's uplink. But core01, receiving traffic on gi1 for a host it knows is on gi1, is right in sending it out all of its ports. (That is: sw01 is misbehaving, but core01 is doing what it should given the circumstances.) My question is: Why is sw01 sending that unicast traffic out its uplink, gi1? (And pre-emptively: yes, I know SG300s leave much to be desired, and yes, we should have spanning-tree enabled, but that's where I'm at right now.)

Read the article
Windows 7 Machine Makes Router Drop -All- Wireless Connections [closed]

- by Hammer Bro.

Note: I accidentally originally posted this question over at SuperUser, and I still think the issue is caused by some low-level networking practice of Windows 7, but I think the expertise here would be more apt to figuring it out. Apologies for the cross-post. Some background: My home network consists of my Desktop, a two-month old Windows 7 (x64) machine which is online most frequently (N-spec), as well as three other Windows XP laptops (all G) that only connect every now and then (one for work, one for Netflix, and the other for infrequent regular laptop uses). I used to have a Belkin F5D8236-4 wireless router, and everything worked great. A week ago, however, I found out that the Belkin absolutely in no way would establish a VPN connection, something that has become important for work. So I bought a Netgear WNR3500v2/U/L. The wireless was acting a little sketchy at first for just the Windows 7 machine, but I thought it had something to do with 802.11N and I was in a hurry so I just fished up an ethernet cable and disabled the computer's wireless. It has now become apparent, though, that whenever the Windows 7 machine is connected to the router, all wireless connections become unstable. I was using my work laptop for a solid six hours today with no trouble, having multiple SSH connections open over VPN and streaming internet radio in the background. Then, within two minutes of turning on this Windows 7 box, I had lost all connectivity over the wireless. And I was two feet away from the router. The same sort of thing happens on all of the other laptops -- Netflix can be playing stuff all weekend, but if I come up here and do things on this (W7) computer, the streaming will be dead within ten minutes. So here are my basic observations: If the Windows 7 machine is off, then all connections will have a Signal Strength of Very Good or Excellent and a Speed of 48-54 Mbps for an indefinite amount of time. Shortly after the Windows 7 machine is turned on, all wireless connections will experience a consistent decline in Speed down to 1.0 Mbps, eventually losing their connection entirely. These machines will continue to maintain 70% signal strength, as observed by themselves and router. Once dropped, a wireless connection will have difficulty reconnecting. And, if a connection manages to become established, it will quickly drop off again. The Windows 7 machine itself will continue to function just fine if it's using a wired connection, although it will experience these same issues over the wireless. All of the drivers and firmwares are up to date, and this happened both with the stock Netgear firmware as well as the (current) DD-WRT. What I've tried: Making sure each computer is being assigned a distinct IP. (They are.) Disabling UPnP and Stateful Packet Inspection on the router. Disabling Network Sharing, SSDP Discovery, TCP/IP NetBios Helper and Computer Browser services on the Windows 7 machine. Disabling QoS Packet Scheduler, IPv6, and Link Layer Topology Discovery options on my ethernet controller (leaving only Client for Microsoft Networks, File and Printer Sharing, and IPv4 enabled). What I think: It seems awfully similar to the problems discussed in detail at http://social.msdn.microsoft.com/Forums/en/wsk/thread/1064e397-9d9b-4ae2-bc8e-c8798e591915 (which was both the most relevant and concrete information I could dig up on the internet). I still think that something the Windows 7 IP stack (or just Operating System itself) is doing is giving the router fits. However, I could be wrong, because I have two key differences. One is that most instances of this problem are reported as the entire router dying or restarting, and mine still works just fine over the wired connection. The other is that it's a new router, tested with both the factory firmware and the (I assume) well-maintained DD-WRT project. Even if Windows 7 is still secretly sending IPv6 packets or the TCP Window Scaling implementation that I hear Vista caused some trouble with (even though I've tried my best to disable anything fancy), this router should support those functions. I don't want to get a new or a replacement router unless someone can convince me that this is a defective unit. But the problem seems too specific and predictable by my instincts to be a hardware hiccup. And I don't want to deal with the inevitable problems that always seem to take half a day to resolve when getting a new router, since I'm frantically working (including tomorrow) to complete a project by next week's deadline. Plus, I think in the worst case scenario, I could keep this router connected directly to the modem, disable its wireless entirely, and connect the old Belkin to it directly. That should allow me to still use VPN (although I'll have to plug my work laptop directly into that router), and then maintain wireless connections for all of the other computers. But that feels so wrong to me. Anyone have any ideas what the cause and possible solution could be? Clarifications: The Windows 7 machine is directly connected via an ethernet cable to the router for everything above. But while it is online, all other computers' wireless connections become unusable. It is not an issue of signal strength or interference -- no other devices within scanning range are using Channel 1, and the problem will affect computers that are literally feet away from the router with 95% signal strength.

Read the article
How do I make Linux recognize a new SATA /dev/sda drive I hot swapped in without rebooting?

- by Philip Durbin

Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized: [root@fs-2 ~]# tail -18 /var/log/messages May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake } May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset May 5 16:54:45 fs-2 kernel: ata1: soft resetting link May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:54:55 fs-2 kernel: ata1: soft resetting link May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:05 fs-2 kernel: ata1: soft resetting link May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps May 5 16:55:40 fs-2 kernel: ata1: soft resetting link May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up May 5 16:55:45 fs-2 kernel: ata1: EH complete I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work: [root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan -bash: echo: write error: Invalid argument [root@fs-2 ~]# [root@fs-2 ~]# /root/rescan-scsi-bus.sh -l [snip] 0 new device(s) found. 0 device(s) removed. [root@fs-2 ~]# [root@fs-2 ~]# ls /dev/sda ls: /dev/sda: No such file or directory I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting? The operating system in question is RHEL5.3: [root@fs-2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.3 (Tikanga) The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS. Here is the lscpi output: [root@fs-2 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful. The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output: [root@www-1 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)

Read the article
Ubuntu, No wireless networks found after correctly installed madwifi

- by Peter

Hi, I just installed madwifi on my MSI laptop with an Atheros AR5001 wifi card & Lucid. As far as I can see and according to System - Administration - Hardware drivers the install was successful and the card + driver is up and running. However, I don't see any wireless network (my windows PC can see about 5 wireless networks). I tried it with the network manager applet as well as with wicd. If I try to connect to "Hidden Wireless Network" via nm-applet, it will start to connect for a while but is unable too (although I supply it with the correct WEP settings & key) So, I'm unable to use my wireless network. What am i doing wrong? Some information about my system: iwconfig lo no wireless extensions. eth0 no wireless extensions. wifi0 no wireless extensions. ath0 IEEE 802.11g ESSID:"" Mode:Managed Frequency:2.437 GHz Access Point: Not-Associated Bit Rate:0 kb/s Tx-Power:17 dBm Sensitivity=1/1 Retry:off RTS thr:off Fragment thr:off Power Management:off Link Quality=0/70 Signal level=-96 dBm Noise level=-96 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:0 Invalid misc:0 Missed beacon:0 pan0 no wireless extensions. ifconfig ath0 Link encap:Ethernet HWaddr 00:15:af:cf:e2:ca inet6 addr: fe80::215:afff:fecf:e2ca/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) eth0 Link encap:Ethernet HWaddr 00:21:85:4d:82:78 inet addr:192.168.2.101 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::221:85ff:fe4d:8278/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3800 errors:0 dropped:0 overruns:0 frame:0 TX packets:2944 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3940261 (3.9 MB) TX bytes:525218 (525.2 KB) Interrupt:27 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:12 errors:0 dropped:0 overruns:0 frame:0 TX packets:12 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:720 (720.0 B) TX bytes:720 (720.0 B) wifi0 Link encap:UNSPEC HWaddr 00-15-AF-CF-E2-CA-00-00-00-00-00-00-00-00-00-00 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:3497 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:280 RX bytes:0 (0.0 B) TX bytes:179947 (179.9 KB) Interrupt:16 lshw -C network *-network description: Wireless interface product: AR5001 Wireless Network Adapter vendor: Atheros Communications Inc. physical id: 0 bus info: pci@0000:02:00.0 logical name: wifi0 version: 01 serial: 00:15:af:cf:e2:ca width: 64 bits clock: 33MHz capabilities: pm msi pciexpress msix bus_master cap_list logical ethernet physical wireless configuration: broadcast=yes driver=ath_pci latency=0 multicast=yes wireless=IEEE 802.11g resources: irq:16 memory:fd7f0000-fd7fffff *-network description: Ethernet interface product: RTL8111/8168B PCI Express Gigabit Ethernet controller vendor: Realtek Semiconductor Co., Ltd. physical id: 0 bus info: pci@0000:05:00.0 logical name: eth0 version: 01 serial: 00:21:85:4d:82:78 size: 100MB/s capacity: 1GB/s width: 64 bits clock: 33MHz capabilities: pm vpd msi pciexpress bus_master cap_list rom ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=2.3LK-NAPI duplex=full ip=192.168.2.101 latency=0 link=yes multicast=yes port=MII speed=100MB/s resources: irq:27 ioport:c800(size=256) memory:fe2ff000-fe2fffff memory:fe2c0000-fe2dffff(prefetchable) lspci 00:00.0 Host bridge: ATI Technologies Inc RS690 Host Bridge 00:01.0 PCI bridge: ATI Technologies Inc RS690 PCI to PCI Bridge (Internal gfx) 00:04.0 PCI bridge: ATI Technologies Inc Device 7914 00:06.0 PCI bridge: ATI Technologies Inc RS690 PCI to PCI Bridge (PCI Express Port 2) 00:07.0 PCI bridge: ATI Technologies Inc RS690 PCI to PCI Bridge (PCI Express Port 3) 00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA 00:13.0 USB Controller: ATI Technologies Inc SB600 USB (OHCI0) 00:13.1 USB Controller: ATI Technologies Inc SB600 USB (OHCI1) 00:13.2 USB Controller: ATI Technologies Inc SB600 USB (OHCI2) 00:13.3 USB Controller: ATI Technologies Inc SB600 USB (OHCI3) 00:13.4 USB Controller: ATI Technologies Inc SB600 USB (OHCI4) 00:13.5 USB Controller: ATI Technologies Inc SB600 USB Controller (EHCI) 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 14) 00:14.1 IDE interface: ATI Technologies Inc SB600 IDE 00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) 00:14.3 ISA bridge: ATI Technologies Inc SB600 PCI to LPC Bridge 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:05.0 VGA compatible controller: ATI Technologies Inc RS690M [Radeon X1200 Series] 01:05.2 Audio device: ATI Technologies Inc Radeon X1200 Series Audio Controller 02:00.0 Ethernet controller: Atheros Communications Inc. AR5001 Wireless Network Adapter (rev 01) 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01) 06:04.0 CardBus bridge: O2 Micro, Inc. OZ711MP1/MS1 MemoryCardBus Controller (rev 21) 06:04.2 SD Host controller: O2 Micro, Inc. Integrated MMC/SD Controller (rev 01) 06:04.3 Bridge: O2 Micro, Inc. Integrated MS/xD Controller (rev 01) 06:04.4 FireWire (IEEE 1394): O2 Micro, Inc. Firewire (IEEE 1394) (rev 02) less /proc/modules | grep ath ath_rate_sample 11476 1 - Live 0xf812b000 ath_pci 193197 0 - Live 0xf85c3000 wlan 222892 5 wlan_wep,wlan_scan_sta,ath_rate_sample,ath_pci, Live 0xf8537000 ath_hal 398604 3 ath_rate_sample,ath_pci, Live 0xf8480000 I've been at this for hours now, also tried ndiswrapper and ath5k drivers with no luck, and really could use some help. Cheers.

Read the article
Can't configure PAM + LDAP on Debian Lenny - Getting error=49 on server logs

- by Jorge Suárez de Lis

I've been migrating some servers and desktops using Ubuntu 10.04 from getting the users from an old OpenLDAP implementation to a newer Centos Active Directory. I haven't had any problems so far, until I reached a Debian Lenny server. I've set up the server as the others, setting /etc/ldap.conf and /etc/ldap/ldap.conf. However, when I issue "getent passwd", I get nothing from the LDAP server. Reading the pam_ldap manpage, I realized that /etc/ldap.conf was not an accepted file by pam_ldap -it worked with Ubuntu though-, so I renamed it to /etc/pam_ldap.conf. Same result. However, once I've changed the name of this file, when I login using SSH I get this on the LDAP server logs: [20/Jul/2012:11:19:40 +0200] conn=16501 fd=155 slot=155 connection from x.x.x.50 to 10.1.176.237 [20/Jul/2012:11:19:40 +0200] conn=16501 op=0 BIND dn="uid=ubuntu,ou=Applications,ou=CITIUS,dc=inv,dc=usc,dc=es" method=128 version=3 [20/Jul/2012:11:19:40 +0200] conn=16501 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=ubuntu,ou=applications,ou=citius,dc=inv,dc=usc,dc=es" [20/Jul/2012:11:19:40 +0200] conn=16501 op=1 SRCH base="ou=People,ou=CITIUS,dc=inv,dc=usc,dc=es" scope=2 filter="(uid=jorge.suarez)" attrs=ALL [20/Jul/2012:11:19:40 +0200] conn=16501 op=1 RESULT err=0 tag=101 nentries=1 etime=0 notes=U [20/Jul/2012:11:19:40 +0200] conn=16501 op=2 BIND dn="uid=jorge.suarez,ou=People,ou=CITIUS,dc=inv,dc=usc,dc=es" method=128 version=3 [20/Jul/2012:11:19:40 +0200] conn=16501 op=2 RESULT err=49 tag=97 nentries=0 etime=0 The password isn't working. I don't know that could be wrong, anything else seems to be OK. That user/password is working from another clients: [20/Jul/2012:11:29:39 +0200] conn=16528 fd=188 slot=188 connection from x.x.x.224 to 10.1.176.237 [20/Jul/2012:11:29:39 +0200] conn=16528 op=0 BIND dn="uid=ubuntu,ou=Applications,ou=CITIUS,dc=inv,dc=usc,dc=es" method=128 version=3 [20/Jul/2012:11:29:39 +0200] conn=16528 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=ubuntu,ou=applications,ou=citius,dc=inv,dc=usc,dc=es" [20/Jul/2012:11:29:39 +0200] conn=16528 op=1 SRCH base="ou=People,ou=CITIUS,dc=inv,dc=usc,dc=es" scope=2 filter="(uid=jorge.suarez)" attrs=ALL [20/Jul/2012:11:29:39 +0200] conn=16528 op=1 RESULT err=0 tag=101 nentries=1 etime=0 notes=U [20/Jul/2012:11:29:39 +0200] conn=16528 op=2 BIND dn="uid=jorge.suarez,ou=People,ou=CITIUS,dc=inv,dc=usc,dc=es" method=128 version=3 [20/Jul/2012:11:29:39 +0200] conn=16528 op=2 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=jorge.suarez,ou=people,ou=citius,dc=inv,dc=usc,dc=es" I'm using SSHA for storing passwords on the LDAP server. Maybe this is not supported by Debian Lenny? On pam_ldap.conf, I've set up this, as in all the other servers: # Do not hash the password at all; presume # the directory server will do it, if # necessary. This is the default. pam_password md5 Also tried clear, but it didn't work. Anyways, it's weird that issuing getent passwd still gets me no users. However, if I use pamtest from the package libpam-dotfile to test login, it works. # pamtest ssh jorge.suarez Trying to authenticate <jorge.suarez> for service <ssh>. Password: Authentication successful. # pamtest foo jorge.suarez Trying to authenticate <jorge.suarez> for service <foo>. Password: Authentication successful. But "su" won't work also: # su jorge.suarez Id. descoñecido: jorge.suarez Just the output from getent passwd : # getent passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/bin/sh bin:x:2:2:bin:/bin:/bin/sh sys:x:3:3:sys:/dev:/bin/sh sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/bin/sh man:x:6:12:man:/var/cache/man:/bin/sh lp:x:7:7:lp:/var/spool/lpd:/bin/sh mail:x:8:8:mail:/var/mail:/bin/sh news:x:9:9:news:/var/spool/news:/bin/sh uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh proxy:x:13:13:proxy:/bin:/bin/sh www-data:x:33:33:www-data:/var/www:/bin/sh backup:x:34:34:backup:/var/backups:/bin/sh list:x:38:38:Mailing List Manager:/var/list:/bin/sh irc:x:39:39:ircd:/var/run/ircd:/bin/sh gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh nobody:x:65534:65534:nobody:/nonexistent:/bin/sh libuuid:x:100:101::/var/lib/libuuid:/bin/sh Debian-exim:x:101:103::/var/spool/exim4:/bin/false statd:x:102:65534::/var/lib/nfs:/bin/false sshd:x:104:65534::/var/run/sshd:/usr/sbin/nologin luser:x:1000:1000:Usuario local de Burdeos,,,:/home/luser:/bin/bash messagebus:x:105:107::/var/run/dbus:/bin/false sge-admin:x:1001:1001:Administrador do SGE,,,:/home/cluster/sge-admin:/bin/bash ntp:x:107:110::/home/ntp:/bin/false haldaemon:x:108:111:Hardware abstraction layer,,,:/var/run/hald:/bin/false vde2-net:x:109:114::/var/run/vde2:/bin/false uml-net:x:110:115::/home/uml-net:/bin/false polkituser:x:111:116:PolicyKit,,,:/var/run/PolicyKit:/bin/false Debian-pxe:x:113:65534:Dummy user for Debian pxe package,,,:/home/Debian-pxe:/bin/false Nscd was stopped from the beginning.

Read the article
Wireless internet connection connects but internet does not work (no packets received). Wired does.

- by Rodney

When I connect my PC via ethernet cable to my ADSL router it works fine. When I connect via Wireless it connects and the internet will work for a random amount of time and then stop working. It stays connected with a strong signal but no packets are received. My laptop/iphone are right next to it and wireless works fine. If I open the Wireless USB status, it says it is connected to my SSID with full strength (54 mps - I am 3 meteres away from my router) and the activty shows as Packets 594 SENT and 105 RECEIVED (this goes up VERY slowly) I have tried the following: Turned off anitvirus and firewall completely. Tested the wifi signal- I am writing this on my laptop which is next to my PC and also has full wifi strength. Tried a different wireless adapter - I dug out an old PCI wireless card - it does the exact same thing. Compared all wireless settings to my laptop. I can ping google.com and it replies (sometimes with packet loss) When I reboot the PC it will connect for a minute or two (random time) and then just stops again. I tried Firefox, IE etc. no joy I have updated all latest versions (Netgear WG111v2) and drivers Checked Event Log - nothing unusual Ping the router (and even connect as admin for the few minutes when the internet does work) Changed the MTU down to 1200 using DrTCP Checked Device Manager for conflicts - none. I ping the router from the PC (192.168.0.10 - 192.168.0.1) and it replies with 4 packets. BUT, on my router admin page (which I access via http on my laptop wirelessly) - if I ping 192.168.0.10 all packets timeout (pinging my laptop 192.168.0.12 works fine) My router admin page shows the leased IP address for 192.168.0.10 (ie it is definitely talking to the router initially) Now I am out of ideas - please help. I think it is an OS/Software issue as I have tried 2 different wireless adapaters (PCI and USB) with the same result but all other wireless devices work fine around mine). It's not the firewall. It is getting assigned an IP address correctly (my PC gets 192.168.0.10, my laptop is .12) It is assigned by DHCP. As soon as I plug in the ethernet cable it all works fine. Repairing the adapter sometimes helps but it will always stop working after a random time. The wireless adapter always shows as connected with Excellent signal but the internet does not work. I am running Windows XP SP3 and have tried a Netgear WG111v2 USB adapter. Thanks in advance! UPDATE: The internet seems to be working, it is just either sending packets too small or slow to work (some small pages load bits of them very slowly but then hang). XP seems to have a networking diagnostic app - here is the output: Last diagnostic run time: 08/30/10 08:16:38 IP Configuration Diagnostic Invalid IP address info Valid IP address detected: 192.168.0.10 IP Layer Diagnostic Corrupted IP routing table info The default route is valid info The loopback route is valid info The local host route is valid info The local subnet route is valid Invalid ARP cache entries action The ARP cache has been flushed Gateway Diagnostic Gateway info The following proxy configuration is being used by IE: Automatically Detect Settings:Disabled Automatic Configuration Script: Proxy Server: Proxy Bypass list: info This computer has the following default gateway entry(ies): 192.168.0.1 info This computer has the following IP address(es): 192.168.0.10 info The default gateway is in the same subnet as this computer info The default gateway entry is a valid unicast address info The default gateway address was resolved via ARP in 1 try(ies) info The default gateway was reached via ICMP Ping in 1 try(ies) info TCP port 80 on host 65.55.12.249 was successfully reached info The Internet host www.microsoft.com was successfully reached info The default gateway is OK DNS Client Diagnostic DNS - Not a home user scenario info Using Web Proxy: no info Resolving name ok for (www.microsoft.com): yes No DNS servers DNS failure HTTP, HTTPS, FTP Diagnostic HTTP, HTTPS, FTP connectivity info FTP (Passive): Successfully connected to ftp.microsoft.com. info HTTP: Successfully connected to www.microsoft.com. warn HTTPS: Error 12002 connecting to www.microsoft.com: The operation timed out warn HTTPS: Error 12002 connecting to www.passport.net: The operation timed out error Could not make an HTTPS connection. info Redirecting user to support call WinSock Diagnostic WinSock status info All base service provider entries are present in the Winsock catalog. info The Winsock Service provider chains are valid. info Provider entry MSAFD Tcpip [TCP/IP] passed the loopback communication test. info Provider entry MSAFD Tcpip [UDP/IP] passed the loopback communication test. info Provider entry RSVP UDP Service Provider passed the loopback communication test. info Provider entry RSVP TCP Service Provider passed the loopback communication test. info Connectivity is valid for all Winsock service providers. Wireless Diagnostic Wireless - Service disabled Wireless - User SSID action User input required: Specify network name or SSID Wireless - First time setup info The Wireless Network name (SSID) to which the user would like to connect = RodSof Wifi. Wireless - Radio off info Valid IP address detected: 192.168.0.10 Wireless - Out of range Wireless - Hardware issue Wireless - Novice user Wireless - Ad-hoc network Wireless - Less preferred Wireless - 802.1x enabled Wireless - Configuration mismatch Wireless - Low SNR Network Adapter Diagnostic Network location detection info Using home Internet connection Network adapter identification info Network connection: Name=Local Area Connection 2, Device=Realtek RTL8168C(P)/8111C(P) PCI-E Gigabit Ethernet NIC, MediaType=LAN, SubMediaType=LAN info Network connection: Name=Wireless USB, Device=NETGEAR WG111v2 54Mbps Wireless USB 2.0 Adapter, MediaType=LAN, SubMediaType=WIRELESS info Both Ethernet and Wireless connections available, prompting user for selection action User input required: Select network connection info Wireless connection selected Network adapter status info Network connection status: Connected HTTP, HTTPS, FTP Diagnostic HTTP, HTTPS, FTP connectivity info FTP (Active): Successfully connected to ftp.microsoft.com. warn HTTP: Error 12007 connecting to www.microsoft.com: The server name or address could not be resolved warn HTTP: Error 12002 connecting to www.hotmail.com: The operation timed out warn HTTPS: Error 12002 connecting to www.passport.net: The operation timed out warn HTTPS: Error 12002 connecting to www.microsoft.com: The operation timed out error Could not make an HTTP connection. error Could not make an HTTPS connection.

Read the article
Weighted round robins via TTL - possible?

- by Joe Hopfgartner

I currently use DNS round robin for load balancing, which works great. The records look like this (I have a ttl of 120 seconds) ;; ANSWER SECTION: orion.2x.to. 116 IN A 80.237.201.41 orion.2x.to. 116 IN A 87.230.54.12 orion.2x.to. 116 IN A 87.230.100.10 orion.2x.to. 116 IN A 87.230.51.65 I learned that not every ISP / device treats such a response the same way. For example some DNS servers rotate the addresses randomly or always cycle them through. Some just propagate the first entry, others try to determine which is best (regionally near) by looking at the ip address. However if the userbase is big enough (spreads over multiple ISPs etc) it balances pretty well. The discrepancies from highest to lowest loaded server hardly every exceeds 15%. However now I have the problem that I am introducing more servers into the systems, that not all have the same capacities. I currently only have 1gbps servers, but I want to work with 100mbit and also 10gbps servers too. So what I want is I want to introduce a server with 10 GBps with a weight of 100, a 1 gbps server with a weight of 10 and a 100 mbit server with a weight of 1. I used to add servers twice to bring more traffic to them (which worked nice. the bandwidth doubled almost.) But adding a 10gbit server 100 times to DNS is a bit rediculous. So I thought about using the TTL. If I give server A 240 seconds ttl and server B only 120 seconds (which is about about the minimum to use for round robin, as a lot of dns servers set to 120 if a lower ttl is specified.. so i have heard) I think something like this should occour in an ideal scenario: first 120 seconds 50% of requests get server A -> keep it for 240 seconds. 50% of requests get server B -> keep it for 120 seconds second 120 seconds 50% of requests still have server A cached -> keep it for another 120 seconds. 25% of requests get server A -> keep it for 240 seconds 25% of requests get server B -> keep it for 120 seconds third 120 seconds 25% will get server A (from the 50% of Server A that now expired) -> cache 240 sec 25% will get server B (from the 50% of Server A that now expired) -> cache 120 sec 25% will have server A cached for another 120 seconds 12.5% will get server B (from the 25% of server B that now expired) -> cache 120sec 12.5% will get server A (from the 25% of server B that now expired) -> cache 240 sec fourth 120 seconds 25% will have server A cached -> cache for another 120 secs 12.5% will get server A (from the 25% of b that now expired) -> cache 240 secs 12.5% will get server B (from the 25% of b that now expired) -> cache 120 secs 12.5% will get server A (from the 25% of a that now expired) -> cache 240 secs 12.5% will get server B (from the 25% of a that now expired) -> cache 120 secs 6.25% will get server A (from the 12.5% of b that now expired) -> cache 240 secs 6.25% will get server B (from the 12.5% of b that now expired) -> cache 120 secs 12.5% will have server A cached -> cache another 120 secs ... i think i lost something at this point but i think you get the idea.... As you can see this gets pretty complicated to predict and it will for sure not work out like this in practice. But it should definitely have an effect on the distribution! I know that weighted round robin exists and is just controlled by the root server. It just cycles through dns records when responding and returns dns records with a set propability that corresponds to the weighting. My DNS server does not support this, and my requirements are not that precise. If it doesnt weight perfectly its okay, but it should go into the right direction. I think using the TTL field could be a more elegant and easier solution - and it deosnt require a dns server that controls this dynamically, which saves resources - which is in my opinion the whole point of dns load balancing vs hardware load balancers. My question now is... are there any best prectices / methos / rules of thumb to weight round robin distribution using the TTL attribute of DNS records? Edit: The system is a forward proxy server system. The amount of Bandwidth (not requests) exceeds what one single server with ethernet can handle. So I need a balancing solution that distributes the bandwidth to several servers. Are there any alternative methods than using DNS? Of course I can use a load balancer with fibre channel etc, but the costs are rediciulous and it also increases only the width of the bottleneck and does not eliminate it. The only thing i can think of are anycast (is it anycast or multicast?) ip addresses, but I don't have the means to set up such a system.

Read the article
Mount TMPFS instead of ro /dev

- by schiggn

I am working on a ARM-Based embedded system with a custom Debian Linux based on kernel 2.6.31. In the final system, the Root file system is stored as squashfs on flash. Now, the folder /dev is created by udev, but since there is no hot plugging functionality needed and booting time is critical, I wanted to delete udev and "hard code" the /dev folder (read here, page 5). because i still need to change parameters of the devices (with ioctl /sysfs) this does not work for me in this case. so i thought of mounting a tmpfs on /dev and change the parameters there. is this possible? and how to do best? my approach would be: delete /dev from RFS create tar containing basic devices mount tmpfs /dev untar tar-file into /dev change parameters Could this work? Do you see any problems? I found out, that you can mount on top of already mounted mount point, is it somehow possible just to take data with while mounting the new file system? if so that would be very convenient! Thanks Update: I just tried that out, but I'm stuck at a certain point. I packed all my devices into devices.tar, packed it into /usr of my squashfs and added the following lines to mountkernfs.sh, which is executed right after INIT. #mount /dev on tmpfs echo -n "Mounting /dev on tmpfs..." mount -o size=5M,mode=0755 -t tmpfs tmpfs /dev mknod -m 600 /dev/console c 5 1 mknod -m 600 /dev/null c 1 3 echo "done." echo -n "Populating /dev..." tar -xf /usr/devices.tar -C /dev echo "done." This works fine on the version over NFS, if I place printf's in the code, I can see it executing, if I comment out the extracting part, its complaining about missing devices. Booting OK mmc0: new high speed SDHC card at address 0007 mmcblk0: mmc0:0007 SD04G 3.67 GiB mmcblk0: p1 IP-Config: Unable to set interface netmask (-22). Looking up port of RPC 100003/2 on 192.168.1.234 Looking up port of RPC 100005/1 on 192.168.1.234 VFS: Mounted root (nfs filesystem) on device 0:14. Freeing init memory: 136K INIT: version 2.86 booting Mounting /dev on tmpfs...done. Populating /dev...done. Initializing /var...done. Setting the system clock. System Clock set to: Thu Sep 13 11:26:23 UTC 2012. INIT: Entering runlevel: 2 UBI: attaching mtd8 to ubi0 Commenting out the extraction of the tar mmc0: new high speed SDHC card at address 0007 mmcblk0: mmc0:0007 SD04G 3.67 GiB mmcblk0: p1 IP-Config: Unable to set interface netmask (-22). Looking up port of RPC 100003/2 on 192.168.1.234 Looking up port of RPC 100005/1 on 192.168.1.234 VFS: Mounted root (nfs filesystem) on device 0:14. Freeing init memory: 136K INIT: version 2.86 booting Mounting /dev on tmpfs...done. Populating /dev...done. Initializing /var...done. Setting the system clock. Cannot access the Hardware Clock via any known method. Use the --debug option to see the details of our search for an access method. Unable to set System Clock to: Thu Sep 13 12:24:00 UTC 2012 ... (warning). INIT: Entering runlevel: 2 libubi: error!: cannot open "/dev/ubi_ctrl" So far so good. But if I pack the whole story into a squashfs and boot from there, it is acting strange. It's telling me while booting that it is unable to open an initial console and its throwing errors on mounting the UBIFS devices, but finally provides a login anyway. Over that my echo's are not executed. If I then log in, /dev is mounted as TMPFS as desired and all the devices reside inside. When I redo the "mount" command to mount the UBIFS partitions it is executed whitout problem and useable. From squashfs VFS: Mounted root (squashfs filesystem) readonly on device 31:15. Freeing init memory: 136K Warning: unable to open an initial console. mmc0: new high speed SDHC card at address 0007 mmcblk0: mmc0:0007 SD04G 3.67 GiB mmcblk0: p1 UBIFS error (pid 484): ubifs_get_sb: cannot open "ubi1_0", error -19 Additionally, a part of the rest of the bootscripts is still exexuted, but not all of them. Does anyone has a clue why? Other question, is 5MB enough/too much for /dev?

Read the article
Problem upgrading kernel on debian 3.1

- by exhuma

Hi, I have a quite old box in a remote server farm. So I have no direct access. Only remote SSH (and via SSH to a serial console). I haven't updated this box in ages. Now, whenever I want to install a new package, a dependency to glibc appears. Unfortunately, the install of glibc depends on a 2.6 kernel and I am running a venerable 2.4 kernel (one more reason to upgrade). The problem is, that the install of a new kernel has an indirect (over locales) dependency to glibc. So, to install glibc, I need a new kernel. For a new kernel, I need to upgrade glibc. Essentially I am blocked. What's the best way to proceed considering I have no "hardware" access? Here's a quick transcript of the upgrade process: [green:~]% sudo aptitude install linux-image-686 Reading Package Lists... Done Building Dependency Tree Reading extended state information Initializing package states... Done Reading task descriptions... Done The following packages are unused and will be REMOVED: gcc-4.3-base The following NEW packages will be automatically installed: dash libc6-i686 libparse-recdescent-perl linux-image-2.6-686 linux-image-2.6.18-6-686 module-init-tools yaird The following packages have been kept back: adduser apache2 apache2-mpm-prefork apache2-utils apache2.2-common apt apt-utils aptitude autoconf autotools-dev awstats base-files base-passwd [...snip...] util-linux vacation vim vim-common wamerican wbritish wget whiptail whois wwwconfig-common zlib1g The following NEW packages will be installed: dash libc6-i686 libparse-recdescent-perl linux-image-2.6-686 linux-image-2.6.18-6-686 linux-image-686 module-init-tools yaird The following packages will be upgraded: hotplug libc6 2 packages upgraded, 8 newly installed, 1 to remove and 277 not upgraded. Need to get 0B/22.7MB of archives. After unpacking 52.1MB will be used. Do you want to continue? [Y/n/?] Writing extended state information... Done Preconfiguring packages ... (Reading database ... 34065 files and directories currently installed.) Preparing to replace libc6 2.3.6.ds1-13 (using .../libc6_2.7-18lenny2_i386.deb) ... Checking for services that may need to be restarted... Checking init scripts... WARNING: init script for postgresql not found. [ --- libc6 config screen appears here --- ] WARNING: POSIX threads library NPTL requires kernel version 2.6.8 or later. If you use a kernel 2.4, please upgrade it before installing glibc. The installation of a 2.6 kernel _could_ ask you to install a new libc first, this is NOT a bug, and should *NOT* be reported. In that case, please add etch sources to your /etc/apt/sources.list and run: apt-get install -t etch linux-image-2.6 Then reboot into this new kernel, and proceed with your upgrade dpkg: error processing /var/cache/apt/archives/libc6_2.7-18lenny2_i386.deb (--unpack): subprocess pre-installation script returned error exit status 1 Errors were encountered while processing: /var/cache/apt/archives/libc6_2.7-18lenny2_i386.deb E: Sub-process /usr/bin/dpkg returned an error code (1) Ack! Something bad happened while installing packages. Trying to recover: dpkg: dependency problems prevent configuration of locales: locales depends on glibc-2.7-1; however: Package glibc-2.7-1 is not installed. dpkg: error processing locales (--configure): dependency problems - leaving unconfigured Errors were encountered while processing: locales Reading Package Lists... Done Building Dependency Tree Reading extended state information Initializing package states... Done Reading task descriptions... Done Now, if I follow the instrunctions as promted I get the following. Note that I am using aptitude instead of apt-get to benefit from the better dependency tracking. I did try with apt-get first. But that let me to the same problem. [green:~]% sudo aptitude install -t etch linux-image-2.6.26-2-686 Reading Package Lists... Done Building Dependency Tree Reading extended state information Initializing package states... Done Reading task descriptions... Done E: Unable to correct problems, you have held broken packages. E: Unable to correct dependencies, some packages cannot be installed E: Unable to resolve some dependencies! Some packages had unmet dependencies. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following packages have unmet dependencies: linux-image-2.6.26-2-686: Depends: initramfs-tools (>= 0.55) but it is not installable or yaird (>= 0.0.13) but it is not installable or linux-initramfs-tool which is a virtual package. Any ideas?

Read the article
Deployment Workbench no longer available after PXE boot

- by Patrick

Our build process revolves around windows Deployment Workbench. Unfortunately this was setup by someone who is no longer with the company, and no-one has ever dared/needed to make any changes. The other day it stopped working. It turns out that one of our build guys started thinking about changing some stuff in it, clicked something and now it no longer works (He is saying now that he right clicked on the 'LAB' entry in 'Deployment points' and hit 'Update', which took some time to run through apparently). The job has fallen on me to resolve and frankly I'm not sure what I'm doing. I was wondering if someone with more experience than me can provide some pointers as to troubleshooting cos I'm feeling quite a lot in the dark here. On the server I have Deployment Workbench up and running (MMC snapin) version 3.0. There is a WDS service that appears to be running ok, as does the tFTPd service. Nothing specific to this in event logs. From the client side; PXE boot works and gets you to the Win PE launch, and it has the correct company logo as the background (proving to me that its loading win PE from the network). WPEINIT runs, and asks for domain credentials, here the team simply put User/Pass/Domain in the boxes and click ok. Normally the build would kick off. Instead they get an error message saying that the \NATBLU01\Distribution$ share isn't available. Checking \NATBLU01\Distribution$ shows that its there and accessible over the network. Security/permissions seem ok, even 'ANONYMOUS LOGON' has read access to that share so I don't see that being a problem. Digging the trace files from C:\MININT\SMSOSD\OSDLOGS\ after an attempt to run the build I can see an error saying much the same - <![LOG[Validating connection to \\NATBLU01\Distribution$]LOG]!><time="16:42:14.000+000" date="03-15-2012" component="LiteTouch" context="" type="1" thread="" file="LiteTouch"> <![LOG[FindFile: The file OSDConnectToUNC.exe could not be found in any standard locations.]LOG]!><time="16:42:14.000+000" date="03-15-2012" component="LiteTouch" context="" type="1" thread="" file="LiteTouch"> <![LOG[The network location cannot be reached. For information about network troubleshooting, see Windows Help.]LOG]!><time="16:42:24.000+000" date="03-15-2012" component="LiteTouch" context="" type="3" thread="" file="LiteTouch"> <![LOG[ERROR - Unable to map a network drive to \\NATBLU01\Distribution$.]LOG]!><time="16:42:24.000+000" date="03-15-2012" component="LiteTouch" context="" type="3" thread="" file="LiteTouch"> BDD.LOG shows much the same. Full copies of the .LOG files can both files be found here : BDD.LOG LITETOUCH.LOG I can get to a command prompt from the Win PE that boots from PXE, however there isn't any network stuff there. IPCONFIG returns nothing so none of the tests I would usually run resolve anything. I'm at a loss frankly. I did wonder if I could perhaps start a new build process but if the change to the DeploymentWorkbench has knocked it offline I don't think I'm going to be able to create a new deployment. Failing that; we do have a deployment point labeled type 'Media' which appears to be a DVD ISO image of one of the builds, but its dated 2008, is it possible to export the network build to .ISO and build from DVD? We are looking at new hardware to run this from anyway (for the impending Windows 7 roll out) so a temporary work round isn't going to be too much of a problem. All assistance is appreciated! EDIT : OK. Got it working again. Solution was close to Newmanth's idea. The problem was that our PE image didn't appear to be connecting the network. I had an older copy of the PE boot.WIM on a stick that I had been using for other purposes. I booted that and correctly got a network connection. Showed a correct internal IP and could ping out etc etc. However I was still getting the same errors in all the logs and in when wpeinit was running. What I did seperately was to update the PE image that DeploymentWorkbench was pushing out to display a different back ground. I wanted to prove that I was working in the correct place. Turns out that I wasn't. I went and looked at the other deployment stuff we had on this machine, Windows Deployment Services was installed and although all the install images are off line the boot image was online, so I uploaded the copy from my stick to that. Booted straight off. And fixed. Working. Yay! For anyone stumbling across this in the future you may find that although your deployment images are located in the DeploymentWorkbench, the Win PE boot image you are launching from is located in the associated Windows Deployment Services images.

Read the article
Web browsing is fast, but downloads are slow

- by Ricket

I work for a company on my university's campus, helping with general IT problems and some web development. But lately there has been a problem that has me and my boss completely stumped. We, plus one contractor, make up the entire IT department, so I'm reaching out to you for help. All around the office, we have wall jacks. These collect in a closet down the hall and all plug into a switch. This switch, along with our individual server jacks, plugs into another switch, and that switch plugs into our firewall hardware. Then the firewall is connected out to our campus network. Our campus internet is, well, very fast. I don't know exactly the terms, tiers, etc., but we have thousands of students and downloads can run as fast as 10 MB/s at night; uploads are sometimes even faster. I think we're practically ISP level. In short, I have a lot of faith that it is not the campus side of things that is causing a problem, combined with other evidence I'll mention in a moment. So our symptoms: web browsing is fast. Web pages, images, etc. load instantly. No problems there. But then when I go to download something, the download starts fast but very quickly (a matter of seconds) drops to nearly 0. Often it will actually drop to 0 and time out. This happens with even very small files, 1 MB or less. It smells to me like a QoS sort of thing. I'm not entirely sure, and I wanted to get your opinions first. My boss is hesitant to touch our firewall, much less let me touch it, and it was set up and is managed by a consultant remotely. These problems don't seem tied to a time of the day. I've tried downloads after 5:00 and still the same thing happens. From my desk, I can turn on my wireless adapter and pick up the campus wireless access point. If I unplug ethernet and connect to it, downloads are fast. This adds to my suspicion that it's limited to our company network. Also, a number of weeks ago the consultant upgraded our firewall firmware. Suddenly everything was very fast. I tested with downloads from Sun and speedtest.net and things were blazing fast, as they should be with our campus internet! It was wonderful, and I figured the slow speeds were an old firmware bug. In a matter of days, things steadily declined until they were back to the old symptoms. Oh, and we have antivirus installed on every computer, and we keep it up to date. Though I suppose the possibility is still there that someone could have spyware which is bogging down our internet, in which case what is the easiest/best way to find this out? (maybe this should go in a separate question) Thank you for your patience in reading all of this. Do you have any ideas as to what I can try? Is this something that you've experienced before? What sort of tools or methods can I use to try and diagnose the problem? P.S. everything here is Windows. Windows Server 2003 and 2008 on our servers, and Windows XP on employees' machines. Update: We are submitting a ticket to the university to just take a look and see if they see anything unusual and/or can suggestion methods for us to try and pinpoint our problem. Hopefully they'll be helpful! I'll update this to let you know what goes on. Update again: We found a hub (yes, a HUB) right between our campus connection and our firewall. It had only those two ethernet cables plugged into it, nothing else. After removing the hub, our speeds have jumped up to several mbps. However in talking with the campus, we got them to run a gigabit line to our firewall in place of the 100mbps line. As of friday, we are at about 65 mbps up and down (according to speedtest.net at 8am)!! Go NC State!!

Read the article

Search Results

Search found 8037 results on 322 pages for 'hardware hacking'.

Page 314/322 | < Previous Page | 310 311 312 313 314 315 316 317 318 319 320 321 | Next Page >

- by Geoff Dalgas

- by user1389890

- by user1389890

- by Hammer Bro.

- by René Kåbis

- by SethG

- by Myles

- by jlego

- by Callum D

- by Philip Durbin

- by Aron Rotteveel

- by Andy Shinn

- by kammer

- by Manca Weeks

- by Rich Lafferty

- by Hammer Bro.

- by Philip Durbin

- by Peter

- by Jorge Suárez de Lis

- by Rodney

- by Joe Hopfgartner

- by schiggn

- by exhuma

- by Patrick

- by Ricket

< Previous Page | 310 311 312 313 314 315 316 317 318 319 320 321 | Next Page >