Search Results

Search found 17013 results on 681 pages for 'hard coding'.

Page 663/681 | < Previous Page | 659 660 661 662 663 664 665 666 667 668 669 670 | Next Page >

Un-failing over a Cisco PIX 515e

- by ABrown

We had a power outage at our data center last week and when our dual PIX 515E running IOS 7.0(8) (configured with a failover cable) came back, they were in a failed over state where the Secondary unit is active and the Primary unit is standby I have tried 'failover reset', 'failover active', and 'failover reload-standby' as well as executing reloads on both units in a variety of orders, and they don't come back Primary/Active Secondary/Standby. The only thing in my arsenal that I haven't tried is driving to the data center and performing a hard reboot, which I hate to do. I have read How Failover Works on the Cisco Secure Firewall and it seems like this should be wicked straight forward. output of show failover on Primary: Failover On Cable status: Normal Failover unit Primary Failover LAN Interface: N/A - Serial-based failover enabled Unit Poll frequency 15 seconds, holdtime 45 seconds Interface Poll frequency 15 seconds Interface Policy 1 Monitored Interfaces 2 of 250 maximum Version: Ours 7.0(8), Mate 7.0(8) Last Failover at: 02:52:05 UTC Mar 10 2010 This host: Primary - Standby Ready Active time: 0 (sec) Interface outside (x.x.x.165): Normal Interface inside (y.y.y.3): Normal Other host: Secondary - Active Active time: 897045 (sec) Interface outside (x.x.x.164): Normal Interface inside (y.y.y.4): Normal Stateful Failover Logical Update Statistics Link : Unconfigured. output of show failover on Secondary: Failover On Cable status: Normal Failover unit Secondary Failover LAN Interface: N/A - Serial-based failover enabled Unit Poll frequency 15 seconds, holdtime 45 seconds Interface Poll frequency 15 seconds Interface Policy 1 Monitored Interfaces 2 of 250 maximum Version: Ours 7.0(8), Mate 7.0(8) Last Failover at: 02:03:04 UTC Feb 28 2010 This host: Secondary - Active Active time: 896925 (sec) Interface outside (x.x.x.164): Normal Interface inside (y.y.y.4): Normal Other host: Primary - Standby Ready Active time: 0 (sec) Interface outside (x.x.x.165): Normal Interface inside (y.y.y.3): Normal Stateful Failover Logical Update Statistics Link : Unconfigured. I'm seeing the following in my syslog: Mar 10 03:05:00 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reset' command. Mar 10 03:05:09 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reload-standby' command. Mar 10 03:05:12 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=20,my=Active,peer=Failed. Mar 10 03:05:12 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Failed. Mar 10 03:06:09 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=0,my=Active,peer=Failed. Mar 10 03:06:09 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is down. Mar 10 03:06:09 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=1,my=Active,peer=Failed. Mar 10 03:06:10 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is up. Mar 10 03:06:10 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=411,op=2,my=Active,peer=Failed. Mar 10 03:06:23 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=80,my=Active,peer=Standby Ready. Mar 10 03:06:23 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Standby Ready. Mar 10 03:06:24 fw2 %PIX-6-720027: (VPN-Primary) HA status callback: My state Standby Ready. Mar 10 03:07:05 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reset' command. Mar 10 03:07:31 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover active' command. Mar 10 03:08:04 fw1 %PIX-5-611103: User logged out: Uname: enable_1 Mar 10 03:08:04 fw1 %PIX-6-315011: SSH session from admin1_int on interface inside for user "pix" terminated normally Mar 10 03:08:39 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=20,my=Active,peer=Failed. Mar 10 03:08:39 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Failed. Mar 10 03:09:10 fw1 %PIX-6-605005: Login permitted from admin1_int/36891 to inside:192.168.4.4/ssh for user "pix" Mar 10 03:09:23 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reset' command. Mar 10 03:09:38 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=0,my=Active,peer=Failed. Mar 10 03:09:39 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is down. Mar 10 03:09:39 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=1,my=Active,peer=Failed. Mar 10 03:09:39 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is up. Mar 10 03:09:39 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=411,op=2,my=Active,peer=Failed. Mar 10 03:09:52 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=80,my=Active,peer=Standby Ready. Mar 10 03:09:52 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Standby Ready. Mar 10 03:09:53 fw2 %PIX-6-720027: (VPN-Primary) HA status callback: My state Standby Ready. I'm not exactly sure how to interpret that syslog data. Primary doesn't seem to even try to become Active. When I reload the individual units separately, my connections are retained, so it doesn't seem like I have a real hardware failure. Is there something I can query (IOS or SNMP) to check for hardware issues? Any thoughts? My IOS-fu is weak. Thanks for any help you might provide, Aaron

Read the article
What does it mean when ARP shows <incomplete> on eth1

- by Geoff Dalgas

We have been using HAProxy along with heartbeat from the Linux-HA project. We are using two linux instances to provide a failover. Each server has with their own public IP and a single IP which is shared between the two using a virtual interface (eth1:1) at IP: 69.59.196.211 The virtual interface (eth1:1) IP 69.59.196.211 is configured as the gateway for the windows servers behind them and we use ip_forwarding to route traffic. We are experiencing an occasional network outage on one of our windows servers behind our linux gateways. HAProxy will detect the server is offline which we can verify by remoting to the failed server and attempting to ping the gateway: Pinging 69.59.196.211 with 32 bytes of data: Reply from 69.59.196.220: Destination host unreachable. Running arp -a on this failed server shows that there is no entry for the gateway address (69.59.196.211): Interface: 69.59.196.220 --- 0xa Internet Address Physical Address Type 69.59.196.161 00-26-88-63-c7-80 dynamic 69.59.196.210 00-15-5d-0a-3e-0e dynamic 69.59.196.212 00-21-5e-4d-45-c9 dynamic 69.59.196.213 00-15-5d-00-b2-0d dynamic 69.59.196.215 00-21-5e-4d-61-1a dynamic 69.59.196.217 00-21-5e-4d-2c-e8 dynamic 69.59.196.219 00-21-5e-4d-38-e5 dynamic 69.59.196.221 00-15-5d-00-b2-0d dynamic 69.59.196.222 00-15-5d-0a-3e-09 dynamic 69.59.196.223 ff-ff-ff-ff-ff-ff static 224.0.0.22 01-00-5e-00-00-16 static 224.0.0.252 01-00-5e-00-00-fc static 225.0.0.1 01-00-5e-00-00-01 static On our linux gateway instances arp -a shows: peak-colo-196-220.peak.org (69.59.196.220) at <incomplete> on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-222.peak.org (69.59.196.222) at 00:15:5d:0a:3e:09 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 Why would arp occasionally set the entry for this failed server as <incomplete>? Should we be defining our arp entries statically? I've always left arp alone since it works 99% of the time, but in this one instance it appears to be failing. Are there any additional troubleshooting steps we can take help resolve this issue? THINGS WE HAVE TRIED I added a static arp entry for testing on one of the linux gateways which still didn't help. root@haproxy2:~# arp -a peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-221.peak.org (69.59.196.221) at 00:15:5d:00:b2:0d [ether] on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 peak-colo-196-220.peak.org (69.59.196.220) at 00:21:5e:4d:30:8d [ether] PERM on eth1 root@haproxy2:~# arp -i eth1 -s 69.59.196.220 00:21:5e:4d:30:8d root@haproxy2:~# ping 69.59.196.220 PING 69.59.196.220 (69.59.196.220) 56(84) bytes of data. --- 69.59.196.220 ping statistics --- 7 packets transmitted, 0 received, 100% packet loss, time 6006ms Rebooting the windows web server solves this issue temporarily with no other changes to the network but our experience shows this issue will come back. Swapping network cards and switches I noticed the link light on the port of the switch for the failed windows server was running at 100Mb instead of 1Gb on the failed interface. I moved the cable to several other open ports and the link indicated 100Mb for each port that I tried. I also swapped the cable with the same result. I tried changing the properties of the network card in windows and the server locked up and required a hard reset after clicking apply. This windows server has two physical network interfaces so I have swapped the cables and network settings on the two interfaces to see if the problem follows the interface. If the public interface goes down again we will know that it is not an issue with the network card. (We also tried another switch we have on hand, no change) Changing network hardware driver versions We've had the same problem with the latest Broadcom driver, as well as the built-in driver that ships in Windows Server 2008 R2. Replacing network cables As a last ditch effort we remembered another change that occurred was the replacement of all of the patch cords between our servers / switch. We had purchased two sets, one green of lengths 1ft - 3ft for the private interfaces and another set of red cables for the public interfaces. We swapped out all of the public interface patch cables with a different brand and ran our servers without issue for a full week ... aaaaaand then the problem recurred. Disable checksum offload, remove TProxy We also tried disabling TCP/IP checksum offload in the driver, no change. We're now pulling out TProxy and moving to a more traditional x-forwarded-for network arrangement without any fancy IP address rewriting. We'll see if that helps.

Read the article
Hyper-V File Server Clustering - at my wit’s end

- by René Kåbis

I am at my wit’s end with File Server clustering under Hyper-V. I am hoping that someone might be able to help me figure out this Gordian Knot of a technology that seems to have dead ends (like forcing cluster VMs to use iSCSI drives where normally-attached VHDX drives could suffice) where logic and reason would normally provide a logical solution. My hardware: I will be running three servers (in the end), but right now everything is taking place on one server. One of the secondary servers will exist purely as a witness/quorum, and another slightly more powerful one will be acting as an emergency backup (with additional storage, just not redundant) to hold the secondary AD VM and the other halves of a set of clustered VMs: the SQL VM and the file system VM. Please note, these each are the depreciated nodes of a cluster, the main nodes will be on the most powerful first machine. My heavy lifter is a machine that also contains all of the truly redundant storage on the network. If this gives anyone the heebie-geebies, too bad. It has a 6TB (usable) RAID-10 array, and will (in the end) hold the primary nodes of both aforementioned clusters, but is right now holding all VMs. This is, right now: DC01, DC02, SQL01, SQL02, FS01 & FS02. Eventually, I will be adding additional VMs to handle Exchange, Sharepoint and Lync, but only to this main server (the secondary server won't be able to handle more than three or four VMs, so why burden it? The AD, SQL & FS VMs are the most critical for the business). If anyone is now saying, “wait, what about a SAN or a NAS for the file servers?”, well too bad. What exists on the main machine is what I have to deal with. I followed these instructions, but I seem to be unable to get things to work. In order to make the file server truly redundant, I cannot trust any one machine to hold the only data store on the network. Therefore, I have created a set of iSCSI drives on the VM-host of the main machine, and attached one to each file server VM. The end result is that I want my FS01 to sit on the heavy lifter, along with its iSCSI “drive”, and FS02 will sit on the secondary machine with its own iSCSI “drive” there as well. That is, neither iSCSI drive will end up sitting on the same machine as the other. As such, the clustered FS will utterly duplicate the contents of the iSCSI drives between each other, so that if one physical machine (or the FS VM) goes toes-up, the other has got a full copy of the data on its own iSCSI drive. My problem occurs when I try to apply the file server role within the failover cluster manager. Actually, it is even before that -- it occurs when adding the disks. Since I have added each disk preferentially to a specific VM (by limiting the initiator by DNS hostname, and by adding two-way CHAP authentication), this forces each VM to be in control of its own iSCSI disk. However, when I try to add the disks to the Disks section of Storage within Failover Cluster Manager, the entire process fails for a random disk of the pair. That is, one will get online, but the other will remain offline because it does not have the correct “owner node”. I mean, really -- WTF? Of course it doesn’t have the right owner node, both drives are showing the same node name!! I cannot seem to have one drive show up with one node name as owner, and the other drive show up with the other node name as owner. And because both drives are not “online”, I cannot create a pool to apply to a cluster role. Talk about getting stuck between a rock and a hard place! I’ve got more to add, but my work is closing for the day and I have to wrap things up. I will try to add more tomorrow morning when I get in. My main objective is to have a file server VM on each machine, the storage on each machine, but a transparent failover in case one physical machine fails. Essentially, a failover FS that doesn’t care which machine fails -- the storage contents are replicated equally on each machine. Am I even heading in the right direction?

Read the article
LDAP query on linux against AD returns groups with no members

- by SethG

I am using LDAP+kerberos to authenticate against Active Directory on Windows 2003 R2. My krb5.conf and ldap.conf appear to be correct (according to pretty much every sample I found on the 'net). I can login to the host with both password and ssh keys. When I run getent passwd, all my ldap user accounts are listed with all the important attributes. When I run getent group, all the ldap groups and their gid's are listed, but no group members. If I run ldapsearch and filter on any group, the members are all listed with the "member" attribute. So the data is there for the taking, it's just not being parsed properly. It would appear that I simply am using an incorrect mapping in ldap.conf, but I can't see it. I've tried several variations and all give the same result. Here is my current ldap.conf: host <ad-host1-ip> <ad-host2-ip> base dc=my,dc=full,dc=dn uri ldap://<ad-host1> ldap://<ad-host2> ldap_version 3 binddn <mybinddn> bindpw <mybindpw> scope sub bind_policy hard nss_reconnect_tries 3 nss_reconnect_sleeptime 1 nss_reconnect_maxsleeptime 8 nss_reconnect_maxconntries 3 nss_map_objectclass posixAccount User nss_map_objectclass posixGroup Group nss_map_attribute uid sAMAccountName nss_map_attribute gidNumber msSFU30GidNumber nss_map_attribute uidNumber msSFU30UidNumber nss_map_attribute cn cn nss_map_attribute gecos displayName nss_map_attribute homeDirectory msSFU30HomeDirectory nss_map_attribute loginShell msSFU30LoginShell nss_map_attribute uniqueMember member pam_filter objectcategory=User pam_login_attribute sAMAccountName pam_member_attribute member pam_password ad Here's the kicker: this config works 100% fine on a different linux box with a different distro. It does not work on the distro I am planning on switching to. I have installed from source the versions of pam_ldap and nss_ldap on the new box to match the old box, which fixed another problem I was having with this setup. Other relevant info is the original AD box was Windows 2003. It's mirror died a horrible hardware death so I'm trying to add two more 2003-R2 servers to the mirror tree and ultimately drop the old 2003 box. The new R2 boxes appear to have joined the DC forest properly. What do I need to do to get groups working? I've exhausted all the resources I could find and need a different angle. Any input is appreciated. Status update, 7/31/09 I have managed to tweak my config file to get full info from the AD and performance is nice and snappy. I replaced the back-rev'd copies of pam_ldap and nss_ldap with the current ones for the distro I'm using, so it's back to a standard out-of-the-box install. Here's my current config: host <ad-host1-ip> <ad-host2-ip> base dc=my,dc=full,dc=dn uri ldap://<ad-host1> ldap://<ad-host2> ldap_version 3 binddn <mybinddn> bindpw <mybindpw> scope sub bind_policy soft nss_reconnect_tries 3 nss_reconnect_sleeptime 1 nss_reconnect_maxsleeptime 8 nss_reconnect_maxconntries 3 nss_connect_policy oneshot referrals no nss_map_objectclass posixAccount User nss_map_objectclass posixGroup Group nss_map_attribute uid sAMAccountName nss_map_attribute gidNumber msSFU30GidNumber nss_map_attribute uidNumber msSFU30UidNumber nss_map_attribute cn cn nss_map_attribute gecos displayName nss_map_attribute homeDirectory msSFU30HomeDirectory nss_map_attribute loginShell msSFU30LoginShell nss_map_attribute uniqueMember member pam_filter objectcategory=CN=Person,CN=Schema,CN=Configuration,DC=w2k,DC=cis,DC=ksu,DC=edu pam_login_attribute sAMAccountName pam_member_attribute member pam_password ad ssl off tls_checkpeer no sasl_secprops maxssf=0 The remaining problem now is when you run the groups command, not all subscribed groups are listed. Some are (one or two), but not all. Group memberships are still honored, such as file and printer access. getent group foo still shows that the user is a member of group foo. So it appears to be a presentation bug, and does not interfere with normal operation. It also appears that some (I have not determined exactly how many) group searches do not resolve correctly, even though the group is listed. eg, when you run "getent group bar", nothing is returned, but if you run "getent group|grep bar" or "getent group|grep <bar_gid>" you can see that it indeed listed and your group name and gid are correct. This still seems like an LDAP search or mapping error, but I can't figure out what it is. I'm a heckuva lot closer than earlier in the week, but I'd really like to get this last detail ironed out.

Read the article
Windows 7 Pro sysprep not working

- by Callum D

Hello, I'm trying to sysprep a Windows 7 Professional machine, prior to grabbing an image for mass deployment on identical hardware, and am having a hard time getting sysprep to work (at all). I've created an XML answer file with WSIM, and have a basic setupcomplete.cmd file, but none of the configurations in the answer file seem to be applied. I've read technet articles and googled, and I still have no idea why this is happening. Is someone able to have a look at the answer file I've attached and let me know where I'm going wrong? thanks, Callum AutoUnattend.XML <?xml version="1.0" encoding="utf-8"?> <unattend xmlns="urn:schemas-microsoft-com:unattend"> <settings pass="specialize"> <component name="Microsoft-Windows-Shell-Setup" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <AutoLogon> <Password> <Value>**********************************</Value> <PlainText>false</PlainText> </Password> <Username>administrator</Username> <LogonCount>1</LogonCount> <Enabled>true</Enabled> </AutoLogon> <WindowsFeatures> <ShowMediaCenter>false</ShowMediaCenter> <ShowWindowsMediaPlayer>false</ShowWindowsMediaPlayer> </WindowsFeatures> <CopyProfile>true</CopyProfile> <DoNotCleanTaskBar>true</DoNotCleanTaskBar> <RegisteredOrganization>SomeCompany (UK) Ltd.</RegisteredOrganization> <RegisteredOwner>SomeCompany User</RegisteredOwner> <ShowWindowsLive>false</ShowWindowsLive> <TimeZone>GMT Standard Time</TimeZone> </component> <component name="Security-Malware-Windows-Defender" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <DisableAntiSpyware>true</DisableAntiSpyware> </component> </settings> <settings pass="oobeSystem"> <component name="Microsoft-Windows-International-Core" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <SystemLocale>en-UK</SystemLocale> <UserLocale>en-UK</UserLocale> <UILanguage>en-US</UILanguage> <InputLocale>0809:00000809</InputLocale> </component> <component name="Microsoft-Windows-Shell-Setup" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <OOBE> <HideEULAPage>true</HideEULAPage> <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE> <NetworkLocation>Work</NetworkLocation> <ProtectYourPC>1</ProtectYourPC> </OOBE> <UserAccounts> <AdministratorPassword> <Value>*************************************************=</Value> <PlainText>false</PlainText> </AdministratorPassword> </UserAccounts> </component> <component name="Microsoft-Windows-Deployment" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Reseal> <Mode>OOBE</Mode> </Reseal> </component> </settings> <settings pass="generalize"> <component name="Microsoft-Windows-Security-SPP" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <SkipRearm>0</SkipRearm> </component> </settings> <settings pass="windowsPE"> <component name="Microsoft-Windows-Setup" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <UseConfigurationSet>true</UseConfigurationSet> </component> </settings> <cpi:offlineImage cpi:source="wim:c:/wim/install.wim#Windows 7 PROFESSIONAL" xmlns:cpi="urn:schemas-microsoft-com:cpi" /> </unattend>

Read the article
Why might login failures cause SQL 2005 to dump and ditch?

- by Byron Sommardahl

Our SQL 2005 server began timing out and finally stopped responding on Oct 26th. The application logs showed a ton of 17883 events leading up to a reboot. After the reboot everything was fine but we were still scratching our heads. Fast forward 6 days... it happened again. Then again 2 days later. The last night. Today it has happened three times to far. The timeline is fairly predictable when it happens: Trans log backups. Login failure for "user2". Minidump Another minidump for the scheduler Repeated 17883 events. Server fails little by little until it won't accept any requests. Reboot is all that gets us going again (a band-aid) Interesting, though, is that the server box itself doesn't seem to have any problems. CPU usage is normal. Network connectivity is fine. We can remote in and look at logs. Management studio does eventually bog down, though. Today, for the first time, we tried stopping services instead of a reboot. All services stopped on their own except for the SQL Server service. We finally did an "end task" on that one and were able to bring everything back up. It worked fine for about 30 minutes until we started seeing timeouts and 17883's again. This time, probably because we didn't reboot all the way, we saw a bunch of 844 events mixed in with the 17883's. Our entire tech team here is scratching heads... some ideas we're kicking around: MS Cumulative Update hit around the same time as when we first had a problem. Since then, we've rolled it back. Maybe it didn't rollback all the way. The situation looks and feels like an unhandled "stack overflow" (no relation) in that it starts small and compounds over time. Problem with this is that there isn't significant CPU usage. At any rate, we're not ruling SQL 2005 bug out at all. Maybe we added one too many import processes and have reached our limit on this box. (hard to believe). Looking at SQLDUMP0151.log at the time of one of the crashes. There are some "login failures" and then there are two stack dumps. 1st a normal stack dump, 2nd for a scheduler dump. Here's a snippet: (sorry for the lack of line breaks) 2009-11-10 11:59:14.95 spid63 Using 'xpsqlbot.dll' version '2005.90.3042' to execute extended stored procedure 'xp_qv'. This is an informational message only; no user action is required. 2009-11-10 11:59:15.09 spid63 Using 'xplog70.dll' version '2005.90.3042' to execute extended stored procedure 'xp_msver'. This is an informational message only; no user action is required. 2009-11-10 12:02:33.24 Logon Error: 18456, Severity: 14, State: 16. 2009-11-10 12:02:33.24 Logon Login failed for user 'standard_user2'. [CLIENT: 50.36.172.101] 2009-11-10 12:08:21.12 Logon Error: 18456, Severity: 14, State: 16. 2009-11-10 12:08:21.12 Logon Login failed for user 'standard_user2'. [CLIENT: 50.36.172.101] 2009-11-10 12:13:49.38 Logon Error: 18456, Severity: 14, State: 16. 2009-11-10 12:13:49.38 Logon Login failed for user 'standard_user2'. [CLIENT: 50.36.172.101] 2009-11-10 12:15:16.88 Logon Error: 18456, Severity: 14, State: 16. 2009-11-10 12:15:16.88 Logon Login failed for user 'standard_user2'. [CLIENT: 50.36.172.101] 2009-11-10 12:18:24.41 Logon Error: 18456, Severity: 14, State: 16. 2009-11-10 12:18:24.41 Logon Login failed for user 'standard_user2'. [CLIENT: 50.36.172.101] 2009-11-10 12:18:38.88 spid111 Using 'dbghelp.dll' version '4.0.5' 2009-11-10 12:18:39.02 spid111 *Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0149.txt 2009-11-10 12:18:39.02 spid111 SqlDumpExceptionHandler: Process 111 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process. 2009-11-10 12:18:39.02 spid111 * ***************************************************************************** 2009-11-10 12:18:39.02 spid111 * 2009-11-10 12:18:39.02 spid111 * BEGIN STACK DUMP: 2009-11-10 12:18:39.02 spid111 * 11/10/09 12:18:39 spid 111 2009-11-10 12:18:39.02 spid111 * 2009-11-10 12:18:39.02 spid111 * 2009-11-10 12:18:39.02 spid111 * Exception Address = 0159D56F Module(sqlservr+0059D56F) 2009-11-10 12:18:39.02 spid111 * Exception Code = c0000005 EXCEPTION_ACCESS_VIOLATION 2009-11-10 12:18:39.02 spid111 * Access Violation occurred writing address 00000000 2009-11-10 12:18:39.02 spid111 * Input Buffer 138 bytes - 2009-11-10 12:18:39.02 spid111 * " N R S C _ P T A 22 00 4e 00 52 00 53 00 43 00 5f 00 50 00 54 00 41 00 2009-11-10 12:18:39.02 spid111 * C _ Q A . d b o . 43 00 5f 00 51 00 41 00 2e 00 64 00 62 00 6f 00 2e 00 2009-11-10 12:18:39.02 spid111 * U s p S e l N e x 55 00 73 00 70 00 53 00 65 00 6c 00 4e 00 65 00 78 00 2009-11-10 12:18:39.02 spid111 * t A c c o u n t 74 00 41 00 63 00 63 00 6f 00 75 00 6e 00 74 00 00 00 2009-11-10 12:18:39.02 spid111 * @ i n t F o r m I 0a 40 00 69 00 6e 00 74 00 46 00 6f 00 72 00 6d 00 49 2009-11-10 12:18:39.02 spid111 * D & 8 @ t x 00 44 00 00 26 04 04 38 00 00 00 09 40 00 74 00 78 00 2009-11-10 12:18:39.02 spid111 * t A l i a s § 74 00 41 00 6c 00 69 00 61 00 73 00 00 a7 0f 00 09 04 2009-11-10 12:18:39.02 spid111 * Ð GQE9732 d0 00 00 07 00 47 51 45 39 37 33 32 2009-11-10 12:18:39.02 spid111 * 2009-11-10 12:18:39.02 spid111 * 2009-11-10 12:18:39.02 spid111 * MODULE BASE END SIZE 2009-11-10 12:18:39.02 spid111 * sqlservr 01000000 02C09FFF 01c0a000 2009-11-10 12:18:39.02 spid111 * ntdll 7C800000 7C8C1FFF 000c2000 2009-11-10 12:18:39.02 spid111 * kernel32 77E40000 77F41FFF 00102000

Read the article
How to disable Mac OS X from using swap when there still is "Inactive" memory?

- by Motin

A common phenomena in my day to day usage (and several other's according to various posts throughout the internet) of OS X, the system seems to become slow whenever there is no more "Free" memory available. Supposedly, this is due to swapping, since heavy disk activity is apparent and that vm_stat reports many pageouts. (Correct me from wrong) However, the amount of "Inactive" ram is typically around 12.5%-25% of all available memory (^1.) when swapping starts/occurs/ends. According to http://support.apple.com/kb/ht1342 : Inactive memory This information in memory is not actively being used, but was recently used. For example, if you've been using Mail and then quit it, the RAM that Mail was using is marked as Inactive memory. This Inactive memory is available for use by another application, just like Free memory. However, if you open Mail before its Inactive memory is used by a different application, Mail will open quicker because its Inactive memory is converted to Active memory, instead of loading Mail from the slower hard disk. And according to http://developer.apple.com/library/mac/#documentation/Performance/Conceptual/ManagingMemory/Articles/AboutMemory.html : The inactive list contains pages that are currently resident in physical memory but have not been accessed recently. These pages contain valid data but may be released from memory at any time. So, basically: When a program has quit, it's memory becomes marked as Inactive and should be claimable at any time. Still, OS X will prefer to start swapping out memory to the Swap file instead of just claiming this memory, whenever the "Free" memory gets to low. Why? What is the advantage of this behavior over, say, instantly releasing Inactive memory and not even touch the swap file? Some sources (^2.) indicate that OS X would page out the "Inactive" memory to swap before releasing it, but that doesn't make sense now does it if the memory may be released from memory at any time? Swapping is expensive, releasing is cheap, right? Can this behavior be changed using some preference or known hack? (Preferably one that doesn't include disabling swap/dynamic_pager altogether and restarting...) I do appreciate the purge command, as well as the concept of Repairing disk permissions to force some Free memory, but those are ways to painfully force more Free memory than to actually fixing the swap/release decision logic... Btw a similar question was asked here: http://forums.macnn.com/90/mac-os-x/434650/why-does-os-x-swap-when/ and here: http://hintsforums.macworld.com/showthread.php?t=87688 but even though the OPs re-asked the core question, none of the replies addresses an answer to it... ^1. UPDATE 17-mar-2012 Since I first posted this question, I have gone from 4gb to 8gb of installed ram, and the problem remains. The amount of "Inactive" ram was 0.5gb-1.0gb before and is now typically around 1.0-2.0GB when swapping starts/occurs/ends, ie it seems that around 12.5%-25% of the ram is preserved as Inactive by osx kernel logic. ^2. For instance http://apple.stackexchange.com/questions/4288/what-does-it-mean-if-i-have-lots-of-inactive-memory-at-the-end-of-a-work-day : Once all your memory is used (free memory is 0), the OS will write out inactive memory to the swapfile to make more room in active memory. UPDATE 17-mar-2012 Here is a round-up of the methods that have been suggested to help so far: The purge command "Used to approximate initial boot conditions with a cold disk buffer cache for performance analysis. It does not affect anonymous memory that has been allocated through malloc, vm_allocate, etc". This is useful to prevent osx to swap-out the disk cache (which is ridiculous that osx actually does so in the first place), but with the downside that the disk cache is released, meaning that if the disk cache was not about to be swapped out, one would simply end up with a cold disk buffer cache, probably affecting performance negatively. The FreeMemory app and/or Repairing disk permissions to force some Free memory Doesn't help releasing any memory, only moving some gigabytes of memory contents from ram to the hd. In the end, this causes lots of swap-ins when I attempt to use the applications that were open while freeing memory, as a lot of its vm is now on swap. Speeding up swap-allocation using dynamicpagerwrapper Seems a good thing to do in order to speed up swap-usage, but does not address the problem of osx swapping in the first place while there is still inactive memory. Disabling swap by disabling dynamicpager and restarting This will force osx not to use swap to the price of the system hanging when all memory is used. Not a viable alternative... Disabling swap using a hacked dynamicpager Similar to disabling dynamicpager above, some excerpts from the comments to the blog post indicate that this is not a viable solution: "The Inactive Memory is high as usual". "when your system is running out of memory, the whole os hangs...", "if you consume the whole amount of memory of the mac, the machine will likely hang" To sum up, I am still unaware of a way of disabling Mac OS X from using swap when there still is "Inactive" memory. If it isn't possible, maybe at least there is an explanation somewhere of why osx prefers to swap out memory that may be released from memory at any time?

Read the article
How do I make Linux recognize a new SATA /dev/sda drive I hot swapped in without rebooting?

- by Philip Durbin

Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized: [root@fs-2 ~]# tail -18 /var/log/messages May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake } May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset May 5 16:54:45 fs-2 kernel: ata1: soft resetting link May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:54:55 fs-2 kernel: ata1: soft resetting link May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:05 fs-2 kernel: ata1: soft resetting link May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps May 5 16:55:40 fs-2 kernel: ata1: soft resetting link May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up May 5 16:55:45 fs-2 kernel: ata1: EH complete I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work: [root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan -bash: echo: write error: Invalid argument [root@fs-2 ~]# [root@fs-2 ~]# /root/rescan-scsi-bus.sh -l [snip] 0 new device(s) found. 0 device(s) removed. [root@fs-2 ~]# [root@fs-2 ~]# ls /dev/sda ls: /dev/sda: No such file or directory I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting? The operating system in question is RHEL5.3: [root@fs-2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.3 (Tikanga) The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS. Here is the lscpi output: [root@fs-2 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful. The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output: [root@www-1 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)

Read the article
Macports irssi & perl5 installation issues

- by Dmitri DB

Long time reader, first time poster. Big, appreciative thanks for everyone's collective questioning and answering here and at stackoverflow, it's helped me quite a lot over the time I've been learning answers through these sites! Apologies in advance if I didn't search hard enough on posts already up on this site to find out what I could do about this issue, but I thought I'd just reach out for the sake of trying at least once. I've experienced this issue while starting up my macports-installed version of irssi: 13:25 -!- Irssi: Error in script dispatch: 13:25 Can't locate lib.pm in @INC (@INC contains: /opt/local/lib/perl5/site_perl/5.12.4/darwin-multi-2level /opt/local/lib/perl5/site_perl/5.12.4 /opt/local/lib/perl5/vendor_perl/5.12.4/darwin-multi-2level /opt/local/lib/perl5/vendor_perl/5.12.4 /opt/local/lib/perl5/5.12.4/darwin-multi-2level /opt/local/lib/perl5/5.12.4 /opt/local/lib/perl5/site_perl/5.12.3/darwin-multi-2level /opt/local/lib/perl5/site_perl/5.12.3 /opt/local/lib/perl5/site_perl /opt/local/lib/perl5/vendor_perl .) at (eval 18) line 1. 13:25 BEGIN failed--compilation aborted at (eval 18) line 1. 13:25 Huh, strange. I looked into it a bit: [email protected] /opt/local/lib/perl5 ?- find . -name "lib.pm" -ls 14673887 16 -r--r--r-- 1 root admin 6853 25 Jun 23:39 ./5.12.4/darwin-thread-multi- 2level/lib.pm [email protected] /opt/local/lib/perl5 ?- l 5.12.4/darwin-thread-multi-2level total 1864 drwxr-xr-x 55 root admin 1870 28 Jun 19:28 . drwxr-xr-x 158 root admin 5372 28 Jun 19:28 .. -rw-r--r-- 1 root admin 177814 25 Jun 23:39 .packlist drwxr-xr-x 6 root admin 204 28 Jun 19:28 B -r--r--r-- 1 root admin 25714 25 Jun 23:39 B.pm drwxr-xr-x 64 root admin 2176 28 Jun 19:28 CORE drwxr-xr-x 3 root admin 102 28 Jun 19:28 Compress -r--r--r-- 1 root admin 3000 25 Jun 23:39 Config.pm -r--r--r-- 1 root admin 228094 25 Jun 23:39 Config.pod -r--r--r-- 1 root admin 409 25 Jun 23:39 Config_git.pl -r--r--r-- 1 root admin 38759 25 Jun 23:39 Config_heavy.pl -r--r--r-- 1 root admin 21174 25 Jun 23:39 Cwd.pm -r--r--r-- 1 root admin 63535 25 Jun 23:39 DB_File.pm drwxr-xr-x 3 root admin 102 28 Jun 19:28 Data drwxr-xr-x 5 root admin 170 28 Jun 19:28 Devel drwxr-xr-x 4 root admin 136 28 Jun 19:28 Digest -r--r--r-- 1 root admin 25185 25 Jun 23:39 DynaLoader.pm drwxr-xr-x 22 root admin 748 28 Jun 19:28 Encode -r--r--r-- 1 root admin 29731 25 Jun 23:39 Encode.pm -r--r--r-- 1 root admin 6736 25 Jun 23:39 Errno.pm -r--r--r-- 1 root admin 5445 25 Jun 23:39 Fcntl.pm drwxr-xr-x 5 root admin 170 28 Jun 19:28 File drwxr-xr-x 3 root admin 102 28 Jun 19:28 Filter -r--r--r-- 1 root admin 1819 25 Jun 23:39 GDBM_File.pm drwxr-xr-x 4 root admin 136 28 Jun 19:28 Hash drwxr-xr-x 3 root admin 102 28 Jun 19:28 I18N drwxr-xr-x 11 root admin 374 28 Jun 19:28 IO -r--r--r-- 1 root admin 1404 25 Jun 23:39 IO.pm drwxr-xr-x 6 root admin 204 28 Jun 19:28 IPC drwxr-xr-x 4 root admin 136 28 Jun 19:28 List drwxr-xr-x 4 root admin 136 28 Jun 19:28 MIME drwxr-xr-x 3 root admin 102 28 Jun 19:28 Math -r--r--r-- 1 root admin 2519 25 Jun 23:39 NDBM_File.pm -r--r--r-- 1 root admin 4208 25 Jun 23:39 O.pm -r--r--r-- 1 root admin 15563 25 Jun 23:39 Opcode.pm -r--r--r-- 1 root admin 21011 25 Jun 23:39 POSIX.pm -r--r--r-- 1 root admin 58962 25 Jun 23:39 POSIX.pod drwxr-xr-x 5 root admin 170 28 Jun 19:28 PerlIO -r--r--r-- 1 root admin 2515 25 Jun 23:39 SDBM_File.pm drwxr-xr-x 4 root admin 136 28 Jun 19:28 Scalar -r--r--r-- 1 root admin 10837 25 Jun 23:39 Socket.pm -r--r--r-- 1 root admin 41003 25 Jun 23:39 Storable.pm drwxr-xr-x 4 root admin 136 28 Jun 19:28 Sys drwxr-xr-x 3 root admin 102 28 Jun 19:28 Text drwxr-xr-x 5 root admin 170 28 Jun 19:28 Time drwxr-xr-x 3 root admin 102 28 Jun 19:28 Unicode -r--r--r-- 1 root admin 14462 25 Jun 23:39 attributes.pm drwxr-xr-x 38 root admin 1292 28 Jun 19:28 auto -r--r--r-- 1 root admin 19892 25 Jun 23:39 encoding.pm -r--r--r-- 1 root admin 6853 25 Jun 23:39 lib.pm -r--r--r-- 1 root admin 11044 25 Jun 23:39 mro.pm -r--r--r-- 1 root admin 997 25 Jun 23:39 ops.pm -r--r--r-- 1 root admin 13945 25 Jun 23:39 re.pm drwxr-xr-x 3 root admin 102 28 Jun 19:28 threads -r--r--r-- 1 root admin 33283 25 Jun 23:39 threads.pm So, it sort of seems to me that the permissions which perl5 got installed with for these modules has gotten mixed up somehow? I'm not really a perl user beyond enjoying it for massive directory-recursive find/replace operations within text files, so I haven't much of an idea what the permissions here are supposed to look like, and I'm not really sure how to go about determining how macports has gone and installed perl this way when it's otherwise worked without failure for years now. Does anyone have any recommendations for the sanest path towards rectifying this issue? Also, is there any interesting reason as to why the macports default for the perl5 port installs 5.12.4, and not 5.16.0, which has to be explicitly installed via the perl5.16 port? Thanks again!

Read the article
Proper network configuration for a KVM guest to be on the same networks at the host

- by Steve Madsen

I am running a Debian Linux server on Lenny. Within it, I am running another Lenny instance using KVM. Both servers are externally available, with public IPs, as well as a second interface with private IPs for the LAN. Everything works fine, except the VM sees all network traffic as originating from the host server. I suspect this might have something to do with the iptables-based firewall I'm running on the host. What I'd like to figure out is: how to I properly configure the host's networking such that all of these requirements are met? Both host and VMs have 2 network interfaces (public and private). Both host and VMs can be independently firewalled. Ideally, VM traffic does not have to traverse the host firewall. VMs see real remote IP addresses, not the host's. Currently, the host's network interfaces are configured as bridges. eth0 and eth1 do not have IP addresses assigned to them, but br0 and br1 do. /etc/network/interfaces on the host: # The primary network interface auto br1 iface br1 inet static address 24.123.138.34 netmask 255.255.255.248 network 24.123.138.32 broadcast 24.123.138.39 gateway 24.123.138.33 bridge_ports eth1 bridge_stp off auto br1:0 iface br1:0 inet static address 24.123.138.36 netmask 255.255.255.248 network 24.123.138.32 broadcast 24.123.138.39 # Internal network auto br0 iface br0 inet static address 192.168.1.1 netmask 255.255.255.0 network 192.168.1.0 broadcast 192.168.1.255 bridge_ports eth0 bridge_stp off This is the libvirt/qemu configuration file for the VM: <domain type='kvm'> <name>apps</name> <uuid>636b6620-0949-bc88-3197-37153b88772e</uuid> <memory>393216</memory> <currentMemory>393216</currentMemory> <vcpu>1</vcpu> <os> <type arch='i686' machine='pc'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/kvm</emulator> <disk type='file' device='cdrom'> <target dev='hdc' bus='ide'/> <readonly/> </disk> <disk type='file' device='disk'> <source file='/raid/kvm-images/apps.qcow2'/> <target dev='vda' bus='virtio'/> </disk> <interface type='bridge'> <mac address='54:52:00:27:5e:02'/> <source bridge='br0'/> <model type='virtio'/> </interface> <interface type='bridge'> <mac address='54:52:00:40:cc:7f'/> <source bridge='br1'/> <model type='virtio'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target port='0'/> </console> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/> </devices> </domain> Along with the rest of my firewall rules, the firewalling script includes this command to pass packets destined for a KVM guest: # Allow bridged packets to pass (for KVM guests). iptables -A FORWARD -m physdev --physdev-is-bridged -j ACCEPT (Not applicable to this question, but a side-effect of my bridging configuration appears to be that I can't ever shut down cleanly. The kernel eventually tells me "unregister_netdevice: waiting for br1 to become free" and I have to hard reset the system. Maybe a sign I've done something dumb?)

Read the article
Getting HAPROXY to redirect http to https in users browser session

- by Jon

We are currently using a Internet cloud provider to host our SaaS platform. The platform consists of a Firewall - Cloud Provider SLB - - Apache Web Server - HAPROXY SLB - Liferay Platform We have had to use HAPROXY because of an issue with the cloud providers SLB that meant we were unable to use it for load balancing the Liferay platform applications. I have implemented HAPROXY in our secure tier and that seems to do the trick of load balancing the requests quite adequately. However during testing we encountered a functional issue whereby selecting a sub-menu from the web portal resulted in the application hanging, using an http analyser we saw that the request being passed back to the users browser was in http, from discussing this with the software vendor it transpires that the Liferay application has some hard-coded http links, and that other customers have worked around this by using physical NLB's such as F5 and redirecting the http traffic to https. The entry in the HAPROXY logs reads: haproxy[2717]: haproxy[2717]: <Apache Web Agent>:37957 [11/Apr/2013:08:07:00.128] http-uapi uapi/<ServerName> 0/0/0/9/10 200 4912 - - ---- 4/2/1/2/0 0/0 "GET /servicedesk/controller?docommand=renderradform&!key=esd_sfb001_frm_feedback_forms_list&isportalintegratedmode=true&USR=joe.bloggs%40gmail.com&_dc=1365667773097&redirecturl=controller%3Fdocommand%3Drenderbody%26%21key%3DESD_SFB001_FRM_FEEDBACK_FORMS_LIST%26isportalintegratedmode%3Dtrue&sso_token=ALiYv2UqzLsAhSw1ZchRDlCHlq44Bhj9&ONERROR=%2Fweb%2Fjsp%2Fapps%2Fportal-integration-error.jsp&itype=login&slicetoken=NW51O%242aRo%2C_Zz%2476P_9DTtnFmz6%28bhk&AUTOFORWARDURL=controller%3Fdocommand%3Drenderbody%26%21key%3DESD_SFB001_FRM_FEEDBACK_FORMS_LIST%26isportalintegratedmode%3Dtrue&LOGINPAGE=https%3A%2F%2F<FQDN of Web Portal>%2Fweb%2F4732cf01-82c3-4bc5-b6c9-552253e672cf%2Fworkflow-tools&appid=1&!uid=1&!redownloadToken=7.0.3.1.1363611301.0&userlocale=en_US&!datechanged=2012-05-18%2015:05:31.38 HTTP/1.1" :37957 [11/Apr/2013:08:07:00.128] http-uapi uapi/<ServerName> 0/0/0/9/10 200 4912 - - ---- 4/2/1/2/0 0/0 "GET /servicedesk/controller?docommand=renderradform&!key=esd_sfb001_frm_feedback_forms_list&isportalintegratedmode=true&USR=joe.bloggs%40gmail.com&_dc=1365667773097&redirecturl=controller%3Fdocommand%3Drenderbody%26%21key%3DESD_SFB001_FRM_FEEDBACK_FORMS_LIST%26isportalintegratedmode%3Dtrue&sso_token=ALiYv2UqzLsAhSw1ZchRDlCHlq44Bhj9&ONERROR=%2Fweb%2Fjsp%2Fapps%2Fportal-integration-error.jsp&itype=login&slicetoken=NW51O%242aRo%2C_Zz%2476P_9DTtnFmz6%28bhk&AUTOFORWARDURL=controller%3Fdocommand%3Drenderbody%26%21key%3DESD_SFB001_FRM_FEEDBACK_FORMS_LIST%26isportalintegratedmode%3Dtrue&LOGINPAGE=https%3A%2F%2F<FQDN of Web Portal>%2Fweb%2F4732cf01-82c3-4bc5-b6c9-552253e672cf%2Fworkflow-tools&appid=1&!uid=1&!redownloadToken=7.0.3.1.1363611301.0&userlocale=en_US&!datechanged=2012-05-18%2015:05:31.38 HTTP/1.1" The corresponding HTTP browser entry shows: http://<FQDN of ServiceDesk>/servicedesk/controller?docommand=renderradform&!key=esd_org019_frm_contact_list&isportalintegratedmode=true&USR=joe.bloggs%40gmail.com&_dc=1365665987887&redirecturl=controller%3Fdocommand%3Drenderbody%26%21key%3DESD_ORG019_FRM_CONTACT_LIST%26isportalintegratedmode%3Dtrue&sso_token=3NxsXYORMPp32SwL8ftVUCMH2QdWLH82&ONERROR=%2Fweb%2Fjsp%2Fapps%2Fportal-integration-error.jsp&itype=login&slicetoken=NW51O%242aRo%2C_Zz%2476P_9DTtnFmz6%28bhk&AUTOFORWARDURL=controller%3Fdocommand%3Drenderbody%26%21key%3DESD_ORG019_FRM_CONTACT_LIST%26isportalintegratedmode%3Dtrue&LOGINPAGE=https%3A%2F%2F<FQDN of Web Portal>>%2Fweb%2F4732cf01-82c3-4bc5-b6c9-552253e672cf%2Fapplication-setup&appid=1&!uid=1&!redownloadToken=7.0.3.1.1363611301.0&userlocale=en_US&!datechanged=2012-10-26%2019:00:25.08 From reading through the forums and other sites it looks like we should be use to use HAPROXY to redirect the traffic to https, but try as I might I cant get it to work. This is our HAPROXY configuration: global log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon stats socket /var/lib/haproxy/stats defaults mode http log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 frontend http-openfire bind *:7070 default_backend openfire backend openfire balance roundrobin server <serverName> <IPv4 Address>:7070 check server <serverName> <IPv4 Address>:7070 check frontend http-uapi bind *:7080 default_backend uapi backend uapi balance roundrobin server <serverName> <IPv4 Address>:7080 check server <serverName> <IPv4 Address>:7080 check frontend http-sec bind *:8080 default_backend sec backend sec balance roundrobin server <serverName> <IPv4 Address>:8080 check server <serverName> <IPv4 Address>:8080 check frontend http-wall bind *:9080 default_backend wall backend wall balance roundrobin server <serverName> <IPv4 Address>:9080 check server <serverName> <IPv4 Address>:9080 check frontend http-xmpp bind *:9090 default_backend xmpp backend xmpp balance roundrobin server <serverName> <IPv4 Address>:9090 check server <serverName> <IPv4 Address>:9090 check frontend http-aim bind *:10080 default_backend aim backend aim balance roundrobin server <serverName> <IPv4 Address>:10080 check server <serverName> <IPv4 Address>:10080 check frontend http-servicedesk bind *:8081 default_backend servicedesk backend servicedesk balance roundrobin server <serverName> <IPv4 Address>:8081 check server <serverName> <IPv4 Address>:8081 check listen stats :1936 mode http stats enable stats hide-version stats realm Haproxy\ Statistics stats uri / stats auth haproxy:<Password> I have tried following the articles listed posted on http://stackoverflow.com/questions/13227544/haproxy-redirecting-http-to-https-ssl and http://parsnips.net/haproxy-http-to-https-redirect/ but that hasn't made any difference. Am I on the right track with this or are we trying to achieve the impossible?, I'm hoping I'm just being an idiot and one of you good people can point me in the right direction.

Read the article
Unicast traffic between hosts on a switch leaving the switch by its uplink. Why?

- by Rich Lafferty

I have a weird thing happening on our network at my office which I can't quite get my head around. In particular I can't tell if it's a problem with a switch, or a problem with configuration. We have a Cisco SG300-52 switch (sw01) in the top of a rack in our server room, connected to another SG300-28 that acts as our core switch (core01). Both run layer 2 only, our firewalls do routing between VLANs. They have a dozen or so VLANs between them. Gi1 on sw01 is a trunk port connected to gi1 on core01. (Disclosure: There are other switches in our environment but I'm pretty sure I've isolated the problem down to these two. Happy to provide more info if necessary.) The behaviour I'm seeing is limited to one VLAN, vlan 12 -- or, at least, it's not happening on the other ones I checked (It's hard to guarantee the absence of packets), and it is: sw01 is forwarding, to core01, traffic which is between two hosts which are both plugged into sw01. (I noticed this because the IDS in our firewall gave a false positive on traffic which should not reach the firewall.) We noticed this mostly between our two dhcp/dns servers, net01 (10.12.0.10) and net02 (10.12.0.11). net01 is physical hardware and net02 is on a VMware ESX server. net01 is connected to gi44 on sw01 and net02's ESX server to gi11. [net01]----gi44-[sw01]-gi1----gi1-[core01] [net02]----gi11/ Let's see some interfaces! Remember, vlan 12 is the problem vlan. Of the others I explicitly verified that vlan 27 was not affected. Here's the two hosts' ports: esx01 contains net02. sw01#sh run int gi11 interface gigabitethernet11 description esx01 lldp med disable switchport trunk allowed vlan add 5-7,11-13,100 switchport trunk native vlan 27 ! sw01#sh run int gi44 interface gigabitethernet44 description net01-1 lldp med disable switchport mode access switchport access vlan 12 ! Here's the trunk on sw01. sw01#sh run int gi1 interface gigabitethernet1 description "trunk to core01" lldp med disable switchport trunk allowed vlan add 4-7,11-13,27,100 ! And the other end of the trunk on core01. interface gigabitethernet1 description sw01 macro description switch switchport trunk allowed vlan add 2-7,11-16,27,100 ! I have a monitor port on core01, thus: core01#sh run int gi12 interface gigabitethernet12 description "monitor port" port monitor GigabitEthernet 1 ! And the monitor port on core01 sees unicast traffic going between net01 and net02, both of which are on sw01! I've verified this with a monitor port on sw01 that sees the net01-net02 unicast traffic leaving via gi1 too. sw01 knows that both of those hosts are on ports that are not its trunk port: :) ratchet$ arp -a | grep net net02.2ndsiteinc.com (10.12.0.11) at 00:0C:29:1A:66:15 [ether] on eth0 net01.2ndsiteinc.com (10.12.0.10) at 00:11:43:D8:9F:94 [ether] on eth0 sw01#sh mac addr addr 00:0C:29:1A:66:15 Aging time is 300 sec Vlan Mac Address Port Type -------- --------------------- ---------- ---------- 12 00:0c:29:1a:66:15 gi11 dynamic sw01#sh mac addr addr 00:11:43:D8:9F:94 Aging time is 300 sec Vlan Mac Address Port Type -------- --------------------- ---------- ---------- 12 00:11:43:d8:9f:94 gi44 dynamic I also brought up an unused port on sw01 on vlan 12, but the unicast traffic was (as best as I could tell) not coming out that port. So it doesn't look like sw01 is pushing it out all its ports, just the right ports and also gi1! I've verified that sw01 is not filling up its address-table: sw01#sh mac addr count This may take some time. Capacity : 8192 Free : 7983 Used : 208 The full configs for both core01 and sw01 are available: core01, sw01. Finally, versions: sw01#sh ver SW version 1.1.2.0 ( date 12-Nov-2011 time 23:34:26 ) Boot version 1.0.0.4 ( date 08-Apr-2010 time 16:37:57 ) HW version V01 core01#sh ver SW version 1.1.2.0 ( date 12-Nov-2011 time 23:34:26 ) Boot version 1.1.0.6 ( date 11-May-2011 time 18:31:00 ) HW version V01 So my understanding is this: sw01 should take unicast traffic for net01 and send it only out net02's port, and vice versa; none of it should go out sw01's uplink. But core01, receiving traffic on gi1 for a host it knows is on gi1, is right in sending it out all of its ports. (That is: sw01 is misbehaving, but core01 is doing what it should given the circumstances.) My question is: Why is sw01 sending that unicast traffic out its uplink, gi1? (And pre-emptively: yes, I know SG300s leave much to be desired, and yes, we should have spanning-tree enabled, but that's where I'm at right now.)

Read the article
Two network interfaces and two IP addresses on the same subnet in Linux

- by Scott Duckworth

I recently ran into a situation where I needed two IP addresses on the same subnet assigned to one Linux host so that we could run two SSL/TLS sites. My first approach was to use IP aliasing, e.g. using eth0:0, eth0:1, etc, but our network admins have some fairly strict settings in place for security that squashed this idea: They use DHCP snooping and normally don't allow static IP addresses. Static addressing is accomplished by using static DHCP entries, so the same MAC address always gets the same IP assignment. This feature can be disabled per switchport if you ask and you have a reason for it (thankfully I have a good relationship with the network guys and this isn't hard to do). With the DHCP snooping disabled on the switchport, they had to put in a rule on the switch that said MAC address X is allowed to have IP address Y. Unfortunately this had the side effect of also saying that MAC address X is ONLY allowed to have IP address Y. IP aliasing required that MAC address X was assigned two IP addresses, so this didn't work. There may have been a way around these issues on the switch configuration, but in an attempt to preserve good relations with the network admins I tried to find another way. Having two network interfaces seemed like the next logical step. Thankfully this Linux system is a virtual machine, so I was able to easily add a second network interface (without rebooting, I might add - pretty cool). A few keystrokes later I had two network interfaces up and running and both pulled IP addresses from DHCP. But then the problem came in: the network admins could see (on the switch) the ARP entry for both interfaces, but only the first network interface that I brought up would respond to pings or any sort of TCP or UDP traffic. After lots of digging and poking, here's what I came up with. It seems to work, but it also seems to be a lot of work for something that seems like it should be simple. Any alternate ideas out there? Step 1: Enable ARP filtering on all interfaces: # sysctl -w net.ipv4.conf.all.arp_filter=1 # echo "net.ipv4.conf.all.arp_filter = 1" >> /etc/sysctl.conf From the file networking/ip-sysctl.txt in the Linux kernel docs: arp_filter - BOOLEAN 1 - Allows you to have multiple network interfaces on the same subnet, and have the ARPs for each interface be answered based on whether or not the kernel would route a packet from the ARP'd IP out that interface (therefore you must use source based routing for this to work). In other words it allows control of which cards (usually 1) will respond to an arp request. 0 - (default) The kernel can respond to arp requests with addresses from other interfaces. This may seem wrong but it usually makes sense, because it increases the chance of successful communication. IP addresses are owned by the complete host on Linux, not by particular interfaces. Only for more complex setups like load- balancing, does this behaviour cause problems. arp_filter for the interface will be enabled if at least one of conf/{all,interface}/arp_filter is set to TRUE, it will be disabled otherwise Step 2: Implement source-based routing I basically just followed directions from http://lartc.org/howto/lartc.rpdb.multiple-links.html, although that page was written with a different goal in mind (dealing with two ISPs). Assume that the subnet is 10.0.0.0/24, the gateway is 10.0.0.1, the IP address for eth0 is 10.0.0.100, and the IP address for eth1 is 10.0.0.101. Define two new routing tables named eth0 and eth1 in /etc/iproute2/rt_tables: ... top of file omitted ... 1 eth0 2 eth1 Define the routes for these two tables: # ip route add default via 10.0.0.1 table eth0 # ip route add default via 10.0.0.1 table eth1 # ip route add 10.0.0.0/24 dev eth0 src 10.0.0.100 table eth0 # ip route add 10.0.0.0/24 dev eth1 src 10.0.0.101 table eth1 Define the rules for when to use the new routing tables: # ip rule add from 10.0.0.100 table eth0 # ip rule add from 10.0.0.101 table eth1 The main routing table was already taken care of by DHCP (and it's not even clear that its strictly necessary in this case), but it basically equates to this: # ip route add default via 10.0.0.1 dev eth0 # ip route add 130.127.48.0/23 dev eth0 src 10.0.0.100 # ip route add 130.127.48.0/23 dev eth1 src 10.0.0.101 And voila! Everything seems to work just fine. Sending pings to both IP addresses works fine. Sending pings from this system to other systems and forcing the ping to use a specific interface works fine (ping -I eth0 10.0.0.1, ping -I eth1 10.0.0.1). And most importantly, all TCP and UDP traffic to/from either IP address works as expected. So again, my question is: is there a better way to do this? This seems like a lot of work for a seemingly simple problem.

Read the article
How do I make Linux recognize a new SATA /dev/sda drive I hot swapped in without rebooting?

- by Philip Durbin

Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized: [root@fs-2 ~]# tail -18 /var/log/messages May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake } May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset May 5 16:54:45 fs-2 kernel: ata1: soft resetting link May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:54:55 fs-2 kernel: ata1: soft resetting link May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:05 fs-2 kernel: ata1: soft resetting link May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps May 5 16:55:40 fs-2 kernel: ata1: soft resetting link May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up May 5 16:55:45 fs-2 kernel: ata1: EH complete I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work: [root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan -bash: echo: write error: Invalid argument [root@fs-2 ~]# [root@fs-2 ~]# /root/rescan-scsi-bus.sh -l [snip] 0 new device(s) found. 0 device(s) removed. [root@fs-2 ~]# [root@fs-2 ~]# ls /dev/sda ls: /dev/sda: No such file or directory I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting? The operating system in question is RHEL5.3: [root@fs-2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.3 (Tikanga) The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS. Here is the lscpi output: [root@fs-2 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful. The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output: [root@www-1 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)

Read the article
Installing Windows on HP Proliant Servers without SmartStart

- by Fitzroy

I have a PXE server for deploying Windows XP and Windows 7 to workstations. The process is as follows: Boot the workstation from the NIC. Workstation sends a DHCP request. DHCP server responds with an IP address and the location of the PXE server. Workstation downloads WinPE image file from PXE server via TFTP Workstation stores WinPE image file in memory and executes it. Once booted into WinPE, I connect to a network share to gain access to either the Windows XP or Windows 7 installation files. A custom script is launched to guide you through the process of formatting and partitioning the hard drive(s) (using DISKPART and FORMAT). Another custom script asks for details such as the hostname to assign to the workstation. The answers provided are used to build an unattended answer file (SIF [Setup Information File] for WinXP and XML for Win7). The Windows setup EXE is launched, passing the unattended answer file to it as a parameter. The Windows XP and Windows 7 installation sources have been customised to include the drivers for our Dell workstations. They also run a number of scripts upon first booting up to install software packages. This process works very well for our workstations and I would now like to use it for building our servers too. The vast majority of our servers are HP Proliant DL360 G6, DL380 G5 and DL380 G6. They’re running Windows Server 2003 (various editions) or 2008 (various editions). To date, we have always built the HP Proliant servers using the SmartStart CD provided. SmartStart does three useful things for us: Setup RAID with HP Array Configuration Utility (ACU). Installs and configures SNMP Installs various HP Tools for Windows (HP Array Configuration Utility, HP Array Diagnostic Utility, HP Proliant Integrated Management Log Viewer, etc) Using SmartStart I have never had to manually download and install Windows drivers for network, sound, video, etc. I'm not sure if this is because SmartStart copies drivers from the CD during setup, or whether Windows just has the drivers natively in its driver CAB. If I abandon the SmartStart CD in favour of my PXE server I would have to do the following: As I wont have access to ACU, I'll configure the RAID (before booting to the PXE server) by pressing F8 (during the boot process) to access Option ROM Configuration for Arrays (ORCA). Installation of SNMP and the HP Tools will have to be installed once the Windows installation is complete using the Proliant Support Pack. Is this method OK? Is there anything that the SmartStart CD does that I'll be unable to do by other means? Are there any disadvantages to not using the SmartStart CD? Many thanks. UPDATE 05/01/12 I’ve been reading through the SmartStart Scripting Toolkit documentation. The scripting toolkit contains command line tools which work within WinPE and can such things as configure BIOS settings, configure an array and setup ILO. I’m personally not too bothered about configuring BIOS settings as I rarely deviate from the defaults (unless the server is to be a Hyper-V host). I’m not too fussed about being able to configure the array from within WinPE, as I’m happy to just press F8 and use Option ROM Configuration for Arrays (ORCA). Although, if it’s easy enough to do, I will explore this further, as it saves time if everything can be configured from within WinPE. One of the nice features all the tools possess is that you can pass input files to them. EG. Configure one server to your requirements, capture its configuration to a file (using the appropriate tool), you can then use the tool on other servers passing the input file with the captured configuration. Array controller drivers appear to be included with the toolkit along with example of how to incorporate them within a WinPE build. I suppose WinPE won’t be able to see logical volumes (I.E 2x physical disks in a RAID 1 configuration) without the array controller drivers? I mentioned in my post that SmartStart normally installs a bunch of Windows HP tools for you. I’ve had a look today, and if you run the SmartStart CD from within Windows all the tools can be installed. Therefore I can do this after the Windows installation is complete. The SmartStart CD appears to contain a lot Windows drivers. I can customise my Windows 2008 source to incorporate these drivers. However, I understand that incorporating an array controller driver is a little different to most drivers. I believe that you have to provide the driver during the very early stages of the Windows setup. I’m working through the Scripting Toolkit documentation to try and work this out...

Read the article
VFS: file-max limit 1231582 reached

- by Rick Koshi

I'm running a Linux 2.6.36 kernel, and I'm seeing some random errors. Things like ls: error while loading shared libraries: libpthread.so.0: cannot open shared object file: Error 23 Yes, my system can't consistently run an 'ls' command. :( I note several errors in my dmesg output: # dmesg | tail [2808967.543203] EXT4-fs (sda3): re-mounted. Opts: (null) [2837776.220605] xv[14450] general protection ip:7f20c20c6ac6 sp:7fff3641b368 error:0 in libpng14.so.14.4.0[7f20c20a9000+29000] [4931344.685302] EXT4-fs (md16): re-mounted. Opts: (null) [4982666.631444] VFS: file-max limit 1231582 reached [4982666.764240] VFS: file-max limit 1231582 reached [4982767.360574] VFS: file-max limit 1231582 reached [4982901.904628] VFS: file-max limit 1231582 reached [4982964.930556] VFS: file-max limit 1231582 reached [4982966.352170] VFS: file-max limit 1231582 reached [4982966.649195] top[31095]: segfault at 14 ip 00007fd6ace42700 sp 00007fff20746530 error 6 in libproc-3.2.8.so[7fd6ace3b000+e000] Obviously, the file-max errors look suspicious, being clustered together and recent. # cat /proc/sys/fs/file-max 1231582 # cat /proc/sys/fs/file-nr 1231712 0 1231582 That also looks a bit odd to me, but the thing is, there's no way I have 1.2 million files open on this system. I'm the only one using it, and it's not visible to anyone outside the local network. # lsof | wc 16046 148253 1882901 # ps -ef | wc 574 6104 44260 I saw some documentation saying: file-max & file-nr: The kernel allocates file handles dynamically, but as yet it doesn't free them again. The value in file-max denotes the maximum number of file- handles that the Linux kernel will allocate. When you get lots of error messages about running out of file handles, you might want to increase this limit. Historically, the three values in file-nr denoted the number of allocated file handles, the number of allocated but unused file handles, and the maximum number of file handles. Linux 2.6 always reports 0 as the number of free file handles -- this is not an error, it just means that the number of allocated file handles exactly matches the number of used file handles. Attempts to allocate more file descriptors than file-max are reported with printk, look for "VFS: file-max limit reached". My first reading of this is that the kernel basically has a built-in file descriptor leak, but I find that very hard to believe. It would imply that any system in active use needs to be rebooted every so often to free up the file descriptors. As I said, I can't believe this would be true, since it's normal to me to have Linux systems stay up for months (even years) at a time. On the other hand, I also can't believe that my nearly-idle system is holding over a million files open. Does anyone have any ideas, either for fixes or further diagnosis? I could, of course, just reboot the system, but I don't want this to be a recurring problem every few weeks. As a stopgap measure, I've quit Firefox, which was accounting for almost 2000 lines of lsof output (!) even though I only had one window open, and now I can run 'ls' again, but I doubt that will fix the problem for long. (edit: Oops, spoke too soon. By the time I finished typing out this question, the symptom was/is back) Thanks in advance for any help. And another update: My system was basically unusable, so I decided I had no option but to reboot. But before I did, I carefully quit one process at a time, checking /proc/sys/fs/file-nr after each termination. I found that, predictably, the number of open files gradually went down as I closed things down. Unfortunately, it wasn't a large effect. Yes, I was able to clear up 5000-10000 open files, but there were still over 1.2 million left. I shut down just about everything. All interactive shells, except for the one ssh I left open to finish closing down, httpd, even nfs service. Basically everything in the process table that wasn't a kernel process, and there were still an appalling number of files apparently left open. After the reboot, I found that /proc/sys/fs/file-nr showed about 2000 files open, which is much more reasonable. Starting up 2 Xvnc sessions as usual, along with the dozen or so monitoring windows I like to keep open, brought the total up to about 4000 files. I can see nothing wrong with that, of course, but I've obviously failed to identify the root cause. I'm still looking for ideas, since I definitely expect it to happen again. And another update, the next day: I watched the system carefully, and discovered that /proc/sys/fs/file-nr showed a growth of about 900 open files per hour. I shut down the system's only NFS client for the night, and the growth stopped. Mind you, it didn't free up the resources, but it did at least stop consuming more. Is this a known bug with NFS? I'll be bringing the NFS client back online today, and I'll narrow it down further. If anyone is familiar with this behavior, feel free to jump in with "Yeah, NFS4 has this problem, go back to NFS3" or something like that.

Read the article
Mount TMPFS instead of ro /dev

- by schiggn

I am working on a ARM-Based embedded system with a custom Debian Linux based on kernel 2.6.31. In the final system, the Root file system is stored as squashfs on flash. Now, the folder /dev is created by udev, but since there is no hot plugging functionality needed and booting time is critical, I wanted to delete udev and "hard code" the /dev folder (read here, page 5). because i still need to change parameters of the devices (with ioctl /sysfs) this does not work for me in this case. so i thought of mounting a tmpfs on /dev and change the parameters there. is this possible? and how to do best? my approach would be: delete /dev from RFS create tar containing basic devices mount tmpfs /dev untar tar-file into /dev change parameters Could this work? Do you see any problems? I found out, that you can mount on top of already mounted mount point, is it somehow possible just to take data with while mounting the new file system? if so that would be very convenient! Thanks Update: I just tried that out, but I'm stuck at a certain point. I packed all my devices into devices.tar, packed it into /usr of my squashfs and added the following lines to mountkernfs.sh, which is executed right after INIT. #mount /dev on tmpfs echo -n "Mounting /dev on tmpfs..." mount -o size=5M,mode=0755 -t tmpfs tmpfs /dev mknod -m 600 /dev/console c 5 1 mknod -m 600 /dev/null c 1 3 echo "done." echo -n "Populating /dev..." tar -xf /usr/devices.tar -C /dev echo "done." This works fine on the version over NFS, if I place printf's in the code, I can see it executing, if I comment out the extracting part, its complaining about missing devices. Booting OK mmc0: new high speed SDHC card at address 0007 mmcblk0: mmc0:0007 SD04G 3.67 GiB mmcblk0: p1 IP-Config: Unable to set interface netmask (-22). Looking up port of RPC 100003/2 on 192.168.1.234 Looking up port of RPC 100005/1 on 192.168.1.234 VFS: Mounted root (nfs filesystem) on device 0:14. Freeing init memory: 136K INIT: version 2.86 booting Mounting /dev on tmpfs...done. Populating /dev...done. Initializing /var...done. Setting the system clock. System Clock set to: Thu Sep 13 11:26:23 UTC 2012. INIT: Entering runlevel: 2 UBI: attaching mtd8 to ubi0 Commenting out the extraction of the tar mmc0: new high speed SDHC card at address 0007 mmcblk0: mmc0:0007 SD04G 3.67 GiB mmcblk0: p1 IP-Config: Unable to set interface netmask (-22). Looking up port of RPC 100003/2 on 192.168.1.234 Looking up port of RPC 100005/1 on 192.168.1.234 VFS: Mounted root (nfs filesystem) on device 0:14. Freeing init memory: 136K INIT: version 2.86 booting Mounting /dev on tmpfs...done. Populating /dev...done. Initializing /var...done. Setting the system clock. Cannot access the Hardware Clock via any known method. Use the --debug option to see the details of our search for an access method. Unable to set System Clock to: Thu Sep 13 12:24:00 UTC 2012 ... (warning). INIT: Entering runlevel: 2 libubi: error!: cannot open "/dev/ubi_ctrl" So far so good. But if I pack the whole story into a squashfs and boot from there, it is acting strange. It's telling me while booting that it is unable to open an initial console and its throwing errors on mounting the UBIFS devices, but finally provides a login anyway. Over that my echo's are not executed. If I then log in, /dev is mounted as TMPFS as desired and all the devices reside inside. When I redo the "mount" command to mount the UBIFS partitions it is executed whitout problem and useable. From squashfs VFS: Mounted root (squashfs filesystem) readonly on device 31:15. Freeing init memory: 136K Warning: unable to open an initial console. mmc0: new high speed SDHC card at address 0007 mmcblk0: mmc0:0007 SD04G 3.67 GiB mmcblk0: p1 UBIFS error (pid 484): ubifs_get_sb: cannot open "ubi1_0", error -19 Additionally, a part of the rest of the bootscripts is still exexuted, but not all of them. Does anyone has a clue why? Other question, is 5MB enough/too much for /dev?

Read the article
Cluster failover and strange gratuitous arp behavior

- by lazerpld

I am experiencing a strange Windows 2008R2 cluster related issue that is bothering me. I feel that I have come close as to what the issue is, but still don't fully understand what is happening. I have a two node exchange 2007 cluster running on two 2008R2 servers. The exchange cluster application works fine when running on the "primary" cluster node. The problem occurs when failing over the cluster ressource to the secondary node. When failing over the cluster to the "secondary" node, which for instance is on the same subnet as the "primary", the failover initially works ok and the cluster ressource continues to work for a couple of minutes on the new node. Which means that the recieving node does send out a gratuitous arp reply packet that updated the arp tables on the network. But after x amount of time (typically within 5 minutes time) something updates the arp-tables again because all of a sudden the cluster service does not answer to pings. So basically I start a ping to the exchange cluster address when its running on the "primary node". It works just great. I failover the cluster ressource group to the "secondary node" and I only have loss of one ping which is acceptable. The cluster ressource still answers for some time after being failed over and all of a sudden the ping starts timing out. This is telling me that the arp table initially is updated by the secondary node, but then something (which I haven't found out yet) wrongfully updates it again, probably with the primary node's MAC. Why does this happen - has anyone experienced the same problem? The cluster is NOT running NLB and the problem stops immidiately after failing over back to the primary node where there are no problems. Each node is using NIC teaming (intel) with ALB. Each node is on the same subnet and has gateway and so on entered correctly as far as I am concerned. Edit: I was wondering if it could be related to network binding order maybe? Because I have noticed that the only difference I can see from node to node is when showing the local arp table. On the "primary" node the arp table is generated on the cluster address as the source. While on the "secondary" its generated from the nodes own network card. Any input on this? Edit: Ok here is the connection layout. Cluster address: A.B.6.208/25 Exchange application address: A.B.6.212/25 Node A: 3 physical nics. Two teamed using intels teaming with the address A.B.6.210/25 called public The last one used for cluster traffic called private with 10.0.0.138/24 Node B: 3 physical nics. Two teamed using intels teaming with the address A.B.6.211/25 called public The last one used for cluster traffic called private with 10.0.0.139/24 Each node sits in a seperate datacenter connected together. End switches being cisco in DC1 and NEXUS 5000/2000 in DC2. Edit: I have been testing a little more. I have now created an empty application on the same cluster, and given it another ip address on the same subnet as the exchange application. After failing this empty application over, I see the exact same problem occuring. After one or two minutes clients on other subnets cannot ping the virtual ip of the application. But while clients on other subnets cannot, another server from another cluster on the same subnet has no trouble pinging. But if i then make another failover to the original state, then the situation is the opposite. So now clients on same subnet cannot, and on other they can. We have another cluster set up the same way and on the same subnet, with the same intel network cards, the same drivers and same teaming settings. Here we are not seeing this. So its somewhat confusing. Edit: OK done some more research. Removed the NIC teaming of the secondary node, since it didnt work anyway. After some standard problems following that, I finally managed to get it up and running again with the old NIC teaming settings on one single physical network card. Now I am not able to reproduce the problem described above. So it is somehow related to the teaming - maybe some kind of bug? Edit: Did some more failing over without being able to make it fail. So removing the NIC team looks like it was a workaround. Now I tried to reestablish the intel NIC teaming with ALB (as it was before) and i still cannot make it fail. This is annoying due to the fact that now i actually cannot pinpoint the root of the problem. Now it just seems to be some kind of MS/intel hick-up - which is hard to accept because what if the problem reoccurs in 14 days? There is a strange thing that happened though. After recreating the NIC team I was not able to rename the team to "PUBLIC" which the old team was called. So something has not been cleaned up in windows - although the server HAS been restarted! Edit: OK after restablishing the ALB teaming the error came back. So I am now going to do some thorough testing and i will get back with my observations. One thing is for sure. It is related to Intel 82575EB NICS, ALB and Gratuitous Arp.

Read the article
Bad disks in ancient server

- by Joel Coel

I have a 1998-era Netware 3.12 server that runs everything on our campus: general ledger, purchasing, payroll, student information, grades, you name it. The server has an Adaptec RAID controller with two volumes: RAID 1, 2 17GB scsi disks, Seagate ST318417W RAID 5, 3 4GB scsi disks, 2 Seagate ST34573W and 1 ST34572W. We are currently in the early stages of a project to replace this system, but you don't just jump into a new system like that and so I need to keep this server running until at least November 2011. This week we had not one but two hard drives fail. Thankfully they are from different volumes and we're able to keep running for the moment, but given the close nature of these failures I have serious doubts that I'll be able to avoid catastrophic failure from this server through the November target as is without restoring the RAID redundancy — it'll only take one more drive failure anywhere and I'm completely hosed. We are fortunate enough to have exact match "spares" lying around for both drives, but the spares are in unknown condition. I tried swapping just them in, but the RAID controller isn't smart enough to handle this and it renders the system unbootable. As for the RAID controller itself, there is utility I can get into during POST via a Ctrl-A shortcut, but I can't do much useful from there. To actually manage volumes I must first boot in to Netware, at which point I can use CI/O Array Management Software Version 2.0 to actually look at volume information. I suspect that the normal way to manage things is to boot from a special floppy with the controller software on it, but that floppy is long gone. Going through the options in the RAID software, I think the only supported way to replace a disk in an existing RAID volume is to physically add the disk, boot up and configure it as a "spare" for a volume, force the volume to use the spare to replace an existing down disk (and at this point I'm only guessing) so that the down disk becomes the spare, repair the volume, remove the spare from the volume, and then shut down and remove the disk. Then start all over for the other failed disk. All this amounts to a lot of downtime, assuming I can even make it work and that my spares are any good. As for finding reliable spares, I have no clue where to even begin looking to find a new 4GB scsi drive, or even which exact scsi system I'm looking for, as it's gone through a few different iterations over time. Another option is to migrate this to a virtual machine (hyper-v), but all previous attempts we've made in this area have failed to get very far. When this machine was installed I was just graduating from high school, and so it requires lower level knowledge of netware and dos than I ever developed, or if I did have since forgotten (I'm not exactly a dos neophyte, either). Part of my problem is this is a high-use server, and taking it down for a few days to figure things out isn't gonna fly very well. As for the question, I'm looking for anything that might be helpful in this situation: a recommendation on a place to find good spares from this era, personal experience repairing RAID volumes using a similar controller or building a hyper-v vm from an old netware server, a line on a floppy with better software for the RAID controller, recommendation on a good Novell consultant in Nebraska that would be able to put things right, a whole other option I haven't considered yet, etc. Update: For backups, we have good (recently verified via restore) backups of the data only -- nothing for the software that actually runs things. Update 2: Just a progress report that I currently have a working Netware 3.12 install in VMWare Virtual Server 2.0, thanks largely to the guide I found here: http://cerbulescubogdan.blogspot.com/2010/11/novell-netware-312-on-vmware.html The next steps are preparing empty netware volumes to match the additional volumes on my existing server, taking a dump of everything on the C:\ drive and netware volumes on my existing server, and figuring out from that information what modules need added to netware, installing my licenses (we do still have that disk, if it's any good), and moving data over. I have approval to bring the server down for a week after the first of the year (sadly not before), so, aside from creating empty volumes, the rest of the work will have to wait until then. Final Update (Jan 5, 2011): I was able to get spares working in both raid arrays without data loss this week. Both are now listed by the controller as "FAULT TOLLERANT" (yay!). I was also able to build on the progress from my last update and now have a functional "spare" server in VMWare Server 2.0. The spare can run and use our erp software, but I can't put it into production because I can't (yet) print from that box (and I have no idea why). Even so, this VM will do in a pinch if I have no other choice, and between it and the repaired RAID arrays I'm comfortable pushing on until I can junk the machine in November.

Read the article
Bridging LXC containers to host eth0 so they can have a public IP

- by Vianney Stroebel

UPDATE: I found the solution there: http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge#No_traffic_gets_trough_.28except_ARP_and_STP.29 # cd /proc/sys/net/bridge # ls bridge-nf-call-arptables bridge-nf-call-iptables bridge-nf-call-ip6tables bridge-nf-filter-vlan-tagged # for f in bridge-nf-*; do echo 0 $f; done But I'd like to have expert opinions on this: is it safe to disable all bridge-nf-*? What are they here for? END OF UPDATE I need to bridge LXC containers to the physical interface (eth0) of my host, reading numerous tutorials, documents and blog posts on the subject. I need the containers to have their own public IP (which I've previously done KVM/libvirt). After two days of searching and trying, I still can't make it work with LXC containers. The host runs a freshly installed Ubuntu Server Quantal (12.10) with only libvirt (which I'm not using here) and lxc installed. I created the containers with : lxc-create -t ubuntu -n mycontainer So they also run Ubuntu 12.10. Content of /var/lib/lxc/mycontainer/config is: lxc.utsname = mycontainer lxc.mount = /var/lib/lxc/test/fstab lxc.rootfs = /var/lib/lxc/test/rootfs lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 lxc.network.name = eth0 lxc.network.veth.pair = vethmycontainer lxc.network.ipv4 = 179.43.46.233 lxc.network.hwaddr= 02:00:00:86:5b:11 lxc.devttydir = lxc lxc.tty = 4 lxc.pts = 1024 lxc.arch = amd64 lxc.cap.drop = sys_module mac_admin mac_override lxc.pivotdir = lxc_putold # uncomment the next line to run the container unconfined: #lxc.aa_profile = unconfined lxc.cgroup.devices.deny = a # Allow any mknod (but not using the node) lxc.cgroup.devices.allow = c *:* m lxc.cgroup.devices.allow = b *:* m # /dev/null and zero lxc.cgroup.devices.allow = c 1:3 rwm lxc.cgroup.devices.allow = c 1:5 rwm # consoles lxc.cgroup.devices.allow = c 5:1 rwm lxc.cgroup.devices.allow = c 5:0 rwm #lxc.cgroup.devices.allow = c 4:0 rwm #lxc.cgroup.devices.allow = c 4:1 rwm # /dev/{,u}random lxc.cgroup.devices.allow = c 1:9 rwm lxc.cgroup.devices.allow = c 1:8 rwm lxc.cgroup.devices.allow = c 136:* rwm lxc.cgroup.devices.allow = c 5:2 rwm # rtc lxc.cgroup.devices.allow = c 254:0 rwm #fuse lxc.cgroup.devices.allow = c 10:229 rwm #tun lxc.cgroup.devices.allow = c 10:200 rwm #full lxc.cgroup.devices.allow = c 1:7 rwm #hpet lxc.cgroup.devices.allow = c 10:228 rwm #kvm lxc.cgroup.devices.allow = c 10:232 rwm Then I changed my host /etc/network/interfaces to: auto lo iface lo inet loopback auto br0 iface br0 inet static bridge_ports eth0 bridge_fd 0 address 92.281.86.226 netmask 255.255.255.0 network 92.281.86.0 broadcast 92.281.86.255 gateway 92.281.86.254 dns-nameservers 213.186.33.99 dns-search ovh.net When I try command line configuration ("brctl addif", "ifconfig eth0", etc.) my remote host becomes inaccessible and I have to hard reboot it. I changed the content of /var/lib/lxc/mycontainer/rootfs/etc/network/interfaces to: auto lo iface lo inet loopback auto eth0 iface eth0 inet static address 179.43.46.233 netmask 255.255.255.255 broadcast 178.33.40.233 gateway 92.281.86.254 It takes several minutes for mycontainer to start (lxc-start -n mycontainer). I tried replacing gateway 92.281.86.254 by : post-up route add 92.281.86.254 dev eth0 post-up route add default gw 92.281.86.254 post-down route del 92.281.86.254 dev eth0 post-down route del default gw 92.281.86.254 My container then starts instantly. But whatever configuration I set in /var/lib/lxc/mycontainer/rootfs/etc/network/interfaces, I cannot ping from mycontainer to any IP (including the host's) : ubuntu@mycontainer:~$ ping 92.281.86.226 PING 92.281.86.226 (92.281.86.226) 56(84) bytes of data. ^C --- 92.281.86.226 ping statistics --- 6 packets transmitted, 0 received, 100% packet loss, time 5031ms And my host cannot ping the container: root@host:~# ping 179.43.46.233 PING 179.43.46.233 (179.43.46.233) 56(84) bytes of data. ^C --- 179.43.46.233 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 4000ms My container's ifconfig: ubuntu@mycontainer:~$ ifconfig eth0 Link encap:Ethernet HWaddr 02:00:00:86:5b:11 inet addr:179.43.46.233 Bcast:255.255.255.255 Mask:0.0.0.0 inet6 addr: fe80::ff:fe79:5a31/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:64 errors:0 dropped:6 overruns:0 frame:0 TX packets:54 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4070 (4.0 KB) TX bytes:4168 (4.1 KB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:32 errors:0 dropped:0 overruns:0 frame:0 TX packets:32 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2496 (2.4 KB) TX bytes:2496 (2.4 KB) My host's ifconfig: root@host:~# ifconfig br0 Link encap:Ethernet HWaddr 4c:72:b9:43:65:2b inet addr:92.281.86.226 Bcast:91.121.67.255 Mask:255.255.255.0 inet6 addr: fe80::4e72:b9ff:fe43:652b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1453 errors:0 dropped:18 overruns:0 frame:0 TX packets:1630 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:145125 (145.1 KB) TX bytes:299943 (299.9 KB) eth0 Link encap:Ethernet HWaddr 4c:72:b9:43:65:2b UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3178 errors:0 dropped:0 overruns:0 frame:0 TX packets:1637 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:298263 (298.2 KB) TX bytes:309167 (309.1 KB) Interrupt:20 Memory:fe500000-fe520000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:300 (300.0 B) TX bytes:300 (300.0 B) vethtest Link encap:Ethernet HWaddr fe:0d:7f:3e:70:88 inet6 addr: fe80::fc0d:7fff:fe3e:7088/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:54 errors:0 dropped:0 overruns:0 frame:0 TX packets:67 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4168 (4.1 KB) TX bytes:4250 (4.2 KB) virbr0 Link encap:Ethernet HWaddr de:49:c5:66:cf:84 inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) I have disabled lxcbr0 (USE_LXC_BRIDGE="false" in /etc/default/lxc). root@host:~# brctl show bridge name bridge id STP enabled interfaces br0 8000.4c72b943652b no eth0 vethtest I have configured the IP 179.43.46.233 to point to 02:00:00:86:5b:11 in my hosting provider (OVH) config panel. (The IPs in this post are not the real ones.) Thanks for reading this long question! :-) Vianney

Read the article
My D-Link's Ethernet bridge downlink just got 10-30x slower?

- by Jay Levitt

TL;DR: I unplugged my network to move my desk, and now downloading via my DIR-655's Ethernet LAN bridge is 10-30x slower than the Ethernet switch it's plugged into. Background My network is SMC cable modem <-> Cisco firewall <-> Netgear switch <-> D-Link WiFi† | | | | SMC8014 ASA-5505 GS608v2 gigE DIR-655 rev A3 gigE †The DIR-655 is used as an access point, not a router (although what D-Link calls an access point, I'd call a bridge). The "WAN" port is unused; the Netgear connects to the built-in 4-port Ethernet LAN switch, inside the built- in router/firewall. Endpoints: MacBook Pro 17" mid-2010 iPhone 4S Fedora 12 Linux server running reasonably fast dual-Athlon X2, VelociRaptors, etc. All cables are <10 feet, mostly CAT-5e, some CAT-6, all premade. All WiFi endpoints are within three feet of the D-Link. Yesterday I unplugged and rearranged stuff, and now connecting via the D-Link - even through the wired switch, right next to the incoming network cable - is 30x slower than connecting directly to the Netgear switch, on both my MacBook and iPhone. How I'm measuring "slower" I'm mostly using http://speedtest.net, which of course only really measures broadband speeds. I've also installed http://www.speedtest.net/mini.php on my local server, but can't test the iPhone with that. Results Speedtest.net, closest server over Comcast business-class: CONFIG | PING (ms) | DOWN (Mbps) | UP (Mbps) Mac <-> Ethernet <-> Netgear | 9 | 31.6 | 6.8 Mac <-> Ethernet <-> D-Link | 8 | 4.1 | 6.0 Mac <-> WiFi <-> D-Link | 9 | 1.4 | 2.9 iPhone <-> WiFi <-> D-Link | 67 | 0.4 | 1.6 Speedtest Mini on Linux PC: CONFIG | DOWN (Mbps) | UP (Mbps) Mac <-> Ethernet <-> NetGear | 97.2 | 76.9 Mac <-> Ethernet <-> D-Link | 8.2 | 24.2 Mac <-> WiFi <-> D-Link | 1.0 | 8.6 Slow typing in SSH: Mac <-> Ethernet <-> Netgear <-> Linux PC: smooth Mac <-> Ethernet <-> D-Link <-> Linux PC: choppy Note that D-Link upload speeds are normal on broadband, slower locally (but I'd believe that's a D-Link limitation), and always faster than the downloads! Since ssh is choppy just with slow typing, I don't believe it's a throttling-type problem either; that's not a lot of bandwidth. What I've tried Swapping all "good" and "bad" cables Re-plugging "bad" cable from D-Link to Netgear and watching it be the "good" cable pulling cables away from power lines Verify that the Mac auto-detects the D-Link as gigE Try to verify the link speed of the D-Link <- Netgear connection, but the firmware doesn't report that Verify that the D-Link sees no TX/RX errors or collisions Use different Ethernet ports on both Netgear and D-Link Reset the D-Link to factory settings Upgrade the D-Link firmware from 1.21 to 1.35NA, 2010/11/12, the latest Reboot everything at least once On the Mac, disable Wi-Fi during the Ethernet tests, and unplug Ethernet during the Wi-Fi tests Using iStumbler, verify that the D-Link isn't picking overloaded Wi-Fi channels (usually just 1-5 neighbors on my and adjacent channels, average for my apt building) Verify that the only client connected to the Wi-Fi was the iPhone Verify that nothing was being chatty on my network according to the WISH log Enable and disable all sorts of D-Link settings, including forcing WAN auto-detect to gigE So. I don't mind buying a new access point—I wouldn't mind having a dual-link network—but as a guy who's been networking since gated v4 was a drastic rewrite, and who often used physical sniffers in the days before Wireshark, I'm baffled. I hate being baffled. What could I possibly have changed that would result in this? How can I measure it? All I can think of is a static zap—thick carpet, socks, HVAC—but I didn't feel one, and does that really happen anymore? Can I test if it's Ethernet vs. TCP layer slowness? I'm not familiar with modern network utilities; it's hard to Google without hitting "Q: Why is my network slow? A: Is your microwave on?" If I don't get an answer here, will someone big and powerful help me migrate it to serverfault without getting screamed back here? In the words of Inigo Montoya, "I must know." Don't get all Dread Pirate Roberts on me.

Read the article
Windows 7: Search indexing is stuck

- by Ricket

When I open Indexing Options, it says: 4,317 items indexed Indexing in progress. Search results might not be complete during this time. It's stuck at 4,317 though; no more items have been indexed. Worst of all, SearchIndexer.exe is taking up 100% CPU (well, 50%, but I have a dual core CPU; it's taking up all processing power it can). It is not causing hard drive activity though. I tried clicking "Troubleshoot search and indexing" at the bottom of the Indexing Options window, but it couldn't find any problem. I've also tried the repair registry key that several websites suggest; I change HKLM\SOFTWARE\Microsoft\Windows Search SetupCompletedSuccessfully to 0 and restarted the computer, and it apparently repaired because it flipped back to 1, but the same problem continues to occur. It's reducing the battery life of my laptop and making it really hot so that my fans are running all the time. I've had to disable the Windows Search service. How can I fix this? Do I need to just flat-out reformat my computer? Update: I've tried rebuilding a couple times. There's nothing unusual about the locations I have to index, and I don't have any downloads in progress or anything like that. I don't see any reason why it stopped, and I noticed it much too late to do a system restore. At this point, I'm hoping someone will offer up some secret answer that will fix the problem, thus the bounty. Another update: I tried starting the service again, just to let it try yet again. It seemed okay at first (Indexing Options showed it operating at reduced speed due to user activity, and the number of files was going up). A while later I checked, and the service had stopped. Event viewer revealed some errors like this: Log Name: Application Source: Application Error Date: 2/1/2010 7:34:23 PM Event ID: 1000 Task Category: (100) Level: Error Keywords: Classic User: N/A Computer: ricky-win7 Description: Faulting application name: SearchIndexer.exe, version: 7.0.7600.16385, time stamp: 0x4a5bcdd0 Faulting module name: NLSData0007.dll, version: 6.1.7600.16385, time stamp: 0x4a5bda88 Exception code: 0xc0000005 Fault offset: 0x002141ba Faulting process id: 0x13a0 Faulting application start time: 0x01caa39f2a70ec02 Faulting application path: C:\Windows\system32\SearchIndexer.exe Faulting module path: C:\Windows\System32\NLSData0007.dll Report Id: b4f7a7ae-0f92-11df-87fc-e5d65d8794c2 Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Application Error" /> <EventID Qualifiers="0">1000</EventID> <Level>2</Level> <Task>100</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2010-02-02T00:34:23.000000000Z" /> <EventRecordID>10689</EventRecordID> <Channel>Application</Channel> <Computer>ricky-win7</Computer> <Security /> </System> <EventData> <Data>SearchIndexer.exe</Data> <Data>7.0.7600.16385</Data> <Data>4a5bcdd0</Data> <Data>NLSData0007.dll</Data> <Data>6.1.7600.16385</Data> <Data>4a5bda88</Data> <Data>c0000005</Data> <Data>002141ba</Data> <Data>13a0</Data> <Data>01caa39f2a70ec02</Data> <Data>C:\Windows\system32\SearchIndexer.exe</Data> <Data>C:\Windows\System32\NLSData0007.dll</Data> <Data>b4f7a7ae-0f92-11df-87fc-e5d65d8794c2</Data> </EventData> </Event> If you are having the same error and arrived here from a Google search, please comment or add an answer detailing your progress on this, if any...

Read the article
Need help identiying a nasty rootkit in Windows

- by goofrider

I have a nasty rootkit that not tools seem to be able to idenity. I know for sure it's a rootkit, but I can figure out which rootkit it is. Here's what I gathered so far: It creates multiple copies of itself in %HOME%\Local Settings\Temp with names like Q.EXE, IAJARZ.exe, etc., and install them as hidden services. These EXE have SysInternals identifiers in them so they're definitely rootkits. It hooked very deep in the system, including file read/write, security policies, registry read/write, and possibly WinSock/TCP/IP. When going to Sophos.com to download their software, the rootkit inject something called Microsoft Ajax Tootkit into the page, which injects code into the email submission form in order to redirect it. (EDIT: I might have panicked. Looks like Sophos does use an AJAZ email form, their form is just broken on Chrome so it looked like a mail form injection attack, the link is http://www.sophos.com/en-us/products/free-tools/virus-removal-tool/download.aspx ) Super-Antispyware found a lot of spyware cookies, in the name of .kaspersky.2o7.net, etc. (just chedk 2o7.net, looks like it's a legit ad company) I tried comparing DNS lookup from the infected systems and from system in other physical locations, no DNS redirections it seems. I used dd to copy the MBR and compared it with the MBR provided by ms-sys package, no differences so it's not infecting MBR. No antivirus or rootkit scanner be able to identify it. Most of them can't even find it. I tried scanning, in-situ (normal mode), in safe mode, and boot to linux live CD. Scanners used: Avast, Sophos anti rootkit, Kasersky TDSSKiller, GMER, RootkitRevealer, and many others. Kaspersky reported some unsigned system files that ought to be signed (e.g. tcpip.sys), and reported a number of MD5 mismatches. But otherwise couldn't identify anything based on signature. When running Sysinternal RootkitRevealer and Sophos AntiRootkit, CPU usage goes up to 100% and gets stucked. The Rootkit is blocking them. When trying running/installing HiJackThis, RootkitRevealer and some other scanners, it tells me system security policy prevent running/installing it. The list of malicious acitivities go on and on. here's a sample of logs from all my scans. In particular, aswSnx.SYS, apnenfno.sys and PROCMON20.SYS has a huge number of hooks. It's hard to tell if the rootkit replaced legit program files like aswSnx.SYS (from Avast) and PROCMON20.SYS (from Sysinternal Process Monitor). I can't find whether apnenfno.sys is from a legit program. Help to identify it is appreciated. Trend Micro RootkitBuster ------ [HIDDEN_REGISTRY][Hidden Reg Value]: KeyPath : HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\sptd\Cfg Root : 586bfc0 SubKey : Cfg ValueName : g0 Data : 38 23 E8 D0 BF F2 2D 6F ... ValueType : 3 AccessType: 0 FullLength: 61 DataSize : 32 [HOOKED_SERVICE_API]: Service API : ZwCreateMutant Image Path : C:\WINDOWS\System32\Drivers\aswSnx.SYS OriginalHandler : 0x8061758e CurrentHandler : 0xaa66cce8 ServiceNumber : 0x2b ModuleName : aswSnx.SYS SDTType : 0x0 [HOOKED_SERVICE_API]: Service API : ZwCreateThread Image Path : c:\windows\system32\drivers\apnenfno.sys OriginalHandler : 0x805d1038 CurrentHandler : 0xaa5f118c ServiceNumber : 0x35 ModuleName : apnenfno.sys SDTType : 0x0 [HOOKED_SERVICE_API]: Service API : ZwDeleteKey Image Path : C:\WINDOWS\system32\Drivers\PROCMON20.SYS OriginalHandler : 0x80624472 CurrentHandler : 0xa709b0f8 ServiceNumber : 0x3f ModuleName : PROCMON20.SYS SDTType : 0x0 HiJackThis ------ O23 - Service: JWAHQAGZ - Sysinternals - www.sysinternals.com - C:\DOCUME~1\jeff\LOCALS~1\Temp\JWAHQAGZ.exe O23 - Service: LHIJ - Sysinternals - www.sysinternals.com - C:\DOCUME~1\jeff\LOCALS~1\Temp\LHIJ.exe Kaspersky TDSSKiller ------ 21:05:58.0375 3936 C:\WINDOWS\system32\ati2sgag.exe - copied to quarantine 21:05:59.0217 3936 ATI Smart ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:05:59.0342 3936 C:\WINDOWS\system32\BUFADPT.SYS - copied to quarantine 21:05:59.0856 3936 BUFADPT ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:05:59.0965 3936 C:\Program Files\CrashPlan\CrashPlanService.exe - copied to quarantine 21:06:00.0152 3936 CrashPlanService ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:00.0246 3936 C:\WINDOWS\system32\epmntdrv.sys - copied to quarantine 21:06:00.0433 3936 epmntdrv ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:00.0464 3936 C:\WINDOWS\system32\EuGdiDrv.sys - copied to quarantine 21:06:00.0526 3936 EuGdiDrv ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:00.0604 3936 C:\Program Files\Common Files\Macrovision Shared\FLEXnet Publisher\FNPLicensingService.exe - copied to quarantine 21:06:01.0181 3936 FLEXnet Licensing Service ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:01.0321 3936 C:\Program Files\AddinForUNCFAT\UNCFATDMS.exe - copied to quarantine 21:06:01.0430 3936 OTFSDMS ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:01.0492 3936 C:\WINDOWS\system32\DRIVERS\tcpip.sys - copied to quarantine 21:06:01.0539 3936 Tcpip ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:01.0601 3936 C:\DOCUME~1\jeff\LOCALS~1\Temp\TULPUWOX.exe - copied to quarantine 21:06:01.0664 3936 HKLM\SYSTEM\ControlSet003\services\TULPUWOX - will be deleted on reboot 21:06:01.0664 3936 C:\DOCUME~1\jeff\LOCALS~1\Temp\TULPUWOX.exe - will be deleted on reboot 21:06:01.0664 3936 TULPUWOX ( UnsignedFile.Multi.Generic ) - User select action: Delete 21:06:01.0757 3936 C:\WINDOWS\system32\Drivers\usbaapl.sys - copied to quarantine 21:06:01.0866 3936 USBAAPL ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:01.0913 3936 C:\Program Files\VMware\VMware Player\vmware-authd.exe - copied to quarantine 21:06:02.0443 3936 VMAuthdService ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:02.0443 3936 vmount2 ( UnsignedFile.Multi.Generic ) - skipped by user 21:06:02.0443 3936 vmount2 ( UnsignedFile.Multi.Generic ) - User select action: Skip 21:06:02.0459 3936 vstor2 ( UnsignedFile.Multi.Generic ) - skipped by user 21:06:02.0459 3936 vstor2 ( UnsignedFile.Multi.Generic ) - User select action: Skip

Read the article
Windows 8, NVIDIA graphics recognition fails

- by Roy Grubb

I just installed Windows 8 Pro OEM 64-bit (clean install) and it won't properly recognize my graphics adapter. When I installed Win8, it automatically installed the BasicDisplay.sys driver dated 6/21/2006. 6.2.9200.16384 (win8_rtm.120725-1247). Hardware - Mobo:MSi G41M-P33 Combo CPU:Intel CoreDuo 6600 Graphics:NVIDIA GeForce 9400GT *OS* - Windows 8 Pro 64-bit OEM The graphics adapter worked fine in Windows XP. The PC is a generic box, bought locally and its mobo failed recently, so I replaced it with the G41M. Microsoft wouldn't let me re-activate Windows XP with a different mobo, so I installed Win8, which appears to work except as described next. Win8 only partially recognizes the graphics adapter and won't allow NVIDIA latest driver installer to see that it's an NVIDIA card. As a result, OpenGL doesn't work, and this is needed by the software I most use. Other than that the graphics look OK. When I say 'partially recognizes', I mean that via the Control Panel, I can see that the adapter is described as NVIDIA, but the driver remains stuck at Microsoft Basic Display Adapter no matter what I try, including "Update driver..." in adapter properties. Display Screen Resolution Advanced Settings Adapter shows: Adapter Type: Microsoft Basic Display Adapter Chip Type: NVIDIA DAC Type: NVIDIA Corporation Bios Information: G27 Board - p381n17 Don't know what this means ... no mention of 9400GT Total Available Graphics Memory: 256 MB Dedicated Video Memory: 0 MB In fact the adapter has 512MB on-board video memory. System Video Memory: 0 MB Shared System Memory: 256 MB And Control Panel Device Manager Display adapters just shows Microsoft Basic Display Adapter. No other graphics adapter, and no unknown device or yellow question mark. What I have tried so far: 1. Cleared CMOS and reset. Updated BIOS and all mobo drivers as follows: 1st I used Driver Reviver to see if any driver updates were required. It found some but I didn't use that to get the drivers. Then I switched to MSi's own mobo driver utility Live Update 5. This also showed the board needed to update several so I used it to fetch the new drivers. After that it showed that everything was up to date and I checked with Driver Reviver again, which also reported no drivers now needed updating. Rebooted. Went to the NVIDIA site to get the latest graphics adapter driver. Their auto-detect "Option 2: Automatically find drivers for my NVIDIA products" said "The NVIDIA Smart Scan was unable to evaluate your system hardware. Please use Option 1 to manually find drivers for your NVIDIA products." So I downloaded 310.70-desktop-win8-win7-winvista-64bit-international-whql.exe, which lists 9400 GT under supported products, but when I run it, it says: "NVIDIA Installer cannot continue This graphics driver could not find compatible graphics hardware." Connected the display to the on-board Intel graphics (G41 Intel Express), removed the NVIDIA card and rebooted, changed to internal graphics in CMOS. Again it installs the MS Basic Display Adapter, and can't properly run my s/w that needs OpenGL. It runs on other machines with Intel Express graphics (WinXP and 7) Shut down and pulled out the power cord. Held start button to discharge all capacitors. Removed and re-inserted NVIDIA adapter in PCI-E slot and made sure properly seated. Connected the monitor to the card, screwed plug to socket. Reconnected power cord. Started and checked in BIOS that Primary Graphics Adapter was set to PCI-E. Started Windows. Uninstalled MS Basic Display Adapter in Device Manager. Screen blanks briefly, reappears. No Graphics adapter entry was then visible in Device Manager. Restarted PC. MS Basic Display Adapter Visible again in Device Manager. Clicked in Device Manager View Show hidden devices. No other graphics adapter appears, no unknown devices. Rebooted. Tried Scan for Hardware changes. None detected. Tried right-click on MS Basic Display Adapter Properties Driver Update Driver... Search automatically. It replied that it had determined driver was up to date. I checked that there were no graphic driver-related entries in Programs and Features that I could delete (none). Searched for any other drivers with nvidia in their name and deleted them, just keeping the 306.97 installer exe file. Did a Windows Update. Ran GPU-Z which shows (main items): Microsoft Basic Display Adapter GPU G72 BIOS 5.72.22.76.88 Device ID 10DE - 01D5 DDR2 Bus Width 32 Bit Memory size 64MB Driver Version nvlddmkm 6.2.9200.16384 (ForceWare 0.00) / Win8 64 NVIDIA SLI Unknown in the drop-down at the foot, "Microsoft Basic Display Adapter" is the only option If I swap hard disks in that machine to one with a Ubuntu 10.4 installation (originally installed on the same PC), lspci shows "VGA compatible controller as NVIDIA Corporation Device 01d5 (rev a1) (prog-if 00 [VGA controller])" and "kernel driver in use: nvidia" I'm out of ideas for new things to try and would be really grateful of suggestions. Thanks!

Read the article
CSC folder data access AND roaming profiles issues (Vista with Server 2003, then 2008)

- by Alex Jones

I'm a junior sysadmin for an IT contractor that helps small, local government agencies, like little towns and the like. One of our clients, a public library with ~ 50 staff users, was recently migrated from Server 2003 Standard to Server 2008 R2 Standard in a very short timeframe; our senior employee, the only network engineer, had suddenly put in his two weeks notice, so management pushed him to do this project before quitting. A bit hasty on management's part? Perhaps. Could we do anything about that? Nope. Do I have to fix this all by myself? Pretty much. The network is set up like this: a) 50ish staff workstations, all running Vista Business SP2. All staff use MS Outlook, which uses RPC-over-HTTPS ("Outlook Anywhere") for cached Exchange access to an offsite location. b) One new (virtualized) Server 2008 R2 Standard instance, running atop a Server 2008 R2 host via Hyper-V. The VM is the domain's DC, and also the site's one and only file server. Let's call that VM "NEWBOX". c) One old physical Server 2003 Standard server, running the same roles. Let's call it "OLDBOX". It's still on the network and accessible, but it's been demoted, and its shares have been disabled. No data has been deleted. c) Gigabit Ethernet everywhere. The organization's only has one domain, and it did not change during the migration. d) Most users were set up for a combo of redirected folders + offline files, but some older employees who had been with the organization a long time are still on roaming profiles. To sum up: the servers in question handle user accounts and files, nothing else (eg, no TS, no mail, no IIS, etc.) I have two major problems I'm hoping you can help me with: 1) Even though all domain users have had their redirected folders moved to the new server, and loggin in to their workstations and testing confirms that the Documents/Music/Whatever folders point to the new paths, it appears some users (not laptops or anything either!) had been working offline from OLDBOX for a long time, and nobody realized it. Here's the ugly implication: a bunch of their data now lives only in their CSC folders, because they can't access the share on OLDBOX and sync with it finally. How do I get this data out of those CSC folders, and onto NEWBOX? 2) What's the best way to migrate roaming profile users to non-roaming ones, without losing vital data like documents, any lingering PSTs, etc? Things I've thought about trying: For problem 1: a) Reenable the documents share on OLDBOX, force an Offline Files sync for ALL domain users, then copy OLDBOX's share's data to the equivalent share on NEWBOX. Reinitialize the Offline Files cache for every user. With this: How do I safely force a domain-wide Offline Files sync? Could I lose data by reenabling the share on OLDBOX and forcing the sync? Afterwards, how can I reinitialize the Offline Files cache for every user, without doing it manually, workstation by workstation? b) Determine which users have unsynced changes to OLDBOX (again, how?), search each user's CSC folder domain-wide via workstation admin shares, and grab the unsynched data. Reinitialize the Offline Files cache for every user. With this: How can I detect which users have unsynched changes with a script? How can I search each user's CSC folder, when the ownership and permissions set for CSC folders are so restrictive? Again, afterwards, how can I reinitialize the Offline Files cache for every user, without doing it manually, workstation by workstation? c) Manually visit each workstation, copy the contents of the CSC folder, and manually copy that data onto NEWBOX. Reinitialize the Offline Files cache for every user. With this: Again, how do I 'break into' the CSC folder and get to its data? As an experiment, I took one workstation's HD offsite, imaged it for safety, and then tried the following with one of our shop PCs, after attaching the drive: grant myself full control of the folder (failed), grant myself ownership of the folder (failed), run chkdsk on the whole drive to make sure nothing's messed up (all OK), try to take full control of the entire drive (failed), try to take ownership of the entire drive (failed) MS KB articles and Googling around suggests there's a utility called CSCCMD that's meant for this exact scenario...but it looks like it's available for XP, not Vista, no? Again, afterwards, how can I reinitialize the Offline Files cache for every user, without doing it manually, workstation by workstation? For problem 2: a) Figure out which users are on roaming profiles, and where their profiles 'live' on the server. Create new folders for them in the redirected folders repository, migrate existing data, and disable the roaming. With this: Finding out who's roaming isn't hard. But what's the best way to disable the roaming itself? In AD Users and Computers, or on each user's workstation? Doing it centrally on the server seems more efficient; that said, all of the KB research I've done turns up articles on how to go from local to roaming, not the other way around, so I don't have good documentation on this. In closing: we have good backups of NEWBOX and OLDBOX, but not of the workstations themselves, so anything drastic on the client side would need imaging and testing for safety. Thanks for reading along this far! Hopefully you can help me dig us out of this mess.

Read the article

Search Results

Search found 17013 results on 681 pages for 'hard coding'.

Page 663/681 | < Previous Page | 659 660 661 662 663 664 665 666 667 668 669 670 | Next Page >

- by ABrown

- by Geoff Dalgas

- by René Kåbis

- by SethG

- by Callum D

- by Byron Sommardahl

- by Motin

- by Philip Durbin

- by Dmitri DB

- by Steve Madsen

- by Jon

- by Rich Lafferty

- by Scott Duckworth

- by Philip Durbin

- by Fitzroy

- by Rick Koshi

- by schiggn

- by lazerpld

- by Joel Coel

- by Vianney Stroebel

- by Jay Levitt

- by Ricket

- by goofrider

- by Roy Grubb

- by Alex Jones

< Previous Page | 659 660 661 662 663 664 665 666 667 668 669 670 | Next Page >