Search Results

Search found 10719 results on 429 pages for 'hardware failure'.

Page 421/429 | < Previous Page | 417 418 419 420 421 422 423 424 425 426 427 428 | Next Page >

Error 2013: Lost connection to MySQL server during query when executing CHECK TABLE FOR UPGRADE

- by Dean Richardson

I just upgraded Ubuntu from 11.10 to 12.04. My rails app now returns the (passenger) error "Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111) (Mysql2::Error)". I get a similar error when I try to access mysql at the command line on my Ubuntu server using mysql -u root -p. I have mysql-server 5.5 installed. I've checked and mysql is not running. When I try to restart it, it fails. Here are some key lines from the tail of /var/log/syslog after an attempted restart: dean@dgwjasonfried:/etc/mysql$ tail -f /var/log/syslog Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: Running 'mysqlcheck' with connection arguments: '--port=3306' '--socket=/var/run/mysqld/mysqld.sock' '--host=localhost' '--socket=/var/run/mysqld/mysqld.sock' '--host=localhost' '--socket=/var/run/mysqld/mysqld.sock' Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: Running 'mysqlcheck' with connection arguments: '--port=3306' '--socket=/var/run/mysqld/mysqld.sock' '--host=localhost' '--socket=/var/run/mysqld/mysqld.sock' '--host=localhost' '--socket=/var/run/mysqld/mysqld.sock' Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: /usr/bin/mysqlcheck: Got error: 2013: Lost connection to MySQL server during query when executing 'CHECK TABLE ... FOR UPGRADE' Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: FATAL ERROR: Upgrade failed Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: molex_app_development.assets OK Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5107]: molex_app_development.ecd_types OK Mar 7 08:55:27 dgwjasonfried /etc/mysql/debian-start[5124]: Checking for insecure root accounts. Mar 7 08:55:27 dgwjasonfried kernel: [ 7551.769657] init: mysql main process (5064) terminated with status 1 Mar 7 08:55:27 dgwjasonfried kernel: [ 7551.769697] init: mysql respawning too fast, stopped Here is most of /etc/mysql/my.cnf: Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock Here is entries for some specific programs The following values assume you have at least 32M ram This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] Basic Settings user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp lc-messages-dir = /usr/share/mysql skip-external-locking Instead of skip-networking the default is now to listen only on localhost which is more compatible and is not less secure. bind-address = 127.0.0.1 And here are permissions for var/run/mysqld/mysqld.sock: srwxrwxrwx 1 mysql mysql 0 Mar 7 09:18 mysqld.sock I'd be grateful for any suggestions the community might have. I reviewed the related questions here and attempted some of the fixes offered but to no avail. Thanks! Dean Richardson Update: Thanks to quanta's suggestion, I looked at the /var/log/mysql/error.log file. I found error messages relating to pointers, fatal signals, and more stuff that I really couldn't make much sense of. I also found mysql man page references, however. One suggested that I try starting mysqld with the --innodb_force_recovery=# option, then attempt to dump (or drop) the offending/corrupted database or table. I worked through the escalating option levels one-by-one (innodb_force_recovery=1, innodb_force_recovery=2, etc.) This allowed me to successfully run mysql -u root -p from the command line and execute several commands. I was able to run queries on my production database, but any attempt to query, dump, or even drop my development database raised an error and led to me losing the connection to mysql. So I've made progress, but until I'm somehow able to drop or repair my development db I'm still unable to get my app to load. Any further advice or suggestions? Thanks! Dean Update: Right after running sudo mysqld --innodb_force_recover=1 from the command line, the error.log contains this: Right after retrying sudo mysqld --innodb_force_recover=1, The error.log file shows this: 130308 4:55:39 [Note] Plugin 'FEDERATED' is disabled. 130308 4:55:39 InnoDB: The InnoDB memory heap is disabled 130308 4:55:39 InnoDB: Mutexes and rw_locks use GCC atomic builtins 130308 4:55:39 InnoDB: Compressed tables use zlib 1.2.3.4 130308 4:55:39 InnoDB: Initializing buffer pool, size = 128.0M 130308 4:55:39 InnoDB: Completed initialization of buffer pool 130308 4:55:39 InnoDB: highest supported file format is Barracuda. InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 130308 4:55:39 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 130308 4:55:40 InnoDB: Waiting for the background threads to start 130308 4:55:41 InnoDB: 1.1.8 started; log sequence number 10259220 130308 4:55:41 InnoDB: !!! innodb_force_recovery is set to 1 !!! 130308 4:55:41 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306 130308 4:55:41 [Note] - '127.0.0.1' resolves to '127.0.0.1'; 130308 4:55:41 [Note] Server socket created on IP: '127.0.0.1'. 130308 4:55:41 [Note] Event Scheduler: Loaded 0 events 130308 4:55:41 [Note] mysqld: ready for connections. Version: '5.5.29-0ubuntu0.12.04.2' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) Then after mysql -u root -p and mysql> drop database molex_app_development; ERROR 2013 (HY000): Lost connection to MySQL server during query mysql> the error.log contains: dean@dgwjasonfried:/var/log/mysql$ tail -f error.log /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6a3ff9ecbd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7f6a1c004bd8): is an invalid pointer Connection ID (thread ID): 1 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 130308 4:55:39 [Note] Plugin 'FEDERATED' is disabled. 130308 4:55:39 InnoDB: The InnoDB memory heap is disabled 130308 4:55:39 InnoDB: Mutexes and rw_locks use GCC atomic builtins 130308 4:55:39 InnoDB: Compressed tables use zlib 1.2.3.4 130308 4:55:39 InnoDB: Initializing buffer pool, size = 128.0M 130308 4:55:39 InnoDB: Completed initialization of buffer pool 130308 4:55:39 InnoDB: highest supported file format is Barracuda. InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 130308 4:55:39 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 130308 4:55:40 InnoDB: Waiting for the background threads to start 130308 4:55:41 InnoDB: 1.1.8 started; log sequence number 10259220 130308 4:55:41 InnoDB: !!! innodb_force_recovery is set to 1 !!! 130308 4:55:41 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306 130308 4:55:41 [Note] - '127.0.0.1' resolves to '127.0.0.1'; 130308 4:55:41 [Note] Server socket created on IP: '127.0.0.1'. 130308 4:55:41 [Note] Event Scheduler: Loaded 0 events 130308 4:55:41 [Note] mysqld: ready for connections. Version: '5.5.29-0ubuntu0.12.04.2' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) 130308 4:58:23 [ERROR] Incorrect definition of table mysql.proc: expected column 'comment' at position 15 to have type text, found type char(64). 130308 4:58:23 InnoDB: Assertion failure in thread 140168992810752 in file fsp0fsp.c line 3639 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 10:58:23 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=1 max_threads=151 thread_count=1 connection_count=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 346681 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f7ba4f6c2f0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f7ba3065e60 thread_stack 0x30000 mysqld(my_print_stacktrace+0x29)[0x7f7ba3609039] mysqld(handle_fatal_signal+0x483)[0x7f7ba34cf9c3] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f7ba2220cb0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f7ba188c425] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f7ba188fb8b] mysqld(+0x65e0fc)[0x7f7ba37160fc] mysqld(+0x602be6)[0x7f7ba36babe6] mysqld(+0x635006)[0x7f7ba36ed006] mysqld(+0x5d7072)[0x7f7ba368f072] mysqld(+0x5d7b9c)[0x7f7ba368fb9c] mysqld(+0x6a3348)[0x7f7ba375b348] mysqld(+0x6a3887)[0x7f7ba375b887] mysqld(+0x5c6a86)[0x7f7ba367ea86] mysqld(+0x5ae3a7)[0x7f7ba36663a7] mysqld(_Z15ha_delete_tableP3THDP10handlertonPKcS4_S4_b+0x16d)[0x7f7ba34d3ffd] mysqld(_Z23mysql_rm_table_no_locksP3THDP10TABLE_LISTbbbb+0x568)[0x7f7ba3417f78] mysqld(_Z11mysql_rm_dbP3THDPcbb+0x8aa)[0x7f7ba339780a] mysqld(_Z21mysql_execute_commandP3THD+0x394c)[0x7f7ba33b886c] mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x10f)[0x7f7ba33bb28f] mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1380)[0x7f7ba33bc6e0] mysqld(_Z24do_handle_one_connectionP3THD+0x1bd)[0x7f7ba346119d] mysqld(handle_one_connection+0x50)[0x7f7ba3461200] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f7ba2218e9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f7ba1949cbd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7f7b7c004b60): is an invalid pointer Connection ID (thread ID): 1 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. --Dean

Read the article
What does it mean when ARP shows <incomplete> on eth1

- by Geoff Dalgas

We have been using HAProxy along with heartbeat from the Linux-HA project. We are using two linux instances to provide a failover. Each server has with their own public IP and a single IP which is shared between the two using a virtual interface (eth1:1) at IP: 69.59.196.211 The virtual interface (eth1:1) IP 69.59.196.211 is configured as the gateway for the windows servers behind them and we use ip_forwarding to route traffic. We are experiencing an occasional network outage on one of our windows servers behind our linux gateways. HAProxy will detect the server is offline which we can verify by remoting to the failed server and attempting to ping the gateway: Pinging 69.59.196.211 with 32 bytes of data: Reply from 69.59.196.220: Destination host unreachable. Running arp -a on this failed server shows that there is no entry for the gateway address (69.59.196.211): Interface: 69.59.196.220 --- 0xa Internet Address Physical Address Type 69.59.196.161 00-26-88-63-c7-80 dynamic 69.59.196.210 00-15-5d-0a-3e-0e dynamic 69.59.196.212 00-21-5e-4d-45-c9 dynamic 69.59.196.213 00-15-5d-00-b2-0d dynamic 69.59.196.215 00-21-5e-4d-61-1a dynamic 69.59.196.217 00-21-5e-4d-2c-e8 dynamic 69.59.196.219 00-21-5e-4d-38-e5 dynamic 69.59.196.221 00-15-5d-00-b2-0d dynamic 69.59.196.222 00-15-5d-0a-3e-09 dynamic 69.59.196.223 ff-ff-ff-ff-ff-ff static 224.0.0.22 01-00-5e-00-00-16 static 224.0.0.252 01-00-5e-00-00-fc static 225.0.0.1 01-00-5e-00-00-01 static On our linux gateway instances arp -a shows: peak-colo-196-220.peak.org (69.59.196.220) at <incomplete> on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-222.peak.org (69.59.196.222) at 00:15:5d:0a:3e:09 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 Why would arp occasionally set the entry for this failed server as <incomplete>? Should we be defining our arp entries statically? I've always left arp alone since it works 99% of the time, but in this one instance it appears to be failing. Are there any additional troubleshooting steps we can take help resolve this issue? THINGS WE HAVE TRIED I added a static arp entry for testing on one of the linux gateways which still didn't help. root@haproxy2:~# arp -a peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-221.peak.org (69.59.196.221) at 00:15:5d:00:b2:0d [ether] on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 peak-colo-196-220.peak.org (69.59.196.220) at 00:21:5e:4d:30:8d [ether] PERM on eth1 root@haproxy2:~# arp -i eth1 -s 69.59.196.220 00:21:5e:4d:30:8d root@haproxy2:~# ping 69.59.196.220 PING 69.59.196.220 (69.59.196.220) 56(84) bytes of data. --- 69.59.196.220 ping statistics --- 7 packets transmitted, 0 received, 100% packet loss, time 6006ms Rebooting the windows web server solves this issue temporarily with no other changes to the network but our experience shows this issue will come back. Swapping network cards and switches I noticed the link light on the port of the switch for the failed windows server was running at 100Mb instead of 1Gb on the failed interface. I moved the cable to several other open ports and the link indicated 100Mb for each port that I tried. I also swapped the cable with the same result. I tried changing the properties of the network card in windows and the server locked up and required a hard reset after clicking apply. This windows server has two physical network interfaces so I have swapped the cables and network settings on the two interfaces to see if the problem follows the interface. If the public interface goes down again we will know that it is not an issue with the network card. (We also tried another switch we have on hand, no change) Changing network hardware driver versions We've had the same problem with the latest Broadcom driver, as well as the built-in driver that ships in Windows Server 2008 R2. Replacing network cables As a last ditch effort we remembered another change that occurred was the replacement of all of the patch cords between our servers / switch. We had purchased two sets, one green of lengths 1ft - 3ft for the private interfaces and another set of red cables for the public interfaces. We swapped out all of the public interface patch cables with a different brand and ran our servers without issue for a full week ... aaaaaand then the problem recurred. Disable checksum offload, remove TProxy We also tried disabling TCP/IP checksum offload in the driver, no change. We're now pulling out TProxy and moving to a more traditional x-forwarded-for network arrangement without any fancy IP address rewriting. We'll see if that helps.

Read the article
Linux software raid fails to include one device for one RAID1 array

- by user1389890

One of my four Linux software raid arrays drops one of its two devices when I reboot my system. The other three arrays work fine. I am running RAID1 on kernel version 2.6.32-5-amd64. Every time I reboot, /dev/md2 comes up with only one device. I can manually add the device by saying $ sudo mdadm /dev/md2 --add /dev/sdc1. This works fine, and mdadm confirms that the device has been re-added as follows: mdadm: re-added /dev/sdc1 After adding the device and and allowing the array time to resynch, this is what the output of $ cat /proc/mdstat looks like: Personalities : [raid1] md3 : active raid1 sda4[0] sdb4[1] 244186840 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdc1[0] sdd1[1] 732574464 blocks [2/2] [UU] md1 : active raid1 sda3[0] sdb3[1] 722804416 blocks [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 6835520 blocks [2/2] [UU] unused devices: <none> Then after I reboot, this is what the output of $ cat /proc/mdstat looks like: Personalities : [raid1] md3 : active raid1 sda4[0] sdb4[1] 244186840 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdd1[1] 732574464 blocks [2/1] [_U] md1 : active raid1 sda3[0] sdb3[1] 722804416 blocks [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 6835520 blocks [2/2] [UU] unused devices: <none> During reboot, here is the output of $ sudo cat /var/log/syslog | grep mdadm : Jun 22 19:00:08 rook mdadm[1709]: RebuildFinished event detected on md device /dev/md2 Jun 22 19:00:08 rook mdadm[1709]: SpareActive event detected on md device /dev/md2, component device /dev/sdc1 Jun 22 19:00:20 rook kernel: [ 7819.446412] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446415] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446782] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446785] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.515844] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.515847] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.606829] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.606832] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855616] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855620] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855950] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855952] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8027.962169] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8027.962171] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8028.054365] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8028.054368] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.588662] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.588664] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.601990] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.601991] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.602693] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.602695] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.605981] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.605983] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.606138] mdadm: sending ioctl 800c0910 to a partition! Jun 22 19:10:23 rook kernel: [ 9.606139] mdadm: sending ioctl 800c0910 to a partition! Jun 22 19:10:48 rook mdadm[1737]: DegradedArray event detected on md device /dev/md2 Here is the mdadm.conf file: ARRAY /dev/md0 metadata=0.90 UUID=92121d42:37f46b82:926983e9:7d8aad9b ARRAY /dev/md1 metadata=0.90 UUID=9c1bafc3:1762d51d:c1ae3c29:66348110 ARRAY /dev/md2 metadata=0.90 UUID=98cea6ca:25b5f305:49e8ec88:e84bc7f0 ARRAY /dev/md3 metadata=1.2 name=rook:3 UUID=ca3fce37:95d49a09:badd0ddc:b63a4792 I also ran $ sudo smartctl -t long /dev/sdc and no hardware issues were detected. As long as I do not reboot, /dev/md2 seems to work fine. Does anyone have any suggestions? Here is the output of $ sudo mdadm -E /dev/sdc1 after re-adding the device and letting it resync: /dev/sdc1: Magic : a92b4efc Version : 0.90.00 UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook) Creation Time : Sun Jul 13 08:05:55 2008 Raid Level : raid1 Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Array Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Update Time : Mon Jun 24 07:42:49 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 5fd6cc13 - correct Events : 180998 Number Major Minor RaidDevice State this 0 8 33 0 active sync /dev/sdc1 0 0 8 33 0 active sync /dev/sdc1 1 1 8 49 1 active sync /dev/sdd1 Here is the output of $ sudo mdadm -D /dev/md2 after re-adding the device and letting it resync: /dev/md2: Version : 0.90 Creation Time : Sun Jul 13 08:05:55 2008 Raid Level : raid1 Array Size : 732574464 (698.64 GiB 750.16 GB) Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Mon Jun 24 07:42:49 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook) Events : 0.180998 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 49 1 active sync /dev/sdd1

Read the article
Upon reboot, Linux software raid fails to include one device of a RAID1 array

- by user1389890

One of my four Linux software raid arrays drops one of its two devices when I reboot my system. The other three arrays work fine. I am running RAID1 on kernel version 2.6.32-5-amd64 (Debian Squeeze). Every time I reboot, /dev/md2 comes up with only one device. I can manually add the device by saying $ sudo mdadm /dev/md2 --add /dev/sdc1. This works fine, and mdadm confirms that the device has been re-added as follows: mdadm: re-added /dev/sdc1 After adding the device and allowing the array time to resynch, this is what the output of $ cat /proc/mdstat looks like: Personalities : [raid1] md3 : active raid1 sda4[0] sdb4[1] 244186840 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdc1[0] sdd1[1] 732574464 blocks [2/2] [UU] md1 : active raid1 sda3[0] sdb3[1] 722804416 blocks [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 6835520 blocks [2/2] [UU] unused devices: <none> Then after I reboot, this is what the output of $ cat /proc/mdstat looks like: Personalities : [raid1] md3 : active raid1 sda4[0] sdb4[1] 244186840 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdd1[1] 732574464 blocks [2/1] [_U] md1 : active raid1 sda3[0] sdb3[1] 722804416 blocks [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 6835520 blocks [2/2] [UU] unused devices: <none> During reboot, here is the output of $ sudo cat /var/log/syslog | grep mdadm : Jun 22 19:00:08 rook mdadm[1709]: RebuildFinished event detected on md device /dev/md2 Jun 22 19:00:08 rook mdadm[1709]: SpareActive event detected on md device /dev/md2, component device /dev/sdc1 Jun 22 19:00:20 rook kernel: [ 7819.446412] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446415] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446782] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.446785] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.515844] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.515847] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.606829] mdadm: sending ioctl 1261 to a partition! Jun 22 19:00:20 rook kernel: [ 7819.606832] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855616] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855620] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855950] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:48 rook kernel: [ 8027.855952] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8027.962169] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8027.962171] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8028.054365] mdadm: sending ioctl 1261 to a partition! Jun 22 19:03:49 rook kernel: [ 8028.054368] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.588662] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.588664] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.601990] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.601991] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.602693] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.602695] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.605981] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.605983] mdadm: sending ioctl 1261 to a partition! Jun 22 19:10:23 rook kernel: [ 9.606138] mdadm: sending ioctl 800c0910 to a partition! Jun 22 19:10:23 rook kernel: [ 9.606139] mdadm: sending ioctl 800c0910 to a partition! Jun 22 19:10:48 rook mdadm[1737]: DegradedArray event detected on md device /dev/md2 Here is the result of $ cat /etc/mdadm/mdadm.conf: ARRAY /dev/md0 metadata=0.90 UUID=92121d42:37f46b82:926983e9:7d8aad9b ARRAY /dev/md1 metadata=0.90 UUID=9c1bafc3:1762d51d:c1ae3c29:66348110 ARRAY /dev/md2 metadata=0.90 UUID=98cea6ca:25b5f305:49e8ec88:e84bc7f0 ARRAY /dev/md3 metadata=1.2 name=rook:3 UUID=ca3fce37:95d49a09:badd0ddc:b63a4792 Here is the output of $ sudo mdadm -E /dev/sdc1 after re-adding the device and letting it resync: /dev/sdc1: Magic : a92b4efc Version : 0.90.00 UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook) Creation Time : Sun Jul 13 08:05:55 2008 Raid Level : raid1 Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Array Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Update Time : Mon Jun 24 07:42:49 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 5fd6cc13 - correct Events : 180998 Number Major Minor RaidDevice State this 0 8 33 0 active sync /dev/sdc1 0 0 8 33 0 active sync /dev/sdc1 1 1 8 49 1 active sync /dev/sdd1 Here is the output of $ sudo mdadm -D /dev/md2 after re-adding the device and letting it resync: /dev/md2: Version : 0.90 Creation Time : Sun Jul 13 08:05:55 2008 Raid Level : raid1 Array Size : 732574464 (698.64 GiB 750.16 GB) Used Dev Size : 732574464 (698.64 GiB 750.16 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Mon Jun 24 07:42:49 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook) Events : 0.180998 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 49 1 active sync /dev/sdd1 I also ran $ sudo smartctl -t long /dev/sdc and no hardware issues were detected. As long as I do not reboot, /dev/md2 seems to work fine. Does anyone have any suggestions?

Read the article
Windows 7 Machine Makes Router Drop -All- Wireless Connections

- by Hammer Bro.

Some background: My home network consists of my Desktop, a two-month old Windows 7 (x64) machine which is online most frequently (N-spec), as well as three other Windows XP laptops (all G) that only connect every now and then (one for work, one for Netflix, and the other for infrequent regular laptop uses). I used to have a Belkin F5D8236-4 wireless router, and everything worked great. A week ago, however, I found out that the Belkin absolutely in no way would establish a VPN connection, something that has become important for work. So I bought a Netgear WNR3500v2/U/L. The wireless was acting a little sketchy at first for just the Windows 7 machine, but I thought it had something to do with 802.11N and I was in a hurry so I just fished up an ethernet cable and disabled the computer's wireless. It has now become apparent, though, that whenever the Windows 7 machine is connected to the router, all wireless connections become unstable. I was using my work laptop for a solid six hours today with no trouble, having multiple SSH connections open over VPN and streaming internet radio in the background. Then, within two minutes of turning on this Windows 7 box, I had lost all connectivity over the wireless. And I was two feet away from the router. The same sort of thing happens on all of the other laptops -- Netflix can be playing stuff all weekend, but if I come up here and do things on this (W7) computer, the streaming will be dead within ten minutes. So here are my basic observations: If the Windows 7 machine is off, then all connections will have a Signal Strength of Very Good or Excellent and a Speed of 48-54 Mbps for an indefinite amount of time. Shortly after the Windows 7 machine is turned on, all wireless connections will experience a consistent decline in Speed down to 1.0 Mbps, eventually losing their connection entirely. These machines will continue to maintain 70% signal strength, as observed by themselves and router. Once dropped, a wireless connection will have difficulty reconnecting. And, if a connection manages to become established, it will quickly drop off again. The Windows 7 machine itself will continue to function just fine if it's using a wired connection, although it will experience these same issues over the wireless. All of the drivers and firmwares are up to date, and this happened both with the stock Netgear firmware as well as the (current) DD-WRT. What I've tried: Making sure each computer is being assigned a distinct IP. (They are.) Disabling UPnP and Stateful Packet Inspection on the router. Disabling Network Sharing, SSDP Discovery, TCP/IP NetBios Helper and Computer Browser services on the Windows 7 machine. Disabling QoS Packet Scheduler, IPv6, and Link Layer Topology Discovery options on my ethernet controller (leaving only Client for Microsoft Networks, File and Printer Sharing, and IPv4 enabled). What I think: It seems awfully similar to the problems discussed in detail at http://social.msdn.microsoft.com/Forums/en/wsk/thread/1064e397-9d9b-4ae2-bc8e-c8798e591915 (which was both the most relevant and concrete information I could dig up on the internet). I still think that something the Windows 7 IP stack (or just Operating System itself) is doing is giving the router fits. However, I could be wrong, because I have two key differences. One is that most instances of this problem are reported as the entire router dying or restarting, and mine still works just fine over the wired connection. The other is that it's a new router, tested with both the factory firmware and the (I assume) well-maintained DD-WRT project. Even if Windows 7 is still secretly sending IPv6 packets or the TCP Window Scaling implementation that I hear Vista caused some trouble with (even though I've tried my best to disable anything fancy), this router should support those functions. I don't want to get a new or a replacement router unless someone can convince me that this is a defective unit. But the problem seems too specific and predictable by my instincts to be a hardware hiccup. And I don't want to deal with the inevitable problems that always seem to take half a day to resolve when getting a new router, since I'm frantically working (including tomorrow) to complete a project by next week's deadline. Plus, I think in the worst case scenario, I could keep this router connected directly to the modem, disable its wireless entirely, and connect the old Belkin to it directly. That should allow me to still use VPN (although I'll have to plug my work laptop directly into that router), and then maintain wireless connections for all of the other computers. But that feels so wrong to me. Anyone have any ideas what the cause and possible solution could be? Clarifications: The Windows 7 machine is directly connected via an ethernet cable to the router for everything above. But while it is online, all other computers' wireless connections become unusable. It is not an issue of signal strength or interference -- no other devices within scanning range are using Channel 1, and the problem will affect computers that are literally feet away from the router with 95% signal strength.

Read the article
Hyper-V File Server Clustering - at my wit’s end

- by René Kåbis

I am at my wit’s end with File Server clustering under Hyper-V. I am hoping that someone might be able to help me figure out this Gordian Knot of a technology that seems to have dead ends (like forcing cluster VMs to use iSCSI drives where normally-attached VHDX drives could suffice) where logic and reason would normally provide a logical solution. My hardware: I will be running three servers (in the end), but right now everything is taking place on one server. One of the secondary servers will exist purely as a witness/quorum, and another slightly more powerful one will be acting as an emergency backup (with additional storage, just not redundant) to hold the secondary AD VM and the other halves of a set of clustered VMs: the SQL VM and the file system VM. Please note, these each are the depreciated nodes of a cluster, the main nodes will be on the most powerful first machine. My heavy lifter is a machine that also contains all of the truly redundant storage on the network. If this gives anyone the heebie-geebies, too bad. It has a 6TB (usable) RAID-10 array, and will (in the end) hold the primary nodes of both aforementioned clusters, but is right now holding all VMs. This is, right now: DC01, DC02, SQL01, SQL02, FS01 & FS02. Eventually, I will be adding additional VMs to handle Exchange, Sharepoint and Lync, but only to this main server (the secondary server won't be able to handle more than three or four VMs, so why burden it? The AD, SQL & FS VMs are the most critical for the business). If anyone is now saying, “wait, what about a SAN or a NAS for the file servers?”, well too bad. What exists on the main machine is what I have to deal with. I followed these instructions, but I seem to be unable to get things to work. In order to make the file server truly redundant, I cannot trust any one machine to hold the only data store on the network. Therefore, I have created a set of iSCSI drives on the VM-host of the main machine, and attached one to each file server VM. The end result is that I want my FS01 to sit on the heavy lifter, along with its iSCSI “drive”, and FS02 will sit on the secondary machine with its own iSCSI “drive” there as well. That is, neither iSCSI drive will end up sitting on the same machine as the other. As such, the clustered FS will utterly duplicate the contents of the iSCSI drives between each other, so that if one physical machine (or the FS VM) goes toes-up, the other has got a full copy of the data on its own iSCSI drive. My problem occurs when I try to apply the file server role within the failover cluster manager. Actually, it is even before that -- it occurs when adding the disks. Since I have added each disk preferentially to a specific VM (by limiting the initiator by DNS hostname, and by adding two-way CHAP authentication), this forces each VM to be in control of its own iSCSI disk. However, when I try to add the disks to the Disks section of Storage within Failover Cluster Manager, the entire process fails for a random disk of the pair. That is, one will get online, but the other will remain offline because it does not have the correct “owner node”. I mean, really -- WTF? Of course it doesn’t have the right owner node, both drives are showing the same node name!! I cannot seem to have one drive show up with one node name as owner, and the other drive show up with the other node name as owner. And because both drives are not “online”, I cannot create a pool to apply to a cluster role. Talk about getting stuck between a rock and a hard place! I’ve got more to add, but my work is closing for the day and I have to wrap things up. I will try to add more tomorrow morning when I get in. My main objective is to have a file server VM on each machine, the storage on each machine, but a transparent failover in case one physical machine fails. Essentially, a failover FS that doesn’t care which machine fails -- the storage contents are replicated equally on each machine. Am I even heading in the right direction?

Read the article
LDAP query on linux against AD returns groups with no members

- by SethG

I am using LDAP+kerberos to authenticate against Active Directory on Windows 2003 R2. My krb5.conf and ldap.conf appear to be correct (according to pretty much every sample I found on the 'net). I can login to the host with both password and ssh keys. When I run getent passwd, all my ldap user accounts are listed with all the important attributes. When I run getent group, all the ldap groups and their gid's are listed, but no group members. If I run ldapsearch and filter on any group, the members are all listed with the "member" attribute. So the data is there for the taking, it's just not being parsed properly. It would appear that I simply am using an incorrect mapping in ldap.conf, but I can't see it. I've tried several variations and all give the same result. Here is my current ldap.conf: host <ad-host1-ip> <ad-host2-ip> base dc=my,dc=full,dc=dn uri ldap://<ad-host1> ldap://<ad-host2> ldap_version 3 binddn <mybinddn> bindpw <mybindpw> scope sub bind_policy hard nss_reconnect_tries 3 nss_reconnect_sleeptime 1 nss_reconnect_maxsleeptime 8 nss_reconnect_maxconntries 3 nss_map_objectclass posixAccount User nss_map_objectclass posixGroup Group nss_map_attribute uid sAMAccountName nss_map_attribute gidNumber msSFU30GidNumber nss_map_attribute uidNumber msSFU30UidNumber nss_map_attribute cn cn nss_map_attribute gecos displayName nss_map_attribute homeDirectory msSFU30HomeDirectory nss_map_attribute loginShell msSFU30LoginShell nss_map_attribute uniqueMember member pam_filter objectcategory=User pam_login_attribute sAMAccountName pam_member_attribute member pam_password ad Here's the kicker: this config works 100% fine on a different linux box with a different distro. It does not work on the distro I am planning on switching to. I have installed from source the versions of pam_ldap and nss_ldap on the new box to match the old box, which fixed another problem I was having with this setup. Other relevant info is the original AD box was Windows 2003. It's mirror died a horrible hardware death so I'm trying to add two more 2003-R2 servers to the mirror tree and ultimately drop the old 2003 box. The new R2 boxes appear to have joined the DC forest properly. What do I need to do to get groups working? I've exhausted all the resources I could find and need a different angle. Any input is appreciated. Status update, 7/31/09 I have managed to tweak my config file to get full info from the AD and performance is nice and snappy. I replaced the back-rev'd copies of pam_ldap and nss_ldap with the current ones for the distro I'm using, so it's back to a standard out-of-the-box install. Here's my current config: host <ad-host1-ip> <ad-host2-ip> base dc=my,dc=full,dc=dn uri ldap://<ad-host1> ldap://<ad-host2> ldap_version 3 binddn <mybinddn> bindpw <mybindpw> scope sub bind_policy soft nss_reconnect_tries 3 nss_reconnect_sleeptime 1 nss_reconnect_maxsleeptime 8 nss_reconnect_maxconntries 3 nss_connect_policy oneshot referrals no nss_map_objectclass posixAccount User nss_map_objectclass posixGroup Group nss_map_attribute uid sAMAccountName nss_map_attribute gidNumber msSFU30GidNumber nss_map_attribute uidNumber msSFU30UidNumber nss_map_attribute cn cn nss_map_attribute gecos displayName nss_map_attribute homeDirectory msSFU30HomeDirectory nss_map_attribute loginShell msSFU30LoginShell nss_map_attribute uniqueMember member pam_filter objectcategory=CN=Person,CN=Schema,CN=Configuration,DC=w2k,DC=cis,DC=ksu,DC=edu pam_login_attribute sAMAccountName pam_member_attribute member pam_password ad ssl off tls_checkpeer no sasl_secprops maxssf=0 The remaining problem now is when you run the groups command, not all subscribed groups are listed. Some are (one or two), but not all. Group memberships are still honored, such as file and printer access. getent group foo still shows that the user is a member of group foo. So it appears to be a presentation bug, and does not interfere with normal operation. It also appears that some (I have not determined exactly how many) group searches do not resolve correctly, even though the group is listed. eg, when you run "getent group bar", nothing is returned, but if you run "getent group|grep bar" or "getent group|grep <bar_gid>" you can see that it indeed listed and your group name and gid are correct. This still seems like an LDAP search or mapping error, but I can't figure out what it is. I'm a heckuva lot closer than earlier in the week, but I'd really like to get this last detail ironed out.

Read the article
Can't Remove Logical Drive/Array from HP P400

- by Myles

This is my first post here. Thank you in advance for any assistance with this matter. I'm trying to remove a logical drive (logical drive 2) and an array (array "B") from my Smart Array P400. The host is a DL580 G5 running 64-bit Red Hat Enterprise Linux Server release 5.7 (Tikanga). I am unable to remove the array using either hpacucli or cpqacuxe. I believe it is because of "OS Status: LOCKED". The file system that lives on this array has been unmounted. I do not want to reboot the host. Is there some way to "release" this logical drive so I can remove the array? Note that I do not need to preserve the data on logical drive 2. I intend to physically remove the drives from the machine and replace them with larger drives. I'm using the cciss kernel module that ships with Red Hat 5.7. Here is some information pertaining to the host and the P400 configuration: [root@gort ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.7 (Tikanga) [root@gort ~]# uname -a Linux gort 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux [root@gort ~]# rpm -qa | egrep '^(hp|cpq)' cpqacuxe-9.30-15.0 hp-health-9.25-1551.7.rhel5 hpsmh-7.1.2-3 hpdiags-9.3.0-466 hponcfg-3.1.0-0 hp-snmp-agents-9.25-2384.8.rhel5 hpacucli-9.30-15.0 [root@gort ~]# hpacucli HP Array Configuration Utility CLI 9.30.15.0 Detecting Controllers...Done. Type "help" for a list of supported commands. Type "exit" to close the console. => ctrl all show config detail Smart Array P400 in Slot 0 (Embedded) Bus Interface: PCI Slot: 0 Cache Serial Number: PA82C0J9SVW34U RAID 6 (ADG) Status: Enabled Controller Status: OK Hardware Revision: D Firmware Version: 7.22 Rebuild Priority: Medium Expand Priority: Medium Surface Scan Delay: 15 secs Surface Scan Mode: Idle Wait for Cache Room: Disabled Surface Analysis Inconsistency Notification: Disabled Post Prompt Timeout: 0 secs Cache Board Present: True Cache Status: OK Cache Ratio: 25% Read / 75% Write Drive Write Cache: Disabled Total Cache Size: 256 MB Total Cache Memory Available: 208 MB No-Battery Write Cache: Disabled Cache Backup Power Source: Batteries Battery/Capacitor Count: 1 Battery/Capacitor Status: OK SATA NCQ Supported: True Logical Drive: 1 Size: 136.7 GB Fault Tolerance: RAID 1 Heads: 255 Sectors Per Track: 32 Cylinders: 35132 Strip Size: 128 KB Full Stripe Size: 128 KB Status: OK Caching: Enabled Unique Identifier: 600508B100184A395356573334550002 Disk Name: /dev/cciss/c0d0 Mount Points: /boot 101 MB, /tmp 7.8 GB, /usr 3.9 GB, /usr/local 2.0 GB, /var 3.9 GB, / 2.0 GB, /local 113.2 GB OS Status: LOCKED Logical Drive Label: A0027AA78DEE Mirror Group 0: physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK) Mirror Group 1: physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK) Drive Type: Data Array: A Interface Type: SAS Unused Space: 0 MB Status: OK Array Type: Data physicaldrive 1I:1:1 Port: 1I Box: 1 Bay: 1 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM57RF40000983878FX Model: HP DG146BB976 Current Temperature (C): 29 Maximum Temperature (C): 35 PHY Count: 2 PHY Transfer Rate: Unknown, Unknown physicaldrive 1I:1:2 Port: 1I Box: 1 Bay: 2 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM55VQC000098388524 Model: HP DG146BB976 Current Temperature (C): 29 Maximum Temperature (C): 36 PHY Count: 2 PHY Transfer Rate: Unknown, Unknown Logical Drive: 2 Size: 546.8 GB Fault Tolerance: RAID 5 Heads: 255 Sectors Per Track: 32 Cylinders: 65535 Strip Size: 64 KB Full Stripe Size: 256 KB Status: OK Caching: Enabled Parity Initialization Status: Initialization Completed Unique Identifier: 600508B100184A395356573334550003 Disk Name: /dev/cciss/c0d1 Mount Points: None OS Status: LOCKED Logical Drive Label: A5C9C6F81504 Drive Type: Data Array: B Interface Type: SAS Unused Space: 0 MB Status: OK Array Type: Data physicaldrive 1I:1:3 Port: 1I Box: 1 Bay: 3 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM2H5PE00009802NK19 Model: HP DG146ABAB4 Current Temperature (C): 30 Maximum Temperature (C): 37 PHY Count: 1 PHY Transfer Rate: Unknown physicaldrive 1I:1:4 Port: 1I Box: 1 Bay: 4 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM28YY400009750MKPJ Model: HP DG146ABAB4 Current Temperature (C): 31 Maximum Temperature (C): 36 PHY Count: 1 PHY Transfer Rate: 3.0Gbps physicaldrive 2I:1:5 Port: 2I Box: 1 Bay: 5 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM2FGYV00009802N3GN Model: HP DG146ABAB4 Current Temperature (C): 30 Maximum Temperature (C): 38 PHY Count: 1 PHY Transfer Rate: Unknown physicaldrive 2I:1:6 Port: 2I Box: 1 Bay: 6 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM8AFAK00009920MMV1 Model: HP DG146BB976 Current Temperature (C): 31 Maximum Temperature (C): 41 PHY Count: 2 PHY Transfer Rate: Unknown, Unknown physicaldrive 2I:1:7 Port: 2I Box: 1 Bay: 7 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 146 GB Rotational Speed: 10000 Firmware Revision: HPDE Serial Number: 3NM2FJQD00009801MSHQ Model: HP DG146ABAB4 Current Temperature (C): 29 Maximum Temperature (C): 39 PHY Count: 1 PHY Transfer Rate: Unknown

Read the article
Can connect to Samba server but cannot access shares?

- by jlego

I have setup a stand-alone box running Fedora 16 to use as a file-sharing and web development server. Needs to be able to share files with a PC running Windows 7 and a Mac running OSX Snow Leopard. I've setup Samba using the Samba configuration GUI tool. Added users to Fedora and connected them as Samba users (which are the same as the Windows and Mac usernames and passwords). The workgroup name is the same as the Windows workgroup. Authentication is set to User. I've allowed Samba and Samba client through the firewall and set the ethernet to a trusted port in the firewall. Both the Windows and Mac machines can connect to the server and view the shares, however when trying to access the shares, Windows throws error 0x80070035 " Windows cannot access \SERVERNAME\ShareName." Windows user is not prompted for a username or password when accessing the server (found under "Network Places"). This also happens when connecting with the IP rather than the server name. The Mac can also connect to the server and see the shares but when choosing a share gives the error "The original item for ShareName cannot be found." When connecting via IP, the Mac user is prompted for username and password, which when authenticated gives a list of shares, however when choosing a share to connect to, the error is displayed and the user cannot access the share. Since both machines are acting similarly when trying to access the shares, I assume it is an issue with how Samba is configured. smb.conf: [global] workgroup = workgroup server string = Server log file = /var/log/samba/log.%m max log size = 50 security = user load printers = yes cups options = raw printcap name = lpstat printing = cups [homes] comment = Home Directories browseable = no writable = yes [printers] comment = All Printers path = /var/spool/samba browseable = yes printable = yes [FileServ] comment = FileShare path = /media/FileServ read only = no browseable = yes valid users = user1, user2 [webdev] comment = Web development path = /var/www/html/webdev read only = no browseable = yes valid users = user1 How do I get samba sharing working? UPDATE: Before this box I had another box with the same version of fedora installed (16) and samba working for these same computers. I started up the old machine and copied the smb.conf file from the old machine to the new one (editing the share definitions for the new shares of course) and I still get the same errors on both client machines. The only difference in environment is the hardware and the router. On the old machine the router received a dynamic public IP and assigned dynamic private IPs to each device on the network while the new machine is connected to a router that has a static public IP (still dynamic internal IPs though.) Could either one of these be affecting Samba? UPDATE 2: As the directory I am trying to share is actually an entire internal disk, I have tried to things: 1.) changing the owner of the mounted disk from root to my user (which is the same username as on the Windows machine) 2.) made a share that only included one of the folders on the disk instead of the entire disk with my user again as the owner. Both tests failed giving me the same errors regarding the network address. UPDATE 3: Not sure exactly what I did, but now whenever I try to connect to the share on the Windows 7 client I am prompted for my username and password. When I enter the correct credentials I get an access denied message. However I did notice that under the login box "domain: WINDOWS-PC-NAME" is listed. I believe this could very well be the problem. Any suggestions? UPDATE 4: So I've completely reinstalled Fedora and Samba now. I've created a share on the first harddrive (one fedora is installed on) and I can access that fine from Windows. However when I try to share any data on the second disk, I am receiving the same error. This I believe is the problem. I think I need to change some things in fstab or fdisk or something. UPDATE 5: So in fstab I mapped the drive to automount in a folder which works correctly. I also added the samba_share_t SElinux label to the mountpoint directory which now allows me to access the shares on the Windows machine, however I cannot see any of the files in the directory on the windows machine. (They are there, I can see them in the fedora file browser locally) UPDATE 6: Figured it out. See answer below

Read the article
Windows 7 Pro sysprep not working

- by Callum D

Hello, I'm trying to sysprep a Windows 7 Professional machine, prior to grabbing an image for mass deployment on identical hardware, and am having a hard time getting sysprep to work (at all). I've created an XML answer file with WSIM, and have a basic setupcomplete.cmd file, but none of the configurations in the answer file seem to be applied. I've read technet articles and googled, and I still have no idea why this is happening. Is someone able to have a look at the answer file I've attached and let me know where I'm going wrong? thanks, Callum AutoUnattend.XML <?xml version="1.0" encoding="utf-8"?> <unattend xmlns="urn:schemas-microsoft-com:unattend"> <settings pass="specialize"> <component name="Microsoft-Windows-Shell-Setup" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <AutoLogon> <Password> <Value>**********************************</Value> <PlainText>false</PlainText> </Password> <Username>administrator</Username> <LogonCount>1</LogonCount> <Enabled>true</Enabled> </AutoLogon> <WindowsFeatures> <ShowMediaCenter>false</ShowMediaCenter> <ShowWindowsMediaPlayer>false</ShowWindowsMediaPlayer> </WindowsFeatures> <CopyProfile>true</CopyProfile> <DoNotCleanTaskBar>true</DoNotCleanTaskBar> <RegisteredOrganization>SomeCompany (UK) Ltd.</RegisteredOrganization> <RegisteredOwner>SomeCompany User</RegisteredOwner> <ShowWindowsLive>false</ShowWindowsLive> <TimeZone>GMT Standard Time</TimeZone> </component> <component name="Security-Malware-Windows-Defender" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <DisableAntiSpyware>true</DisableAntiSpyware> </component> </settings> <settings pass="oobeSystem"> <component name="Microsoft-Windows-International-Core" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <SystemLocale>en-UK</SystemLocale> <UserLocale>en-UK</UserLocale> <UILanguage>en-US</UILanguage> <InputLocale>0809:00000809</InputLocale> </component> <component name="Microsoft-Windows-Shell-Setup" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <OOBE> <HideEULAPage>true</HideEULAPage> <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE> <NetworkLocation>Work</NetworkLocation> <ProtectYourPC>1</ProtectYourPC> </OOBE> <UserAccounts> <AdministratorPassword> <Value>*************************************************=</Value> <PlainText>false</PlainText> </AdministratorPassword> </UserAccounts> </component> <component name="Microsoft-Windows-Deployment" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Reseal> <Mode>OOBE</Mode> </Reseal> </component> </settings> <settings pass="generalize"> <component name="Microsoft-Windows-Security-SPP" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <SkipRearm>0</SkipRearm> </component> </settings> <settings pass="windowsPE"> <component name="Microsoft-Windows-Setup" processorArchitecture="x86" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <UseConfigurationSet>true</UseConfigurationSet> </component> </settings> <cpi:offlineImage cpi:source="wim:c:/wim/install.wim#Windows 7 PROFESSIONAL" xmlns:cpi="urn:schemas-microsoft-com:cpi" /> </unattend>

Read the article
Inbound SIP calls through Cisco 881 NAT hang up after a few seconds

- by MasterRoot24

I've recently moved to a Cisco 881 router for my WAN link. I was previously using a Cisco Linksys WAG320N as my modem/router/WiFi AP/NAT firewall. The WAG320N is now running in bridged mode, so it's simply acting as a modem with one of it's LAN ports connected to FE4 WAN on my Cisco 881. The Cisco 881 get's a DHCP provided IP from my ISP. My LAN is part of default Vlan 1 (192.168.1.0/24). General internet connectivity is working great, I've managed to setup static NAT rules for my HTTP/HTTPS/SMTP/etc. services which are running on my LAN. I don't know whether it's worth mentioning that I've opted to use NVI NAT (ip nat enable as opposed to the traditional ip nat outside/ip nat inside) setup. My reason for this is that NVI allows NAT loopback from my LAN to the WAN IP and back in to the necessary server on the LAN. I run an Asterisk 1.8 PBX on my LAN, which connects to a SIP provider on the internet. Both inbound and outbound calls through the old setup (WAG320N providing routing/NAT) worked fine. However, since moving to the Cisco 881, inbound calls drop after around 10 seconds, whereas outbound calls work fine. The following message is logged on my Asterisk PBX: [Dec 9 15:27:45] WARNING[27734]: chan_sip.c:3641 retrans_pkt: Retransmission timeout reached on transmission [email protected] for seqno 1 (Critical Response) -- See https://wiki.asterisk.org/wiki/display/AST/SIP+Retransmissions Packet timed out after 6528ms with no response [Dec 9 15:27:45] WARNING[27734]: chan_sip.c:3670 retrans_pkt: Hanging up call [email protected] - no reply to our critical packet (see https://wiki.asterisk.org/wiki/display/AST/SIP+Retransmissions). (I know that this is quite a common issue - I've spend the best part of 2 days solid on this, trawling Google.) I've done as I am told and checked https://wiki.asterisk.org/wiki/display/AST/SIP+Retransmissions. Referring to the section "Other SIP requests" in the page linked above, I believe that the hangup to be caused by the ACK from my SIP provider not being passed back through NAT to Asterisk on my PBX. I tried to ascertain this by dumping the packets on my WAN interface on the 881. I managed to obtain a PCAP dump of packets in/out of my WAN interface. Here's an example of an ACK being reveived by the router from my provider: 689 21.219999 193.x.x.x 188.x.x.x SIP 502 Request: ACK sip:[email protected] | However a SIP trace on the Asterisk server show's that there are no ACK's received in response to the 200 OK from my PBX: http://pastebin.com/wwHpLPPz In the past, I have been strongly advised to disable any sort of SIP ALGs on routers and/or firewalls and the many posts regarding this issue on the internet seem to support this. However, I believe on Cisco IOS, the config command to disable SIP ALG is no ip nat service sip udp port 5060 however, this doesn't appear to help the situation. To confirm that config setting is set: Router1#show running-config | include sip no ip nat service sip udp port 5060 Another interesting twist: for a short period of time, I tried another provider. Luckily, my trial account with them is still available, so I reverted my Asterisk config back to the revision before I integrated with my current provider. I then dialled in to the DDI associated with the trial trunk and the call didn't get hung up and I didn't get the error above! To me, this points at the provider, however I know, like all providers do, will say "There's no issues with our SIP proxies - it's your firewall." I'm tempted to agree with this, as this issue was not apparent with the old WAG320N router when it was doing the NAT'ing. I'm sure you'll want to see my running-config too: ! ! Last configuration change at 15:55:07 UTC Sun Dec 9 2012 by xxx version 15.2 no service pad service tcp-keepalives-in service tcp-keepalives-out service timestamps debug datetime msec localtime show-timezone service timestamps log datetime msec localtime show-timezone no service password-encryption service sequence-numbers ! hostname Router1 ! boot-start-marker boot-end-marker ! ! security authentication failure rate 10 log security passwords min-length 6 logging buffered 4096 logging console critical enable secret 4 xxx ! aaa new-model ! ! aaa authentication login local_auth local ! ! ! ! ! aaa session-id common ! memory-size iomem 10 ! crypto pki trustpoint TP-self-signed-xxx enrollment selfsigned subject-name cn=IOS-Self-Signed-Certificate-xxx revocation-check none rsakeypair TP-self-signed-xxx ! ! crypto pki certificate chain TP-self-signed-xxx certificate self-signed 01 quit no ip source-route no ip gratuitous-arps ip auth-proxy max-login-attempts 5 ip admission max-login-attempts 5 ! ! ! ! ! no ip bootp server ip domain name dmz.merlin.local ip domain list dmz.merlin.local ip domain list merlin.local ip name-server x.x.x.x ip inspect audit-trail ip inspect udp idle-time 1800 ip inspect dns-timeout 7 ip inspect tcp idle-time 14400 ip inspect name autosec_inspect ftp timeout 3600 ip inspect name autosec_inspect http timeout 3600 ip inspect name autosec_inspect rcmd timeout 3600 ip inspect name autosec_inspect realaudio timeout 3600 ip inspect name autosec_inspect smtp timeout 3600 ip inspect name autosec_inspect tftp timeout 30 ip inspect name autosec_inspect udp timeout 15 ip inspect name autosec_inspect tcp timeout 3600 ip cef login block-for 3 attempts 3 within 3 no ipv6 cef ! ! multilink bundle-name authenticated license udi pid CISCO881-SEC-K9 sn ! ! username xxx privilege 15 secret 4 xxx username xxx secret 4 xxx ! ! ! ! ! ip ssh time-out 60 ! ! ! ! ! ! ! ! ! interface FastEthernet0 no ip address ! interface FastEthernet1 no ip address ! interface FastEthernet2 no ip address ! interface FastEthernet3 switchport access vlan 2 no ip address ! interface FastEthernet4 ip address dhcp no ip redirects no ip unreachables no ip proxy-arp ip nat enable duplex auto speed auto ! interface Vlan1 ip address 192.168.1.1 255.255.255.0 no ip redirects no ip unreachables no ip proxy-arp ip nat enable ! interface Vlan2 ip address 192.168.0.2 255.255.255.0 ! ip forward-protocol nd ip http server ip http access-class 1 ip http authentication local ip http secure-server ip http timeout-policy idle 60 life 86400 requests 10000 ! ! no ip nat service sip udp port 5060 ip nat source list 1 interface FastEthernet4 overload ip nat source static tcp x.x.x.x 80 interface FastEthernet4 80 ip nat source static tcp x.x.x.x 443 interface FastEthernet4 443 ip nat source static tcp x.x.x.x 25 interface FastEthernet4 25 ip nat source static tcp x.x.x.x 587 interface FastEthernet4 587 ip nat source static tcp x.x.x.x 143 interface FastEthernet4 143 ip nat source static tcp x.x.x.x 993 interface FastEthernet4 993 ip nat source static tcp x.x.x.x 1723 interface FastEthernet4 1723 ! ! logging trap debugging logging facility local2 access-list 1 permit 192.168.1.0 0.0.0.255 access-list 1 permit 192.168.0.0 0.0.0.255 no cdp run ! ! ! ! control-plane ! ! banner motd Authorized Access only ! line con 0 login authentication local_auth length 0 transport output all line aux 0 exec-timeout 15 0 login authentication local_auth transport output all line vty 0 1 access-class 1 in logging synchronous login authentication local_auth length 0 transport preferred none transport input telnet transport output all line vty 2 4 access-class 1 in login authentication local_auth length 0 transport input ssh transport output all ! ! end ...and, if it's of any use, here's my Asterisk SIP config: [general] context=default ; Default context for calls allowoverlap=no ; Disable overlap dialing support. (Default is yes) udpbindaddr=0.0.0.0 ; IP address to bind UDP listen socket to (0.0.0.0 binds to all) ; Optionally add a port number, 192.168.1.1:5062 (default is port 5060) tcpenable=no ; Enable server for incoming TCP connections (default is no) tcpbindaddr=0.0.0.0 ; IP address for TCP server to bind to (0.0.0.0 binds to all interfaces) ; Optionally add a port number, 192.168.1.1:5062 (default is port 5060) srvlookup=yes ; Enable DNS SRV lookups on outbound calls ; Note: Asterisk only uses the first host ; in SRV records ; Disabling DNS SRV lookups disables the ; ability to place SIP calls based on domain ; names to some other SIP users on the Internet ; Specifying a port in a SIP peer definition or ; when dialing outbound calls will supress SRV ; lookups for that peer or call. directmedia=no ; Don't allow direct RTP media between extensions (doesn't work through NAT) externhost=<MY DYNDNS HOSTNAME> ; Our external hostname to resolve to IP and be used in NAT'ed packets localnet=192.168.1.0/24 ; Define our local network so we know which packets need NAT'ing qualify=yes ; Qualify peers by default dtmfmode=rfc2833 ; Set the default DTMF mode disallow=all ; Disallow all codecs by default allow=ulaw ; Allow G.711 u-law allow=alaw ; Allow G.711 a-law ; ---------------------- ; SIP Trunk Registration ; ---------------------- ; Orbtalk register => <MY SIP PROVIDER USER NAME>:[email protected]/<MY DDI> ; Main Orbtalk number ; ---------- ; Trunks ; ---------- [orbtalk] ; Main Orbtalk trunk type=peer insecure=invite host=sipgw3.orbtalk.co.uk nat=yes username=<MY SIP PROVIDER USER NAME> defaultuser=<MY SIP PROVIDER USER NAME> fromuser=<MY SIP PROVIDER USER NAME> secret=xxx context=inbound I really don't know where to go with this. If anyone can help me find out why these calls are being dropped off, I'd be grateful if you could chime in! Please let me know if any further info is required.

Read the article
How do I make Linux recognize a new SATA /dev/sda drive I hot swapped in without rebooting?

- by Philip Durbin

Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized: [root@fs-2 ~]# tail -18 /var/log/messages May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake } May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset May 5 16:54:45 fs-2 kernel: ata1: soft resetting link May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:54:55 fs-2 kernel: ata1: soft resetting link May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:05 fs-2 kernel: ata1: soft resetting link May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps May 5 16:55:40 fs-2 kernel: ata1: soft resetting link May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up May 5 16:55:45 fs-2 kernel: ata1: EH complete I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work: [root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan -bash: echo: write error: Invalid argument [root@fs-2 ~]# [root@fs-2 ~]# /root/rescan-scsi-bus.sh -l [snip] 0 new device(s) found. 0 device(s) removed. [root@fs-2 ~]# [root@fs-2 ~]# ls /dev/sda ls: /dev/sda: No such file or directory I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting? The operating system in question is RHEL5.3: [root@fs-2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.3 (Tikanga) The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS. Here is the lscpi output: [root@fs-2 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful. The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output: [root@www-1 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)

Read the article
Trouble recovering MySQL InnoDB database after server crash

- by Andy Shinn

I had a server crash due to a broken iSCSI link (the filesystem went into read-only mode). After repairing the link and rebooting the machine (CentOS 5 / MySQL 5.1), the MySQL server would not start and gave the following error: 100603 19:11:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 100603 19:11:46 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... InnoDB: Warning: database page corruption or a failed InnoDB: file read of page 112541. InnoDB: Trying to recover it from the doublewrite buffer. InnoDB: Dump of the page: 100603 19:11:46 InnoDB: Page dump in ascii and hex (16384 bytes): lots of binary data 100603 19:11:46 InnoDB: Page checksum 953720272, prior-to-4.0.14-form checksum 2641912043 InnoDB: stored checksum 617821918, prior-to-4.0.14-form stored checksum 2080617765 InnoDB: Page lsn 115 2632899642, low 4 bytes of lsn at page end 2641594600 InnoDB: Page number (if stored to page already) 112541, InnoDB: space id (if created with = MySQL-4.1.1 and stored already) 0 InnoDB: Page may be an index page where index id is 0 616 InnoDB: Dump of corresponding page in doublewrite buffer: 100603 19:11:46 InnoDB: Page dump in ascii and hex (16384 bytes): more binary data 100603 19:11:46 InnoDB: Page checksum 908374788, prior-to-4.0.14-form checksum 824841363 InnoDB: stored checksum 912869634, prior-to-4.0.14-form stored checksum 2210927931 InnoDB: Page lsn 115 2635312169, low 4 bytes of lsn at page end 2633173354 InnoDB: Page number (if stored to page already) 112541, InnoDB: space id (if created with = MySQL-4.1.1 and stored already) 0 InnoDB: Page may be an index page where index id is 0 616 InnoDB: Also the page in the doublewrite buffer is corrupt. InnoDB: Cannot continue operation. InnoDB: You can try to recover the database with the my.cnf InnoDB: option: InnoDB: innodb_force_recovery=6 100603 19:11:46 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended Per the error message, I have tried setting set-variable=innodb_force_recovery=6 in the my.cnf to get to the data. This allows the MySQL server to start. But when I try to do a mysqldump of the database or a SELECT * INTO OUTFILE "filename" FROM broken_table; it seems to endlessly just export the same line over and over again. I have also tried http://code.google.com/p/innodb-tools/. But this tool fails with an error that 'blob' type is not supported. If I try to access the data using the PHP application it crashes MySQL: `100603 19:19:19 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8384512 read_buffer_size=131072 max_used_connections=2 max_threads=151 threads_connected=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338317 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x15d33f0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x453aff00 thread_stack 0x40000 /usr/libexec/mysqld(my_print_stacktrace+0x24) [0x874364] /usr/libexec/mysqld(handle_segfault+0x346) [0x5c9166] /lib64/libpthread.so.0 [0x3a6e40eb10] /usr/libexec/mysqld(rec_get_offsets_func+0x30) [0x7cc310] /usr/libexec/mysqld [0x7674d8] /usr/libexec/mysqld(btr_search_info_update_slow+0x638) [0x768d48] /usr/libexec/mysqld(btr_cur_search_to_nth_level+0xc7d) [0x75f86d] /usr/libexec/mysqld [0x7dd1c1] /usr/libexec/mysqld(row_search_for_mysql+0x18b0) [0x7e03d0] /usr/libexec/mysqld(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int)+0x7c) [0x7526fc] /usr/libexec/mysqld(handler::read_multi_range_next(st_key_multi_range**)+0x29) [0x6aed09] /usr/libexec/mysqld(QUICK_RANGE_SELECT::get_next()+0x194) [0x69a964] /usr/libexec/mysqld [0x6aafe9] /usr/libexec/mysqld(sub_select(JOIN*, st_join_table*, bool)+0x56) [0x635196] /usr/libexec/mysqld [0x63f9cd] /usr/libexec/mysqld(JOIN::exec()+0x950) [0x6497c0] /usr/libexec/mysqld(mysql_select(THD*, Item*, TABLE_LIST*, unsigned int, List&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsign ed long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x17b) [0x64b34b] /usr/libexec/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x169) [0x64bc79] /usr/libexec/mysqld [0x5d34b6] /usr/libexec/mysqld(mysql_execute_command(THD*)+0x4e5) [0x5d6b45] /usr/libexec/mysqld(mysql_parse(THD*, char const*, unsigned int, char const)+0x211) [0x5dc321] /usr/libexec/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x10b8) [0x5dd3f8] /usr/libexec/mysqld(do_command(THD*)+0xe6) [0x5dd9e6] /usr/libexec/mysqld(handle_one_connection+0x73d) [0x5d036d] /lib64/libpthread.so.0 [0x3a6e40673d] /lib64/libc.so.6(clone+0x6d) [0x3a6d4d3d1d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd-query at 0x15de5e0 = SELECT DISTINCT count(DISTINCT i.itemid) as rowscount,i.hostid FROM items i WHERE ((i.itemid BETWEEN 000000000000000 AND 0999999 99999999)) AND i.type<9 AND (i.hostid IN (10017,10047,10050,10054,10056,10059,10062,10063,10064,10065,10066,10067,10068,10069,10070,10071,10072,10073,100 74,10075,10076,10077,10078,10079,10080,10081,10082,10084,10088,10089,10090,10091,10092,10093,10094,10095,10096,10097,10098,10099,10100,10101,10102,10103,10 104,10105,10106,10107,10108,10109)) GROUP BY i.hostid thd-thread_id=3 thd-killed=NOT_KILLED The manual page at contains information that should help you find out what is causing the crash. 100603 19:19:19 mysqld_safe Number of processes running now: 0 100603 19:19:19 mysqld_safe mysqld restarted` Before recovering form an older backup as a last resort I am looking for anymore suggestions. Thanks!

Read the article
Spotlight Infinite Indexing issue (external data drive)

- by Manca Weeks

This is an external drive, formerly a boot drive which is now in use only to access music files (sibelius, audio, midi, live, logic etc.) without transferring the data into a new boot system, partly because of the issue I am about to describe, but mostly because the majority of the data is mainly there for archival purposes. The user is a composer and prominent musician and needs to be able to rehash the data at will. I have tried several things - here is a list: - make complete filesystem clone with antonio diaz's ddrescue - run Disk Warrior on copy, repair whatever errors occurred - wipe out all ACLs on entire drive - set all permissions to the same value - wide open 777 - remove any system data (applications, system files, including hidden files to the best of my knowledge) by selecting only non-system/app data and using Carbon Copy Cloner to put only the data of interest onto a newly formatted drive - transfer data to newly formatted drive folder by folder, resetting the spotlight index in between adding each to observe for issues (interesting here is that no issues occurred except for in Documents folder - when I transferred only the Documents folder to a newly formatted drive on its own - no trouble. It appears almost as thought it may not be the content but the quantity or specific combination of data that results in problems) - use DataRescue to transfer the data to yet another newly formatted drive to expose any missed hidden files Between each of the above steps I stopped Spotlight (search for anything beginning with md in Activity Monitor - All Processes and quitting it), deleted the .Spotlight-V100 directory from the affected drive. Restart Splotlight indexing by adding drive to Spotlight privacy list and removing it. In each case the same issue occurs - Spotlight begins indexing normally (or so it seems), then the index estimated time increases, usually to 4 hours remaining. This is where it gets stuck and continues to predict 4 hours remaining but never finishes. Sometimes I can't eject the drive and have to quit the md.. processes from Activity Monitor to be able to eject the drive without Force Eject. Once I disconnect the drive after the 4 hours remaining situation - if I reattach it, Spotlight forever estimates remaining time and never gets going again. So there it is. It is apparently not a filesystem issue, not a permissions issue and not tied to any particular piece of hardware or protocol (used USB and FW drives). I have tried this on several machines (3 to be precise) and in 10.5.8 and 10.6.5. Simply disabling Spotlight on this volume is not an option because the owner has no clue where things are as the data on the volume dates back to music projects and compositions from 2003 and before. He needs to be able to query for results. Anyone got any ideas? ---update 2-6-11 Since I have not received any responses except the one below which appears to misunderstand my point, I am updating this post hoping to get more responses. I have used the terminal command sudo opensnoop -p PID where PID is the mdworker process ID to try and determine what Spotlight is doing and hopefully find the files it's having trouble with. Here's what happens: After indexing for a few hours, mdworker is gone. It no longer shows up in Activity Monitor under "All Processes" and the Terminal window with the opensnoop result stops moving. I then proceeded to execute the same command on mds to see what it was doing and here's what I get, repeatedly: 501 57 mds 21 / 501 57 mds 21 /Volumes/Sno Leppard 501 57 mds 21 /Volumes/Tiger 501 57 mds 21 /Volumes/Leppard 501 57 mds 21 /Volumes/Disk Warrior 501 57 mds 21 /Volumes/ONM Data These represent all the volumes currently mounted in the system. All except ONM Data, which is the one I am trying to index, are excluded from SPotlight indexing at the moment. The sequence above repeats over and over, with slight variation, sometimes skipping one of the volumes. Questions - what happened to mdworker? What is mds doing? I will let this run until tomorrow morning and throughout the day and monitor for any changes. Any input would be very much appreciated. Even if you're not sure what the ultimate answer is, please alert me to anything you think I may be missing. Hopefully at some point we will figure this out... Thanks, M __final edit__ I finally resolved the issue and here is how I did it. I used the terminal command "sudo opensnoop -p PID" where the PID is the process id of the processes I was monitoring. I was looking at all instances of mds and mdworker running in the system. After the third time through indexing the same data set (see info above), I contacted Apple and got to their highest level of support - they were flabbergasted as well. They advised me to install yet another default 10.6.6 system and try again. The same pattern repeated - mds and mdworker(s) would start indexing and eventually the spotlight icon would say 6 hours remaining and all mdworkers were gone, mds at 90% or so of CPU. But I did finally figure out that the first time mdworker stopped like that, the last file it touched was always in the same folder. I excluded that folder from spotlight search and the rest of the data set indexed within about 2 hours with no strange behavior or failures. I copied that folder to another machine and Spotlight barfed immediately. Exclude that folder and all is well again. I have no clue what is causing this behavior, still, but I did find a functional solution to the problem. Anyone with a similar problem - run opensnoop on all instances of mds and mdworker and wait patiently for wdworker to exit. Look at the last file it touched and exclude the enclosing folder from being indexed. I was able to repeat the issue and solution on 2 different installs and 2 different copies of the data set. Hope this helps. If we find an actual cause of the folder being such a problem (it is called MICHAEL BRECKER RECORD SOLOS and contains almost 1 GB of audio related files - performer, live, SD2 - things like that), I will edit again to let you all know. Thanks for ay attempts to help, M

Read the article
Unicast traffic between hosts on a switch leaving the switch by its uplink. Why?

- by Rich Lafferty

I have a weird thing happening on our network at my office which I can't quite get my head around. In particular I can't tell if it's a problem with a switch, or a problem with configuration. We have a Cisco SG300-52 switch (sw01) in the top of a rack in our server room, connected to another SG300-28 that acts as our core switch (core01). Both run layer 2 only, our firewalls do routing between VLANs. They have a dozen or so VLANs between them. Gi1 on sw01 is a trunk port connected to gi1 on core01. (Disclosure: There are other switches in our environment but I'm pretty sure I've isolated the problem down to these two. Happy to provide more info if necessary.) The behaviour I'm seeing is limited to one VLAN, vlan 12 -- or, at least, it's not happening on the other ones I checked (It's hard to guarantee the absence of packets), and it is: sw01 is forwarding, to core01, traffic which is between two hosts which are both plugged into sw01. (I noticed this because the IDS in our firewall gave a false positive on traffic which should not reach the firewall.) We noticed this mostly between our two dhcp/dns servers, net01 (10.12.0.10) and net02 (10.12.0.11). net01 is physical hardware and net02 is on a VMware ESX server. net01 is connected to gi44 on sw01 and net02's ESX server to gi11. [net01]----gi44-[sw01]-gi1----gi1-[core01] [net02]----gi11/ Let's see some interfaces! Remember, vlan 12 is the problem vlan. Of the others I explicitly verified that vlan 27 was not affected. Here's the two hosts' ports: esx01 contains net02. sw01#sh run int gi11 interface gigabitethernet11 description esx01 lldp med disable switchport trunk allowed vlan add 5-7,11-13,100 switchport trunk native vlan 27 ! sw01#sh run int gi44 interface gigabitethernet44 description net01-1 lldp med disable switchport mode access switchport access vlan 12 ! Here's the trunk on sw01. sw01#sh run int gi1 interface gigabitethernet1 description "trunk to core01" lldp med disable switchport trunk allowed vlan add 4-7,11-13,27,100 ! And the other end of the trunk on core01. interface gigabitethernet1 description sw01 macro description switch switchport trunk allowed vlan add 2-7,11-16,27,100 ! I have a monitor port on core01, thus: core01#sh run int gi12 interface gigabitethernet12 description "monitor port" port monitor GigabitEthernet 1 ! And the monitor port on core01 sees unicast traffic going between net01 and net02, both of which are on sw01! I've verified this with a monitor port on sw01 that sees the net01-net02 unicast traffic leaving via gi1 too. sw01 knows that both of those hosts are on ports that are not its trunk port: :) ratchet$ arp -a | grep net net02.2ndsiteinc.com (10.12.0.11) at 00:0C:29:1A:66:15 [ether] on eth0 net01.2ndsiteinc.com (10.12.0.10) at 00:11:43:D8:9F:94 [ether] on eth0 sw01#sh mac addr addr 00:0C:29:1A:66:15 Aging time is 300 sec Vlan Mac Address Port Type -------- --------------------- ---------- ---------- 12 00:0c:29:1a:66:15 gi11 dynamic sw01#sh mac addr addr 00:11:43:D8:9F:94 Aging time is 300 sec Vlan Mac Address Port Type -------- --------------------- ---------- ---------- 12 00:11:43:d8:9f:94 gi44 dynamic I also brought up an unused port on sw01 on vlan 12, but the unicast traffic was (as best as I could tell) not coming out that port. So it doesn't look like sw01 is pushing it out all its ports, just the right ports and also gi1! I've verified that sw01 is not filling up its address-table: sw01#sh mac addr count This may take some time. Capacity : 8192 Free : 7983 Used : 208 The full configs for both core01 and sw01 are available: core01, sw01. Finally, versions: sw01#sh ver SW version 1.1.2.0 ( date 12-Nov-2011 time 23:34:26 ) Boot version 1.0.0.4 ( date 08-Apr-2010 time 16:37:57 ) HW version V01 core01#sh ver SW version 1.1.2.0 ( date 12-Nov-2011 time 23:34:26 ) Boot version 1.1.0.6 ( date 11-May-2011 time 18:31:00 ) HW version V01 So my understanding is this: sw01 should take unicast traffic for net01 and send it only out net02's port, and vice versa; none of it should go out sw01's uplink. But core01, receiving traffic on gi1 for a host it knows is on gi1, is right in sending it out all of its ports. (That is: sw01 is misbehaving, but core01 is doing what it should given the circumstances.) My question is: Why is sw01 sending that unicast traffic out its uplink, gi1? (And pre-emptively: yes, I know SG300s leave much to be desired, and yes, we should have spanning-tree enabled, but that's where I'm at right now.)

Read the article
Windows 7 Machine Makes Router Drop -All- Wireless Connections [closed]

- by Hammer Bro.

Note: I accidentally originally posted this question over at SuperUser, and I still think the issue is caused by some low-level networking practice of Windows 7, but I think the expertise here would be more apt to figuring it out. Apologies for the cross-post. Some background: My home network consists of my Desktop, a two-month old Windows 7 (x64) machine which is online most frequently (N-spec), as well as three other Windows XP laptops (all G) that only connect every now and then (one for work, one for Netflix, and the other for infrequent regular laptop uses). I used to have a Belkin F5D8236-4 wireless router, and everything worked great. A week ago, however, I found out that the Belkin absolutely in no way would establish a VPN connection, something that has become important for work. So I bought a Netgear WNR3500v2/U/L. The wireless was acting a little sketchy at first for just the Windows 7 machine, but I thought it had something to do with 802.11N and I was in a hurry so I just fished up an ethernet cable and disabled the computer's wireless. It has now become apparent, though, that whenever the Windows 7 machine is connected to the router, all wireless connections become unstable. I was using my work laptop for a solid six hours today with no trouble, having multiple SSH connections open over VPN and streaming internet radio in the background. Then, within two minutes of turning on this Windows 7 box, I had lost all connectivity over the wireless. And I was two feet away from the router. The same sort of thing happens on all of the other laptops -- Netflix can be playing stuff all weekend, but if I come up here and do things on this (W7) computer, the streaming will be dead within ten minutes. So here are my basic observations: If the Windows 7 machine is off, then all connections will have a Signal Strength of Very Good or Excellent and a Speed of 48-54 Mbps for an indefinite amount of time. Shortly after the Windows 7 machine is turned on, all wireless connections will experience a consistent decline in Speed down to 1.0 Mbps, eventually losing their connection entirely. These machines will continue to maintain 70% signal strength, as observed by themselves and router. Once dropped, a wireless connection will have difficulty reconnecting. And, if a connection manages to become established, it will quickly drop off again. The Windows 7 machine itself will continue to function just fine if it's using a wired connection, although it will experience these same issues over the wireless. All of the drivers and firmwares are up to date, and this happened both with the stock Netgear firmware as well as the (current) DD-WRT. What I've tried: Making sure each computer is being assigned a distinct IP. (They are.) Disabling UPnP and Stateful Packet Inspection on the router. Disabling Network Sharing, SSDP Discovery, TCP/IP NetBios Helper and Computer Browser services on the Windows 7 machine. Disabling QoS Packet Scheduler, IPv6, and Link Layer Topology Discovery options on my ethernet controller (leaving only Client for Microsoft Networks, File and Printer Sharing, and IPv4 enabled). What I think: It seems awfully similar to the problems discussed in detail at http://social.msdn.microsoft.com/Forums/en/wsk/thread/1064e397-9d9b-4ae2-bc8e-c8798e591915 (which was both the most relevant and concrete information I could dig up on the internet). I still think that something the Windows 7 IP stack (or just Operating System itself) is doing is giving the router fits. However, I could be wrong, because I have two key differences. One is that most instances of this problem are reported as the entire router dying or restarting, and mine still works just fine over the wired connection. The other is that it's a new router, tested with both the factory firmware and the (I assume) well-maintained DD-WRT project. Even if Windows 7 is still secretly sending IPv6 packets or the TCP Window Scaling implementation that I hear Vista caused some trouble with (even though I've tried my best to disable anything fancy), this router should support those functions. I don't want to get a new or a replacement router unless someone can convince me that this is a defective unit. But the problem seems too specific and predictable by my instincts to be a hardware hiccup. And I don't want to deal with the inevitable problems that always seem to take half a day to resolve when getting a new router, since I'm frantically working (including tomorrow) to complete a project by next week's deadline. Plus, I think in the worst case scenario, I could keep this router connected directly to the modem, disable its wireless entirely, and connect the old Belkin to it directly. That should allow me to still use VPN (although I'll have to plug my work laptop directly into that router), and then maintain wireless connections for all of the other computers. But that feels so wrong to me. Anyone have any ideas what the cause and possible solution could be? Clarifications: The Windows 7 machine is directly connected via an ethernet cable to the router for everything above. But while it is online, all other computers' wireless connections become unusable. It is not an issue of signal strength or interference -- no other devices within scanning range are using Channel 1, and the problem will affect computers that are literally feet away from the router with 95% signal strength.

Read the article
How do I make Linux recognize a new SATA /dev/sda drive I hot swapped in without rebooting?

- by Philip Durbin

Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized: [root@fs-2 ~]# tail -18 /var/log/messages May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake } May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset May 5 16:54:45 fs-2 kernel: ata1: soft resetting link May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:54:55 fs-2 kernel: ata1: soft resetting link May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:05 fs-2 kernel: ata1: soft resetting link May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps May 5 16:55:40 fs-2 kernel: ata1: soft resetting link May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up May 5 16:55:45 fs-2 kernel: ata1: EH complete I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work: [root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan -bash: echo: write error: Invalid argument [root@fs-2 ~]# [root@fs-2 ~]# /root/rescan-scsi-bus.sh -l [snip] 0 new device(s) found. 0 device(s) removed. [root@fs-2 ~]# [root@fs-2 ~]# ls /dev/sda ls: /dev/sda: No such file or directory I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting? The operating system in question is RHEL5.3: [root@fs-2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.3 (Tikanga) The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS. Here is the lscpi output: [root@fs-2 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful. The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output: [root@www-1 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)

Read the article
Can't configure PAM + LDAP on Debian Lenny - Getting error=49 on server logs

- by Jorge Suárez de Lis

I've been migrating some servers and desktops using Ubuntu 10.04 from getting the users from an old OpenLDAP implementation to a newer Centos Active Directory. I haven't had any problems so far, until I reached a Debian Lenny server. I've set up the server as the others, setting /etc/ldap.conf and /etc/ldap/ldap.conf. However, when I issue "getent passwd", I get nothing from the LDAP server. Reading the pam_ldap manpage, I realized that /etc/ldap.conf was not an accepted file by pam_ldap -it worked with Ubuntu though-, so I renamed it to /etc/pam_ldap.conf. Same result. However, once I've changed the name of this file, when I login using SSH I get this on the LDAP server logs: [20/Jul/2012:11:19:40 +0200] conn=16501 fd=155 slot=155 connection from x.x.x.50 to 10.1.176.237 [20/Jul/2012:11:19:40 +0200] conn=16501 op=0 BIND dn="uid=ubuntu,ou=Applications,ou=CITIUS,dc=inv,dc=usc,dc=es" method=128 version=3 [20/Jul/2012:11:19:40 +0200] conn=16501 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=ubuntu,ou=applications,ou=citius,dc=inv,dc=usc,dc=es" [20/Jul/2012:11:19:40 +0200] conn=16501 op=1 SRCH base="ou=People,ou=CITIUS,dc=inv,dc=usc,dc=es" scope=2 filter="(uid=jorge.suarez)" attrs=ALL [20/Jul/2012:11:19:40 +0200] conn=16501 op=1 RESULT err=0 tag=101 nentries=1 etime=0 notes=U [20/Jul/2012:11:19:40 +0200] conn=16501 op=2 BIND dn="uid=jorge.suarez,ou=People,ou=CITIUS,dc=inv,dc=usc,dc=es" method=128 version=3 [20/Jul/2012:11:19:40 +0200] conn=16501 op=2 RESULT err=49 tag=97 nentries=0 etime=0 The password isn't working. I don't know that could be wrong, anything else seems to be OK. That user/password is working from another clients: [20/Jul/2012:11:29:39 +0200] conn=16528 fd=188 slot=188 connection from x.x.x.224 to 10.1.176.237 [20/Jul/2012:11:29:39 +0200] conn=16528 op=0 BIND dn="uid=ubuntu,ou=Applications,ou=CITIUS,dc=inv,dc=usc,dc=es" method=128 version=3 [20/Jul/2012:11:29:39 +0200] conn=16528 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=ubuntu,ou=applications,ou=citius,dc=inv,dc=usc,dc=es" [20/Jul/2012:11:29:39 +0200] conn=16528 op=1 SRCH base="ou=People,ou=CITIUS,dc=inv,dc=usc,dc=es" scope=2 filter="(uid=jorge.suarez)" attrs=ALL [20/Jul/2012:11:29:39 +0200] conn=16528 op=1 RESULT err=0 tag=101 nentries=1 etime=0 notes=U [20/Jul/2012:11:29:39 +0200] conn=16528 op=2 BIND dn="uid=jorge.suarez,ou=People,ou=CITIUS,dc=inv,dc=usc,dc=es" method=128 version=3 [20/Jul/2012:11:29:39 +0200] conn=16528 op=2 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=jorge.suarez,ou=people,ou=citius,dc=inv,dc=usc,dc=es" I'm using SSHA for storing passwords on the LDAP server. Maybe this is not supported by Debian Lenny? On pam_ldap.conf, I've set up this, as in all the other servers: # Do not hash the password at all; presume # the directory server will do it, if # necessary. This is the default. pam_password md5 Also tried clear, but it didn't work. Anyways, it's weird that issuing getent passwd still gets me no users. However, if I use pamtest from the package libpam-dotfile to test login, it works. # pamtest ssh jorge.suarez Trying to authenticate <jorge.suarez> for service <ssh>. Password: Authentication successful. # pamtest foo jorge.suarez Trying to authenticate <jorge.suarez> for service <foo>. Password: Authentication successful. But "su" won't work also: # su jorge.suarez Id. descoñecido: jorge.suarez Just the output from getent passwd : # getent passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/bin/sh bin:x:2:2:bin:/bin:/bin/sh sys:x:3:3:sys:/dev:/bin/sh sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/bin/sh man:x:6:12:man:/var/cache/man:/bin/sh lp:x:7:7:lp:/var/spool/lpd:/bin/sh mail:x:8:8:mail:/var/mail:/bin/sh news:x:9:9:news:/var/spool/news:/bin/sh uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh proxy:x:13:13:proxy:/bin:/bin/sh www-data:x:33:33:www-data:/var/www:/bin/sh backup:x:34:34:backup:/var/backups:/bin/sh list:x:38:38:Mailing List Manager:/var/list:/bin/sh irc:x:39:39:ircd:/var/run/ircd:/bin/sh gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh nobody:x:65534:65534:nobody:/nonexistent:/bin/sh libuuid:x:100:101::/var/lib/libuuid:/bin/sh Debian-exim:x:101:103::/var/spool/exim4:/bin/false statd:x:102:65534::/var/lib/nfs:/bin/false sshd:x:104:65534::/var/run/sshd:/usr/sbin/nologin luser:x:1000:1000:Usuario local de Burdeos,,,:/home/luser:/bin/bash messagebus:x:105:107::/var/run/dbus:/bin/false sge-admin:x:1001:1001:Administrador do SGE,,,:/home/cluster/sge-admin:/bin/bash ntp:x:107:110::/home/ntp:/bin/false haldaemon:x:108:111:Hardware abstraction layer,,,:/var/run/hald:/bin/false vde2-net:x:109:114::/var/run/vde2:/bin/false uml-net:x:110:115::/home/uml-net:/bin/false polkituser:x:111:116:PolicyKit,,,:/var/run/PolicyKit:/bin/false Debian-pxe:x:113:65534:Dummy user for Debian pxe package,,,:/home/Debian-pxe:/bin/false Nscd was stopped from the beginning.

Read the article
Ubuntu, No wireless networks found after correctly installed madwifi

- by Peter

Hi, I just installed madwifi on my MSI laptop with an Atheros AR5001 wifi card & Lucid. As far as I can see and according to System - Administration - Hardware drivers the install was successful and the card + driver is up and running. However, I don't see any wireless network (my windows PC can see about 5 wireless networks). I tried it with the network manager applet as well as with wicd. If I try to connect to "Hidden Wireless Network" via nm-applet, it will start to connect for a while but is unable too (although I supply it with the correct WEP settings & key) So, I'm unable to use my wireless network. What am i doing wrong? Some information about my system: iwconfig lo no wireless extensions. eth0 no wireless extensions. wifi0 no wireless extensions. ath0 IEEE 802.11g ESSID:"" Mode:Managed Frequency:2.437 GHz Access Point: Not-Associated Bit Rate:0 kb/s Tx-Power:17 dBm Sensitivity=1/1 Retry:off RTS thr:off Fragment thr:off Power Management:off Link Quality=0/70 Signal level=-96 dBm Noise level=-96 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:0 Invalid misc:0 Missed beacon:0 pan0 no wireless extensions. ifconfig ath0 Link encap:Ethernet HWaddr 00:15:af:cf:e2:ca inet6 addr: fe80::215:afff:fecf:e2ca/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) eth0 Link encap:Ethernet HWaddr 00:21:85:4d:82:78 inet addr:192.168.2.101 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::221:85ff:fe4d:8278/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3800 errors:0 dropped:0 overruns:0 frame:0 TX packets:2944 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3940261 (3.9 MB) TX bytes:525218 (525.2 KB) Interrupt:27 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:12 errors:0 dropped:0 overruns:0 frame:0 TX packets:12 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:720 (720.0 B) TX bytes:720 (720.0 B) wifi0 Link encap:UNSPEC HWaddr 00-15-AF-CF-E2-CA-00-00-00-00-00-00-00-00-00-00 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:3497 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:280 RX bytes:0 (0.0 B) TX bytes:179947 (179.9 KB) Interrupt:16 lshw -C network *-network description: Wireless interface product: AR5001 Wireless Network Adapter vendor: Atheros Communications Inc. physical id: 0 bus info: pci@0000:02:00.0 logical name: wifi0 version: 01 serial: 00:15:af:cf:e2:ca width: 64 bits clock: 33MHz capabilities: pm msi pciexpress msix bus_master cap_list logical ethernet physical wireless configuration: broadcast=yes driver=ath_pci latency=0 multicast=yes wireless=IEEE 802.11g resources: irq:16 memory:fd7f0000-fd7fffff *-network description: Ethernet interface product: RTL8111/8168B PCI Express Gigabit Ethernet controller vendor: Realtek Semiconductor Co., Ltd. physical id: 0 bus info: pci@0000:05:00.0 logical name: eth0 version: 01 serial: 00:21:85:4d:82:78 size: 100MB/s capacity: 1GB/s width: 64 bits clock: 33MHz capabilities: pm vpd msi pciexpress bus_master cap_list rom ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=2.3LK-NAPI duplex=full ip=192.168.2.101 latency=0 link=yes multicast=yes port=MII speed=100MB/s resources: irq:27 ioport:c800(size=256) memory:fe2ff000-fe2fffff memory:fe2c0000-fe2dffff(prefetchable) lspci 00:00.0 Host bridge: ATI Technologies Inc RS690 Host Bridge 00:01.0 PCI bridge: ATI Technologies Inc RS690 PCI to PCI Bridge (Internal gfx) 00:04.0 PCI bridge: ATI Technologies Inc Device 7914 00:06.0 PCI bridge: ATI Technologies Inc RS690 PCI to PCI Bridge (PCI Express Port 2) 00:07.0 PCI bridge: ATI Technologies Inc RS690 PCI to PCI Bridge (PCI Express Port 3) 00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA 00:13.0 USB Controller: ATI Technologies Inc SB600 USB (OHCI0) 00:13.1 USB Controller: ATI Technologies Inc SB600 USB (OHCI1) 00:13.2 USB Controller: ATI Technologies Inc SB600 USB (OHCI2) 00:13.3 USB Controller: ATI Technologies Inc SB600 USB (OHCI3) 00:13.4 USB Controller: ATI Technologies Inc SB600 USB (OHCI4) 00:13.5 USB Controller: ATI Technologies Inc SB600 USB Controller (EHCI) 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 14) 00:14.1 IDE interface: ATI Technologies Inc SB600 IDE 00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) 00:14.3 ISA bridge: ATI Technologies Inc SB600 PCI to LPC Bridge 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:05.0 VGA compatible controller: ATI Technologies Inc RS690M [Radeon X1200 Series] 01:05.2 Audio device: ATI Technologies Inc Radeon X1200 Series Audio Controller 02:00.0 Ethernet controller: Atheros Communications Inc. AR5001 Wireless Network Adapter (rev 01) 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01) 06:04.0 CardBus bridge: O2 Micro, Inc. OZ711MP1/MS1 MemoryCardBus Controller (rev 21) 06:04.2 SD Host controller: O2 Micro, Inc. Integrated MMC/SD Controller (rev 01) 06:04.3 Bridge: O2 Micro, Inc. Integrated MS/xD Controller (rev 01) 06:04.4 FireWire (IEEE 1394): O2 Micro, Inc. Firewire (IEEE 1394) (rev 02) less /proc/modules | grep ath ath_rate_sample 11476 1 - Live 0xf812b000 ath_pci 193197 0 - Live 0xf85c3000 wlan 222892 5 wlan_wep,wlan_scan_sta,ath_rate_sample,ath_pci, Live 0xf8537000 ath_hal 398604 3 ath_rate_sample,ath_pci, Live 0xf8480000 I've been at this for hours now, also tried ndiswrapper and ath5k drivers with no luck, and really could use some help. Cheers.

Read the article
Mount TMPFS instead of ro /dev

- by schiggn

I am working on a ARM-Based embedded system with a custom Debian Linux based on kernel 2.6.31. In the final system, the Root file system is stored as squashfs on flash. Now, the folder /dev is created by udev, but since there is no hot plugging functionality needed and booting time is critical, I wanted to delete udev and "hard code" the /dev folder (read here, page 5). because i still need to change parameters of the devices (with ioctl /sysfs) this does not work for me in this case. so i thought of mounting a tmpfs on /dev and change the parameters there. is this possible? and how to do best? my approach would be: delete /dev from RFS create tar containing basic devices mount tmpfs /dev untar tar-file into /dev change parameters Could this work? Do you see any problems? I found out, that you can mount on top of already mounted mount point, is it somehow possible just to take data with while mounting the new file system? if so that would be very convenient! Thanks Update: I just tried that out, but I'm stuck at a certain point. I packed all my devices into devices.tar, packed it into /usr of my squashfs and added the following lines to mountkernfs.sh, which is executed right after INIT. #mount /dev on tmpfs echo -n "Mounting /dev on tmpfs..." mount -o size=5M,mode=0755 -t tmpfs tmpfs /dev mknod -m 600 /dev/console c 5 1 mknod -m 600 /dev/null c 1 3 echo "done." echo -n "Populating /dev..." tar -xf /usr/devices.tar -C /dev echo "done." This works fine on the version over NFS, if I place printf's in the code, I can see it executing, if I comment out the extracting part, its complaining about missing devices. Booting OK mmc0: new high speed SDHC card at address 0007 mmcblk0: mmc0:0007 SD04G 3.67 GiB mmcblk0: p1 IP-Config: Unable to set interface netmask (-22). Looking up port of RPC 100003/2 on 192.168.1.234 Looking up port of RPC 100005/1 on 192.168.1.234 VFS: Mounted root (nfs filesystem) on device 0:14. Freeing init memory: 136K INIT: version 2.86 booting Mounting /dev on tmpfs...done. Populating /dev...done. Initializing /var...done. Setting the system clock. System Clock set to: Thu Sep 13 11:26:23 UTC 2012. INIT: Entering runlevel: 2 UBI: attaching mtd8 to ubi0 Commenting out the extraction of the tar mmc0: new high speed SDHC card at address 0007 mmcblk0: mmc0:0007 SD04G 3.67 GiB mmcblk0: p1 IP-Config: Unable to set interface netmask (-22). Looking up port of RPC 100003/2 on 192.168.1.234 Looking up port of RPC 100005/1 on 192.168.1.234 VFS: Mounted root (nfs filesystem) on device 0:14. Freeing init memory: 136K INIT: version 2.86 booting Mounting /dev on tmpfs...done. Populating /dev...done. Initializing /var...done. Setting the system clock. Cannot access the Hardware Clock via any known method. Use the --debug option to see the details of our search for an access method. Unable to set System Clock to: Thu Sep 13 12:24:00 UTC 2012 ... (warning). INIT: Entering runlevel: 2 libubi: error!: cannot open "/dev/ubi_ctrl" So far so good. But if I pack the whole story into a squashfs and boot from there, it is acting strange. It's telling me while booting that it is unable to open an initial console and its throwing errors on mounting the UBIFS devices, but finally provides a login anyway. Over that my echo's are not executed. If I then log in, /dev is mounted as TMPFS as desired and all the devices reside inside. When I redo the "mount" command to mount the UBIFS partitions it is executed whitout problem and useable. From squashfs VFS: Mounted root (squashfs filesystem) readonly on device 31:15. Freeing init memory: 136K Warning: unable to open an initial console. mmc0: new high speed SDHC card at address 0007 mmcblk0: mmc0:0007 SD04G 3.67 GiB mmcblk0: p1 UBIFS error (pid 484): ubifs_get_sb: cannot open "ubi1_0", error -19 Additionally, a part of the rest of the bootscripts is still exexuted, but not all of them. Does anyone has a clue why? Other question, is 5MB enough/too much for /dev?

Read the article
Weighted round robins via TTL - possible?

- by Joe Hopfgartner

I currently use DNS round robin for load balancing, which works great. The records look like this (I have a ttl of 120 seconds) ;; ANSWER SECTION: orion.2x.to. 116 IN A 80.237.201.41 orion.2x.to. 116 IN A 87.230.54.12 orion.2x.to. 116 IN A 87.230.100.10 orion.2x.to. 116 IN A 87.230.51.65 I learned that not every ISP / device treats such a response the same way. For example some DNS servers rotate the addresses randomly or always cycle them through. Some just propagate the first entry, others try to determine which is best (regionally near) by looking at the ip address. However if the userbase is big enough (spreads over multiple ISPs etc) it balances pretty well. The discrepancies from highest to lowest loaded server hardly every exceeds 15%. However now I have the problem that I am introducing more servers into the systems, that not all have the same capacities. I currently only have 1gbps servers, but I want to work with 100mbit and also 10gbps servers too. So what I want is I want to introduce a server with 10 GBps with a weight of 100, a 1 gbps server with a weight of 10 and a 100 mbit server with a weight of 1. I used to add servers twice to bring more traffic to them (which worked nice. the bandwidth doubled almost.) But adding a 10gbit server 100 times to DNS is a bit rediculous. So I thought about using the TTL. If I give server A 240 seconds ttl and server B only 120 seconds (which is about about the minimum to use for round robin, as a lot of dns servers set to 120 if a lower ttl is specified.. so i have heard) I think something like this should occour in an ideal scenario: first 120 seconds 50% of requests get server A -> keep it for 240 seconds. 50% of requests get server B -> keep it for 120 seconds second 120 seconds 50% of requests still have server A cached -> keep it for another 120 seconds. 25% of requests get server A -> keep it for 240 seconds 25% of requests get server B -> keep it for 120 seconds third 120 seconds 25% will get server A (from the 50% of Server A that now expired) -> cache 240 sec 25% will get server B (from the 50% of Server A that now expired) -> cache 120 sec 25% will have server A cached for another 120 seconds 12.5% will get server B (from the 25% of server B that now expired) -> cache 120sec 12.5% will get server A (from the 25% of server B that now expired) -> cache 240 sec fourth 120 seconds 25% will have server A cached -> cache for another 120 secs 12.5% will get server A (from the 25% of b that now expired) -> cache 240 secs 12.5% will get server B (from the 25% of b that now expired) -> cache 120 secs 12.5% will get server A (from the 25% of a that now expired) -> cache 240 secs 12.5% will get server B (from the 25% of a that now expired) -> cache 120 secs 6.25% will get server A (from the 12.5% of b that now expired) -> cache 240 secs 6.25% will get server B (from the 12.5% of b that now expired) -> cache 120 secs 12.5% will have server A cached -> cache another 120 secs ... i think i lost something at this point but i think you get the idea.... As you can see this gets pretty complicated to predict and it will for sure not work out like this in practice. But it should definitely have an effect on the distribution! I know that weighted round robin exists and is just controlled by the root server. It just cycles through dns records when responding and returns dns records with a set propability that corresponds to the weighting. My DNS server does not support this, and my requirements are not that precise. If it doesnt weight perfectly its okay, but it should go into the right direction. I think using the TTL field could be a more elegant and easier solution - and it deosnt require a dns server that controls this dynamically, which saves resources - which is in my opinion the whole point of dns load balancing vs hardware load balancers. My question now is... are there any best prectices / methos / rules of thumb to weight round robin distribution using the TTL attribute of DNS records? Edit: The system is a forward proxy server system. The amount of Bandwidth (not requests) exceeds what one single server with ethernet can handle. So I need a balancing solution that distributes the bandwidth to several servers. Are there any alternative methods than using DNS? Of course I can use a load balancer with fibre channel etc, but the costs are rediciulous and it also increases only the width of the bottleneck and does not eliminate it. The only thing i can think of are anycast (is it anycast or multicast?) ip addresses, but I don't have the means to set up such a system.

Read the article
Problem upgrading kernel on debian 3.1

- by exhuma

Hi, I have a quite old box in a remote server farm. So I have no direct access. Only remote SSH (and via SSH to a serial console). I haven't updated this box in ages. Now, whenever I want to install a new package, a dependency to glibc appears. Unfortunately, the install of glibc depends on a 2.6 kernel and I am running a venerable 2.4 kernel (one more reason to upgrade). The problem is, that the install of a new kernel has an indirect (over locales) dependency to glibc. So, to install glibc, I need a new kernel. For a new kernel, I need to upgrade glibc. Essentially I am blocked. What's the best way to proceed considering I have no "hardware" access? Here's a quick transcript of the upgrade process: [green:~]% sudo aptitude install linux-image-686 Reading Package Lists... Done Building Dependency Tree Reading extended state information Initializing package states... Done Reading task descriptions... Done The following packages are unused and will be REMOVED: gcc-4.3-base The following NEW packages will be automatically installed: dash libc6-i686 libparse-recdescent-perl linux-image-2.6-686 linux-image-2.6.18-6-686 module-init-tools yaird The following packages have been kept back: adduser apache2 apache2-mpm-prefork apache2-utils apache2.2-common apt apt-utils aptitude autoconf autotools-dev awstats base-files base-passwd [...snip...] util-linux vacation vim vim-common wamerican wbritish wget whiptail whois wwwconfig-common zlib1g The following NEW packages will be installed: dash libc6-i686 libparse-recdescent-perl linux-image-2.6-686 linux-image-2.6.18-6-686 linux-image-686 module-init-tools yaird The following packages will be upgraded: hotplug libc6 2 packages upgraded, 8 newly installed, 1 to remove and 277 not upgraded. Need to get 0B/22.7MB of archives. After unpacking 52.1MB will be used. Do you want to continue? [Y/n/?] Writing extended state information... Done Preconfiguring packages ... (Reading database ... 34065 files and directories currently installed.) Preparing to replace libc6 2.3.6.ds1-13 (using .../libc6_2.7-18lenny2_i386.deb) ... Checking for services that may need to be restarted... Checking init scripts... WARNING: init script for postgresql not found. [ --- libc6 config screen appears here --- ] WARNING: POSIX threads library NPTL requires kernel version 2.6.8 or later. If you use a kernel 2.4, please upgrade it before installing glibc. The installation of a 2.6 kernel _could_ ask you to install a new libc first, this is NOT a bug, and should *NOT* be reported. In that case, please add etch sources to your /etc/apt/sources.list and run: apt-get install -t etch linux-image-2.6 Then reboot into this new kernel, and proceed with your upgrade dpkg: error processing /var/cache/apt/archives/libc6_2.7-18lenny2_i386.deb (--unpack): subprocess pre-installation script returned error exit status 1 Errors were encountered while processing: /var/cache/apt/archives/libc6_2.7-18lenny2_i386.deb E: Sub-process /usr/bin/dpkg returned an error code (1) Ack! Something bad happened while installing packages. Trying to recover: dpkg: dependency problems prevent configuration of locales: locales depends on glibc-2.7-1; however: Package glibc-2.7-1 is not installed. dpkg: error processing locales (--configure): dependency problems - leaving unconfigured Errors were encountered while processing: locales Reading Package Lists... Done Building Dependency Tree Reading extended state information Initializing package states... Done Reading task descriptions... Done Now, if I follow the instrunctions as promted I get the following. Note that I am using aptitude instead of apt-get to benefit from the better dependency tracking. I did try with apt-get first. But that let me to the same problem. [green:~]% sudo aptitude install -t etch linux-image-2.6.26-2-686 Reading Package Lists... Done Building Dependency Tree Reading extended state information Initializing package states... Done Reading task descriptions... Done E: Unable to correct problems, you have held broken packages. E: Unable to correct dependencies, some packages cannot be installed E: Unable to resolve some dependencies! Some packages had unmet dependencies. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following packages have unmet dependencies: linux-image-2.6.26-2-686: Depends: initramfs-tools (>= 0.55) but it is not installable or yaird (>= 0.0.13) but it is not installable or linux-initramfs-tool which is a virtual package. Any ideas?

Read the article
Command does not execute in crontab while command itself works just fine

- by fuzzybee

I have this script from Colin Johnson on Github - https://github.com/colinbjohnson/aws-missing-tools/tree/master/ec2-automate-backup It seems great. I have modified it to send email to myself every time an EBS snapshot is created or deleted. The following works like a charm ec2-automate-backup.sh -v "vol-myvolumeid" -k 3 However, it does not execute at all as part of my crontab (I didn't receive any emails) #some command that got commented out */5 * * * * ec2-automate-backup.sh -v "vol-fb2fbcdf" -k 3; * * * * * date /root/logs/crontab.log; */5 * * * * date /root/logs/crontab2.log Please note that the 2nd and 3rd execute just fines as I can see the date and time in log files. What could I have missed here? The full ec2-automate-backup.sh is as follows: #!/bin/bash - # Author: Colin Johnson / [email protected] # Date: 2012-09-24 # Version 0.1 # License Type: GNU GENERAL PUBLIC LICENSE, Version 3 # #confirms that executables required for succesful script execution are available prerequisite_check() { for prerequisite in basename ec2-create-snapshot ec2-create-tags ec2-describe-snapshots ec2-delete-snapshot date do #use of "hash" chosen as it is a shell builtin and will add programs to hash table, possibly speeding execution. Use of type also considered - open to suggestions. hash $prerequisite &> /dev/null if [[ $? == 1 ]] #has exits with exit status of 70, executable was not found then echo "In order to use `basename $0`, the executable \"$prerequisite\" must be installed." 1>&2 | mailx -s "Error happened 0" [email protected] ; exit 70 fi done } #get_EBS_List gets a list of available EBS instances depending upon the selection_method of EBS selection that is provided by user input get_EBS_List() { case $selection_method in volumeid) if [[ -z $volumeid ]] then echo "The selection method \"volumeid\" (which is $app_name's default selection_method of operation or requested by using the -s volumeid parameter) requires a volumeid (-v volumeid) for operation. Correct usage is as follows: \"-v vol-6d6a0527\",\"-s volumeid -v vol-6d6a0527\" or \"-v \"vol-6d6a0527 vol-636a0112\"\" if multiple volumes are to be selected." 1>&2 | mailx -s "Error happened 1" [email protected] ; exit 64 fi ebs_selection_string="$volumeid" ;; tag) if [[ -z $tag ]] then echo "The selected selection_method \"tag\" (-s tag) requires a valid tag (-t key=value) for operation. Correct usage is as follows: \"-s tag -t backup=true\" or \"-s tag -t Name=my_tag.\"" 1>&2 | mailx -s "Error happened 2" [email protected] ; exit 64 fi ebs_selection_string="--filter tag:$tag" ;; *) echo "If you specify a selection_method (-s selection_method) for selecting EBS volumes you must select either \"volumeid\" (-s volumeid) or \"tag\" (-s tag)." 1>&2 | mailx -s "Error happened 3" [email protected] ; exit 64 ;; esac #creates a list of all ebs volumes that match the selection string from above ebs_backup_list_complete=`ec2-describe-volumes --show-empty-fields --region $region $ebs_selection_string 2>&1` #takes the output of the previous command ebs_backup_list_result=`echo $?` if [[ $ebs_backup_list_result -gt 0 ]] then echo -e "An error occured when running ec2-describe-volumes. The error returned is below:\n$ebs_backup_list_complete" 1>&2 | mailx -s "Error happened 4" [email protected] ; exit 70 fi ebs_backup_list=`echo "$ebs_backup_list_complete" | grep ^VOLUME | cut -f 2` #code to right will output list of EBS volumes to be backed up: echo -e "Now outputting ebs_backup_list:\n$ebs_backup_list" } create_EBS_Snapshot_Tags() { #snapshot tags holds all tags that need to be applied to a given snapshot - by aggregating tags we ensure that ec2-create-tags is called only onece snapshot_tags="" #if $name_tag_create is true then append ec2ab_${ebs_selected}_$date_current to the variable $snapshot_tags if $name_tag_create then ec2_snapshot_resource_id=`echo "$ec2_create_snapshot_result" | cut -f 2` snapshot_tags="$snapshot_tags --tag Name=ec2ab_${ebs_selected}_$date_current" fi #if $purge_after_days is true, then append $purge_after_date to the variable $snapshot_tags if [[ -n $purge_after_days ]] then snapshot_tags="$snapshot_tags --tag PurgeAfter=$purge_after_date --tag PurgeAllow=true" fi #if $snapshot_tags is not zero length then set the tag on the snapshot using ec2-create-tags if [[ -n $snapshot_tags ]] then echo "Tagging Snapshot $ec2_snapshot_resource_id with the following Tags:" ec2-create-tags $ec2_snapshot_resource_id --region $region $snapshot_tags #echo "Snapshot tags successfully created" | mailx -s "Snapshot tags successfully created" [email protected] fi } date_command_get() { #finds full path to date binary date_binary_full_path=`which date` #command below is used to determine if date binary is gnu, macosx or other date_binary_file_result=`file -b $date_binary_full_path` case $date_binary_file_result in "Mach-O 64-bit executable x86_64") date_binary="macosx" ;; "ELF 64-bit LSB executable, x86-64, version 1 (SYSV)"*) date_binary="gnu" ;; *) date_binary="unknown" ;; esac #based on the installed date binary the case statement below will determine the method to use to determine "purge_after_days" in the future case $date_binary in gnu) date_command="date -d +${purge_after_days}days -u +%Y-%m-%d" ;; macosx) date_command="date -v+${purge_after_days}d -u +%Y-%m-%d" ;; unknown) date_command="date -d +${purge_after_days}days -u +%Y-%m-%d" ;; *) date_command="date -d +${purge_after_days}days -u +%Y-%m-%d" ;; esac } purge_EBS_Snapshots() { #snapshot_tag_list is a string that contains all snapshots with either the key PurgeAllow or PurgeAfter set snapshot_tag_list=`ec2-describe-tags --show-empty-fields --region $region --filter resource-type=snapshot --filter key=PurgeAllow,PurgeAfter` #snapshot_purge_allowed is a list of all snapshot_ids with PurgeAllow=true snapshot_purge_allowed=`echo "$snapshot_tag_list" | grep .*PurgeAllow'\t'true | cut -f 3` for snapshot_id_evaluated in $snapshot_purge_allowed do #gets the "PurgeAfter" date which is in UTC with YYYY-MM-DD format (or %Y-%m-%d) purge_after_date=`echo "$snapshot_tag_list" | grep .*$snapshot_id_evaluated'\t'PurgeAfter.* | cut -f 5` #if purge_after_date is not set then we have a problem. Need to alter user. if [[ -z $purge_after_date ]] #Alerts user to the fact that a Snapshot was found with PurgeAllow=true but with no PurgeAfter date. then echo "A Snapshot with the Snapshot ID $snapshot_id_evaluated has the tag \"PurgeAllow=true\" but does not have a \"PurgeAfter=YYYY-MM-DD\" date. $app_name is unable to determine if $snapshot_id_evaluated should be purged." 1>&2 | mailx -s "Error happened 5" [email protected] else #convert both the date_current and purge_after_date into epoch time to allow for comparison date_current_epoch=`date -j -f "%Y-%m-%d" "$date_current" "+%s"` purge_after_date_epoch=`date -j -f "%Y-%m-%d" "$purge_after_date" "+%s"` #perform compparison - if $purge_after_date_epoch is a lower number than $date_current_epoch than the PurgeAfter date is earlier than the current date - and the snapshot can be safely removed if [[ $purge_after_date_epoch < $date_current_epoch ]] then echo "The snapshot \"$snapshot_id_evaluated\" with the Purge After date of $purge_after_date will be deleted." ec2-delete-snapshot --region $region $snapshot_id_evaluated echo "Old snapshots successfully deleted for $volumeid" | mailx -s "Old snapshots successfully deleted for $volumeid" [email protected] fi fi done } #calls prerequisitecheck function to ensure that all executables required for script execution are available prerequisite_check app_name=`basename $0` #sets defaults selection_method="volumeid" region="ap-southeast-1" #date_binary allows a user to set the "date" binary that is installed on their system and, therefore, the options that will be given to the date binary to perform date calculations date_binary="" #sets the "Name" tag set for a snapshot to false - using "Name" requires that ec2-create-tags be called in addition to ec2-create-snapshot name_tag_create=false #sets the Purge Snapshot feature to false - this feature will eventually allow the removal of snapshots that have a "PurgeAfter" tag that is earlier than current date purge_snapshots=false #handles options processing while getopts :s:r:v:t:k:pn opt do case $opt in s) selection_method="$OPTARG";; r) region="$OPTARG";; v) volumeid="$OPTARG";; t) tag="$OPTARG";; k) purge_after_days="$OPTARG";; n) name_tag_create=true;; p) purge_snapshots=true;; *) echo "Error with Options Input. Cause of failure is most likely that an unsupported parameter was passed or a parameter was passed without a corresponding option." 1>&2 ; exit 64;; esac done #sets date variable date_current=`date -u +%Y-%m-%d` #sets the PurgeAfter tag to the number of days that a snapshot should be retained if [[ -n $purge_after_days ]] then #if the date_binary is not set, call the date_command_get function if [[ -z $date_binary ]] then date_command_get fi purge_after_date=`$date_command` echo "Snapshots taken by $app_name will be eligible for purging after the following date: $purge_after_date." fi #get_EBS_List gets a list of EBS instances for which a snapshot is desired. The list of EBS instances depends upon the selection_method that is provided by user input get_EBS_List #the loop below is called once for each volume in $ebs_backup_list - the currently selected EBS volume is passed in as "ebs_selected" for ebs_selected in $ebs_backup_list do ec2_snapshot_description="ec2ab_${ebs_selected}_$date_current" ec2_create_snapshot_result=`ec2-create-snapshot --region $region -d $ec2_snapshot_description $ebs_selected 2>&1` if [[ $? != 0 ]] then echo -e "An error occured when running ec2-create-snapshot. The error returned is below:\n$ec2_create_snapshot_result" 1>&2 ; exit 70 else ec2_snapshot_resource_id=`echo "$ec2_create_snapshot_result" | cut -f 2` echo "Snapshots successfully created for volume $volumeid" | mailx -s "Snapshots successfully created for $volumeid" [email protected] fi create_EBS_Snapshot_Tags done #if purge_snapshots is true, then run purge_EBS_Snapshots function if $purge_snapshots then echo "Snapshot Purging is Starting Now." purge_EBS_Snapshots fi cron log Oct 23 10:24:01 ip-10-130-153-227 CROND[28214]: (root) CMD (root (ec2-automate-backup.sh -v "vol-fb2fbcdf" -k 3;)) Oct 23 10:24:01 ip-10-130-153-227 CROND[28215]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:25:01 ip-10-130-153-227 CROND[28228]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:25:01 ip-10-130-153-227 CROND[28229]: (root) CMD (date >> /root/logs/crontab2.log) Oct 23 10:26:01 ip-10-130-153-227 CROND[28239]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:27:01 ip-10-130-153-227 CROND[28247]: (root) CMD (root (ec2-automate-backup.sh -v "vol-fb2fbcdf" -k 3;)) Oct 23 10:27:01 ip-10-130-153-227 CROND[28248]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:28:01 ip-10-130-153-227 CROND[28263]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:29:01 ip-10-130-153-227 CROND[28275]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:30:01 ip-10-130-153-227 CROND[28292]: (root) CMD (root (ec2-automate-backup.sh -v "vol-fb2fbcdf" -k 3;)) Oct 23 10:30:01 ip-10-130-153-227 CROND[28293]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:30:01 ip-10-130-153-227 CROND[28294]: (root) CMD (date >> /root/logs/crontab2.log) Oct 23 10:31:01 ip-10-130-153-227 CROND[28312]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:32:01 ip-10-130-153-227 CROND[28319]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:33:01 ip-10-130-153-227 CROND[28325]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:33:01 ip-10-130-153-227 CROND[28324]: (root) CMD (root (ec2-automate-backup.sh -v "vol-fb2fbcdf" -k 3;)) Oct 23 10:34:01 ip-10-130-153-227 CROND[28345]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:35:01 ip-10-130-153-227 CROND[28362]: (root) CMD (date >> /root/logs/crontab.log;) Oct 23 10:35:01 ip-10-130-153-227 CROND[28363]: (root) CMD (date >> /root/logs/crontab2.log) Mails to root From [email protected] Tue Oct 23 06:00:01 2012 Return-Path: <[email protected]> Date: Tue, 23 Oct 2012 06:00:01 GMT From: [email protected] (Cron Daemon) To: [email protected] Subject: Cron <root@ip-10-130-153-227> root ec2-automate-backup.sh -v "vol-fb2fbcdf" -k 3 Content-Type: text/plain; charset=UTF-8 Auto-Submitted: auto-generated X-Cron-Env: <SHELL=/bin/sh> X-Cron-Env: <HOME=/root> X-Cron-Env: <PATH=/usr/bin:/bin> X-Cron-Env: <LOGNAME=root> X-Cron-Env: <USER=root> Status: R /bin/sh: root: command not found

Read the article
Deployment Workbench no longer available after PXE boot

- by Patrick

Our build process revolves around windows Deployment Workbench. Unfortunately this was setup by someone who is no longer with the company, and no-one has ever dared/needed to make any changes. The other day it stopped working. It turns out that one of our build guys started thinking about changing some stuff in it, clicked something and now it no longer works (He is saying now that he right clicked on the 'LAB' entry in 'Deployment points' and hit 'Update', which took some time to run through apparently). The job has fallen on me to resolve and frankly I'm not sure what I'm doing. I was wondering if someone with more experience than me can provide some pointers as to troubleshooting cos I'm feeling quite a lot in the dark here. On the server I have Deployment Workbench up and running (MMC snapin) version 3.0. There is a WDS service that appears to be running ok, as does the tFTPd service. Nothing specific to this in event logs. From the client side; PXE boot works and gets you to the Win PE launch, and it has the correct company logo as the background (proving to me that its loading win PE from the network). WPEINIT runs, and asks for domain credentials, here the team simply put User/Pass/Domain in the boxes and click ok. Normally the build would kick off. Instead they get an error message saying that the \NATBLU01\Distribution$ share isn't available. Checking \NATBLU01\Distribution$ shows that its there and accessible over the network. Security/permissions seem ok, even 'ANONYMOUS LOGON' has read access to that share so I don't see that being a problem. Digging the trace files from C:\MININT\SMSOSD\OSDLOGS\ after an attempt to run the build I can see an error saying much the same - <![LOG[Validating connection to \\NATBLU01\Distribution$]LOG]!><time="16:42:14.000+000" date="03-15-2012" component="LiteTouch" context="" type="1" thread="" file="LiteTouch"> <![LOG[FindFile: The file OSDConnectToUNC.exe could not be found in any standard locations.]LOG]!><time="16:42:14.000+000" date="03-15-2012" component="LiteTouch" context="" type="1" thread="" file="LiteTouch"> <![LOG[The network location cannot be reached. For information about network troubleshooting, see Windows Help.]LOG]!><time="16:42:24.000+000" date="03-15-2012" component="LiteTouch" context="" type="3" thread="" file="LiteTouch"> <![LOG[ERROR - Unable to map a network drive to \\NATBLU01\Distribution$.]LOG]!><time="16:42:24.000+000" date="03-15-2012" component="LiteTouch" context="" type="3" thread="" file="LiteTouch"> BDD.LOG shows much the same. Full copies of the .LOG files can both files be found here : BDD.LOG LITETOUCH.LOG I can get to a command prompt from the Win PE that boots from PXE, however there isn't any network stuff there. IPCONFIG returns nothing so none of the tests I would usually run resolve anything. I'm at a loss frankly. I did wonder if I could perhaps start a new build process but if the change to the DeploymentWorkbench has knocked it offline I don't think I'm going to be able to create a new deployment. Failing that; we do have a deployment point labeled type 'Media' which appears to be a DVD ISO image of one of the builds, but its dated 2008, is it possible to export the network build to .ISO and build from DVD? We are looking at new hardware to run this from anyway (for the impending Windows 7 roll out) so a temporary work round isn't going to be too much of a problem. All assistance is appreciated! EDIT : OK. Got it working again. Solution was close to Newmanth's idea. The problem was that our PE image didn't appear to be connecting the network. I had an older copy of the PE boot.WIM on a stick that I had been using for other purposes. I booted that and correctly got a network connection. Showed a correct internal IP and could ping out etc etc. However I was still getting the same errors in all the logs and in when wpeinit was running. What I did seperately was to update the PE image that DeploymentWorkbench was pushing out to display a different back ground. I wanted to prove that I was working in the correct place. Turns out that I wasn't. I went and looked at the other deployment stuff we had on this machine, Windows Deployment Services was installed and although all the install images are off line the boot image was online, so I uploaded the copy from my stick to that. Booted straight off. And fixed. Working. Yay! For anyone stumbling across this in the future you may find that although your deployment images are located in the DeploymentWorkbench, the Win PE boot image you are launching from is located in the associated Windows Deployment Services images.

Read the article
Web browsing is fast, but downloads are slow

- by Ricket

I work for a company on my university's campus, helping with general IT problems and some web development. But lately there has been a problem that has me and my boss completely stumped. We, plus one contractor, make up the entire IT department, so I'm reaching out to you for help. All around the office, we have wall jacks. These collect in a closet down the hall and all plug into a switch. This switch, along with our individual server jacks, plugs into another switch, and that switch plugs into our firewall hardware. Then the firewall is connected out to our campus network. Our campus internet is, well, very fast. I don't know exactly the terms, tiers, etc., but we have thousands of students and downloads can run as fast as 10 MB/s at night; uploads are sometimes even faster. I think we're practically ISP level. In short, I have a lot of faith that it is not the campus side of things that is causing a problem, combined with other evidence I'll mention in a moment. So our symptoms: web browsing is fast. Web pages, images, etc. load instantly. No problems there. But then when I go to download something, the download starts fast but very quickly (a matter of seconds) drops to nearly 0. Often it will actually drop to 0 and time out. This happens with even very small files, 1 MB or less. It smells to me like a QoS sort of thing. I'm not entirely sure, and I wanted to get your opinions first. My boss is hesitant to touch our firewall, much less let me touch it, and it was set up and is managed by a consultant remotely. These problems don't seem tied to a time of the day. I've tried downloads after 5:00 and still the same thing happens. From my desk, I can turn on my wireless adapter and pick up the campus wireless access point. If I unplug ethernet and connect to it, downloads are fast. This adds to my suspicion that it's limited to our company network. Also, a number of weeks ago the consultant upgraded our firewall firmware. Suddenly everything was very fast. I tested with downloads from Sun and speedtest.net and things were blazing fast, as they should be with our campus internet! It was wonderful, and I figured the slow speeds were an old firmware bug. In a matter of days, things steadily declined until they were back to the old symptoms. Oh, and we have antivirus installed on every computer, and we keep it up to date. Though I suppose the possibility is still there that someone could have spyware which is bogging down our internet, in which case what is the easiest/best way to find this out? (maybe this should go in a separate question) Thank you for your patience in reading all of this. Do you have any ideas as to what I can try? Is this something that you've experienced before? What sort of tools or methods can I use to try and diagnose the problem? P.S. everything here is Windows. Windows Server 2003 and 2008 on our servers, and Windows XP on employees' machines. Update: We are submitting a ticket to the university to just take a look and see if they see anything unusual and/or can suggestion methods for us to try and pinpoint our problem. Hopefully they'll be helpful! I'll update this to let you know what goes on. Update again: We found a hub (yes, a HUB) right between our campus connection and our firewall. It had only those two ethernet cables plugged into it, nothing else. After removing the hub, our speeds have jumped up to several mbps. However in talking with the campus, we got them to run a gigabit line to our firewall in place of the 100mbps line. As of friday, we are at about 65 mbps up and down (according to speedtest.net at 8am)!! Go NC State!!

Read the article

Search Results

Search found 10719 results on 429 pages for 'hardware failure'.

Page 421/429 | < Previous Page | 417 418 419 420 421 422 423 424 425 426 427 428 | Next Page >

- by Dean Richardson

- by Geoff Dalgas

- by user1389890

- by user1389890

- by Hammer Bro.

- by René Kåbis

- by SethG

- by Myles

- by jlego

- by Callum D

- by MasterRoot24

- by Philip Durbin

- by Andy Shinn

- by Manca Weeks

- by Rich Lafferty

- by Hammer Bro.

- by Philip Durbin

- by Jorge Suárez de Lis

- by Peter

- by schiggn

- by Joe Hopfgartner

- by exhuma

- by fuzzybee

- by Patrick

- by Ricket

< Previous Page | 417 418 419 420 421 422 423 424 425 426 427 428 | Next Page >