low level - Page 566 - Developer IT

What is Causing this IIS 7 Web Service Sporadic Connectivity Error?

- by dpalau

On sporadic occasions we receive the following error when attempting to call an .asmx web service from a .Net client application: "The underlying connection was closed: A connection that was expected to be kept alive was closed by the server. Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host." By sporadic I mean that it might occur zero, once every few days, or a half-dozen times a day for some users. It will never occur for the first web service call of a user. And the subsequent (usually the same) call will always work immediately after the failure. The failures happen across a variety of methods in the service and usually happens between 15-20 seconds (according to the log) from the time of the request. Looking in the IIS site log for the particular call will show one or the other of the following windows error codes: 121: The semaphore timeout period has elapsed. 1236: The network connection was aborted by the local system. Some additional environment details: Running on internal network web farm consisting of two servers running IIS7 on Windows Server 2008 OS. These problems did not occur when running in an older IIS6 web farm of three servers running on Windows Server 2003 (and we use a single IIS6/2003 instance for our development and staging environments with no issues). EDIT: Also, all of these server instances are VMWare virtual machines, not sure if that is a surprise anymore or not. The web service is a .Net 2.0/3.5 compiled .asmx web service that has its own application pool (.Net 2.0, integrated pipeline). Only has Windows Authentication enabled. We have another web service on the farm that uses the same physical path as the primary service, the only difference being that Basic Authentication is enabled. This is used for a portion of our ERP system. Have tried using the same and different application pool - no effect on the error. This site isn't hit as often as the primary site and has never had an error. As mentioned, the error will only happen when called from the .Net client - not from other applications. The client application is always creating a new web service object for each request and setting the service credentials to System.Net.CredentialCache.DefaultCredentials. The application is either deployed locally to a client or run in a Citrix server session. Those users running in Citrix doesn't seem to experience the issue, only locally deployed clients. The Citrix servers and the web farm are located in the same physical location and are located in the same IP range (10.67.xx.xx). Locally deployed clients experiencing the error are located elsewhere (10.105.xx.xx, 10.31.xx.xx). I've checked the OS logs to see if I can see any problems but nothing really sticks out. EDIT: Actually, I myself just ran into the error a little bit ago. I decided to check out the logs again and saw that there was a Security log entry of "Audit Failure" at the 'same' time (IIS log entry at 1:39:59, event log entry at 1:39:50). Not sure if this is a coincidence or not, I'll have to check out the logs of previous errors. I'm probably grasping for straws but the details: Log Name: Security Source: Microsoft-Windows-Security-Auditing Date: 7/8/2009 1:39:50 PM Event ID: 5159 Task Category: Filtering Platform Connection Level: Information Keywords: Audit Failure User: N/A Computer: is071019.<**.net Description: The Windows Filtering Platform has blocked a bind to a local port. Application Information: Process ID: 1260 Application Name: \device\harddiskvolume1\windows\system32\svchost.exe Network Information: Source Address: 0.0.0.0 Source Port: 54802 Protocol: 17 Filter Information: Filter Run-Time ID: 0 Layer Name: Resource Assignment Layer Run-Time ID: 36 I've also tried to use Failed Request Tracing in IIS7 but the service call never actually gets to where FRT can capture it (even though the failure is logged in the web service log). The network infrastructure group said they checked out the DNS and any NIC settings are correct so there is no 'flapping'. Everything pans out. I'm not sure that they checked out any domain controller servers though to see if that could be an issue. Any ideas? Or any other debugging strategies to get to the bottom of this? I'm just the developer in charge of the software and don't really have the knowledge on what to investigate from the networking side of things - although it does sound like a networking issue to me based on what is happening. Thanks in advance for any help.

Read the article

Can't connect to STunnel when it's running as a service

- by John Francis

I've got STunnel configured to proxy non SSL POP3 requests to GMail on port 111. This is working fine when STunnel is running as a desktop app, but when I run the STunnel service, I can't connect to port 111 on the machine (using Outlook Express for example). The Stunnel log file shows the port binding is succeeding, but it never sees a connection. There's something preventing the connection to that port when STunnel is running as a service? Here's stunnel.conf cert = stunnel.pem ; Some performance tunings socket = l:TCP_NODELAY=1 socket = r:TCP_NODELAY=1 ; Some debugging stuff useful for troubleshooting debug = 7 output = stunnel.log ; Use it for client mode client = yes ; Service-level configuration [gmail] accept = 127.0.0.1:111 connect = pop.gmail.com:995 stunnel.log from service 2010.10.07 12:14:22 LOG5[80444:72984]: Reading configuration from file stunnel.conf 2010.10.07 12:14:22 LOG7[80444:72984]: Snagged 64 random bytes from C:/.rnd 2010.10.07 12:14:23 LOG7[80444:72984]: Wrote 1024 new random bytes to C:/.rnd 2010.10.07 12:14:23 LOG7[80444:72984]: PRNG seeded successfully 2010.10.07 12:14:23 LOG7[80444:72984]: Certificate: stunnel.pem 2010.10.07 12:14:23 LOG7[80444:72984]: Certificate loaded 2010.10.07 12:14:23 LOG7[80444:72984]: Key file: stunnel.pem 2010.10.07 12:14:23 LOG7[80444:72984]: Private key loaded 2010.10.07 12:14:23 LOG7[80444:72984]: SSL context initialized for service gmail 2010.10.07 12:14:23 LOG5[80444:72984]: Configuration successful 2010.10.07 12:14:23 LOG5[80444:72984]: No limit detected for the number of clients 2010.10.07 12:14:23 LOG7[80444:72984]: FD=156 in non-blocking mode 2010.10.07 12:14:23 LOG7[80444:72984]: Option SO_REUSEADDR set on accept socket 2010.10.07 12:14:23 LOG7[80444:72984]: Service gmail bound to 0.0.0.0:111 2010.10.07 12:14:23 LOG7[80444:72984]: Service gmail opened FD=156 2010.10.07 12:14:23 LOG5[80444:72984]: stunnel 4.34 on x86-pc-mingw32-gnu with OpenSSL 1.0.0a 1 Jun 2010 2010.10.07 12:14:23 LOG5[80444:72984]: Threading:WIN32 SSL:ENGINE Sockets:SELECT,IPv6 stunnel.log from desktop (working) process 2010.10.07 12:10:31 LOG5[80824:81200]: Reading configuration from file stunnel.conf 2010.10.07 12:10:31 LOG7[80824:81200]: Snagged 64 random bytes from C:/.rnd 2010.10.07 12:10:32 LOG7[80824:81200]: Wrote 1024 new random bytes to C:/.rnd 2010.10.07 12:10:32 LOG7[80824:81200]: PRNG seeded successfully 2010.10.07 12:10:32 LOG7[80824:81200]: Certificate: stunnel.pem 2010.10.07 12:10:32 LOG7[80824:81200]: Certificate loaded 2010.10.07 12:10:32 LOG7[80824:81200]: Key file: stunnel.pem 2010.10.07 12:10:32 LOG7[80824:81200]: Private key loaded 2010.10.07 12:10:32 LOG7[80824:81200]: SSL context initialized for service gmail 2010.10.07 12:10:32 LOG5[80824:81200]: Configuration successful 2010.10.07 12:10:32 LOG5[80824:81200]: No limit detected for the number of clients 2010.10.07 12:10:32 LOG7[80824:81200]: FD=156 in non-blocking mode 2010.10.07 12:10:32 LOG7[80824:81200]: Option SO_REUSEADDR set on accept socket 2010.10.07 12:10:32 LOG7[80824:81200]: Service gmail bound to 0.0.0.0:111 2010.10.07 12:10:32 LOG7[80824:81200]: Service gmail opened FD=156 2010.10.07 12:10:33 LOG5[80824:81200]: stunnel 4.34 on x86-pc-mingw32-gnu with OpenSSL 1.0.0a 1 Jun 2010 2010.10.07 12:10:33 LOG5[80824:81200]: Threading:WIN32 SSL:ENGINE Sockets:SELECT,IPv6 2010.10.07 12:10:33 LOG7[80824:81844]: Service gmail accepted FD=188 from 127.0.0.1:24813 2010.10.07 12:10:33 LOG7[80824:81844]: Creating a new thread 2010.10.07 12:10:33 LOG7[80824:81844]: New thread created 2010.10.07 12:10:33 LOG7[80824:25144]: Service gmail started 2010.10.07 12:10:33 LOG7[80824:25144]: FD=188 in non-blocking mode 2010.10.07 12:10:33 LOG7[80824:25144]: Option TCP_NODELAY set on local socket 2010.10.07 12:10:33 LOG5[80824:25144]: Service gmail accepted connection from 127.0.0.1:24813 2010.10.07 12:10:33 LOG7[80824:25144]: FD=212 in non-blocking mode 2010.10.07 12:10:33 LOG6[80824:25144]: connect_blocking: connecting 209.85.227.109:995 2010.10.07 12:10:33 LOG7[80824:25144]: connect_blocking: s_poll_wait 209.85.227.109:995: waiting 10 seconds 2010.10.07 12:10:33 LOG5[80824:25144]: connect_blocking: connected 209.85.227.109:995 2010.10.07 12:10:33 LOG5[80824:25144]: Service gmail connected remote server from 192.168.1.9:24814 2010.10.07 12:10:33 LOG7[80824:25144]: Remote FD=212 initialized 2010.10.07 12:10:33 LOG7[80824:25144]: Option TCP_NODELAY set on remote socket 2010.10.07 12:10:33 LOG7[80824:25144]: SSL state (connect): before/connect initialization 2010.10.07 12:10:33 LOG7[80824:25144]: SSL state (connect): SSLv3 write client hello A 2010.10.07 12:10:33 LOG7[80824:25144]: SSL state (connect): SSLv3 read server hello A 2010.10.07 12:10:33 LOG7[80824:25144]: SSL state (connect): SSLv3 read server certificate A 2010.10.07 12:10:33 LOG7[80824:25144]: SSL state (connect): SSLv3 read server done A 2010.10.07 12:10:33 LOG7[80824:25144]: SSL state (connect): SSLv3 write client key exchange A 2010.10.07 12:10:33 LOG7[80824:25144]: SSL state (connect): SSLv3 write change cipher spec A 2010.10.07 12:10:33 LOG7[80824:25144]: SSL state (connect): SSLv3 write finished A 2010.10.07 12:10:33 LOG7[80824:25144]: SSL state (connect): SSLv3 flush data 2010.10.07 12:10:33 LOG7[80824:25144]: SSL state (connect): SSLv3 read finished A 2010.10.07 12:10:33 LOG7[80824:25144]: 1 items in the session cache 2010.10.07 12:10:33 LOG7[80824:25144]: 1 client connects (SSL_connect()) 2010.10.07 12:10:33 LOG7[80824:25144]: 1 client connects that finished 2010.10.07 12:10:33 LOG7[80824:25144]: 0 client renegotiations requested 2010.10.07 12:10:33 LOG7[80824:25144]: 0 server connects (SSL_accept()) 2010.10.07 12:10:33 LOG7[80824:25144]: 0 server connects that finished 2010.10.07 12:10:33 LOG7[80824:25144]: 0 server renegotiations requested 2010.10.07 12:10:33 LOG7[80824:25144]: 0 session cache hits 2010.10.07 12:10:33 LOG7[80824:25144]: 0 external session cache hits 2010.10.07 12:10:33 LOG7[80824:25144]: 0 session cache misses 2010.10.07 12:10:33 LOG7[80824:25144]: 0 session cache timeouts 2010.10.07 12:10:33 LOG6[80824:25144]: SSL connected: new session negotiated 2010.10.07 12:10:33 LOG6[80824:25144]: Negotiated ciphers: RC4-MD5 SSLv3 Kx=RSA Au=RSA Enc=RC4(128) Mac=MD5 2010.10.07 12:10:34 LOG7[80824:25144]: SSL socket closed on SSL_read 2010.10.07 12:10:34 LOG7[80824:25144]: Sending socket write shutdown 2010.10.07 12:10:34 LOG5[80824:25144]: Connection closed: 53 bytes sent to SSL, 118 bytes sent to socket 2010.10.07 12:10:34 LOG7[80824:25144]: Service gmail finished (0 left)

Read the article

KVM Slow performance on XP Guest

- by Gregg Leventhal

The system is very slow to do anything, even browse a local folder, and CPU sits at 100% frequently. Guest is XP 32 bit. Host is Scientific Linux 6.2, Libvirt 0.10, Guest XP OS shows ACPI Multiprocessor HAL and a virtIO driver for NIC and SCSI. Installed. CPUInfo on host: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz stepping : 7 cpu MHz : 3200.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 6784.93 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: <memory unit='KiB'>4194304</memory> <currentMemory unit='KiB'>4194304</currentMemory> <vcpu placement='static' cpuset='0'>1</vcpu> <os> <type arch='x86_64' machine='rhel6.3.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <cpu mode='custom' match='exact'> <model fallback='allow'>SandyBridge</model> <vendor>Intel</vendor> <feature policy='require' name='vme'/> <feature policy='require' name='tm2'/> <feature policy='require' name='est'/> <feature policy='require' name='vmx'/> <feature policy='require' name='osxsave'/> <feature policy='require' name='smx'/> <feature policy='require' name='ss'/> <feature policy='require' name='ds'/> <feature policy='require' name='tsc-deadline'/> <feature policy='require' name='dtes64'/> <feature policy='require' name='ht'/> <feature policy='require' name='pbe'/> <feature policy='require' name='tm'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='ds_cpl'/> <feature policy='require' name='xtpr'/> <feature policy='require' name='acpi'/> <feature policy='require' name='monitor'/> <feature policy='force' name='sse'/> <feature policy='force' name='sse2'/> <feature policy='force' name='sse4.1'/> <feature policy='force' name='sse4.2'/> <feature policy='force' name='ssse3'/> <feature policy='force' name='x2apic'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/libvirt/images/Server-10-9-13.qcow2'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </disk>

Read the article

Recover RAID 5 data after created new array instead of re-using

- by Brigadieren

Folks please help - I am a newb with a major headache at hand (perfect storm situation). I have a 3 1tb hdd on my ubuntu 11.04 configured as software raid 5. The data had been copied weekly onto another separate off the computer hard drive until that completely failed and was thrown away. A few days back we had a power outage and after rebooting my box wouldn't mount the raid. In my infinite wisdom I entered mdadm --create -f... command instead of mdadm --assemble and didn't notice the travesty that I had done until after. It started the array degraded and proceeded with building and syncing it which took ~10 hours. After I was back I saw that that the array is successfully up and running but the raid is not I mean the individual drives are partitioned (partition type f8 ) but the md0 device is not. Realizing in horror what I have done I am trying to find some solutions. I just pray that --create didn't overwrite entire content of the hard driver. Could someone PLEASE help me out with this - the data that's on the drive is very important and unique ~10 years of photos, docs, etc. Is it possible that by specifying the participating hard drives in wrong order can make mdadm overwrite them? when I do mdadm --examine --scan I get something like ARRAY /dev/md/0 metadata=1.2 UUID=f1b4084a:720b5712:6d03b9e9:43afe51b name=<hostname>:0 Interestingly enough name used to be 'raid' and not the host hame with :0 appended. Here is the 'sanitized' config entries: DEVICE /dev/sdf1 /dev/sde1 /dev/sdd1 CREATE owner=root group=disk mode=0660 auto=yes HOMEHOST <system> MAILADDR root ARRAY /dev/md0 metadata=1.2 name=tanserv:0 UUID=f1b4084a:720b5712:6d03b9e9:43afe51b Here is the output from mdstat cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sdd1[0] sdf1[3] sde1[1] 1953517568 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> fdisk shows the following: fdisk -l Disk /dev/sda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000bf62e Device Boot Start End Blocks Id System /dev/sda1 * 1 9443 75846656 83 Linux /dev/sda2 9443 9730 2301953 5 Extended /dev/sda5 9443 9730 2301952 82 Linux swap / Solaris Disk /dev/sdb: 750.2 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000de8dd Device Boot Start End Blocks Id System /dev/sdb1 1 91201 732572001 8e Linux LVM Disk /dev/sdc: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00056a17 Device Boot Start End Blocks Id System /dev/sdc1 1 60801 488384001 8e Linux LVM Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000ca948 Device Boot Start End Blocks Id System /dev/sdd1 1 121601 976760001 fd Linux raid autodetect Disk /dev/dm-0: 1250.3 GB, 1250254913536 bytes 255 heads, 63 sectors/track, 152001 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/dm-0 doesn't contain a valid partition table Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x93a66687 Device Boot Start End Blocks Id System /dev/sde1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xe6edc059 Device Boot Start End Blocks Id System /dev/sdf1 1 121601 976760001 fd Linux raid autodetect Disk /dev/md0: 2000.4 GB, 2000401989632 bytes 2 heads, 4 sectors/track, 488379392 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 524288 bytes / 1048576 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table Per suggestions I did clean up the superblocks and re-created the array with --assume-clean option but with no luck at all. Is there any tool that will help me to revive at least some of the data? Can someone tell me what and how the mdadm --create does when syncs to destroy the data so I can write a tool to un-do whatever was done? After the re-creating of the raid I run fsck.ext4 /dev/md0 and here is the output root@tanserv:/etc/mdadm# fsck.ext4 /dev/md0 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Superblock invalid, trying backup blocks... fsck.ext4: Bad magic number in super-block while trying to open /dev/md0 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 Per Shanes' suggestion I tried root@tanserv:/home/mushegh# mkfs.ext4 -n /dev/md0 mke2fs 1.41.14 (22-Dec-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=128 blocks, Stripe width=256 blocks 122101760 inodes, 488379392 blocks 24418969 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=0 14905 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 and run fsck.ext4 with every backup block but all returned the following: root@tanserv:/home/mushegh# fsck.ext4 -b 214990848 /dev/md0 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Invalid argument while trying to open /dev/md0 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> Any suggestions? Regards!

Read the article

mdadm raid5 recover double disk failure - with a twist (drive order)

- by Peter Bos

Let me acknowledge first off that I have made mistakes, and that I have a backup for most but not all of the data on this RAID. I still have hope of recovering the rest of the data. I don't have the kind of money to take the drives to a recovery expert company. Mistake #0, not having a 100% backup. I know. I have a mdadm RAID5 system of 4x3TB. Drives /dev/sd[b-e], all with one partition /dev/sd[b-e]1. I'm aware that RAID5 on very large drives is risky, yet I did it anyway. Recent events The RAID become degraded after a two drive failure. One drive [/dev/sdc] is really gone, the other [/dev/sde] came back up after a power cycle, but was not automatically re-added to the RAID. So I was left with a 4 device RAID with only 2 active drives [/dev/sdb and /dev/sdd]. Mistake #1, not using dd copies of the drives for restoring the RAID. I did not have the drives or the time. Mistake #2, not making a backup of the superblock and mdadm -E of the remaining drives. Recovery attempt I reassembled the RAID in degraded mode with mdadm --assemble --force /dev/md0, using /dev/sd[bde]1. I could then access my data. I replaced /dev/sdc with a spare; empty; identical drive. I removed the old /dev/sdc1 from the RAID mdadm --fail /dev/md0 /dev/sdc1 Mistake #3, not doing this before replacing the drive I then partitioned the new /dev/sdc and added it to the RAID. mdadm --add /dev/md0 /dev/sdc1 It then began to restore the RAID. ETA 300 mins. I followed the process via /proc/mdstat to 2% and then went to do other stuff. Checking the result Several hours (but less then 300 mins) later, I checked the process. It had stopped due to a read error on /dev/sde1. Here is where the trouble really starts I then removed /dev/sde1 from the RAID and re-added it. I can't remember why I did this; it was late. mdadm --manage /dev/md0 --remove /dev/sde1 mdadm --manage /dev/md0 --add /dev/sde1 However, /dev/sde1 was now marked as spare. So I decided to recreate the whole array using --assume-clean using what I thought was the right order, and with /dev/sdc1 missing. mdadm --create /dev/md0 --assume-clean -l5 -n4 /dev/sdb1 missing /dev/sdd1 /dev/sde1 That worked, but the filesystem was not recognized while trying to mount. (It should have been EXT4). Device order I then checked a recent backup I had of /proc/mdstat, and I found the drive order. md0 : active raid5 sdb1[0] sde1[4] sdd1[2] sdc1[1] 8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] I then remembered this RAID had suffered a drive loss about a year ago, and recovered from it by replacing the faulty drive with a spare one. That may have scrambled the device order a bit...so there was no drive [3] but only [0],[1],[2], and [4]. I tried to find the drive order with the Permute_array script: https://raid.wiki.kernel.org/index.php/Permute_array.pl but that did not find the right order. Questions I now have two main questions: I screwed up all the superblocks on the drives, but only gave: mdadm --create --assume-clean commands (so I should not have overwritten the data itself on /dev/sd[bde]1. Am I right that in theory the RAID can be restored [assuming for a moment that /dev/sde1 is ok] if I just find the right device order? Is it important that /dev/sde1 be given the device number [4] in the RAID? When I create it with mdadm --create /dev/md0 --assume-clean -l5 -n4 \ /dev/sdb1 missing /dev/sdd1 /dev/sde1 it is assigned the number [3]. I wonder if that is relevant to the calculation of the parity blocks. If it turns out to be important, how can I recreate the array with /dev/sdb1[0] missing[1] /dev/sdd1[2] /dev/sde1[4]? If I could get that to work I could start it in degraded mode and add the new drive /dev/sdc1 and let it resync again. It's OK if you would like to point out to me that this may not have been the best course of action, but you'll find that I realized this. It would be great if anyone has any suggestions.

Read the article

squid3 auth thru samba using ntlm to AD doesn't work

- by derty

some users here are spending to much time exploring the WWW. So big boss whats to get this under control. We use a squid3 just for some security reason and chace benefits. and now i'm trying to set up a new proxy on a different server (Debian 6) Permissions are defined in AC and the squid3 should get the auth thru samba/winbind by using the ntlm protocol. but i'll get all the time Access, denited. it only works by using LDAP but thats not the way i need it. here some log and confs squid access.log 1326878095.784 1 192.168.15.27 TCP_DENIED/407 4049 GET http://at.msn.com/? -NONE/- text/html 1326878095.791 1 192.168.15.27 TCP_DENIED/407 4294 GET http://at.msn.com/? - NONE/- text/html 1326878095.803 9 192.168.15.27 TCP_DENIED/403 4028 GET http://at.msn.com/? kavan NONE/- text/html 1326878095.848 0 192.168.15.27 TCP_DENIED/403 3881 GET http://www.squid-cache.org/Artwork/SN.png kavan NONE/- text/html 1326878100.279 0 192.168.15.27 TCP_DENIED/403 3735 GET http://www.google.at/ kavan NONE/- text/html 1326878100.296 0 192.168.15.27 TCP_DENIED/403 3870 GET http://www.squid-cache.org/Artwork/SN.png kavan NONE/- text/html 1326878155.700 0 192.168.15.27 TCP_DENIED/407 4072 GET http://ie9cvlist.ie.microsoft.com/IE9CompatViewList.xml - NONE/- text/html 1326878155.705 2 192.168.15.27 TCP_DENIED/407 4317 GET http://ie9cvlist.ie.microsoft.com/IE9CompatViewList.xml - NONE/- text/html 1326878155.709 3 192.168.15.27 TCP_DENIED/403 4026 GET http://ie9cvlist.ie.microsoft.com/IE9CompatViewList.xml kavan NONE/- text/html squid chace 2012/01/18 10:12:49| Creating Swap Directories 2012/01/18 10:12:49| Starting Squid Cache version 3.1.6 for x86_64-pc-linux-gnu... 2012/01/18 10:12:49| Process ID 17236 2012/01/18 10:12:49| With 65535 file descriptors available 2012/01/18 10:12:49| Initializing IP Cache... 2012/01/18 10:12:49| DNS Socket created at [::], FD 7 2012/01/18 10:12:49| DNS Socket created at 0.0.0.0, FD 8 2012/01/18 10:12:49| Adding nameserver 192.168.15.2 from /etc/resolv.conf 2012/01/18 10:12:49| Adding nameserver 192.168.15.19 from /etc/resolv.conf 2012/01/18 10:12:49| Adding nameserver 192.168.15.1 from /etc/resolv.conf 2012/01/18 10:12:49| Adding domain schoenbrunn.local from /etc/resolv.conf 2012/01/18 10:12:49| helperOpenServers: Starting 5/5 'squid_ldap_auth' processes 2012/01/18 10:12:49| helperOpenServers: Starting 10/10 'ntlm_auth' processes 2012/01/18 10:12:49| helperOpenServers: Starting 10/10 'squid_kerb_auth' processes 2012/01/18 10:12:49| squid_kerb_auth: INFO: Starting version 1.0.5 2012/01/18 10:12:49| squid_kerb_auth: INFO: Starting version 1.0.5 2012/01/18 10:12:49| squid_kerb_auth: INFO: Starting version 1.0.5 2012/01/18 10:12:49| squid_kerb_auth: INFO: Starting version 1.0.5 2012/01/18 10:12:49| squid_kerb_auth: INFO: Starting version 1.0.5 2012/01/18 10:12:49| squid_kerb_auth: INFO: Starting version 1.0.5 2012/01/18 10:12:49| squid_kerb_auth: INFO: Starting version 1.0.5 2012/01/18 10:12:49| squid_kerb_auth: INFO: Starting version 1.0.5 2012/01/18 10:12:49| helperOpenServers: Starting 5/5 'squid_ldap_group' processes 2012/01/18 10:12:49| squid_kerb_auth: INFO: Starting version 1.0.5 2012/01/18 10:12:49| squid_kerb_auth: INFO: Starting version 1.0.5 2012/01/18 10:12:49| Unlinkd pipe opened on FD 73 2012/01/18 10:12:49| Local cache digest enabled; rebuild/rewrite every 3600/3600 sec 2012/01/18 10:12:49| Store logging disabled 2012/01/18 10:12:49| Swap maxSize 0 + 262144 KB, estimated 20164 objects 2012/01/18 10:12:49| Target number of buckets: 1008 2012/01/18 10:12:49| Using 8192 Store buckets 2012/01/18 10:12:49| Max Mem size: 262144 KB 2012/01/18 10:12:49| Max Swap size: 0 KB 2012/01/18 10:12:49| Using Least Load store dir selection 2012/01/18 10:12:49| Set Current Directory to /var/spool/squid3 2012/01/18 10:12:49| Loaded Icons. 2012/01/18 10:12:49| Accepting HTTP connections at [::]:3128, FD 74. 2012/01/18 10:12:49| HTCP Disabled. 2012/01/18 10:12:49| Squid modules loaded: 0 2012/01/18 10:12:49| Adaptation support is off. 2012/01/18 10:12:49| Ready to serve requests. 2012/01/18 10:12:50| storeLateRelease: released 0 objects smb.conf # Domain Authntication Settings workgroup = <WORKGROUP> security = ads password server = <DOMAINNAME>.LOCAL realm = <DOMAINNAME>.LOCAL ldap ssl = no # logging log level = 5 max log size = 50 # logs split per machine log file = /var/log/samba/%m.log # max 50KB per log file, then rotate ; max log size = 50 # User settings username map = /etc/samba/smbusers idmap uid = 10000-20000000 idmap gid = 10000-20000000 idmap backend = ad ; template primary group = <ad group> template shell = /sbin/nologin # Winbind Settings winbind separator = + winbind enum users = Yes winbind enum groups = Yes winbind netsted groups = Yes winbind nested groups = Yes winbind cache time = 10 winbind use default domain = Yes #Other Globals unix charset = LOCALE server string = <SERVERNAME> load printers = no printing = cups cups options = raw ; printcap name = /etc/printcap #obtain list of printers automatically on SystemV ; printcap name = lpstat ; printing = cups squid.conf auth_param ntlm program /usr/bin/ntlm_auth --require-membership-of=<DOMAINNAME>\\INTERNETZ --helper-protocol=squid-2.5-ntlmssp auth_param ntlm children 10 auth_param basic program /usr/lib/squid3/squid_ldap_auth -R -b "dc=<dcname>,dc=local" -D "cn=administrator,cn=Users,dc=<domainname>,dc=local" -w "******" -f sAMAccountName=%s -h 192.168.15.19:3268 auth_param basic realm "Proxy Authentifizierung. Bitte geben Sie Ihren Benutzername und Ihr Passwort ein!" #means insert you PW in an other language - # external_acl_type InetGroup %LOGIN /usr/lib/squid3/squid_ldap_group -R -b "dc=<domainname>,dc=local" -D "cn=administrator,cn=Users,dc=<domainname>,dc=local" -w "******" -f "(&(objectclass=person)(sAMAccountName=%v) (memberof=cn=%a,cn=internetz,dc=<domainname>,dc=local))" -h 192.168.15.19:3268 auth_param negotiate program /usr/lib/squid3/squid_kerb_auth -d auth_param negotiate children 10 auth_param negotiate keep_alive on acl localnet proxy_auth REQUIRED acl InetAccess external InetGroup Internetz http_access allow InetAccess http_access deny all acl auth proxy_auth REQUIRED http_access allow auth and a very suspicious is that by adding the proxy server to the Domain i see 2 new entries in the PC one with the original computer-name leopoldine and one with leopoldine CNF:f8efa4c4-ff0e-4217-939d-f1523b43464d ?!? I tried a lot, really... but i stuck on this problem... i actually i even reinstalled all dependent programs and reconfigured them from default. Group exists and has me in it. Firefox running on the old proxy and i use IE for testing the new one. But i'll get all the time Access-Denited and to be honest i'm quite a beginner, so please don't be to prude. I'll interested in improving, i'll get the information we need to fix this but i started working 2 month ago and got only 1 1/2 year's training and not a single sec. in linux ;)

Read the article

Need help configurating my Tomcat server without any WAR files

- by gablin

I just reinstalled my entire server, and now I can't seem to get my JSP-based website to work on Tomcat anymore. I use the same server.xml file, which worked perfectly before the reinstallation, but no longer. Here's the content of the server.xml file which worked before:  <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />  <Listener className="org.apache.catalina.core.JasperListener" />  <Listener className="org.apache.catalina.mbeans.ServerLifecycleListener" /> <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" />  <GlobalNamingResources>  <Resource name="UserDatabase" auth="Container" type="org.apache.catalina.UserDatabase" description="User database that can be updated and saved" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" pathname="conf/tomcat-users.xml" /> </GlobalNamingResources>  <Service name="Catalina">    <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />      <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />   <Engine name="Catalina" defaultHost="localhost">      <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase"/>        <Host name="www.rebootradio.nu"> <Alias>rebootradio.nu</Alias> <Context path="" docBase="D:/services/http/rebootradio.nu" debug="1" reloadable="true"/> </Host> </Engine> </Service> </Server> The JSP site doesn't use any WAR files or anything like that; there's just a default.jsp in the specified folder D:/services/http/rebootradio.nu which loads the site. As I said, this configuration worked before, but now with the latest verion of XAMPP and Tomcat it doesn't work anymore. All I get is a 404 message saying The requested resource () is not available.

Read the article

Strange Recurrent Excessive I/O Wait

- by Chris

I know quite well that I/O wait has been discussed multiple times on this site, but all the other topics seem to cover constant I/O latency, while the I/O problem we need to solve on our server occurs at irregular (short) intervals, but is ever-present with massive spikes of up to 20k ms a-wait and service times of 2 seconds. The disk affected is /dev/sdb (Seagate Barracuda, for details see below). A typical iostat -x output would at times look like this, which is an extreme sample but by no means rare: iostat (Oct 6, 2013) tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 16.00 0.00 156.00 9.75 21.89 288.12 36.00 57.60 5.50 0.00 44.00 8.00 48.79 2194.18 181.82 100.00 2.00 0.00 16.00 8.00 46.49 3397.00 500.00 100.00 4.50 0.00 40.00 8.89 43.73 5581.78 222.22 100.00 14.50 0.00 148.00 10.21 13.76 5909.24 68.97 100.00 1.50 0.00 12.00 8.00 8.57 7150.67 666.67 100.00 0.50 0.00 4.00 8.00 6.31 10168.00 2000.00 100.00 2.00 0.00 16.00 8.00 5.27 11001.00 500.00 100.00 0.50 0.00 4.00 8.00 2.96 17080.00 2000.00 100.00 34.00 0.00 1324.00 9.88 1.32 137.84 4.45 59.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22.00 44.00 204.00 11.27 0.01 0.27 0.27 0.60 Let me provide you with some more information regarding the hardware. It's a Dell 1950 III box with Debian as OS where uname -a reports the following: Linux xx 2.6.32-5-amd64 #1 SMP Fri Feb 15 15:39:52 UTC 2013 x86_64 GNU/Linux The machine is a dedicated server that hosts an online game without any databases or I/O heavy applications running. The core application consumes about 0.8 of the 8 GBytes RAM, and the average CPU load is relatively low. The game itself, however, reacts rather sensitive towards I/O latency and thus our players experience massive ingame lag, which we would like to address as soon as possible. iostat: avg-cpu: %user %nice %system %iowait %steal %idle 1.77 0.01 1.05 1.59 0.00 95.58 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 13.16 25.42 135.12 504701011 2682640656 sda 1.52 0.74 20.63 14644533 409684488 Uptime is: 19:26:26 up 229 days, 17:26, 4 users, load average: 0.36, 0.37, 0.32 Harddisk controller: 01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) Harddisks: Array 1, RAID-1, 2x Seagate Cheetah 15K.5 73 GB SAS Array 2, RAID-1, 2x Seagate ST3500620SS Barracuda ES.2 500GB 16MB 7200RPM SAS Partition information from df: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 480191156 30715200 425083668 7% /home /dev/sda2 7692908 437436 6864692 6% / /dev/sda5 15377820 1398916 13197748 10% /usr /dev/sda6 39159724 19158340 18012140 52% /var Some more data samples generated with iostat -dx sdb 1 (Oct 11, 2013) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 15.00 0.00 70.00 0.00 656.00 9.37 4.50 1.83 4.80 33.60 sdb 0.00 0.00 0.00 2.00 0.00 16.00 8.00 12.00 836.00 500.00 100.00 sdb 0.00 0.00 0.00 3.00 0.00 32.00 10.67 9.96 1990.67 333.33 100.00 sdb 0.00 0.00 0.00 4.00 0.00 40.00 10.00 6.96 3075.00 250.00 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00 100.00 sdb 0.00 0.00 0.00 2.00 0.00 16.00 8.00 2.62 4648.00 500.00 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 100.00 sdb 0.00 0.00 0.00 1.00 0.00 16.00 16.00 1.69 7024.00 1000.00 100.00 sdb 0.00 74.00 0.00 124.00 0.00 1584.00 12.77 1.09 67.94 6.94 86.00 Characteristic charts generated with rrdtool can be found here: iostat plot 1, 24 min interval: http://imageshack.us/photo/my-images/600/yqm3.png/ iostat plot 2, 120 min interval: http://imageshack.us/photo/my-images/407/griw.png/ As we have a rather large cache of 5.5 GBytes, we thought it might be a good idea to test if the I/O wait spikes would perhaps be caused by cache miss events. Therefore, we did a sync and then this to flush the cache and buffers: echo 3 > /proc/sys/vm/drop_caches and directly afterwards the I/O wait and service times virtually went through the roof, and everything on the machine felt like slow motion. During the next few hours the latency recovered and everything was as before - small to medium lags in short, unpredictable intervals. Now my question is: does anybody have any idea what might cause this annoying behaviour? Is it the first indication of the disk array or the raid controller dying, or something that can be easily mended by rebooting? (At the moment we're very reluctant to do this, however, because we're afraid that the disks might not come back up again.) Any help is greatly appreciated. Thanks in advance, Chris. Edited to add: we do see one or two processes go to 'D' state in top, one of which seems to be kjournald rather frequently. If I'm not mistaken, however, this does not indicate the processes causing the latency, but rather those affected by it - correct me if I'm wrong. Does the information about uninterruptibly sleeping processes help us in any way to address the problem? @Andy Shinn requested smartctl data, here it is: smartctl -a -d megaraid,2 /dev/sdb yields: smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Device: SEAGATE ST3500620SS Version: MS05 Serial number: Device type: disk Transport protocol: SAS Local Time is: Mon Oct 14 20:37:13 2013 CEST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 20 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 1236631092 Blocks received from initiator = 1097862364 Blocks read from cache and sent to initiator = 1383620256 Number of read and write commands whose size <= segment size = 531295338 Number of read and write commands whose size > segment size = 51986460 Vendor (Seagate/Hitachi) factory information number of hours powered up = 36556.93 number of minutes until next internal SMART test = 32 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 509271032 47 0 509271079 509271079 20981.423 0 write: 0 0 0 0 0 5022.039 0 verify: 1870931090 196 0 1870931286 1870931286 100558.708 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed 16 36538 - [- - -] # 2 Background short Completed 16 36514 - [- - -] # 3 Background short Completed 16 36490 - [- - -] # 4 Background short Completed 16 36466 - [- - -] # 5 Background short Completed 16 36442 - [- - -] # 6 Background long Completed 16 36420 - [- - -] # 7 Background short Completed 16 36394 - [- - -] # 8 Background short Completed 16 36370 - [- - -] # 9 Background long Completed 16 36364 - [- - -] #10 Background short Completed 16 36361 - [- - -] #11 Background long Completed 16 2 - [- - -] #12 Background short Completed 16 0 - [- - -] Long (extended) Self Test duration: 6798 seconds [113.3 minutes] smartctl -a -d megaraid,3 /dev/sdb yields: smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Device: SEAGATE ST3500620SS Version: MS05 Serial number: Device type: disk Transport protocol: SAS Local Time is: Mon Oct 14 20:37:26 2013 CEST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 19 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 288745640 Blocks received from initiator = 1097848399 Blocks read from cache and sent to initiator = 1304149705 Number of read and write commands whose size <= segment size = 527414694 Number of read and write commands whose size > segment size = 51986460 Vendor (Seagate/Hitachi) factory information number of hours powered up = 36596.83 number of minutes until next internal SMART test = 28 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 610862490 44 0 610862534 610862534 20470.133 0 write: 0 0 0 0 0 5022.480 0 verify: 2861227413 203 0 2861227616 2861227616 100872.443 0 Non-medium error count: 1 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed 16 36580 - [- - -] # 2 Background short Completed 16 36556 - [- - -] # 3 Background short Completed 16 36532 - [- - -] # 4 Background short Completed 16 36508 - [- - -] # 5 Background short Completed 16 36484 - [- - -] # 6 Background long Completed 16 36462 - [- - -] # 7 Background short Completed 16 36436 - [- - -] # 8 Background short Completed 16 36412 - [- - -] # 9 Background long Completed 16 36404 - [- - -] #10 Background short Completed 16 36401 - [- - -] #11 Background long Completed 16 2 - [- - -] #12 Background short Completed 16 0 - [- - -] Long (extended) Self Test duration: 6798 seconds [113.3 minutes]

Read the article

Unable to access intel fake RAID 1 array in Fedora 14 after reboot

- by Sim

Hello everyone, 1st I am relatively new to linux (but not to *nix). I have 4 disks assembled in the following intel ahci bios fake raid arrays: 2x320GB RAID1 - used for operating systems md126 2x1TB RAID1 - used for data md125 I have used the raid of size 320GB to install my operating system and the second raid I didn't even select during the installation of Fedora 14. After successful partitioning and installation of Fedora, I tried to make the second array available, it was possible to make it visible in linux with mdadm --assembe --scan , after that I created one maximum size partition and 1 maximum size ext4 filesystem in it. Mounted, and used it. After restart - a few I/O errors during boot regarding md125 + inability to mount the filesystem on it and dropped into repair shell. I commented the filesystem in fstab and it booted. To my surprise, the array was marked as "auto read only": [root@localhost ~]# cat /proc/mdstat Personalities : [raid1] md125 : active (auto-read-only) raid1 sdc[1] sdd[0] 976759808 blocks super external:/md127/0 [2/2] [UU] md127 : inactive sdc[1](S) sdd[0](S) 4514 blocks super external:imsm md126 : active raid1 sda[1] sdb[0] 312566784 blocks super external:/md1/0 [2/2] [UU] md1 : inactive sdb[1](S) sda[0](S) 4514 blocks super external:imsm unused devices: <none> [root@localhost ~]# and the partition in it was not available as device special file in /dev: [root@localhost ~]# ls -l /dev/md125* brw-rw---- 1 root disk 9, 125 Jan 6 15:50 /dev/md125 [root@localhost ~]# But the partition is there according to fdisk: [root@localhost ~]# fdisk -l /dev/md125 Disk /dev/md125: 1000.2 GB, 1000202043392 bytes 19 heads, 10 sectors/track, 10281682 cylinders, total 1953519616 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x1b238ea9 Device Boot Start End Blocks Id System /dev/md125p1 2048 1953519615 976758784 83 Linux [root@localhost ~]# I tried to "activate" the array in different ways (I'm not experienced with mdadm and the man page is gigantic so I was only browsing it looking for my answer) but it was impossible - the array would still stay in "auto read only" and the device special file for the partition it will not be in /dev. It was only after I recreated the partition via fdisk that it reappeared in /dev... until next reboot. So, my question is - How do I make the array automatically available after reboot? Here is some additional information: 1st I am able to see the UUID of the array in blkid: [root@localhost ~]# blkid /dev/sdc: UUID="b9a1149f-ae11-4fc8-a600-0d77354dc42a" SEC_TYPE="ext2" TYPE="ext3" /dev/sdd: UUID="b9a1149f-ae11-4fc8-a600-0d77354dc42a" SEC_TYPE="ext2" TYPE="ext3" /dev/md126p1: UUID="60C8D9A7C8D97C2A" TYPE="ntfs" /dev/md126p2: UUID="3d1b38a3-b469-4b7c-b016-8abfb26a5d7d" TYPE="ext4" /dev/md126p3: UUID="1Msqqr-AAF8-k0wi-VYnq-uWJU-y0OD-uIFBHL" TYPE="LVM2_member" /dev/mapper/vg00-rootlv: LABEL="_Fedora-14-x86_6" UUID="34cc1cf5-6845-4489-8303-7a90c7663f0a" TYPE="ext4" /dev/mapper/vg00-swaplv: UUID="4644d857-e13b-456c-ac03-6f26299c1046" TYPE="swap" /dev/mapper/vg00-homelv: UUID="82bd58b2-edab-4b4b-aec4-b79595ecd0e3" TYPE="ext4" /dev/mapper/vg00-varlv: UUID="1b001444-5fdd-41b6-a59a-9712ec6def33" TYPE="ext4" /dev/mapper/vg00-tmplv: UUID="bf7d2459-2b35-4a1c-9b81-d4c4f24a9842" TYPE="ext4" /dev/md125: UUID="b9a1149f-ae11-4fc8-a600-0d77354dc42a" SEC_TYPE="ext2" TYPE="ext3" /dev/sda: TYPE="isw_raid_member" /dev/md125p1: UUID="420adfdd-6c4e-4552-93f0-2608938a4059" TYPE="ext4" [root@localhost ~]# Here is how /etc/mdadm.conf looks like: [root@localhost ~]# cat /etc/mdadm.conf # mdadm.conf written out by anaconda MAILADDR root AUTO +imsm +1.x -all ARRAY /dev/md1 UUID=89f60dee:e46a251f:7475814b:d4cc19a9 ARRAY /dev/md126 UUID=a8775c90:cee66376:5310fc13:63bcba5b ARRAY /dev/md125 UUID=b9a1149f:ae114fc8:a6000d77:354dc42a [root@localhost ~]# here is how /proc/mdstat looks like after I recreate the partition in the array so that it becomes available: [root@localhost ~]# cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sdc[1] sdd[0] 976759808 blocks super external:/md127/0 [2/2] [UU] md127 : inactive sdc[1](S) sdd[0](S) 4514 blocks super external:imsm md126 : active raid1 sda[1] sdb[0] 312566784 blocks super external:/md1/0 [2/2] [UU] md1 : inactive sdb[1](S) sda[0](S) 4514 blocks super external:imsm unused devices: <none> [root@localhost ~]# Detailed output regarding the array in subject: [root@localhost ~]# mdadm --detail /dev/md125 /dev/md125: Container : /dev/md127, member 0 Raid Level : raid1 Array Size : 976759808 (931.51 GiB 1000.20 GB) Used Dev Size : 976759940 (931.51 GiB 1000.20 GB) Raid Devices : 2 Total Devices : 2 Update Time : Fri Jan 7 00:38:00 2011 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 30ebc3c2:b6a64751:4758d05c:fa8ff782 Number Major Minor RaidDevice State 1 8 32 0 active sync /dev/sdc 0 8 48 1 active sync /dev/sdd [root@localhost ~]# and /etc/fstab, with /data commented (the filesystem that is on this array): # # /etc/fstab # Created by anaconda on Thu Jan 6 03:32:40 2011 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/vg00-rootlv / ext4 defaults 1 1 UUID=3d1b38a3-b469-4b7c-b016-8abfb26a5d7d /boot ext4 defaults 1 2 #UUID=420adfdd-6c4e-4552-93f0-2608938a4059 /data ext4 defaults 0 1 /dev/mapper/vg00-homelv /home ext4 defaults 1 2 /dev/mapper/vg00-tmplv /tmp ext4 defaults 1 2 /dev/mapper/vg00-varlv /var ext4 defaults 1 2 /dev/mapper/vg00-swaplv swap swap defaults 0 0 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 [root@localhost ~]# Thanks in advance to everyone that even read this whole issue :-)

Read the article

nginx+php-fpm help optimize configs

- by Dmitro

I have 3 servers. First server (CPU - model name: 06/17, 2.66GHz, 4 cores, 8GB RAM) have nginx as load balancer with next config upstream lb_mydomain { server mydomain.ru:81 weight=2; server 66.0.0.18 weight=6; } server { listen 80; server_name ~(?!mydomain.ru)(.*); client_max_body_size 20m; location / { proxy_pass http://lb_mydomain; proxy_redirect off; proxy_set_header Connection close; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_pass_header Set-Cookie; proxy_pass_header P3P; proxy_pass_header Content-Type; proxy_pass_header Content-Disposition; proxy_pass_header Content-Length; } } And configs from nginx.conf: user www-data; worker_processes 5; # worker_priority -1; error_log /var/log/nginx/error.log; pid /var/run/nginx.pid; events { worker_connections 5024; # multi_accept on; } http { include /etc/nginx/mime.types; access_log /var/log/nginx/access.log; sendfile on; default_type application/octet-stream; #tcp_nopush on; keepalive_timeout 65; tcp_nodelay on; gzip on; gzip_disable "MSIE [1-6]\.(?!.*SV1)"; # PHP-FPM (backend) upstream php-fpm { server 127.0.0.1:9000; } include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } And config php-fpm: listen = 127.0.0.1:9000 ;listen.backlog = -1 ;listen.allowed_clients = 127.0.0.1 ;listen.owner = www-data ;listen.group = www-data ;listen.mode = 0666 user = www-data group = www-data pm = dynamic pm.max_children = 80 ;pm.start_servers = 20 pm.min_spare_servers = 5 pm.max_spare_servers = 35 ;pm.max_requests = 500 pm.status_path = /status ping.path = /ping ;ping.response = pong request_terminate_timeout = 30s request_slowlog_timeout = 10s slowlog = /var/log/php-fpm.log.slow ;rlimit_files = 1024 ;rlimit_core = 0 ;chroot = chdir = /var/www ;catch_workers_output = yes ;env[HOSTNAME] = $HOSTNAME ;env[PATH] = /usr/local/bin:/usr/bin:/bin ;env[TMP] = /tmp ;env[TMPDIR] = /tmp ;env[TEMP] = /tmp ;php_admin_value[sendmail_path] = /usr/sbin/sendmail -t -i -f [email protected] ;php_flag[display_errors] = off ;php_admin_value[error_log] = /var/log/fpm-php.www.log ;php_admin_flag[log_errors] = on ;php_admin_value[memory_limit] = 32M In top I see 20 php-fpm processes which use from 1% - 15% CPU. So it's have high load averadge: top - 15:36:22 up 34 days, 20:54, 1 user, load average: 5.98, 7.75, 8.78 Tasks: 218 total, 1 running, 217 sleeping, 0 stopped, 0 zombie Cpu(s): 34.1%us, 3.2%sy, 0.0%ni, 37.0%id, 24.8%wa, 0.0%hi, 0.9%si, 0.0%st Mem: 8183228k total, 7538584k used, 644644k free, 351136k buffers Swap: 9936892k total, 14636k used, 9922256k free, 990540k cached Second server(CPU - model name: Intel(R) Xeon(R) CPU E5504 @ 2.00GHz, 8 cores, 8GB RAM). Nginx configs from nginx.conf: user www-data; worker_processes 5; # worker_priority -1; error_log /var/log/nginx/error.log; pid /var/run/nginx.pid; events { worker_connections 5024; # multi_accept on; } http { include /etc/nginx/mime.types; access_log /var/log/nginx/access.log; sendfile on; default_type application/octet-stream; #tcp_nopush on; keepalive_timeout 65; tcp_nodelay on; gzip on; gzip_disable "MSIE [1-6]\.(?!.*SV1)"; # PHP-FPM (backend) upstream php-fpm { server 127.0.0.1:9000; } include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } And config of php-fpm: listen = 127.0.0.1:9000 ;listen.backlog = -1 ;listen.allowed_clients = 127.0.0.1 ;listen.owner = www-data ;listen.group = www-data ;listen.mode = 0666 user = www-data group = www-data pm = dynamic pm.max_children = 50 ;pm.start_servers = 20 pm.min_spare_servers = 5 pm.max_spare_servers = 35 ;pm.max_requests = 500 ;pm.status_path = /status ;ping.path = /ping ;ping.response = pong ;request_terminate_timeout = 0 ;request_slowlog_timeout = 0 ;slowlog = /var/log/php-fpm.log.slow ;rlimit_files = 1024 ;rlimit_core = 0 ;chroot = chdir = /var/www ;catch_workers_output = yes ;env[HOSTNAME] = $HOSTNAME ;env[PATH] = /usr/local/bin:/usr/bin:/bin ;env[TMP] = /tmp ;env[TMPDIR] = /tmp ;env[TEMP] = /tmp ;php_admin_value[sendmail_path] = /usr/sbin/sendmail -t -i -f [email protected] ;php_flag[display_errors] = off ;php_admin_value[error_log] = /var/log/fpm-php.www.log ;php_admin_flag[log_errors] = on ;php_admin_value[memory_limit] = 32M In top I see 50 php-fpm processes which use from 10% - 25% CPU. So it's have high load averadge: top - 15:53:05 up 33 days, 1:15, 1 user, load average: 41.35, 40.28, 39.61 Tasks: 239 total, 40 running, 199 sleeping, 0 stopped, 0 zombie Cpu(s): 96.5%us, 3.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 8185560k total, 7804224k used, 381336k free, 161648k buffers Swap: 19802108k total, 16k used, 19802092k free, 5068112k cached Third server is server with database postgresql. Also i try ab -n 50 -c 5 http://www.mydomain.ru/ And I get next info: Complete requests: 50 Failed requests: 48 (Connect: 0, Receive: 0, Length: 48, Exceptions: 0) Write errors: 0 Total transferred: 9271367 bytes HTML transferred: 9247767 bytes Requests per second: 1.02 [#/sec] (mean) Time per request: 4882.427 [ms] (mean) Time per request: 976.486 [ms] (mean, across all concurrent requests) Transfer rate: 185.44 [Kbytes/sec] received Please advise how can I make lower level of load average?

Read the article

solved: passenger(mod_rails) fails to start puppet master under nginx

- by Anadi Misra

On the server [root@bangvmpllDA02 logs]# ruby -v ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux] [root@bangvmpllDA02 logs]# puppet --version 3.0.1 and [root@bangvmpllDA02 logs]# service nginx configtest nginx: the configuration file /apps/nginx/nginx.conf syntax is ok nginx: configuration file /apps/nginx/nginx.conf test is successful [root@bangvmpllDA02 logs]# service nginx status nginx (pid 25923 25921 25920 25917 25908) is running... [root@bangvmpllDA02 logs]# however none of my agents are able to connect to the master, they all fail with errors like so [amisr1@blramisr195602 ~]$ puppet agent --test --verbose --server bangvmpllda02.XXX.com Info: Creating a new SSL certificate request for blramisr195602.XXX.com Info: Certificate Request fingerprint (SHA256): 26:EB:08:1F:82:32:E4:03:7A:64:8E:30:A3:99:93:26:E6:66:B9:B0:49:B6:08:F9:67:CA:1B:0C:00:B9:1D:41 Error: Could not request certificate: Error 405 on SERVER: <html> <head><title>405 Not Allowed</title></head> <body bgcolor="white"> <center><h1>405 Not Allowed</h1></center> <hr><center>nginx</center> </body> </html> Exiting; failed to retrieve certificate and waitforcert is disabled when I check logs on puppet master [root@bangvmpllDA02 logs]# tail puppet_access.log [05/Dec/2012:17:45:18 +0530] "GET /production/certificate/ca? HTTP/1.1" 404 162 "-" "Ruby" [05/Dec/2012:18:32:23 +0530] "PUT /production/certificate_request/sl63anadi.XXX.com HTTP/1.1" 405 166 "-" "-" [05/Dec/2012:18:33:33 +0530] "GET /production/certificate/sl63anadi.XXX.com? HTTP/1.1" 404 162 "-" "-" [05/Dec/2012:18:33:33 +0530] "GET /production/certificate_request/sl63anadi.XXX.com? HTTP/1.1" 404 162 "-" "-" [05/Dec/2012:18:33:33 +0530] "PUT /production/certificate_request/sl63anadi.XXX.com HTTP/1.1" 405 166 "-" "-" and the error logs show that nginx is not really able to process the request well 2012/12/05 18:33:33 [error] 25920#0: *23 open() "/etc/puppet/rack/public/production/certificate/sl63anadi.XXX.com" failed (2: No such file or directory), client: 10.209.47.26, server: , request: "GET /production/certificate/sl63anadi.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:33:33 [error] 25920#0: *24 open() "/etc/puppet/rack/public/production/certificate_request/sl63anadi.XXX.com" failed (2: No such file or directory), client: 10.209.47.26, server: , request: "GET /production/certificate_request/sl63anadi.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:47:56 [error] 25923#0: *27 open() "/etc/puppet/rack/public/production/certificate/ca" failed (2: No such file or directory), client: 10.209.47.31, server: , request: "GET /production/certificate/ca? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:47:56 [error] 25923#0: *28 open() "/etc/puppet/rack/public/production/certificate_request/blramisr195602.XXX.com" failed (2: No such file or directory), client: 10.209.47.31, server: , request: "GET /production/certificate_request/blramisr195602.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" Passenger does not show any application groups either [root@bangvmpllDA02 nginx]# passenger-status ----------- General information ----------- max = 15 count = 0 active = 0 inactive = 0 Waiting on global queue: 0 ----------- Application groups ----------- [root@bangvmpllDA02 nginx]# here's my nginx configuration [root@bangvmpllDA02 logs]# cat ../nginx.conf user puppet; worker_processes 4; #error_log logs/error.log; #error_log logs/error.log notice; error_log logs/error.log info; #pid logs/nginx.pid; events { use epoll; worker_connections 1024; } http { include mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log logs/access.log main; sendfile on; #tcp_nopush on; server_tokens off; #keepalive_timeout 0; keepalive_timeout 120; gzip on; gzip_http_version 1.1; gzip_disable "msie6"; gzip_vary on; gzip_min_length 1100; gzip_buffers 64 8k; gzip_comp_level 3; gzip_proxied any; gzip_types text/plain text/css application/x-javascript text/xml application/xml; server { listen 80; server_name bangvmpllda02.XXXX.com; charset utf-8; #access_log logs/http.access.log main; location / { root html; index index.html index.htm index.php; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # location ~ \.php$ { root html; fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_param SCRIPT_NAME $fastcgi_script_name; include fastcgi_params; } # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # location ~ /\.ht { access_log off; log_not_found off; deny all; } location ~* \.(jpg|jpeg|gif|png|css|js|ico|xml)$ { access_log off; log_not_found off; expires 2d; } } # Passenger needed for puppet passenger_root /usr/lib/ruby/gems/1.8/gems/passenger-3.0.18; passenger_ruby /usr/bin/ruby; passenger_max_pool_size 15; server { ssl on; listen 8140 default ssl; server_name bangvmpllda02.XXXX.com; passenger_enabled on; passenger_set_cgi_param HTTP_X_CLIENT_DN $ssl_client_s_dn; passenger_set_cgi_param HTTP_X_CLIENT_VERIFY $ssl_client_verify; passenger_min_instances 5; access_log logs/puppet_access.log; error_log logs/puppet_error.log; root /etc/puppet/rack/public; ssl_certificate /var/lib/puppet/ssl/certs/bangvmpllda02.XXX.com.pem; ssl_certificate_key /var/lib/puppet/ssl/private_keys/bangvmpllda02.XXX.com.pem; ssl_crl /var/lib/puppet/ssl/ca/ca_crl.pem; ssl_client_certificate /var/lib/puppet/ssl/certs/ca.pem; ssl_ciphers SSLv2:-LOW:-EXPORT:RC4+RSA; ssl_prefer_server_ciphers on; ssl_verify_client optional; ssl_verify_depth 1; ssl_session_cache shared:SSL:128m; ssl_session_timeout 5m; } } and the puppet.conf [main] # The Puppet log directory. # The default value is '$vardir/log'. logdir = /var/log/puppet # Where Puppet PID files are kept. # The default value is '$vardir/run'. rundir = /var/run/puppet dns_alt_names = devops.XXXX.com,devops confdir = /etc/puppet vardir = /var/lib/puppet storeconfigs = true storeconfigs_backend = puppetdb thin_storeconfigs = false async_storeconfigs = false ssl_client_header = SSL_CLIENT_S_D ssl_client_verify_header = SSL_CLIENT_VERIFY # Where SSL certificates are kept. # The default value is '$confdir/ssl'. ssldir = $vardir/ssl any ideas where am I going wrong? I checkthe directory permissions; /usr/share/puppet, /etc/puppet and /var/lib/puppet (and files inside them) are owned by puppet user. Solved The simple solution to my complicated problem was that I had placed the config.ru in wrong place moved it to /etc/puppet/rack , it was in /etc/puppet/rack/public Well!!! :-/

Read the article

fd partitions gone from 2 discs, md happy with it and resyncs. How to recover ?

- by d0nd

Hey gurus, need some help badly with this one. I run a server with a 6Tb md raid5 volume built over 7*1Tb disks. I've had to shut down the server lately and when it went back up, 2 out of the 7 disks used for the raid volume had lost its conf : dmesg : [ 10.184167] sda: sda1 sda2 sda3 // System disk [ 10.202072] sdb: sdb1 [ 10.210073] sdc: sdc1 [ 10.222073] sdd: sdd1 [ 10.229330] sde: sde1 [ 10.239449] sdf: sdf1 [ 11.099896] sdg: unknown partition table [ 11.255641] sdh: unknown partition table All 7 disks have same geometry and were configured alike : dmesg : Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x1e7481a5 Device Boot Start End Blocks Id System /dev/sdb1 1 121601 976760001 fd Linux raid autodetect All 7 disks (sdb1, sdc1, sdd1, sde1, sdf1, sdg1, sdh1) were used in a md raid5 xfs volume. When booting, md, which was (obviously) out of sync kicked in and automatically started rebuilding over the 7 disks, including the two "faulty" ones; xfs tried to do some shenanigans as well: dmesg : [ 19.566941] md: md0 stopped. [ 19.817038] md: bind<sdc1> [ 19.817339] md: bind<sdd1> [ 19.817465] md: bind<sde1> [ 19.817739] md: bind<sdf1> [ 19.817917] md: bind<sdh> [ 19.818079] md: bind<sdg> [ 19.818198] md: bind<sdb1> [ 19.818248] md: md0: raid array is not clean -- starting background reconstruction [ 19.825259] raid5: device sdb1 operational as raid disk 0 [ 19.825261] raid5: device sdg operational as raid disk 6 [ 19.825262] raid5: device sdh operational as raid disk 5 [ 19.825264] raid5: device sdf1 operational as raid disk 4 [ 19.825265] raid5: device sde1 operational as raid disk 3 [ 19.825267] raid5: device sdd1 operational as raid disk 2 [ 19.825268] raid5: device sdc1 operational as raid disk 1 [ 19.825665] raid5: allocated 7334kB for md0 [ 19.825667] raid5: raid level 5 set md0 active with 7 out of 7 devices, algorithm 2 [ 19.825669] RAID5 conf printout: [ 19.825670] --- rd:7 wd:7 [ 19.825671] disk 0, o:1, dev:sdb1 [ 19.825672] disk 1, o:1, dev:sdc1 [ 19.825673] disk 2, o:1, dev:sdd1 [ 19.825675] disk 3, o:1, dev:sde1 [ 19.825676] disk 4, o:1, dev:sdf1 [ 19.825677] disk 5, o:1, dev:sdh [ 19.825679] disk 6, o:1, dev:sdg [ 19.899787] PM: Starting manual resume from disk [ 28.663228] Filesystem "md0": Disabling barriers, not supported by the underlying device [ 28.663228] XFS mounting filesystem md0 [ 28.884433] md: resync of RAID array md0 [ 28.884433] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 28.884433] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. [ 28.884433] md: using 128k window, over a total of 976759936 blocks. [ 29.025980] Starting XFS recovery on filesystem: md0 (logdev: internal) [ 32.680486] XFS: xlog_recover_process_data: bad clientid [ 32.680495] XFS: log mount/recovery failed: error 5 [ 32.682773] XFS: log mount failed I ran fdisk and flagged sdg1 and sdh1 as fd. I tried to reassemble the array but it didnt work: no matter what was in mdadm.conf, it still uses sdg and sdh instead of sdg1 and sdh1. I checked in /dev and I see no sdg1 and and sdh1, shich explains why it wont use it. I just don't know why those partitions are gone from /dev and how to readd those... blkid : /dev/sda1: LABEL="boot" UUID="519790ae-32fe-4c15-a7f6-f1bea8139409" TYPE="ext2" /dev/sda2: TYPE="swap" /dev/sda3: LABEL="root" UUID="91390d23-ed31-4af0-917e-e599457f6155" TYPE="ext3" /dev/sdb1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sdc1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sdd1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sde1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sdf1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sdg: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" /dev/sdh: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" fdisk -l : Disk /dev/sda: 40.0 GB, 40020664320 bytes 255 heads, 63 sectors/track, 4865 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x8c878c87 Device Boot Start End Blocks Id System /dev/sda1 * 1 12 96358+ 83 Linux /dev/sda2 13 134 979965 82 Linux swap / Solaris /dev/sda3 135 4865 38001757+ 83 Linux Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x1e7481a5 Device Boot Start End Blocks Id System /dev/sdb1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xc9bdc1e9 Device Boot Start End Blocks Id System /dev/sdc1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xcc356c30 Device Boot Start End Blocks Id System /dev/sdd1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xe87f7a3d Device Boot Start End Blocks Id System /dev/sde1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xb17a2d22 Device Boot Start End Blocks Id System /dev/sdf1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x8f3bce61 Device Boot Start End Blocks Id System /dev/sdg1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xa98062ce Device Boot Start End Blocks Id System /dev/sdh1 1 121601 976760001 fd Linux raid autodetect I really dont know what happened nor how to recover from this mess. Needless to say the 5TB or so worth of data sitting on those disks are very valuable to me... Any idea any one? Did anybody ever experienced a similar situation or know how to recover from it ? Can someone help me? I'm really desperate... :x

Read the article

Connect ps/2->usb keyboard to linux?

- by Daniel

I have a lovely ancient ergonomic keyboard (no name SK - 6000) connected via a DIN-ps/2 adapter to a ps/2-usb adapter to my docking station. After Grub it stops working. It takes either suspending and waking up or replugging it while Linux is running to get it to work. No extra kernel modules get loaded for this. When it works and I restart without power off, it will work immediately. Even when it does not work, it is visible (lsusb device number varies but output is identical whether working or not): $ lsusb -v -s 001:006 Bus 001 Device 006: ID 0a81:0205 Chesen Electronics Corp. PS/2 Keyboard+Mouse Adapter Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 8 idVendor 0x0a81 Chesen Electronics Corp. idProduct 0x0205 PS/2 Keyboard+Mouse Adapter bcdDevice 0.10 iManufacturer 1 CHESEN iProduct 2 PS2 to USB Converter iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 59 bNumInterfaces 2 bConfigurationValue 1 iConfiguration 2 PS2 to USB Converter bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 100mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 1 Boot Interface Subclass bInterfaceProtocol 1 Keyboard iInterface 0 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.10 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 64 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 1 Boot Interface Subclass bInterfaceProtocol 2 Mouse iInterface 0 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.10 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 148 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Device Status: 0x0000 (Bus Powered) $ ll -R /sys/bus/hid/drivers/ /sys/bus/hid/drivers/: total 0 drwxr-xr-x 2 root root 0 Jul 8 2012 generic-usb/ /sys/bus/hid/drivers/generic-usb: total 0 lrwxrwxrwx 1 root root 0 Jul 7 23:33 0003:046D:C03D.0003 -> ../../../../devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.2/1-1.2.2:1.0/0003:046D:C03D.0003/ lrwxrwxrwx 1 root root 0 Jul 7 23:33 0003:0A81:0205.0001 -> ../../../../devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/ lrwxrwxrwx 1 root root 0 Jul 7 23:33 0003:0A81:0205.0002 -> ../../../../devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.1/0003:0A81:0205.0002/ --w------- 1 root root 4096 Jul 7 23:32 bind lrwxrwxrwx 1 root root 0 Jul 7 23:33 module -> ../../../../module/usbhid/ --w------- 1 root root 4096 Jul 7 23:32 new_id --w------- 1 root root 4096 Jul 8 2012 uevent --w------- 1 root root 4096 Jul 7 23:32 unbind When replugging, dmesg shows this (which except for the 1st line and different input numbers already came at boot time): [ 1583.295385] usb 1-1.2.1: new low-speed USB device number 6 using ehci_hcd [ 1583.446514] input: CHESEN PS2 to USB Converter as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/input/input17 [ 1583.446817] generic-usb 0003:0A81:0205.0001: input,hidraw0: USB HID v1.10 Keyboard [CHESEN PS2 to USB Converter] on usb-0000:00:1a.0-1.2.1/input0 [ 1583.454764] input: CHESEN PS2 to USB Converter as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.1/input/input18 [ 1583.455534] generic-usb 0003:0A81:0205.0002: input,hidraw1: USB HID v1.10 Mouse [CHESEN PS2 to USB Converter] on usb-0000:00:1a.0-1.2.1/input1 [ 1583.455578] usbcore: registered new interface driver usbhid [ 1583.455584] usbhid: USB HID core driver So I tried $ sudo udevadm test /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0 run_command: calling: test adm_test: version 175 This program is for debugging only, it does not run any program, specified by a RUN key. It may show incorrect results, because some values may be different, or not available at a simulation run. parse_file: reading '/lib/udev/rules.d/40-crda.rules' as rules file parse_file: reading '/lib/udev/rules.d/40-fuse.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/40-usb-media-players.rules' as rules file parse_file: reading '/lib/udev/rules.d/40-usb_modeswitch.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/42-qemu-usb.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/69-cd-sensors.rules' as rules file add_rule: IMPORT found builtin 'usb_id', replacing /lib/udev/rules.d/69-cd-sensors.rules:76 ... parse_file: reading '/lib/udev/rules.d/77-mm-usb-device-blacklist.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/85-usbmuxd.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/95-upower-hid.rules' as rules file parse_file: reading '/lib/udev/rules.d/95-upower-wup.rules' as rules file parse_file: reading '/lib/udev/rules.d/97-bluetooth-hid2hci.rules' as rules file udev_rules_new: rules use 271500 bytes tokens (22625 * 12 bytes), 44331 bytes buffer udev_rules_new: temporary index used 76320 bytes (3816 * 20 bytes) udev_device_new_from_syspath: device 0x7f78a5e4d2d0 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0' udev_device_new_from_syspath: device 0x7f78a5e5f820 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0' udev_device_read_db: device 0x7f78a5e5f820 filled with db file data udev_device_new_from_syspath: device 0x7f78a5e60270 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001' udev_device_new_from_syspath: device 0x7f78a5e609c0 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0' udev_device_new_from_syspath: device 0x7f78a5e61160 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1' udev_device_new_from_syspath: device 0x7f78a5e61960 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2' udev_device_new_from_syspath: device 0x7f78a5e62150 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1' udev_device_new_from_syspath: device 0x7f78a5e62940 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1' udev_device_new_from_syspath: device 0x7f78a5e630f0 has devpath '/devices/pci0000:00/0000:00:1a.0' udev_device_new_from_syspath: device 0x7f78a5e638a0 has devpath '/devices/pci0000:00' udev_event_execute_rules: no node name set, will use kernel supplied name 'hidraw0' udev_node_add: creating device node '/dev/hidraw0', devnum=251:0, mode=0600, uid=0, gid=0 udev_node_mknod: preserve file '/dev/hidraw0', because it has correct dev_t udev_node_mknod: preserve permissions /dev/hidraw0, 020600, uid=0, gid=0 node_symlink: preserve already existing symlink '/dev/char/251:0' to '../hidraw0' udev_device_update_db: created empty file '/run/udev/data/c251:0' for '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0' ACTION=add DEVNAME=/dev/hidraw0 DEVPATH=/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0 MAJOR=251 MINOR=0 SUBSYSTEM=hidraw UDEV_LOG=6 USEC_INITIALIZED=969079051 The later lines sound like it's already there. And none of these awakes the keyboard: $ sudo udevadm trigger --verbose --sysname-match=usb* /sys/devices/pci0000:00/0000:00:1a.0/usb1 /sys/devices/pci0000:00/0000:00:1a.0/usbmon/usbmon1 /sys/devices/pci0000:00/0000:00:1d.0/usb2 /sys/devices/pci0000:00/0000:00:1d.0/usbmon/usbmon2 /sys/devices/virtual/usbmon/usbmon0 $ sudo udevadm trigger --verbose --sysname-match=hidraw0 /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0 $ sudo udevadm trigger I also tried this to no avail: # echo -n 0003:0A81:0205.0001 > /sys/bus/hid/drivers/generic-usb/bind ksh: echo: write to 1 failed [No such device] # echo -n 0003:0A81:0205.0001 > /sys/bus/hid/drivers/generic-usb/unbind # echo -n 0003:0A81:0205.0001 > /sys/bus/hid/drivers/generic-usb/bind # echo usb1 >/sys/bus/usb/drivers/usb/unbind # echo usb1 >/sys/bus/usb/drivers/usb/bind What else should I try to get the same result as replugging or suspending, by just issuing a command?

Read the article

FFMPEG Segfault Solutions

- by Brentley_11

I'm trying to convert a bunch of movies into h.264 mp4's using FFMPEG. These movies are sourced from various portable camcorders such as the Flip Mino HD and the Kodak ZI8. One issue I'm having with video from the ZI8 is it seems to be causing FFMPEG to segfault. Here is my command: ffmpeg -i 'XmasSailor720p60fps.MOV' -threads 2 -acodec libfaac -ab 96kb -vcodec libx264 -vpre hq -b 500kb -s 484x272 XmasSailor.mp4 Here is the output: FFmpeg version SVN-r20668, Copyright (c) 2000-2009 Fabrice Bellard, et al. built on Dec 2 2009 18:37:34 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4) configuration: --enable-libfaac --enable-libfaad --enable-libmp3lame --enable-libx264 --enable-gpl --enable-nonfree --enable-postproc --enable-pthreads --enable-shared libavutil 50. 5. 1 / 50. 5. 1 libavcodec 52.42. 0 / 52.42. 0 libavformat 52.39. 2 / 52.39. 2 libavdevice 52. 2. 0 / 52. 2. 0 libswscale 0. 7. 2 / 0. 7. 2 libpostproc 51. 2. 0 / 51. 2. 0 Seems stream 0 codec frame rate differs from container frame rate: 59.94 (60000/1001) -> 29.97 (30000/1001) Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'XmasSailor720p60fps.MOV': Duration: 00:00:05.37, start: 0.000000, bitrate: 12021 kb/s Stream #0.0(eng): Video: h264, yuv420p, 1280x720 [PAR 1:1 DAR 16:9], 11994 kb/s, 29.97 tbr, 90k tbn, 59.94 tbc Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 128 kb/s Metadata major_brand : qt minor_version : 0 compatible_brands: qt comment : KODAK Zi8 Pocket Video Camera comment-eng : KODAK Zi8 Pocket Video Camera [libx264 @ 0x99e1020]using SAR=1/1 [libx264 @ 0x99e1020]using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.1 Cache64 [libx264 @ 0x99e1020]profile High, level 2.1 Output #0, mp4, to 'XmasSailor.mp4': Stream #0.0(eng): Video: libx264, yuv420p, 484x272 [PAR 1:1 DAR 121:68], q=10-51, 500 kb/s, 30k tbn, 29.97 tbc Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 96 kb/s Metadata comment : Encoded with the Statusfirm Video Transcoder Stream mapping: Stream #0.0 -> #0.0 Stream #0.1 -> #0.1 Press [q] to stop encoding [h264 @ 0x99de950]B picture before any references, skipping [h264 @ 0x99de950]decode_slice_header error [h264 @ 0x99de950]no frame! Error while decoding stream #0.0 [h264 @ 0x99de950]B picture before any references, skipping [h264 @ 0x99de950]decode_slice_header error [h264 @ 0x99de950]no frame! Error while decoding stream #0.0 frame= 20 fps= 0 q=13797729.0 size= 0kB time=0.66 bitrate= 0.6kbits/s frame= 39 fps= 37 q=13797729.0 size= 0kB time=1.30 bitrate= 0.3kbits/s frame= 48 fps= 30 q=33.0 size= 11kB time=0.10 bitrate= 903.0kbits/s frame= 58 fps= 27 q=31.0 size= 22kB time=0.43 bitrate= 421.0kbits/s frame= 67 fps= 25 q=29.0 size= 41kB time=0.73 bitrate= 462.6kbits/s frame= 75 fps= 23 q=29.0 size= 59kB time=1.00 bitrate= 486.7kbits/s frame= 83 fps= 22 q=29.0 size= 81kB time=1.27 bitrate= 521.9kbits/s frame= 90 fps= 21 q=29.0 size= 97kB time=1.50 bitrate= 530.1kbits/s frame= 98 fps= 20 q=29.0 size= 114kB time=1.77 bitrate= 526.9kbits/s frame= 106 fps= 20 q=29.0 size= 134kB time=2.04 bitrate= 537.7kbits/s frame= 114 fps= 19 q=29.0 size= 150kB time=2.30 bitrate= 533.7kbits/s frame= 122 fps= 19 q=29.0 size= 172kB time=2.57 bitrate= 547.8kbits/s frame= 130 fps= 19 q=29.0 size= 193kB time=2.84 bitrate= 557.5kbits/s frame= 136 fps= 18 q=29.0 size= 211kB time=3.04 bitrate= 570.0kbits/s frame= 144 fps= 18 q=29.0 size= 242kB time=3.30 bitrate= 599.5kbits/s frame= 152 fps= 17 q=30.0 size= 261kB time=3.57 bitrate= 598.6kbits/s frame= 157 fps= 15 q=-1.0 Lsize= 368kB time=5.21 bitrate= 579.3kbits/s video:302kB audio:61kB global headers:0kB muxing overhead 1.416371% [libx264 @ 0x99e1020]frame I:1 Avg QP:27.22 size: 8720 [libx264 @ 0x99e1020]frame P:48 Avg QP:25.15 size: 3759 [libx264 @ 0x99e1020]frame B:108 Avg QP:30.10 size: 1105 [libx264 @ 0x99e1020]consecutive B-frames: 0.6% 11.5% 28.8% 59.0% [libx264 @ 0x99e1020]mb I I16..4: 28.5% 47.6% 23.9% [libx264 @ 0x99e1020]mb P I16..4: 0.8% 1.3% 0.5% P16..4: 50.6% 17.7% 13.1% 0.0% 0.0% skip:15.9% [libx264 @ 0x99e1020]mb B I16..4: 0.2% 0.3% 0.1% B16..8: 44.0% 1.2% 2.6% direct: 5.1% skip:46.5% L0:45.5% L1:51.0% BI: 3.5% [libx264 @ 0x99e1020]final ratefactor: 23.51 [libx264 @ 0x99e1020]8x8 transform intra:49.9% inter:67.9% [libx264 @ 0x99e1020]direct mvs spatial:98.1% temporal:1.9% [libx264 @ 0x99e1020]coded y,uvDC,uvAC intra: 54.7% 76.1% 41.4% inter: 17.1% 24.4% 7.8% [libx264 @ 0x99e1020]i16 v,h,dc,p: 18% 52% 5% 25% [libx264 @ 0x99e1020]i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 12% 22% 9% 7% 10% 10% 9% 8% 13% [libx264 @ 0x99e1020]i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 13% 18% 8% 8% 10% 13% 10% 9% 12% [libx264 @ 0x99e1020]Weighted P-Frames: Y:10.4% [libx264 @ 0x99e1020]ref P L0: 60.2% 15.3% 11.0% 7.6% 5.2% 0.7% [libx264 @ 0x99e1020]ref B L0: 72.6% 15.6% 11.8% [libx264 @ 0x99e1020]kb/s:471.17 Segmentation fault I'm wondering if anyone else has ran into similar issues. I wasn't able to find anything helpful via Google. Another question I have is if anyone knows of a company that offers paid support for FFMPEG. Thank you for your time.

Read the article

Why are my hard drives failing?

- by WishCow

I have a small Ubuntu server running at home, with 2 HDDs. There are two software raids (raid1) on the disks, managed by mdadm, which I believe is irrelevant, but mentioning it anyway. Both of the HDDs are Western Digital, and have been used for around 2 years, when one of them started making clicking noises, and died. I figured that maybe it's natural after 2 years, so I bought a new one, and resynced the raid arrays. After about a month, the other drive also died. I didn't get suspicious, since both drives have been bought at the same time, it's not that surprising to see both of them near each other, so I bought another one. So far, 2 old drives failed, and 2 brand new in the system. After one month, one of the new drives died. This is when it started getting suspicious. Since the PC was put together from some really old parts (think AthlonXP), I figured that maybe the motherboard's SATA controller is the culprit. Of course you can't switch parts easily in an old PC like this, so I bought a whole system, new MB, new CPU, new RAM. Took the just failed drive back, since it was under warranty, and got it replaced. So it is up to 2 failed drives from the old ones, and 1 failed drive from the new ones. No problems, for 1 month. After that errors were creeping up again in /var/log/messages, and mdadm was reporting raid array failures. I started tearing my hair out. Everything is new in the system, it's up to the third brand new HDD, it's simply not possible that all of the new drives that I bought were faulty. Let's see what is still common... the cables. Okay, long shot, let's replace the SATA cables. Take HDD back, smile to the guy at the counter and say that I'm really unlucky. He replaces the HDD. I come home, one month passes and one of HDDs fails, again. I'm not joking. Two of the brand new HDDs have failed. Maybe it's a bug in the OS. Let's see what the manufacturer's testing tool says. Download testing tool, burn it to a CD, reboot, leave HDD testing overnight. Test says that the drive is faulty, and I should back up everything, if I still can. I don't know what's happening, but it does not look like a software problem, something is definitely thrashing the HDDs. I should mention now, that the whole system is in a shoebox. Since there are a load of "build your own ikea case" stuff, I thought there shouldn't be any problems throwing the thing in a box, and stuffing it away somewhere. The box is well ventilated, but I thought that just maybe the drives were overheating. There is no other possible answer to this. So I took the HDD back, and got it replaced (for the 3rd time), and bought HDD coolers. And just now, I have heard the sound of doom. click click whizzzzzzzzz. SSH into the box: You have new mail! mail r 1 DegradedArrayEvent on /dev/md0 ... dmesg output: [47128.000051] ata3: lost interrupt (Status 0x50) [47128.000097] end_request: I/O error, dev sda, sector 58588863 [47128.000134] md: super_written gets error=-5, uptodate=0 [48043.976054] ata3: lost interrupt (Status 0x50) [48043.976086] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [48043.976132] ata3.00: cmd c8/00:18:bf:40:52/00:00:00:00:00/e1 tag 0 dma 12288 in [48043.976135] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [48043.976208] ata3.00: status: { DRDY } [48043.976241] ata3: soft resetting link [48044.148446] ata3.00: configured for UDMA/133 [48044.148457] ata3.00: device reported invalid CHS sector 0 [48044.148477] ata3: EH complete Recap: No possibility of overheating 6 drives have failed, 4 of those have been brand new. I'm not sure now that the original two have been faulty, or suffered the same thing that the new ones. There is nothing common in the system, apart from the OS which is Ubuntu Karmic now (started with Jaunty). New MB, new CPU, new RAM, new SATA cables. No, the little holes on the HDD are not covered I'm crying. Really. I don't have the face to return to the store now, it's not possible for 4 drives to fail under 4 months. A few ideas that I have been thinking: Is it possible that I fuck up something when I partition and resync the drives? Can it be so bad that it physicaly wrecks the drive? (since the vendor supplied tool says that the drive is damaged) I do the partitoning with fdisk, and use the same block size for the raid1 partitions (I check the exact block sizes with fdisk -lu) Is it possible that the linux kernel or mdadm, or something is not compatible with this exact brand of HDDs, and thrashes them? Is it possible that it may be the shoebox? Try placing it somewhere else? It's under a shelf now, so humidity is not a problem either. Is it possible that a normal PC case will solve my problem (I'm going to shoot myself then)? I will get a picture tomorrow. Am I just simply cursed? Any help or speculation is greatly appreciated. Edit: The power strip is guarded against overvoltage. Edit2: I have moved inbetween these 4 months, so the possibility of the cause being "dirty" electricity in both places, is very low. Edit3: I have checked the voltages in the BIOS (couldn't borrow a multimeter), and they are all seem correct, the biggest discrepancy is in the 12V, because it's supplying 11.3. Should I be worried about that? Edit4: I put my desktop PC's PSU into the server. The BIOS reported much more accurate voltage readings, and also it has successfully rebuilt the raid1 array, which took some 3-4 hours, so I feel a little positive now. Will get a new PSU tomorrow to test with that. Also, attaching the picture about the box: (disregard the 3rd drive)

Read the article

Bind: dns not 'spreaded'

- by realtebo

I've elfoip.net with bind $ whois elfoip.net | grep 'Name Server' Name Server: NS.ELFOIP.NET I need elfoip.net be able to serve third levels domain, like mickymouse.elfoip.net, etc... Yes, I'm trying to create an other useless dyndns clone. i've added some third level as A RR. Eg: executing this from the server itself $ dig @localhost mattinauno.elfoip.net ;; ANSWER SECTION: mattinauno.elfoip.net. 60 IN A 192.81.221.113 I was expecting in one or two days, from my pc i can digit in browser mattinauno.elfoip.net and get page a 192.81.221.113 But this is not happening. Are there any prerequisites to satisfy to allow dns of my isp to be able to forward dns resolution of *.elfoip.net to MY dns ? (Or to ask to him and then cache ?) TTL of zone is set a 5m I've not AllowQuey directive, is it necessary for other dns to cache from mine ? I've cheched the zone with bind utility named-checkzone but no error detected. How to diagnose why other dns doesn't take in account RR from mine ? from my home pc dig @ns.elfoip.net mattinauno.elfoip.net ;; ANSWER SECTION: mattinauno.elfoip.net. 60 IN A 192.81.221.113 ;; AUTHORITY SECTION: elfoip.net. 300 IN NS ns.elfoip.net. but dig @8.8.8.8 mattinauno.elfoip.net give no answers Whole zone file: note I've used nsupdate, so this file has been re-edited and re-formatted from this utility ! root@mirko:/var/named# cat elfoip.net.db $ORIGIN . $TTL 300 ; 5 minutes elfoip.net IN SOA ns.elfoip.net. hostmaster.elfoip.net. ( 2013062314 ; serial 3600 ; refresh (1 hour) 600 ; retry (10 minutes) 86400 ; expire (1 day) 60 ; minimum (1 minute) ) NS ns.elfoip.net. A 109.168.99.6 $ORIGIN elfoip.net. $TTL 60 ; 1 minute google A 173.194.35.56 maiscai A 192.81.221.113 mattinadue A 192.81.221.113 mattinauno A 192.81.221.113 $TTL 300 ; 5 minutes ns A 109.168.99.6 $TTL 60 ; 1 minute prova A 208.67.222.222 prova2 A 13.23.34.45 A 13.23.34.46 www CNAME elfoip.net. EDIT: added named.conf.local zone "elfoip.net" { type master; // file "/etc/bind/elfoip.net.db"; file "/var/named/elfoip.net.db"; allow-update { key elfoip.net ; }; }; EDIT: I've no setup list-on directive *EDIT Added a TCPDUMP after [email protected] wwww.elfoip.net from a machine which uses my company internal dns, who allow recursive query. root@mirko:~# tcpdump -i eth0 'port 53' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 11:57:23.293611 IP host9-210-static.22-87-b.business.telecomitalia.it.45958 > mirko.elfoip.net.domain: 61337+ A? www.elfoip.net. (32) 11:57:23.294114 IP mirko.elfoip.net.domain > host9-210-static.22-87-b.business.telecomitalia.it.45958: 61337* 2/1/1 CNAME elfoip.net., A 109.168.99.6 (95) 11:57:23.294554 IP mirko.elfoip.net.59571 > google-public-dns-a.google.com.domain: 45851+ PTR? 9.210.22.87.in-addr.arpa. (42) 11:57:23.330444 IP google-public-dns-a.google.com.domain > mirko.elfoip.net.59571: 45851 1/0/0 PTR host9-210-static.22-87-b.business.telecomitalia.it. (106) 11:57:23.331181 IP mirko.elfoip.net.44171 > google-public-dns-a.google.com.domain: 33339+ PTR? 8.8.8.8.in-addr.arpa. (38) 11:57:23.439405 IP google-public-dns-a.google.com.domain > mirko.elfoip.net.44171: 33339 1/0/0 PTR google-public-dns-a.google.com. (82) 11:57:31.350654 IP host9-210-static.22-87-b.business.telecomitalia.it.30108 > mirko.elfoip.net.domain: 38269 [1au] A? ns.elfoip.net. (42) 11:57:31.351117 IP mirko.elfoip.net.domain > host9-210-static.22-87-b.business.telecomitalia.it.30108: 38269* 1/1/1 A 109.168.99.6 (72) If i dig @8.8.8.8 www.elfoip.net, NOTHING happens in dump log !

Read the article

Resolve Wrong IP from Domain Name only on certain networks

- by Godric Seer

I host a personal website on an old desktop that is LAMP based. There are several strange things about this problem so I will break it down into steps. Since I have a dynamic IP, I use no-ip to make sure I have a working domain name at all times. I use the automatic update client, but logged in and checked and my no-ip domain has the proper IP tied to it. Here is a link to the homepage through the no-ip domain for reference. Also, I do a ping and a traceroute on the no-ip domain and get: [eckertzs@localhost ~]$ ping -c 1 endradil.noip.me PING endradil.noip.me (65.24.215.99) 56(84) bytes of data. 64 bytes from endradil.noip.me (65.24.215.99): icmp_seq=1 ttl=64 time=2.23 ms --- endradil.noip.me ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 104ms rtt min/avg/max/mdev = 2.233/2.233/2.233/0.000 ms [eckertzs@localhost ~]$ traceroute endradil.noip.me traceroute to endradil.noip.me (65.24.215.99), 30 hops max, 60 byte packets 1 . (192.168.2.1) 1.755 ms 5.409 ms 5.380 ms 2 endradil.noip.me (65.24.215.99) 6.297 ms 9.543 ms 10.324 ms Using this domain, I can connect to my webserver without issue or interruption(the https is required to avoid a redirect serverside, but it works). I also have a domain I have bought on GoDaddy where I have a CNAME record forwarding the www subdomain to my no-ip domain. CNAME Record Host: www Points to: endradil.noip.me TTL: 1 hour For the past several weeks, I never had an issue using the GoDaddy domain to connect (ssh or https). As of the past few days, however, the GoDaddy domain has only worked intermittently, for a few minutes at a time and then will go down for hours at a time. I get server not found errors most of the time. Also, if I happen to be using the GoDaddy domain for an ssh connection, the connection will freeze. I have run online tests of the DNS and have seen that the website is visible by external servers and resolved to the correct IP. I also contacted GoDaddy support but they had no issues connecting to the website, and therefore did not see any issues. My personal computers (Windows desktop, linux laptop, android phone) all fail to connect when on my personal wifi. If I disconnect my phone from the wifi and use my AT&T wireless data, it can connect with both domains without issue. When I attempt to use Google webmaster tools to crawl the site using the GoDaddy domain, Google can not find the site. From my linux laptop, I have found some interesting results when I ping or traceroute the domain. The results from these: [eckertzs@localhost ~]$ ping -c 1 www.endradil.com PING www.endradil.com.Belkin (198.105.244.228) 56(84) bytes of data. --- www.endradil.com.Belkin ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 10000ms [eckertzs@localhost ~]$ traceroute www.endradil.com traceroute to www.endradil.com (198.105.244.228), 30 hops max, 60 byte packets 1 . (192.168.2.1) 1.918 ms 2.806 ms 2.772 ms 2 cpe-65-24-208-1.insight.res.rr.com (65.24.208.1) 29.247 ms 29.654 ms 30.094 ms 3 cpe-69-23-24-117.new.res.rr.com (69.23.24.117) 15.597 ms 23.218 ms 23.581 ms 4 agg24.clmcohib01r.midwest.rr.com (65.29.1.52) 30.581 ms 30.556 ms 31.192 ms 5 be27.clevohek01r.midwest.rr.com (65.29.1.38) 30.580 ms 31.062 ms 31.038 ms 6 bu-ether25.atlngamq47w-bcr01.tbone.rr.com (107.14.19.38) 37.863 ms 68.844 ms 43.773 ms 7 107.14.17.178 (107.14.17.178) 51.866 ms 51.019 ms 50.989 ms 8 ae0.pr1.dca10.tbone.rr.com (107.14.17.200) 48.467 ms ae-4-0.a0.lax91.tbone.rr.com (66.109.1.113) 49.912 ms * 9 v413.core1.ash1.he.net (209.51.175.33) 60.270 ms 50.842 ms 50.819 ms 10 100ge5-1.core1.nyc4.he.net (184.105.223.166) 55.597 ms 56.045 ms 56.020 ms 11 xerocole-inc.10gigabitethernet12-4.core1.nyc4.he.net (216.66.41.242) 56.001 ms 55.969 ms 55.992 ms 12 * * * both show the incorrect IP. Also, the traceroute timesout on hops 12 through 255 (output truncated above). The traceroute using site24x7 works and shows reasonable results when run from their california server. From another linux box on a different network but in the same city as me (10 miles away), I still get timeout for traceroute, however the IP resolves correctly for the domain. From this I believe that the DNS result is incorrectly cached in either my router/modem or perhaps even at my ISP level. My question is, first, how do I find out exactly what is wrong, and second, how do I resolve it.

Read the article

How to diagnose failing 6Gbps SATA connection?

- by whitequark

I have a Samsung RC530 notebook and OCZ Vertex-3 6Gbps SATA SSD working in AHCI mode. # dmesg | grep DMI SAMSUNG ELECTRONICS CO., LTD. RC530/RC730/RC530/RC730, BIOS 03WD.M008.20110927.PSA 09/27/2011 # lspci -nn 00:1f.2 SATA controller [0106]: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller [8086:1c03] (rev 04) # sdparm -a /dev/sda /dev/sda: ATA OCZ-VERTEX3 2.15 At the boot, the following messages are present in dmesg (I am running Debian wheezy @ Linux 3.2.8): # dmesg | grep -iE '(ata|ahci)' [ 5.179783] ahci 0000:00:1f.2: version 3.0 [ 5.179802] ahci 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19 [ 5.179864] ahci 0000:00:1f.2: irq 42 for MSI/MSI-X [ 5.195424] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x5 impl SATA mode [ 5.195429] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ems apst [ 5.195436] ahci 0000:00:1f.2: setting latency timer to 64 [ 5.204035] scsi0 : ahci [ 5.204301] scsi1 : ahci [ 5.204447] scsi2 : ahci [ 5.204592] scsi3 : ahci [ 5.204682] scsi4 : ahci [ 5.204799] scsi5 : ahci [ 5.204917] ata1: SATA max UDMA/133 abar m2048@0xf7c06000 port 0xf7c06100 irq 42 [ 5.204920] ata2: DUMMY [ 5.204923] ata3: SATA max UDMA/133 abar m2048@0xf7c06000 port 0xf7c06200 irq 42 [ 5.204924] ata4: DUMMY [ 5.204926] ata5: DUMMY [ 5.204927] ata6: DUMMY [ 5.523039] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 5.525911] ata3.00: ATAPI: TSSTcorp CDDVDW SN-208BB, SC00, max UDMA/100 [ 5.531006] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 5.533703] ata3.00: configured for UDMA/100 [ 5.542790] ata1.00: ATA-8: OCZ-VERTEX3, 2.15, max UDMA/133 [ 5.542800] ata1.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [ 5.552751] ata1.00: configured for UDMA/133 [ 5.553050] scsi 0:0:0:0: Direct-Access ATA OCZ-VERTEX3 2.15 PQ: 0 ANSI: 5 [ 5.559621] scsi 2:0:0:0: CD-ROM TSSTcorp CDDVDW SN-208BB SC00 PQ: 0 ANSI: 5 [ 5.564059] sd 0:0:0:0: [sda] 117231408 512-byte logical blocks: (60.0 GB/55.8 GiB) [ 5.564127] sd 0:0:0:0: [sda] Write Protect is off [ 5.564131] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 5.564158] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.564582] sda: sda1 [ 5.564810] sd 0:0:0:0: [sda] Attached SCSI disk [ 5.572006] sr0: scsi3-mmc drive: 16x/24x writer dvd-ram cd/rw xa/form2 cdda tray [ 5.572010] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 5.572189] sr 2:0:0:0: Attached scsi CD-ROM sr0 [ 6.717181] ata1.00: exception Emask 0x50 SAct 0x1 SErr 0x280900 action 0x6 frozen [ 6.717238] ata1.00: irq_stat 0x08000000, interface fatal error [ 6.717291] ata1: SError: { UnrecovData HostInt 10B8B BadCRC } [ 6.717342] ata1.00: failed command: READ FPDMA QUEUED [ 6.717395] ata1.00: cmd 60/50:00:20:39:58/00:00:00:00:00/40 tag 0 ncq 40960 in [ 6.717396] res 40/00:00:20:39:58/00:00:00:00:00/40 Emask 0x50 (ATA bus error) [ 6.717503] ata1.00: status: { DRDY } [ 6.717553] ata1: hard resetting link [ 7.033417] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 7.055234] ata1.00: configured for UDMA/133 [ 7.055262] ata1: EH complete [ 7.147280] ata1.00: exception Emask 0x10 SAct 0xf8 SErr 0x280100 action 0x6 frozen [ 7.147340] ata1.00: irq_stat 0x08000000, interface fatal error [ 7.147393] ata1: SError: { UnrecovData 10B8B BadCRC } [ 7.147460] ata1.00: failed command: READ FPDMA QUEUED [ 7.147529] ata1.00: cmd 60/08:18:88:17:41/00:00:02:00:00/40 tag 3 ncq 4096 in [ 7.147531] res 40/00:38:50:99:64/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.147691] ata1.00: status: { DRDY } [ 7.147754] ata1.00: failed command: READ FPDMA QUEUED [ 7.147821] ata1.00: cmd 60/00:20:f8:42:4c/01:00:02:00:00/40 tag 4 ncq 131072 in [ 7.147822] res 40/00:38:50:99:64/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.147977] ata1.00: status: { DRDY } [ 7.148036] ata1.00: failed command: READ FPDMA QUEUED [ 7.148100] ata1.00: cmd 60/50:28:f8:43:4c/00:00:02:00:00/40 tag 5 ncq 40960 in [ 7.148101] res 40/00:38:50:99:64/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.148255] ata1.00: status: { DRDY } [ 7.148315] ata1.00: failed command: READ FPDMA QUEUED [ 7.148379] ata1.00: cmd 60/00:30:50:98:64/01:00:02:00:00/40 tag 6 ncq 131072 in [ 7.148380] res 40/00:38:50:99:64/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.148534] ata1.00: status: { DRDY } [ 7.148593] ata1.00: failed command: READ FPDMA QUEUED [ 7.148657] ata1.00: cmd 60/00:38:50:99:64/01:00:02:00:00/40 tag 7 ncq 131072 in [ 7.148658] res 40/00:38:50:99:64/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.148813] ata1.00: status: { DRDY } [ 7.148875] ata1: hard resetting link [ 7.464842] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 7.486794] ata1.00: configured for UDMA/133 [ 7.486822] ata1: EH complete [ 7.546395] ata1.00: exception Emask 0x10 SAct 0x2f SErr 0x280100 action 0x6 frozen [ 7.546470] ata1.00: irq_stat 0x08000000, interface fatal error [ 7.546531] ata1: SError: { UnrecovData 10B8B BadCRC } [ 7.546588] ata1.00: failed command: READ FPDMA QUEUED [ 7.546648] ata1.00: cmd 60/00:00:e0:4b:61/01:00:02:00:00/40 tag 0 ncq 131072 in [ 7.546649] res 40/00:28:e0:4c:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.546794] ata1.00: status: { DRDY } [ 7.546847] ata1.00: failed command: READ FPDMA QUEUED [ 7.546906] ata1.00: cmd 60/00:08:90:2f:48/01:00:02:00:00/40 tag 1 ncq 131072 in [ 7.546907] res 40/00:28:e0:4c:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.547053] ata1.00: status: { DRDY } [ 7.547106] ata1.00: failed command: READ FPDMA QUEUED [ 7.547165] ata1.00: cmd 60/00:10:90:30:48/01:00:02:00:00/40 tag 2 ncq 131072 in [ 7.547166] res 40/00:28:e0:4c:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.547310] ata1.00: status: { DRDY } [ 7.547363] ata1.00: failed command: READ FPDMA QUEUED [ 7.547422] ata1.00: cmd 60/00:18:50:c7:64/01:00:02:00:00/40 tag 3 ncq 131072 in [ 7.547423] res 40/00:28:e0:4c:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.547568] ata1.00: status: { DRDY } [ 7.547621] ata1.00: failed command: READ FPDMA QUEUED [ 7.547681] ata1.00: cmd 60/00:28:e0:4c:61/01:00:02:00:00/40 tag 5 ncq 131072 in [ 7.547682] res 40/00:28:e0:4c:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.547825] ata1.00: status: { DRDY } [ 7.547882] ata1: hard resetting link [ 7.864408] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 7.886351] ata1.00: configured for UDMA/133 [ 7.886375] ata1: EH complete [ 7.890012] ata1: limiting SATA link speed to 3.0 Gbps [ 7.890016] ata1.00: exception Emask 0x10 SAct 0x7 SErr 0x280100 action 0x6 frozen [ 7.890093] ata1.00: irq_stat 0x08000000, interface fatal error [ 7.890152] ata1: SError: { UnrecovData 10B8B BadCRC } [ 7.890210] ata1.00: failed command: READ FPDMA QUEUED [ 7.890272] ata1.00: cmd 60/00:00:90:33:48/01:00:02:00:00/40 tag 0 ncq 131072 in [ 7.890273] res 40/00:10:e0:4f:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.890418] ata1.00: status: { DRDY } [ 7.890472] ata1.00: failed command: READ FPDMA QUEUED [ 7.890530] ata1.00: cmd 60/00:08:90:34:48/01:00:02:00:00/40 tag 1 ncq 131072 in [ 7.890531] res 40/00:10:e0:4f:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.890672] ata1.00: status: { DRDY } [ 7.890724] ata1.00: failed command: READ FPDMA QUEUED [ 7.890781] ata1.00: cmd 60/78:10:e0:4f:61/00:00:02:00:00/40 tag 2 ncq 61440 in [ 7.890782] res 40/00:10:e0:4f:61/00:00:02:00:00/40 Emask 0x10 (ATA bus error) [ 7.890925] ata1.00: status: { DRDY } [ 7.890981] ata1: hard resetting link [ 8.208021] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 8.230100] ata1.00: configured for UDMA/133 [ 8.230124] ata1: EH complete Looks like the SATA interface tries to use 6Gbps link, then fails miserably and Linux fallbacks to 3Gbps. This is somewhat fine for me, as the system boots successfully each time and works under high load (cd linux-3.2.8; make -j16). I've also ran memtest86+ and it did not find any errors. What concerns me more is that Grub sometimes takes a long time to load the images and/or fails to load itself completely. The error is consistent and is probablistic: that is, each time I boot I have a certain chance to fail. Actually, I have a slight suspiction on the cause of the failure. Look at the cabling: What kind of engineer does it this way? Nah. Even 1Gbps Ethernet hardly tolerates cables bent over a small angle, and there you have 6Gbps SATA. How cound I determine and fix the cause of errors and/or switch the link to 3Gbps mode permanently?

Read the article

Windows Server 2008 R2 network adapter stops working, requires hard reboot

- by Geoff Dalgas

TL;DR version: Turns out this was a Windows Server 2008 R2 kernel networking bug. After siccing Microsoft support on it, we (eventually) got an unpublished kernel hotfix from Microsoft to address it. If you, too, are experiencing mysterious low-level network driver failures requiring a reboot/bluescreen cycle, you might want that hotfix (or maybe Service Pack 1 whenever it is released, too.) We have been using HAProxy along with heartbeat from the Linux-HA project. We are using two linux instances to provide a failover. Each server has with their own public IP and a single IP which is shared between the two using a virtual interface (eth1:1) at IP: 69.59.196.211 The virtual interface (eth1:1) IP 69.59.196.211 is configured as the gateway for the windows servers behind them and we use ip_forwarding to route traffic. We are experiencing an occasional network outage on one of our windows servers behind our linux gateways. HAProxy will detect the server is offline which we can verify by remoting to the failed server and attempting to ping the gateway: Pinging 69.59.196.211 with 32 bytes of data: Reply from 69.59.196.220: Destination host unreachable. Running arp -a on this failed server shows that there is no entry for the gateway address (69.59.196.211): Interface: 69.59.196.220 --- 0xa Internet Address Physical Address Type 69.59.196.161 00-26-88-63-c7-80 dynamic 69.59.196.210 00-15-5d-0a-3e-0e dynamic 69.59.196.212 00-21-5e-4d-45-c9 dynamic 69.59.196.213 00-15-5d-00-b2-0d dynamic 69.59.196.215 00-21-5e-4d-61-1a dynamic 69.59.196.217 00-21-5e-4d-2c-e8 dynamic 69.59.196.219 00-21-5e-4d-38-e5 dynamic 69.59.196.221 00-15-5d-00-b2-0d dynamic 69.59.196.222 00-15-5d-0a-3e-09 dynamic 69.59.196.223 ff-ff-ff-ff-ff-ff static 224.0.0.22 01-00-5e-00-00-16 static 224.0.0.252 01-00-5e-00-00-fc static 225.0.0.1 01-00-5e-00-00-01 static On our linux gateway instances arp -a shows: peak-colo-196-220.peak.org (69.59.196.220) at <incomplete> on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-222.peak.org (69.59.196.222) at 00:15:5d:0a:3e:09 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 Why would arp occasionally set the entry for this failed server as <incomplete>? Should we be defining our arp entries statically? I've always left arp alone since it works 99% of the time, but in this one instance it appears to be failing. Are there any additional troubleshooting steps we can take help resolve this issue? THINGS WE HAVE TRIED I added a static arp entry for testing on one of the linux gateways which still didn't help. root@haproxy2:~# arp -a peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-221.peak.org (69.59.196.221) at 00:15:5d:00:b2:0d [ether] on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 peak-colo-196-220.peak.org (69.59.196.220) at 00:21:5e:4d:30:8d [ether] PERM on eth1 root@haproxy2:~# arp -i eth1 -s 69.59.196.220 00:21:5e:4d:30:8d root@haproxy2:~# ping 69.59.196.220 PING 69.59.196.220 (69.59.196.220) 56(84) bytes of data. --- 69.59.196.220 ping statistics --- 7 packets transmitted, 0 received, 100% packet loss, time 6006ms Rebooting the windows web server solves this issue temporarily with no other changes to the network but our experience shows this issue will come back. Swapping network cards and switches I noticed the link light on the port of the switch for the failed windows server was running at 100Mb instead of 1Gb on the failed interface. I moved the cable to several other open ports and the link indicated 100Mb for each port that I tried. I also swapped the cable with the same result. I tried changing the properties of the network card in windows and the server locked up and required a hard reset after clicking apply. This windows server has two physical network interfaces so I have swapped the cables and network settings on the two interfaces to see if the problem follows the interface. If the public interface goes down again we will know that it is not an issue with the network card. (We also tried another switch we have on hand, no change) Changing network hardware driver versions We've had the same problem with the latest Broadcom driver, as well as the built-in driver that ships in Windows Server 2008 R2. Replacing network cables As a last ditch effort we remembered another change that occurred was the replacement of all of the patch cords between our servers / switch. We had purchased two sets, one green of lengths 1ft - 3ft for the private interfaces and another set of red cables for the public interfaces. We swapped out all of the public interface patch cables with a different brand and ran our servers without issue for a full week ... aaaaaand then the problem recurred. Disable checksum offload, remove TProxy We also tried disabling TCP/IP checksum offload in the driver, no change. We're now pulling out TProxy and moving to a more traditional x-forwarded-for network arrangement without any fancy IP address rewriting. We'll see if that helps. Switch Virtualization providers On the off chance this was related to Hyper-V in some way (we do host Linux VMs on it), we switched to VMWare Server. No change. Switch host model We've reached the end of our troubleshooting rope and are now formally involving Microsoft support. They recommended changing the host model: http://en.wikipedia.org/wiki/Host_model http://technet.microsoft.com/en-us/magazine/2007.09.cableguy.aspx We did that, and.. we'll see.

Read the article

MySQL port 3306 blocked in csf yet can still telnet to port 3306 from external host

- by Neek

We have a Centos 6 VPS that was recently migrated to a new machine within the same web hosting company. It's running WHM/cPanel and has csf/lfd installed. csf is set up with mostly vanilla config. I'm no iptables expert, csf has not let me down before. If a port isn't in the TCP_IN list, it should be blocked on the firewall by iptables. My problem is that I can telnet to port 3306 from an external host, yet I think iptables ought to be blocking 3306 because of csf's rules. We are now failing a security check because of this open port. (this output is obfuscated to protect the innocent: www.ourhost.com is the host with the firewall problem) [root@nickfenwick log]# telnet www.ourhost.com 3306 Trying 158.255.45.107... Connected to www.ourhost.com. Escape character is '^]'. HHost 'nickfenwick.com' is not allowed to connect to this MySQL serverConnection closed by foreign host. So the connection is established, and MySQL refuses the connection due to its configuration. I need the network connection to be refused at the firewall level, before it reaches MySQL. Using WHM's csf web UI I can see 'Firewall Configuration' includes a fairly sensible TCP_IN line: TCP_IN: 20,21,22,25,53,80,110,143,222,443,465,587,993,995,2077,2078,2082,2083,2086,2087,2095,2096,8080 (lets ignore that I could trim that a little for now, my concern is that 3306 is not listed in that list) When csf is restarted it logs the usual slew of output as it sets up iptables rules, for example what looks like it blocking all traffic and then allowing specific ports like SSH on 22: [cut] DROP all opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 [cut] ACCEPT tcp opt -- in !lo out * 0.0.0.0/0 -> 0.0.0.0/0 state NEW tcp dpt:22 [cut] I can see that iptables is running, service iptables status returns a long list of firewall rules. Here is my Chain INPUT section from service iptables status, hopefully that's enough to show how the firewall is configured. Table: filter Chain INPUT (policy DROP) num target prot opt source destination 1 acctboth all -- 0.0.0.0/0 0.0.0.0/0 2 ACCEPT tcp -- 217.112.88.10 0.0.0.0/0 tcp dpt:53 3 ACCEPT udp -- 217.112.88.10 0.0.0.0/0 udp dpt:53 4 ACCEPT tcp -- 217.112.88.10 0.0.0.0/0 tcp spt:53 5 ACCEPT udp -- 217.112.88.10 0.0.0.0/0 udp spt:53 6 ACCEPT tcp -- 8.8.4.4 0.0.0.0/0 tcp dpt:53 7 ACCEPT udp -- 8.8.4.4 0.0.0.0/0 udp dpt:53 8 ACCEPT tcp -- 8.8.4.4 0.0.0.0/0 tcp spt:53 9 ACCEPT udp -- 8.8.4.4 0.0.0.0/0 udp spt:53 10 ACCEPT tcp -- 8.8.8.8 0.0.0.0/0 tcp dpt:53 11 ACCEPT udp -- 8.8.8.8 0.0.0.0/0 udp dpt:53 12 ACCEPT tcp -- 8.8.8.8 0.0.0.0/0 tcp spt:53 13 ACCEPT udp -- 8.8.8.8 0.0.0.0/0 udp spt:53 14 LOCALINPUT all -- 0.0.0.0/0 0.0.0.0/0 15 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 16 INVALID tcp -- 0.0.0.0/0 0.0.0.0/0 17 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 18 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:20 19 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:21 20 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22 21 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:25 22 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:53 23 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:80 24 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:110 25 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:143 26 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:222 27 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:443 28 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:465 29 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:587 30 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:993 31 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:995 32 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:2077 33 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:2078 34 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:2082 35 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:2083 36 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:2086 37 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:2087 38 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:2095 39 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:2096 40 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:8080 41 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 state NEW udp dpt:20 42 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 state NEW udp dpt:21 43 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 state NEW udp dpt:53 44 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 state NEW udp dpt:222 45 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 state NEW udp dpt:8080 46 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 8 47 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 0 48 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 11 49 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 3 50 LOGDROPIN all -- 0.0.0.0/0 0.0.0.0/0 What's the next thing to check?

Read the article

Linux Kernel crash mutex_lock_slowpath "blocked for more than 120 seconds". What to do?

- by Roddick

I have out-of-the box Debian Lenny with non-custom kernel 2.6.26-2-amd64. Brand new server that is used to 5% of it's potential, CPU and Disk-wise. Meaning it probably not crashing because of overload. every few days it freezes with hundreds of these messages in console log: : [284847.828428] INFO: task apache2:12473 blocked for more than 120 seconds. : [284847.868468] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. : [284847.912759] apache2 D ffff8101bc6b7ab0 0 12473 14358 : [284847.912763] ffff810160d5bc50 0000000000000082 ffff8101c0002e40 0000000000000000 : [284847.912766] ffff8101a7c42950 ffff810327d92810 ffff8101a7c42bd8 0000000400000044 : [284847.912770] ffff8101c0002e40 00000000000612d0 0000000000000000 00000040000612d0 : [284847.912773] Call Trace: : [284847.912786] [<ffffffff80429b0d>] __mutex_lock_slowpath+0x64/0x9b : [284847.912790] [<ffffffff80429972>] mutex_lock+0xa/0xb : [284847.912794] [<ffffffff802a20b9>] do_lookup+0x82/0x1c1 : [284847.912800] [<ffffffff802a4271>] __link_path_walk+0x87a/0xd19 : [284847.912805] [<ffffffff80295844>] kmem_getpages+0x96/0x15f : [284847.912808] [<ffffffff80295fb7>] ____cache_alloc_node+0x6d/0x106 : [284847.912814] [<ffffffff802a4756>] path_walk+0x46/0x8b : [284847.912819] [<ffffffff802a4a82>] do_path_lookup+0x158/0x1cf : [284847.912822] [<ffffffff802a3879>] getname+0x140/0x1a7 : [284847.912827] [<ffffffff802a53f1>] __user_walk_fd+0x37/0x4c : [284847.912831] [<ffffffff8029e381>] vfs_lstat_fd+0x18/0x47 : [284847.912840] [<ffffffff8029e3c9>] sys_newlstat+0x19/0x31 : [284847.912848] [<ffffffff8020beda>] system_call_after_swapgs+0x8a/0x8f Almost all traces has __mutex_lock_slowpath as top-level. Only some has different trace: : [284847.737386] INFO: task apache2:12472 blocked for more than 120 seconds. : [284847.777551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. : [284847.824881] apache2 D ffff8101bc6b7ab0 0 12472 14358 : [284847.824886] ffff8101b9cc1c50 0000000000000086 ffffffffa0131e0a 0000000000000002 : [284847.824889] ffff8102e7454300 ffff810324c6cad0 ffff8102e7454588 0000000000000000 : [284847.824893] 0000000000000001 0000000000000296 0000000000000003 ffff8101b9cc1c58 : [284847.824896] Call Trace: : [284847.828403] [<ffffffffa0131e0a>] :ext3:__ext3_journal_dirty_metadata+0x1e/0x46 : [284847.828412] [<ffffffff80429b0d>] __mutex_lock_slowpath+0x64/0x9b : [284847.828418] [<ffffffff80429972>] mutex_lock+0xa/0xb : [284847.828421] [<ffffffff802a20b9>] do_lookup+0x82/0x1c1 : [284847.828427] [<ffffffff802a4271>] __link_path_walk+0x87a/0xd19 : [284847.828428] [<ffffffff80271296>] find_lock_page+0x1f/0x8a : [284847.828428] [<ffffffff80273182>] filemap_fault+0x1c2/0x33c : [284847.828428] [<ffffffff802a4756>] path_walk+0x46/0x8b : [284847.828428] [<ffffffff802a4a82>] do_path_lookup+0x158/0x1cf : [284847.828428] [<ffffffff802a3879>] getname+0x140/0x1a7 : [284847.828428] [<ffffffff802a53f1>] __user_walk_fd+0x37/0x4c : [284847.828428] [<ffffffff8029e381>] vfs_lstat_fd+0x18/0x47 : [284847.828428] [<ffffffff8029e3c9>] sys_newlstat+0x19/0x31 : [284847.828428] [<ffffffff8020beda>] system_call_after_swapgs+0x8a/0x8f kernel: [1912668.466347] INFO: task apache2:17984 blocked for more than 120 seconds. [1912668.507035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. : [1912668.555165] apache2 D ffff8101c5637ba0 0 17984 17282 : [1912668.596752] ffff810166a7dd30 0000000000000086 0000000000000000 ffff810166a7dcd8 : [1912668.643341] ffff8101c563c880 ffff81024505f000 0000000000000002 ffff810166a7dd68 : [1912668.699566] 0000000000000086 00000000000cb1a0 0000000000000000 ffff81017f344d60 : [1912668.744773] Call Trace: : [1912668.761754] [<ffffffff8022a3ed>] pick_next_task_fair+0x6e/0x7a : [1912668.829311] [<ffffffff802be0e2>] bio_alloc_bioset+0x89/0xd9 : [1912668.861930] [<ffffffff8024ac3a>] getnstimeofday+0x39/0x98 : [1912668.897005] [<ffffffff802710f6>] sync_page+0x0/0x41 : [1912668.927868] [<ffffffff80429487>] io_schedule+0x5c/0x9e : [1912668.960286] [<ffffffff80271132>] sync_page+0x3c/0x41 : [1912668.991756] [<ffffffff804295fa>] __wait_on_bit_lock+0x36/0x66 : [1912669.031757] [<ffffffff802710e3>] __lock_page+0x5e/0x64 : [1912669.064191] [<ffffffff802461d3>] wake_bit_function+0x0/0x23 : [1912669.100100] [<ffffffff80281bc5>] handle_mm_fault+0x5e4/0x8de : [1912669.134531] [<ffffffff802461a5>] autoremove_wake_function+0x0/0x2e : [1912669.174623] [<ffffffff802aa108>] fcntl_setlk+0x1cf/0x291 : [1912669.210623] [<ffffffff802461a5>] autoremove_wake_function+0x0/0x2e : [1912669.246923] [<ffffffff802a677f>] sys_fcntl+0x280/0x2f7 After googling for "mutex_lock_slowpath" I can only find the Kernel mailing list discussions that this issue was introduced in some commit. Wthout reference to verison. Discussions as recent as Jan 25, 2011. The Kernel I am using is form Debian Lenny, year ago. What should I do? Is this bug even fixed in kernel? if it's such obvious bug why it happens so rarely? Should I download latest kernel from kernel.org and upgrade? Should I use Debian backports to install new "Approved" kernel? Am I missing something? What to do?

Read the article

Bad disks in ancient server

- by Joel Coel

I have a 1998-era Netware 3.12 server that runs everything on our campus: general ledger, purchasing, payroll, student information, grades, you name it. The server has an Adaptec RAID controller with two volumes: RAID 1, 2 17GB scsi disks, Seagate ST318417W RAID 5, 3 4GB scsi disks, 2 Seagate ST34573W and 1 ST34572W. We are currently in the early stages of a project to replace this system, but you don't just jump into a new system like that and so I need to keep this server running until at least November 2011. This week we had not one but two hard drives fail. Thankfully they are from different volumes and we're able to keep running for the moment, but given the close nature of these failures I have serious doubts that I'll be able to avoid catastrophic failure from this server through the November target as is without restoring the RAID redundancy — it'll only take one more drive failure anywhere and I'm completely hosed. We are fortunate enough to have exact match "spares" lying around for both drives, but the spares are in unknown condition. I tried swapping just them in, but the RAID controller isn't smart enough to handle this and it renders the system unbootable. As for the RAID controller itself, there is utility I can get into during POST via a Ctrl-A shortcut, but I can't do much useful from there. To actually manage volumes I must first boot in to Netware, at which point I can use CI/O Array Management Software Version 2.0 to actually look at volume information. I suspect that the normal way to manage things is to boot from a special floppy with the controller software on it, but that floppy is long gone. Going through the options in the RAID software, I think the only supported way to replace a disk in an existing RAID volume is to physically add the disk, boot up and configure it as a "spare" for a volume, force the volume to use the spare to replace an existing down disk (and at this point I'm only guessing) so that the down disk becomes the spare, repair the volume, remove the spare from the volume, and then shut down and remove the disk. Then start all over for the other failed disk. All this amounts to a lot of downtime, assuming I can even make it work and that my spares are any good. As for finding reliable spares, I have no clue where to even begin looking to find a new 4GB scsi drive, or even which exact scsi system I'm looking for, as it's gone through a few different iterations over time. Another option is to migrate this to a virtual machine (hyper-v), but all previous attempts we've made in this area have failed to get very far. When this machine was installed I was just graduating from high school, and so it requires lower level knowledge of netware and dos than I ever developed, or if I did have since forgotten (I'm not exactly a dos neophyte, either). Part of my problem is this is a high-use server, and taking it down for a few days to figure things out isn't gonna fly very well. As for the question, I'm looking for anything that might be helpful in this situation: a recommendation on a place to find good spares from this era, personal experience repairing RAID volumes using a similar controller or building a hyper-v vm from an old netware server, a line on a floppy with better software for the RAID controller, recommendation on a good Novell consultant in Nebraska that would be able to put things right, a whole other option I haven't considered yet, etc. Update: For backups, we have good (recently verified via restore) backups of the data only -- nothing for the software that actually runs things. Update 2: Just a progress report that I currently have a working Netware 3.12 install in VMWare Virtual Server 2.0, thanks largely to the guide I found here: http://cerbulescubogdan.blogspot.com/2010/11/novell-netware-312-on-vmware.html The next steps are preparing empty netware volumes to match the additional volumes on my existing server, taking a dump of everything on the C:\ drive and netware volumes on my existing server, and figuring out from that information what modules need added to netware, installing my licenses (we do still have that disk, if it's any good), and moving data over. I have approval to bring the server down for a week after the first of the year (sadly not before), so, aside from creating empty volumes, the rest of the work will have to wait until then. Final Update (Jan 5, 2011): I was able to get spares working in both raid arrays without data loss this week. Both are now listed by the controller as "FAULT TOLLERANT" (yay!). I was also able to build on the progress from my last update and now have a functional "spare" server in VMWare Server 2.0. The spare can run and use our erp software, but I can't put it into production because I can't (yet) print from that box (and I have no idea why). Even so, this VM will do in a pinch if I have no other choice, and between it and the repaired RAID arrays I'm comfortable pushing on until I can junk the machine in November.

Read the article

Need help configurating my Tomcat server

- by gablin

I just reinstalled my entire server, and now I can't seem to get my JSP-based website to work on Tomcat anymore. I use the same server.xml file, which worked perfectly before the reinstallation, but no longer. Here's the content of the server.xml file which worked before:  <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />  <Listener className="org.apache.catalina.core.JasperListener" />  <Listener className="org.apache.catalina.mbeans.ServerLifecycleListener" /> <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" />  <GlobalNamingResources>  <Resource name="UserDatabase" auth="Container" type="org.apache.catalina.UserDatabase" description="User database that can be updated and saved" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" pathname="conf/tomcat-users.xml" /> </GlobalNamingResources>  <Service name="Catalina">    <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />      <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />   <Engine name="Catalina" defaultHost="localhost">      <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase"/>        <Host name="www.rebootradio.nu"> <Alias>rebootradio.nu</Alias> <Context path="" docBase="D:/services/http/rebootradio.nu" debug="1" reloadable="true"/> </Host> </Engine> </Service> </Server> The JSP site doesn't use any WAR files or anything like that; there's just a default.jsp in the specified folder D:/services/http/rebootradio.nu which loads the site. As I said, this configuration worked before, but now with the latest verion of XAMPP and Tomcat it doesn't work anymore. All I get is a 404 message saying The requested resource () is not available.

Read the article

Web browsing is fast, but downloads are slow

- by Ricket

I work for a company on my university's campus, helping with general IT problems and some web development. But lately there has been a problem that has me and my boss completely stumped. We, plus one contractor, make up the entire IT department, so I'm reaching out to you for help. All around the office, we have wall jacks. These collect in a closet down the hall and all plug into a switch. This switch, along with our individual server jacks, plugs into another switch, and that switch plugs into our firewall hardware. Then the firewall is connected out to our campus network. Our campus internet is, well, very fast. I don't know exactly the terms, tiers, etc., but we have thousands of students and downloads can run as fast as 10 MB/s at night; uploads are sometimes even faster. I think we're practically ISP level. In short, I have a lot of faith that it is not the campus side of things that is causing a problem, combined with other evidence I'll mention in a moment. So our symptoms: web browsing is fast. Web pages, images, etc. load instantly. No problems there. But then when I go to download something, the download starts fast but very quickly (a matter of seconds) drops to nearly 0. Often it will actually drop to 0 and time out. This happens with even very small files, 1 MB or less. It smells to me like a QoS sort of thing. I'm not entirely sure, and I wanted to get your opinions first. My boss is hesitant to touch our firewall, much less let me touch it, and it was set up and is managed by a consultant remotely. These problems don't seem tied to a time of the day. I've tried downloads after 5:00 and still the same thing happens. From my desk, I can turn on my wireless adapter and pick up the campus wireless access point. If I unplug ethernet and connect to it, downloads are fast. This adds to my suspicion that it's limited to our company network. Also, a number of weeks ago the consultant upgraded our firewall firmware. Suddenly everything was very fast. I tested with downloads from Sun and speedtest.net and things were blazing fast, as they should be with our campus internet! It was wonderful, and I figured the slow speeds were an old firmware bug. In a matter of days, things steadily declined until they were back to the old symptoms. Oh, and we have antivirus installed on every computer, and we keep it up to date. Though I suppose the possibility is still there that someone could have spyware which is bogging down our internet, in which case what is the easiest/best way to find this out? (maybe this should go in a separate question) Thank you for your patience in reading all of this. Do you have any ideas as to what I can try? Is this something that you've experienced before? What sort of tools or methods can I use to try and diagnose the problem? P.S. everything here is Windows. Windows Server 2003 and 2008 on our servers, and Windows XP on employees' machines. Update: We are submitting a ticket to the university to just take a look and see if they see anything unusual and/or can suggestion methods for us to try and pinpoint our problem. Hopefully they'll be helpful! I'll update this to let you know what goes on. Update again: We found a hub (yes, a HUB) right between our campus connection and our firewall. It had only those two ethernet cables plugged into it, nothing else. After removing the hub, our speeds have jumped up to several mbps. However in talking with the campus, we got them to run a gigabit line to our firewall in place of the 100mbps line. As of friday, we are at about 65 mbps up and down (according to speedtest.net at 8am)!! Go NC State!!

Read the article

FFMPEG Segfault Solutions

- by Brentley_11

I'm trying to convert a bunch of movies into h.264 mp4's using FFMPEG. These movies are sourced from various portable camcorders such as the Flip Mino HD and the Kodak ZI8. One issue I'm having with video from the ZI8 is it seems to be causing FFMPEG to segfault. Here is my command: ffmpeg -i 'XmasSailor720p60fps.MOV' -threads 2 -acodec libfaac -ab 96kb -vcodec libx264 -vpre hq -b 500kb -s 484x272 XmasSailor.mp4 Here is the output: FFmpeg version SVN-r20668, Copyright (c) 2000-2009 Fabrice Bellard, et al. built on Dec 2 2009 18:37:34 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4) configuration: --enable-libfaac --enable-libfaad --enable-libmp3lame --enable-libx264 --enable-gpl --enable-nonfree --enable-postproc --enable-pthreads --enable-shared libavutil 50. 5. 1 / 50. 5. 1 libavcodec 52.42. 0 / 52.42. 0 libavformat 52.39. 2 / 52.39. 2 libavdevice 52. 2. 0 / 52. 2. 0 libswscale 0. 7. 2 / 0. 7. 2 libpostproc 51. 2. 0 / 51. 2. 0 Seems stream 0 codec frame rate differs from container frame rate: 59.94 (60000/1001) -> 29.97 (30000/1001) Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'XmasSailor720p60fps.MOV': Duration: 00:00:05.37, start: 0.000000, bitrate: 12021 kb/s Stream #0.0(eng): Video: h264, yuv420p, 1280x720 [PAR 1:1 DAR 16:9], 11994 kb/s, 29.97 tbr, 90k tbn, 59.94 tbc Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 128 kb/s Metadata major_brand : qt minor_version : 0 compatible_brands: qt comment : KODAK Zi8 Pocket Video Camera comment-eng : KODAK Zi8 Pocket Video Camera [libx264 @ 0x99e1020]using SAR=1/1 [libx264 @ 0x99e1020]using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.1 Cache64 [libx264 @ 0x99e1020]profile High, level 2.1 Output #0, mp4, to 'XmasSailor.mp4': Stream #0.0(eng): Video: libx264, yuv420p, 484x272 [PAR 1:1 DAR 121:68], q=10-51, 500 kb/s, 30k tbn, 29.97 tbc Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 96 kb/s Metadata comment : Encoded with the Statusfirm Video Transcoder Stream mapping: Stream #0.0 -> #0.0 Stream #0.1 -> #0.1 Press [q] to stop encoding [h264 @ 0x99de950]B picture before any references, skipping [h264 @ 0x99de950]decode_slice_header error [h264 @ 0x99de950]no frame! Error while decoding stream #0.0 [h264 @ 0x99de950]B picture before any references, skipping [h264 @ 0x99de950]decode_slice_header error [h264 @ 0x99de950]no frame! Error while decoding stream #0.0 frame= 20 fps= 0 q=13797729.0 size= 0kB time=0.66 bitrate= 0.6kbits/s frame= 39 fps= 37 q=13797729.0 size= 0kB time=1.30 bitrate= 0.3kbits/s frame= 48 fps= 30 q=33.0 size= 11kB time=0.10 bitrate= 903.0kbits/s frame= 58 fps= 27 q=31.0 size= 22kB time=0.43 bitrate= 421.0kbits/s frame= 67 fps= 25 q=29.0 size= 41kB time=0.73 bitrate= 462.6kbits/s frame= 75 fps= 23 q=29.0 size= 59kB time=1.00 bitrate= 486.7kbits/s frame= 83 fps= 22 q=29.0 size= 81kB time=1.27 bitrate= 521.9kbits/s frame= 90 fps= 21 q=29.0 size= 97kB time=1.50 bitrate= 530.1kbits/s frame= 98 fps= 20 q=29.0 size= 114kB time=1.77 bitrate= 526.9kbits/s frame= 106 fps= 20 q=29.0 size= 134kB time=2.04 bitrate= 537.7kbits/s frame= 114 fps= 19 q=29.0 size= 150kB time=2.30 bitrate= 533.7kbits/s frame= 122 fps= 19 q=29.0 size= 172kB time=2.57 bitrate= 547.8kbits/s frame= 130 fps= 19 q=29.0 size= 193kB time=2.84 bitrate= 557.5kbits/s frame= 136 fps= 18 q=29.0 size= 211kB time=3.04 bitrate= 570.0kbits/s frame= 144 fps= 18 q=29.0 size= 242kB time=3.30 bitrate= 599.5kbits/s frame= 152 fps= 17 q=30.0 size= 261kB time=3.57 bitrate= 598.6kbits/s frame= 157 fps= 15 q=-1.0 Lsize= 368kB time=5.21 bitrate= 579.3kbits/s video:302kB audio:61kB global headers:0kB muxing overhead 1.416371% [libx264 @ 0x99e1020]frame I:1 Avg QP:27.22 size: 8720 [libx264 @ 0x99e1020]frame P:48 Avg QP:25.15 size: 3759 [libx264 @ 0x99e1020]frame B:108 Avg QP:30.10 size: 1105 [libx264 @ 0x99e1020]consecutive B-frames: 0.6% 11.5% 28.8% 59.0% [libx264 @ 0x99e1020]mb I I16..4: 28.5% 47.6% 23.9% [libx264 @ 0x99e1020]mb P I16..4: 0.8% 1.3% 0.5% P16..4: 50.6% 17.7% 13.1% 0.0% 0.0% skip:15.9% [libx264 @ 0x99e1020]mb B I16..4: 0.2% 0.3% 0.1% B16..8: 44.0% 1.2% 2.6% direct: 5.1% skip:46.5% L0:45.5% L1:51.0% BI: 3.5% [libx264 @ 0x99e1020]final ratefactor: 23.51 [libx264 @ 0x99e1020]8x8 transform intra:49.9% inter:67.9% [libx264 @ 0x99e1020]direct mvs spatial:98.1% temporal:1.9% [libx264 @ 0x99e1020]coded y,uvDC,uvAC intra: 54.7% 76.1% 41.4% inter: 17.1% 24.4% 7.8% [libx264 @ 0x99e1020]i16 v,h,dc,p: 18% 52% 5% 25% [libx264 @ 0x99e1020]i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 12% 22% 9% 7% 10% 10% 9% 8% 13% [libx264 @ 0x99e1020]i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 13% 18% 8% 8% 10% 13% 10% 9% 12% [libx264 @ 0x99e1020]Weighted P-Frames: Y:10.4% [libx264 @ 0x99e1020]ref P L0: 60.2% 15.3% 11.0% 7.6% 5.2% 0.7% [libx264 @ 0x99e1020]ref B L0: 72.6% 15.6% 11.8% [libx264 @ 0x99e1020]kb/s:471.17 Segmentation fault I'm wondering if anyone else has ran into similar issues. I wasn't able to find anything helpful via Google. Another question I have is if anyone knows of a company that offers paid support for FFMPEG. Thank you for your time.

Search Results

Search found 14780 results on 592 pages for 'low level'.

Page 566/592 | < Previous Page | 562 563 564 565 566 567 568 569 570 571 572 573 | Next Page >

- by dpalau

- by John Francis

- by Gregg Leventhal

- by Brigadieren

- by Peter Bos

- by derty

- by gablin

- by Chris

- by Sim

- by Dmitro

- by Anadi Misra

- by d0nd

- by Daniel

- by Brentley_11

- by WishCow

- by realtebo

- by Godric Seer

- by whitequark

- by Geoff Dalgas

- by Neek

- by Roddick

- by Joel Coel

- by gablin

- by Ricket

- by Brentley_11

< Previous Page | 562 563 564 565 566 567 568 569 570 571 572 573 | Next Page >