seagate - Page 14 - Developer IT

Kubuntu 12.04 - Touchpad and keyboard stopped working at random

- by StepTNT

As in the title, I've got this problem with my Kubuntu 12.04. At first I've thought that the whole system was hung, but it happened again 5 minutes ago and, while the keyboard and the touchpad stopped working, the music was still playing, so I guess that's just an "input" problem, because the system was still working! Any solution? Is there some data that you need to know about my setup? EDIT: Added my lshw outout description: Notebook product: N53SV () vendor: ASUSTeK Computer Inc. version: 1.0 serial: B2N0AS17695408A width: 64 bits capabilities: smbios-2.6 dmi-2.6 vsyscall32 configuration: boot=normal chassis=notebook family=N uuid=8083F2DA-A43E-E081-3F3F-BCAEC55F8AA1 *-core description: Motherboard product: N53SV vendor: ASUSTeK Computer Inc. physical id: 0 version: 1.0 serial: BSN12345678901234567 slot: MIDDLE *-firmware description: BIOS vendor: American Megatrends Inc. physical id: 0 version: N53SV.214 date: 08/10/2011 size: 64KiB capacity: 2496KiB capabilities: pci upgrade shadowing cdboot bootselect edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb smartbattery biosbootspecification *-cpu description: CPU product: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz vendor: Intel Corp. physical id: 4 bus info: cpu@0 version: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz serial: To Be Filled By O.E.M. slot: CPU 1 size: 800MHz capacity: 4GHz width: 64 bits clock: 100MHz capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts tpr_shadow vnmi flexpriority ept vpid cpufreq configuration: cores=4 enabledcores=1 threads=2 *-cache description: L1 cache physical id: 5 slot: L1-Cache size: 32KiB capacity: 32KiB capabilities: internal write-back instruction *-memory description: System Memory physical id: 40 slot: System board or motherboard size: 10GiB *-bank:0 description: SODIMM DDR3 Synchronous 1333 MHz (0,8 ns) product: 99U5428-040.A00LF vendor: Kingston physical id: 0 serial: 103C28C3 slot: ChannelA-DIMM0 size: 4GiB width: 64 bits clock: 1333MHz (0.8ns) *-bank:1 description: SODIMM DDR3 Synchronous 1333 MHz (0,8 ns) product: HMT325S6BFR8C-H9 vendor: Hynix/Hyundai physical id: 1 serial: 58383D1F slot: ChannelA-DIMM1 size: 2GiB width: 64 bits clock: 1333MHz (0.8ns) *-bank:2 description: SODIMM DDR3 Synchronous 1333 MHz (0,8 ns) product: HMT325S6BFR8C-H9 vendor: Hynix/Hyundai physical id: 2 serial: 58183D19 slot: ChannelB-DIMM0 size: 2GiB width: 64 bits clock: 1333MHz (0.8ns) *-bank:3 description: SODIMM DDR3 Synchronous 1333 MHz (0,8 ns) product: HMT325S6BFR8C-H9 vendor: Hynix/Hyundai physical id: 3 serial: 58183C8F slot: ChannelB-DIMM1 size: 2GiB width: 64 bits clock: 1333MHz (0.8ns) *-pci description: Host bridge product: 2nd Generation Core Processor Family DRAM Controller vendor: Intel Corporation physical id: 100 bus info: pci@0000:00:00.0 version: 09 width: 32 bits clock: 33MHz configuration: driver=agpgart-intel resources: irq:0 *-pci:0 description: PCI bridge product: Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port vendor: Intel Corporation physical id: 1 bus info: pci@0000:00:01.0 version: 09 width: 32 bits clock: 33MHz capabilities: pci pm msi pciexpress normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:40 ioport:d000(size=4096) memory:db000000-dc0fffff ioport:c0000000(size=301989888) *-generic UNCLAIMED description: Unassigned class product: Illegal Vendor ID vendor: Illegal Vendor ID physical id: 0 bus info: pci@0000:01:00.0 version: ff width: 32 bits clock: 66MHz capabilities: bus_master vga_palette cap_list configuration: latency=255 maxlatency=255 mingnt=255 resources: memory:db000000-dbffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:d000(size=128) memory:dc000000-dc07ffff *-display description: VGA compatible controller product: 2nd Generation Core Processor Family Integrated Graphics Controller vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 09 width: 64 bits clock: 33MHz capabilities: msi pm vga_controller bus_master cap_list rom configuration: driver=i915 latency=0 resources: irq:47 memory:dc400000-dc7fffff memory:b0000000-bfffffff ioport:e000(size=64) *-communication description: Communication controller product: 6 Series/C200 Series Chipset Family MEI Controller #1 vendor: Intel Corporation physical id: 16 bus info: pci@0000:00:16.0 version: 04 width: 64 bits clock: 33MHz capabilities: pm msi bus_master cap_list configuration: driver=mei latency=0 resources: irq:48 memory:df00b000-df00b00f *-usb:0 description: USB controller product: 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 vendor: Intel Corporation physical id: 1a bus info: pci@0000:00:1a.0 version: 05 width: 32 bits clock: 33MHz capabilities: pm debug ehci bus_master cap_list configuration: driver=ehci_hcd latency=0 resources: irq:16 memory:df008000-df0083ff *-multimedia description: Audio device product: 6 Series/C200 Series Chipset Family High Definition Audio Controller vendor: Intel Corporation physical id: 1b bus info: pci@0000:00:1b.0 version: 05 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: driver=snd_hda_intel latency=0 resources: irq:49 memory:df000000-df003fff *-pci:1 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 1 vendor: Intel Corporation physical id: 1c bus info: pci@0000:00:1c.0 version: b5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:41 ioport:c000(size=4096) memory:de600000-deffffff ioport:d4200000(size=10485760) *-pci:2 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 2 vendor: Intel Corporation physical id: 1c.1 bus info: pci@0000:00:1c.1 version: b5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:42 ioport:b000(size=4096) memory:ddc00000-de5fffff ioport:d3700000(size=10485760) *-network description: Wireless interface product: AR9285 Wireless Network Adapter (PCI-Express) vendor: Atheros Communications Inc. physical id: 0 bus info: pci@0000:03:00.0 logical name: wlan0 version: 01 serial: 48:5d:60:f2:2c:fd width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list ethernet physical wireless configuration: broadcast=yes driver=ath9k driverversion=3.2.0-24-generic firmware=N/A ip=192.168.1.6 latency=0 link=yes multicast=yes wireless=IEEE 802.11bgn resources: irq:17 memory:ddc00000-ddc0ffff *-pci:3 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 4 vendor: Intel Corporation physical id: 1c.3 bus info: pci@0000:00:1c.3 version: b5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:43 ioport:a000(size=4096) memory:dd200000-ddbfffff ioport:d2c00000(size=10485760) *-usb description: USB controller product: FL1000G USB 3.0 Host Controller vendor: Fresco Logic physical id: 0 bus info: pci@0000:04:00.0 version: 04 width: 32 bits clock: 33MHz capabilities: pm msi pciexpress xhci bus_master cap_list configuration: driver=xhci_hcd latency=0 resources: irq:19 memory:dd200000-dd20ffff *-pci:4 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 6 vendor: Intel Corporation physical id: 1c.5 bus info: pci@0000:00:1c.5 version: b5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:44 ioport:9000(size=4096) memory:dc800000-dd1fffff ioport:d2100000(size=10485760) *-network description: Ethernet interface product: RTL8111/8168B PCI Express Gigabit Ethernet controller vendor: Realtek Semiconductor Co., Ltd. physical id: 0 bus info: pci@0000:05:00.0 logical name: eth0 version: 06 serial: bc:ae:c5:5f:8a:a1 size: 10Mbit/s capacity: 1Gbit/s width: 64 bits clock: 33MHz capabilities: pm msi pciexpress msix vpd bus_master cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=2.3LK-NAPI duplex=half firmware=rtl_nic/rtl8168e-2.fw latency=0 link=no multicast=yes port=MII speed=10Mbit/s resources: irq:46 ioport:9000(size=256) memory:d2104000-d2104fff memory:d2100000-d2103fff *-usb:1 description: USB controller product: 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 vendor: Intel Corporation physical id: 1d bus info: pci@0000:00:1d.0 version: 05 width: 32 bits clock: 33MHz capabilities: pm debug ehci bus_master cap_list configuration: driver=ehci_hcd latency=0 resources: irq:23 memory:df007000-df0073ff *-isa description: ISA bridge product: HM65 Express Chipset Family LPC Controller vendor: Intel Corporation physical id: 1f bus info: pci@0000:00:1f.0 version: 05 width: 32 bits clock: 33MHz capabilities: isa bus_master cap_list configuration: latency=0 *-storage description: SATA controller product: 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller vendor: Intel Corporation physical id: 1f.2 bus info: pci@0000:00:1f.2 logical name: scsi0 logical name: scsi2 version: 05 width: 32 bits clock: 66MHz capabilities: storage msi pm ahci_1.0 bus_master cap_list emulated configuration: driver=ahci latency=0 resources: irq:45 ioport:e0b0(size=8) ioport:e0a0(size=4) ioport:e090(size=8) ioport:e080(size=4) ioport:e060(size=32) memory:df006000-df0067ff *-disk description: ATA Disk product: ST9750420AS vendor: Seagate physical id: 0 bus info: scsi@0:0.0.0 logical name: /dev/sda version: 0002 serial: 5WS0A7QR size: 698GiB (750GB) capabilities: partitioned partitioned:dos configuration: ansiversion=5 signature=e0c5913d *-volume:0 description: Windows FAT volume vendor: MSDOS5.0 physical id: 1 bus info: scsi@0:0.0.0,1 logical name: /dev/sda1 version: FAT32 serial: 4ce5-3acb size: 3004MiB capacity: 3004MiB capabilities: primary fat initialized configuration: FATs=2 filesystem=fat *-volume:1 description: EXT4 volume vendor: Linux physical id: 2 bus info: scsi@0:0.0.0,2 logical name: /dev/sda2 logical name: / version: 1.0 serial: c198cc2a-d86a-4460-a4d5-3fc0b21e439c size: 28GiB capacity: 28GiB capabilities: primary journaled extended_attributes large_files huge_files dir_nlink recover extents ext4 ext2 initialized configuration: created=2012-03-15 16:53:54 filesystem=ext4 lastmountpoint=/ modified=2012-05-02 18:52:04 mount.fstype=ext4 mount.options=rw,relatime,errors=remount-ro,user_xattr,acl,barrier=1,data=ordered mounted=2012-05-09 19:06:01 state=mounted *-volume:2 description: Windows NTFS volume physical id: 3 bus info: scsi@0:0.0.0,3 logical name: /dev/sda3 version: 3.1 serial: 4c1cdebc-ec09-2947-a3b5-c1f9f1cddc1c size: 152GiB capacity: 152GiB capabilities: primary bootable ntfs initialized configuration: clustersize=4096 created=2011-02-22 16:02:47 filesystem=ntfs label=OS state=clean *-volume:3 description: Extended partition physical id: 4 bus info: scsi@0:0.0.0,4 logical name: /dev/sda4 size: 514GiB capacity: 514GiB capabilities: primary extended partitioned partitioned:extended *-logicalvolume:0 description: Linux swap / Solaris partition physical id: 5 logical name: /dev/sda5 capacity: 10GiB capabilities: nofs *-logicalvolume:1 description: HPFS/NTFS partition physical id: 6 logical name: /dev/sda6 capacity: 504GiB *-cdrom description: DVD-RAM writer product: BD-MLT UJ240AS vendor: MATSHITA physical id: 1 bus info: scsi@2:0.0.0 logical name: /dev/cdrom logical name: /dev/cdrw logical name: /dev/dvd logical name: /dev/dvdrw logical name: /dev/sr0 version: 1.00 capabilities: removable audio cd-r cd-rw dvd dvd-r dvd-ram configuration: ansiversion=5 status=nodisc *-serial UNCLAIMED description: SMBus product: 6 Series/C200 Series Chipset Family SMBus Controller vendor: Intel Corporation physical id: 1f.3 bus info: pci@0000:00:1f.3 version: 05 width: 64 bits clock: 33MHz configuration: latency=0 resources: memory:df005000-df0050ff ioport:e040(size=32)

Read the article

laptop crashed: why?

- by sds

my linux (ubuntu 12.04) laptop crashed, and I am trying to figure out why. # last sds pts/4 :0 Tue Sep 4 10:01 still logged in sds pts/3 :0 Tue Sep 4 10:00 still logged in reboot system boot 3.2.0-29-generic Tue Sep 4 09:43 - 11:23 (01:40) sds pts/8 :0 Mon Sep 3 14:23 - crash (19:19) this seems to indicate a crash at 09:42 (= 14:23+19:19). as per another question, I looked at /var/log: auth.log: Sep 4 09:17:02 t520sds CRON[32744]: pam_unix(cron:session): session closed for user root Sep 4 09:43:17 t520sds lightdm: pam_unix(lightdm:session): session opened for user lightdm by (uid=0) no messages file syslog: Sep 4 09:24:19 t520sds kernel: [219104.819975] CPU0: Package power limit normal Sep 4 09:43:16 t520sds kernel: imklog 5.8.6, log source = /proc/kmsg started. kern.log: Sep 4 09:24:19 t520sds kernel: [219104.819969] CPU1: Package power limit normal Sep 4 09:24:19 t520sds kernel: [219104.819971] CPU2: Package power limit normal Sep 4 09:24:19 t520sds kernel: [219104.819974] CPU3: Package power limit normal Sep 4 09:24:19 t520sds kernel: [219104.819975] CPU0: Package power limit normal Sep 4 09:43:16 t520sds kernel: imklog 5.8.6, log source = /proc/kmsg started. Sep 4 09:43:16 t520sds kernel: [ 0.000000] Initializing cgroup subsys cpuset Sep 4 09:43:16 t520sds kernel: [ 0.000000] Initializing cgroup subsys cpu I had a computation running until 9:24, but the system crashed 18 minutes later! kern.log has many pages of these: Sep 4 09:43:16 t520sds kernel: [ 0.000000] total RAM covered: 8086M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 64K num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 128K num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 256K num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 512K num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 1M num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 2M num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 4M num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 8M num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 16M num_reg: 10 lose cover RAM: 38M Sep 4 09:43:16 t520sds kernel: [ 0.000000] *BAD*gran_size: 64K chunk_size: 32M num_reg: 10 lose cover RAM: -16M Sep 4 09:43:16 t520sds kernel: [ 0.000000] *BAD*gran_size: 64K chunk_size: 64M num_reg: 10 lose cover RAM: -16M Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 128M num_reg: 10 lose cover RAM: 0G Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 256M num_reg: 10 lose cover RAM: 0G Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 512M num_reg: 10 lose cover RAM: 0G Sep 4 09:43:16 t520sds kernel: [ 0.000000] gran_size: 64K chunk_size: 1G num_reg: 10 lose cover RAM: 0G Sep 4 09:43:16 t520sds kernel: [ 0.000000] *BAD*gran_size: 64K chunk_size: 2G num_reg: 10 lose cover RAM: -1G does this mean that my RAM is bad?! it also says Sep 4 09:43:16 t520sds kernel: [ 2.944123] EXT4-fs (sda1): INFO: recovery required on readonly filesystem Sep 4 09:43:16 t520sds kernel: [ 2.944126] EXT4-fs (sda1): write access will be enabled during recovery Sep 4 09:43:16 t520sds kernel: [ 3.088001] firewire_core: created device fw0: GUID f0def1ff8fbd7dff, S400 Sep 4 09:43:16 t520sds kernel: [ 8.929243] EXT4-fs (sda1): orphan cleanup on readonly fs Sep 4 09:43:16 t520sds kernel: [ 8.929249] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 658984 ... Sep 4 09:43:16 t520sds kernel: [ 9.343266] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 525343 Sep 4 09:43:16 t520sds kernel: [ 9.343270] EXT4-fs (sda1): 56 orphan inodes deleted Sep 4 09:43:16 t520sds kernel: [ 9.343271] EXT4-fs (sda1): recovery complete Sep 4 09:43:16 t520sds kernel: [ 9.645799] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) does this mean my HD is bad? As per FaultyHardware, I tried smartctl -l selftest, which uncovered no errors: smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-30-generic] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Momentus 7200.4 Device Model: ST9500420AS Serial Number: 5VJE81YK LU WWN Device Id: 5 000c50 0440defe3 Firmware Version: 0003LVM1 User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon Sep 10 16:40:04 2012 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 109) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103b) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 099 034 Pre-fail Always - 162843537 3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 571 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 17210154023 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 174362787320258 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 571 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 1 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 061 043 045 Old_age Always In_the_past 39 (0 11 44 26) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 84 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 20 193 Load_Cycle_Count 0x0032 099 099 000 Old_age Always - 2434 194 Temperature_Celsius 0x0022 039 057 000 Old_age Always - 39 (0 15 0 0) 195 Hardware_ECC_Recovered 0x001a 041 041 000 Old_age Always - 162843537 196 Reallocated_Event_Count 0x000f 095 095 030 Pre-fail Always - 4540 (61955, 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 4545 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Googling for the messages proved inconclusive, I can't even figure out whether the messages are routine or catastrophic. So, what do I do now?

Read the article

How do I make Linux recognize a new SATA /dev/sda drive I hot swapped in without rebooting?

- by Philip Durbin

Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized: [root@fs-2 ~]# tail -18 /var/log/messages May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake } May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset May 5 16:54:45 fs-2 kernel: ata1: soft resetting link May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:54:55 fs-2 kernel: ata1: soft resetting link May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:05 fs-2 kernel: ata1: soft resetting link May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps May 5 16:55:40 fs-2 kernel: ata1: soft resetting link May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up May 5 16:55:45 fs-2 kernel: ata1: EH complete I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work: [root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan -bash: echo: write error: Invalid argument [root@fs-2 ~]# [root@fs-2 ~]# /root/rescan-scsi-bus.sh -l [snip] 0 new device(s) found. 0 device(s) removed. [root@fs-2 ~]# [root@fs-2 ~]# ls /dev/sda ls: /dev/sda: No such file or directory I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting? The operating system in question is RHEL5.3: [root@fs-2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.3 (Tikanga) The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS. Here is the lscpi output: [root@fs-2 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful. The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output: [root@www-1 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)

Read the article

Confirm disk is broken when it passes all diagnostics

- by Halfgaar

I have a system with a potentially broken disk, but the disk passes all manner of diagnostics. I have been unable to confirm that the disk is broken. What are my options? I could just replace the disk, but because this situation is very similar to another more severe situation I have (long story), I'd like to actually make a proper diagnosis as opposed to randomly binning hardware. The issue and history is this: I had a Debian Linux PC (500 MHz P3) acting as router, nagios and munin. It crashed every couple of weeks. No logs or dmesg could be obtained (because it's an old Compaq that only boots when you configure it as keyboardless, making connecting a keyboard later, once it's booted, impossible). At the time, I just replaced the computer with another Compaq (P4 2.4 GHz) because I thought the hardware was faulty. However, it still crashed every couple of weeks. the difference is that on this computer, I can still SSH into it. It gives all kinds of errors on hda. I'd like to confirm that the disk is broken, but nothing I do confirms this: SMART error logs shows no errors. Normally when a disk starts acting up, SMART my pass, but it still records a read-error in the error log. SMART self-test (smartctl -t long /dev/sda) completes without errors. re-allocated sector count (a tell-tale parameter) has been 31 all its life, even when the disk was still in use in my desktop PC years ago, and it still is. The figure never changed. dd if=/dev/sda of=/dev/null bs=4096 passes with flying colors. What else can I do to assess the health of the drive? Again, this is not about making this router fully functional again, this is a disk forensic question, because it just so happens that I have another server that potentially has the same problem, and knowing the answer to this will possibly help me greatly. For the record, below are logs and such. This is the smartctl -a output: smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family Device Model: ST3120026A Serial Number: 5JT1CLQM Firmware Version: 3.06 User Capacity: 120,034,123,776 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2 Local Time is: Mon Jul 1 21:18:33 2013 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 24) The self-test routine was aborted by the host. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 85) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 050 046 006 Pre-fail Always - 47766662 3 Spin_Up_Time 0x0003 097 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 31 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 820305 9 Power_On_Hours 0x0032 048 048 000 Old_age Always - 46373 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 605 194 Temperature_Celsius 0x0022 036 065 000 Old_age Always - 36 195 Hardware_ECC_Recovered 0x001a 050 046 000 Old_age Always - 47766662 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 196 000 Old_age Always - 6 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Aborted by host 80% 46361 - # 2 Extended offline Completed without error 00% 46358 - # 3 Short offline Completed without error 00% 12046 - # 4 Extended offline Completed without error 00% 10472 - # 5 Short offline Completed without error 00% 10471 - # 6 Short offline Completed without error 00% 10471 - # 7 Short offline Completed without error 00% 6770 - # 8 Extended offline Aborted by host 90% 5958 - # 9 Extended offline Aborted by host 90% 5951 - #10 Short offline Completed without error 00% 5024 - #11 Extended offline Aborted by host 80% 5024 - #12 Short offline Completed without error 00% 3697 - #13 Short offline Completed without error 00% 237 - #14 Short offline Completed without error 00% 145 - #15 Short offline Completed without error 00% 69 - #16 Extended offline Completed without error 00% 68 - #17 Short offline Completed without error 00% 66 - #18 Short offline Completed without error 00% 49 - #19 Short offline Completed without error 00% 29 - #20 Short offline Completed without error 00% 29 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. And this is the dmesg error when it has crashed (which repeats for a bunch of different sectors): [1755091.211136] sd 0:0:0:0: [sda] Unhandled error code [1755091.211144] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [1755091.211151] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 08 fe ad 38 00 00 08 00 [1755091.211166] end_request: I/O error, dev sda, sector 150908216

Read the article

Skipping scheduled self-tests and predicting drive EOL

- by Steve Madsen

For a few weeks now, smartd has been reporting that it is skipping some of its scheduled self-tests on the weekends: Apr 24 18:29:32 calvin smartd[4758]: Device: /dev/sda, skip scheduled Offline Immediate Test; 40% remaining of current Self-Test. Apr 24 18:29:33 calvin smartd[4758]: Device: /dev/sdb, skip scheduled Offline Immediate Test; 50% remaining of current Self-Test. The drives in this RAID-1 array are set to run an offline test four times a day, a short self-test at 2am every day, and a long self-test on Saturdays at 2am. For some reason, it looks like the long self-test is taking longer, causing the other scheduled tests to be skipped. First question: is this a sign of likely drive failure? Then today, smartd reported that a self-test failed. Here is the output of smartctl -a /dev/sdb: smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.8 family Device Model: ST3250823AS Serial Number: 3ND1GNBC Firmware Version: 3.03 User Capacity: 250,059,350,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sun Apr 25 13:15:34 2010 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 84) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 047 039 006 Pre-fail Always - 168450357 3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 33 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 9 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 654745480 9 Power_On_Hours 0x0032 055 055 000 Old_age Always - 40141 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 51 194 Temperature_Celsius 0x0022 037 062 000 Old_age Always - 37 (0 17 0 0) 195 Hardware_ECC_Recovered 0x001a 047 039 000 Old_age Always - 168450357 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 40131 - # 2 Extended offline Completed: read failure 30% 40129 379795511 # 3 Short offline Completed without error 00% 40084 - # 4 Short offline Completed without error 00% 40060 - # 5 Short offline Completed without error 00% 40036 - # 6 Short offline Completed without error 00% 40013 - # 7 Short offline Completed without error 00% 39990 - # 8 Extended offline Completed without error 00% 39977 - # 9 Short offline Completed without error 00% 39919 - #10 Short offline Completed without error 00% 39895 - #11 Short offline Completed without error 00% 39872 - #12 Short offline Completed without error 00% 39848 - #13 Short offline Completed without error 00% 39824 - #14 Short offline Completed without error 00% 39801 - #15 Extended offline Completed without error 00% 39789 - #16 Short offline Completed without error 00% 39754 - #17 Short offline Completed without error 00% 39732 - #18 Short offline Completed without error 00% 39707 - #19 Short offline Completed without error 00% 39683 - #20 Short offline Completed without error 00% 39660 - #21 Short offline Completed without error 00% 39636 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Given that this drive is about 4.5 years old, I am probably tempting fate by keeping it in service. SMART doesn't seem to get much respect as a reliable way to predict drive failure. What else can I use to get an early indication of drive failure?

Read the article

How do I make Linux recognize a new SATA /dev/sda drive I hot swapped in without rebooting?

- by Philip Durbin

Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized: [root@fs-2 ~]# tail -18 /var/log/messages May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake } May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset May 5 16:54:45 fs-2 kernel: ata1: soft resetting link May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:54:55 fs-2 kernel: ata1: soft resetting link May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:05 fs-2 kernel: ata1: soft resetting link May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0) May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps May 5 16:55:40 fs-2 kernel: ata1: soft resetting link May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16) May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up May 5 16:55:45 fs-2 kernel: ata1: EH complete I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work: [root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan -bash: echo: write error: Invalid argument [root@fs-2 ~]# [root@fs-2 ~]# /root/rescan-scsi-bus.sh -l [snip] 0 new device(s) found. 0 device(s) removed. [root@fs-2 ~]# [root@fs-2 ~]# ls /dev/sda ls: /dev/sda: No such file or directory I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting? The operating system in question is RHEL5.3: [root@fs-2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.3 (Tikanga) The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS. Here is the lscpi output: [root@fs-2 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful. The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output: [root@www-1 ~]# lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02) 04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)

Read the article

e2fsck extremly slow, although enough memory exists

- by kaefert

I've got this external USB-Disk: kaefert@blechmobil:~$ lsusb -s 2:3 Bus 002 Device 003: ID 0bc2:3320 Seagate RSS LLC As can be seen in this dmesg output, there are some problems that prevents that disk from beeing mounted: kaefert@blechmobil:~$ dmesg | grep sdb [ 114.474342] sd 5:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) [ 114.475089] sd 5:0:0:0: [sdb] Write Protect is off [ 114.475092] sd 5:0:0:0: [sdb] Mode Sense: 43 00 00 00 [ 114.475959] sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 114.477093] sd 5:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) [ 114.501649] sdb: sdb1 [ 114.502717] sd 5:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) [ 114.504354] sd 5:0:0:0: [sdb] Attached SCSI disk [ 116.804408] EXT4-fs (sdb1): ext4_check_descriptors: Checksum for group 3976 failed (47397!=61519) [ 116.804413] EXT4-fs (sdb1): group descriptors corrupted! So I went and fired up my favorite partition manager - gparted, and told it to verify and repair the partition sdb1. This made gparted call e2fsck (version 1.42.4 (12-Jun-2012)) e2fsck -f -y -v /dev/sdb1 Although gparted called e2fsck with the "-v" option, sadly it doesn't show me the output of my e2fsck process (bugreport https://bugzilla.gnome.org/show_bug.cgi?id=467925 ) I started this whole thing on Sunday (2012-11-04_2200) evening, so about 48 hours ago, this is what htop says about it now (2012-11-06-1900): PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 3704 root 39 19 1560M 1166M 768 R 98.0 19.5 42h56:43 e2fsck -f -y -v /dev/sdb1 Now I found a few posts on the internet that discuss e2fsck running slow, for example: http://gparted-forum.surf4.info/viewtopic.php?id=13613 where they write that its a good idea to see if the disk is just that slow because maybe its damaged, and I think these outputs tell me that this is not the case in my case: kaefert@blechmobil:~$ sudo hdparm -tT /dev/sdb /dev/sdb: Timing cached reads: 3562 MB in 2.00 seconds = 1783.29 MB/sec Timing buffered disk reads: 82 MB in 3.01 seconds = 27.26 MB/sec kaefert@blechmobil:~$ sudo hdparm /dev/sdb /dev/sdb: multcount = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 364801/255/63, sectors = 5860533160, start = 0 However, although I can read quickly from that disk, this disk speed doesn't seem to be used by e2fsck, considering tools like gkrellm or iotop or this: kaefert@blechmobil:~$ iostat -x Linux 3.2.0-2-amd64 (blechmobil) 2012-11-06 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 14,24 47,81 14,63 0,95 0,00 22,37 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,59 8,29 2,42 5,14 43,17 160,17 53,75 0,30 39,80 8,72 54,42 3,95 2,99 sdb 137,54 5,48 9,23 0,20 587,07 22,73 129,35 0,07 7,70 7,51 16,18 2,17 2,04 Now I researched a little bit on how to find out what e2fsck is doing with all that processor time, and I found the tool strace, which gives me this: kaefert@blechmobil:~$ sudo strace -p3704 lseek(4, 41026998272, SEEK_SET) = 41026998272 write(4, "\212\354K[_\361\3nl\212\245\352\255jR\303\354\312Yv\334p\253r\217\265\3567\325\257\3766"..., 4096) = 4096 lseek(4, 48404766720, SEEK_SET) = 48404766720 read(4, "\7t\260\366\346\337\304\210\33\267j\35\377'\31f\372\252\ffU\317.y\211\360\36\240c\30`\34"..., 4096) = 4096 lseek(4, 41027002368, SEEK_SET) = 41027002368 write(4, "\232]7Ws\321\352\t\1@[+5\263\334\276{\343zZx\352\21\316`1\271[\202\350R`"..., 4096) = 4096 lseek(4, 48404770816, SEEK_SET) = 48404770816 read(4, "\17\362r\230\327\25\346//\210H\v\311\3237\323K\304\306\361a\223\311\324\272?\213\tq \370\24"..., 4096) = 4096 lseek(4, 41027006464, SEEK_SET) = 41027006464 write(4, "\367yy>x\216?=\324Z\305\351\376&\25\244\210\271\22\306}\276\237\370(\214\205G\262\360\257#"..., 4096) = 4096 lseek(4, 48404774912, SEEK_SET) = 48404774912 read(4, "\365\25\0\21|T\0\21}3t_\272\373\222k\r\177\303\1\201\261\221$\261B\232\3142\21U\316"..., 4096) = 4096 ^CProcess 3704 detached around 16 of these lines every second, so 4 read and 4 write operations every second, which I don't consider to be a lot.. And finally, my question: Will this process ever finish? If those numbers from fseek (48404774912) represent bytes, that would be something like 45 gigabytes, with this beeing a 3 terrabyte disk, which would give me 134 days to go, if the speed stays constant, and he scans the disk like this completly and only once. Do you have some advice for me? I have most of the data on that disk elsewhere, but I've put a lot of hours into sorting and merging it to this disk, so I would prefer to getting this disk up and running again, without formatting it anew. I don't think that the hardware is damaged since the disk is only a few months and since I can't see any I/O errors in the dmesg output. UPDATE: I just looked at the strace output again (2012-11-06_2300), now it looks like this: lseek(4, 1419860611072, SEEK_SET) = 1419860611072 read(4, "3#\f\2447\335\0\22A\355\374\276j\204'\207|\217V|\23\245[\7VP\251\242\276\207\317:"..., 4096) = 4096 lseek(4, 43018145792, SEEK_SET) = 43018145792 write(4, "]\206\231\342Y\204-2I\362\242\344\6R\205\361\324\177\265\317C\334V\324\260\334\275t=\10F."..., 4096) = 4096 lseek(4, 1419860615168, SEEK_SET) = 1419860615168 read(4, "\262\305\314Y\367\37x\326\245\226\226\320N\333$s\34\204\311\222\7\315\236\336\300TK\337\264\236\211n"..., 4096) = 4096 lseek(4, 43018149888, SEEK_SET) = 43018149888 write(4, "\271\224m\311\224\25!I\376\16;\377\0\223H\25Yd\201Y\342\r\203\271\24eG<\202{\373V"..., 4096) = 4096 lseek(4, 1419860619264, SEEK_SET) = 1419860619264 read(4, ";d\360\177\n\346\253\210\222|\250\352T\335M\33\260\320\261\7g\222P\344H?t\240\20\2548\310"..., 4096) = 4096 lseek(4, 43018153984, SEEK_SET) = 43018153984 write(4, "\360\252j\317\310\251G\227\335{\214`\341\267\31Y\202\360\v\374\307oq\3063\217Z\223\313\36D\211"..., 4096) = 4096 So this number of the lseeks before the reads, like 1419860619264 are already a lot bigger, standing for 1.29 terabytes if the numbers are bytes, so it doesn't seem to be a linear progress on a big scale, maybe there are only some areas that need work, that have big gaps in between them. (times are in CET)

Read the article

Drivers for NVIDIA 520M not working in Ubuntu 12.04

- by Don

I am aware that this is nominally a duplicate question, however I've read the other questions and haven't been able to resolve my problem after many hours and attempts, so please don't delete it. Additionally, it seems like many answers to the other questions are specifically dependent on certain situations. My situation being different from the others I found represented, here's my question. Until last night, I had Ubuntu 12.04 installed with Wubi, and it ran ok, though slowly and with occasional hangs. So I partitioned the drive and installed 12.04 in its own partition. Now when I start it, I am stuck using 2D. I believe this is an NVIDIA bug. My NVIDIA card is a GT 520M and my machine has Optimus. Additional Drivers only displays my wireless driver. Going to System Settings Details Graphics shows Driver:Unknown, Experience:Standard. I downloaded the driver from the NVIDIA website, and ran the installer with no errors, except that the "distribution-provided pre-install script failed". After rebooting, my screen was stuck at 640X480, which was fixed by editing /etc/X11/xorg.conf However, I still was stuck in 2D, and nothing else had changed either. A thread suggested something called Bumblebee. I tried that, and when I ran optirun firefoxI got a frozen blank screen. Following another suggestion, I checked the BIOS to try and disable Optimus. I found and ran myriad other commands to try and fix the problem and nothing changed. Now I have just done a clean re-install of Ubuntu. From there, I: Installed all the updates Downloaded the NVIDIA driver Installed it Got screen stuck at 640X480, fixed in xorg.conf. To recap the problem: I can't get the NVIDIA drivers working I am stuck using 2D I'm an idiot I think if the first one is solved, the solution to the second will naturally follow. If you need me to provide any other information, I'd be happy to. From what I've seen in other threads, I think this information may help: lsmod: dh@donsMachine:~$ lsmod Module Size Used by nvidia 12353161 0 snd_hda_codec_hdmi 32474 1 snd_hda_codec_realtek 223867 1 joydev 17693 0 parport_pc 32866 0 ppdev 17113 0 rfcomm 47604 0 bnep 18281 2 bluetooth 180104 10 rfcomm,bnep snd_hda_intel 33773 3 snd_hda_codec 127706 3 snd_hda_codec_hdmi,snd_hda_codec_realtek,snd_hda_intel snd_hwdep 13668 1 snd_hda_codec snd_pcm 97188 3 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec uvcvideo 72627 0 videodev 98259 1 uvcvideo v4l2_compat_ioctl32 17128 1 videodev snd_seq_midi 13324 0 snd_rawmidi 30748 1 snd_seq_midi snd_seq_midi_event 14899 1 snd_seq_midi snd_seq 61896 2 snd_seq_midi,snd_seq_midi_event lib80211_crypt_tkip 17390 0 wl 2568210 0 lib80211 14381 2 lib80211_crypt_tkip,wl snd_timer 29990 2 snd_pcm,snd_seq snd_seq_device 14540 3 snd_seq_midi,snd_rawmidi,snd_seq snd 78855 16 snd_hda_codec_hdmi,snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_rawmidi,snd_seq,snd_timer,snd_seq_device psmouse 87692 0 serio_raw 13211 0 i915 468745 2 soundcore 15091 1 snd snd_page_alloc 18529 2 snd_hda_intel,snd_pcm drm_kms_helper 46978 1 i915 drm 242038 3 i915,drm_kms_helper mei 41616 0 i2c_algo_bit 13423 1 i915 mxm_wmi 12979 0 acer_wmi 28418 0 sparse_keymap 13890 1 acer_wmi video 19596 1 i915 wmi 19256 2 mxm_wmi,acer_wmi mac_hid 13253 0 lp 17799 0 parport 46562 3 parport_pc,ppdev,lp tg3 152032 0 sdhci_pci 18826 0 sdhci 33205 1 sdhci_pci lspci -nn | grep VGA dh@donsMachine:~$ lspci -nn | grep VGA 00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0116] (rev 09) 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:0df7] (rev a1) lshw dh@donsMachine:~$ sudo lshw [sudo] password for dh: donsmachine description: Notebook product: EasyNote TS44HR () vendor: Packard Bell version: V1.12 serial: LXBWZ02017134209D71601 width: 64 bits capabilities: smbios-2.7 dmi-2.7 vsyscall32 configuration: boot=normal chassis=notebook uuid=16FE576B-CA15-11E0-B096-B870F4E51243 *-core description: Motherboard product: SJV50_HR vendor: Packard Bell physical id: 0 version: Base Board Version serial: Base Board Serial Number slot: Base Board Chassis Location *-firmware description: BIOS vendor: Packard Bell physical id: 0 version: V1.12 date: 07/11/2011 size: 1MiB capacity: 2496KiB capabilities: pci upgrade shadowing cdboot bootselect edd int13floppynec int13floppytoshiba int13floppy360 int13floppy1200 int13floppy720 int13floppy2880 int9keyboard int10video acpi usb biosbootspecification *-memory description: System Memory physical id: 1b slot: System board or motherboard size: 4GiB *-bank:0 description: SODIMM DDR3 Synchronous 1333 MHz (0.8 ns) product: NT2GC64B88B0NS-CG vendor: Nanya Technology physical id: 0 serial: 598E126E slot: ChannelA-DIMM0 size: 2GiB width: 64 bits clock: 1333MHz (0.8ns) *-bank:1 description: DIMM [empty] physical id: 1 slot: ChannelA-DIMM1 *-bank:2 description: SODIMM DDR3 Synchronous 1333 MHz (0.8 ns) product: NT2GC64B88B0NS-CG vendor: Nanya Technology physical id: 2 serial: 159E126C slot: ChannelB-DIMM0 size: 2GiB width: 64 bits clock: 1333MHz (0.8ns) *-bank:3 description: DIMM [empty] physical id: 3 slot: ChannelB-DIMM1 *-cpu description: CPU product: Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz vendor: Intel Corp. physical id: 2e bus info: cpu@0 version: Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz slot: CPU1 size: 2GHz capacity: 4GHz width: 64 bits clock: 1333MHz capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm arat epb xsaveopt pln pts tpr_shadow vnmi flexpriority ept vpid cpufreq configuration: cores=2 enabledcores=2 threads=4 *-cache:0 description: L1 cache physical id: 30 slot: L1 Cache size: 32KiB capacity: 32KiB capabilities: synchronous internal write-through instruction *-cache:1 description: L2 cache physical id: 31 slot: L2 Cache size: 256KiB capacity: 256KiB capabilities: synchronous internal write-through unified *-cache:2 description: L3 cache physical id: 32 slot: L3 Cache size: 3MiB capacity: 3MiB capabilities: synchronous internal write-through unified *-cache description: L1 cache physical id: 2f slot: L1 Cache size: 32KiB capacity: 32KiB capabilities: synchronous internal write-through data *-pci description: Host bridge product: 2nd Generation Core Processor Family DRAM Controller vendor: Intel Corporation physical id: 100 bus info: pci@0000:00:00.0 version: 09 width: 32 bits clock: 33MHz configuration: driver=agpgart-intel resources: irq:0 *-pci:0 description: PCI bridge product: Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port vendor: Intel Corporation physical id: 1 bus info: pci@0000:00:01.0 version: 09 width: 32 bits clock: 33MHz capabilities: pci pm msi pciexpress normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:40 ioport:2000(size=4096) memory:d0000000-d10fffff ioport:a0000000(size=301989888) *-display description: VGA compatible controller product: NVIDIA Corporation vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller bus_master cap_list rom configuration: driver=nvidia latency=0 resources: irq:16 memory:d0000000-d0ffffff memory:a0000000-afffffff memory:b0000000-b1ffffff ioport:2000(size=128) memory:d1000000-d107ffff *-display description: VGA compatible controller product: 2nd Generation Core Processor Family Integrated Graphics Controller vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 09 width: 64 bits clock: 33MHz capabilities: msi pm vga_controller bus_master cap_list rom configuration: driver=i915 latency=0 resources: irq:43 memory:d1400000-d17fffff memory:c0000000-cfffffff ioport:3000(size=64) *-communication description: Communication controller product: 6 Series/C200 Series Chipset Family MEI Controller #1 vendor: Intel Corporation physical id: 16 bus info: pci@0000:00:16.0 version: 04 width: 64 bits clock: 33MHz capabilities: pm msi bus_master cap_list configuration: driver=mei latency=0 resources: irq:42 memory:d1a04000-d1a0400f *-usb:0 description: USB controller product: 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 vendor: Intel Corporation physical id: 1a bus info: pci@0000:00:1a.0 version: 04 width: 32 bits clock: 33MHz capabilities: pm debug ehci bus_master cap_list configuration: driver=ehci_hcd latency=0 resources: irq:16 memory:d1a0a000-d1a0a3ff *-multimedia description: Audio device product: 6 Series/C200 Series Chipset Family High Definition Audio Controller vendor: Intel Corporation physical id: 1b bus info: pci@0000:00:1b.0 version: 04 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: driver=snd_hda_intel latency=0 resources: irq:44 memory:d1a00000-d1a03fff *-pci:1 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 1 vendor: Intel Corporation physical id: 1c bus info: pci@0000:00:1c.0 version: b4 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:17 memory:9fb00000-9fbfffff ioport:d1800000(size=1048576) *-network description: Ethernet interface product: NetLink BCM57785 Gigabit Ethernet PCIe vendor: Broadcom Corporation physical id: 0 bus info: pci@0000:02:00.0 logical name: eth0 version: 10 serial: b8:70:f4:e5:12:43 capacity: 1Gbit/s width: 64 bits clock: 33MHz capabilities: pm msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=tg3 driverversion=3.121 firmware=sb latency=0 link=no multicast=yes port=twisted pair resources: irq:16 memory:d1830000-d183ffff memory:d1840000-d184ffff memory:d1850000-d18507ff *-generic:0 description: SD Host controller product: NetXtreme BCM57765 Memory Card Reader vendor: Broadcom Corporation physical id: 0.1 bus info: pci@0000:02:00.1 version: 10 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: driver=sdhci-pci latency=0 resources: irq:17 memory:d1800000-d180ffff *-generic:1 UNCLAIMED description: System peripheral product: Broadcom Corporation vendor: Broadcom Corporation physical id: 0.2 bus info: pci@0000:02:00.2 version: 10 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: latency=0 resources: memory:d1810000-d181ffff *-generic:2 UNCLAIMED description: System peripheral product: Broadcom Corporation vendor: Broadcom Corporation physical id: 0.3 bus info: pci@0000:02:00.3 version: 10 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: latency=0 resources: memory:d1820000-d182ffff *-pci:2 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 2 vendor: Intel Corporation physical id: 1c.1 bus info: pci@0000:00:1c.1 version: b4 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:16 memory:d1900000-d19fffff *-network description: Wireless interface product: BCM43225 802.11b/g/n vendor: Broadcom Corporation physical id: 0 bus info: pci@0000:03:00.0 logical name: eth1 version: 01 serial: 68:a3:c4:44:81:96 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list ethernet physical wireless configuration: broadcast=yes driver=wl0 driverversion=5.100.82.38 ip=192.168.0.12 latency=0 multicast=yes wireless=IEEE 802.11bgn resources: irq:17 memory:d1900000-d1903fff *-usb:1 description: USB controller product: 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 vendor: Intel Corporation physical id: 1d bus info: pci@0000:00:1d.0 version: 04 width: 32 bits clock: 33MHz capabilities: pm debug ehci bus_master cap_list configuration: driver=ehci_hcd latency=0 resources: irq:23 memory:d1a09000-d1a093ff *-isa description: ISA bridge product: HM65 Express Chipset Family LPC Controller vendor: Intel Corporation physical id: 1f bus info: pci@0000:00:1f.0 version: 04 width: 32 bits clock: 33MHz capabilities: isa bus_master cap_list configuration: latency=0 *-storage description: SATA controller product: 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller vendor: Intel Corporation physical id: 1f.2 bus info: pci@0000:00:1f.2 logical name: scsi0 logical name: scsi1 version: 04 width: 32 bits clock: 66MHz capabilities: storage msi pm ahci_1.0 bus_master cap_list emulated configuration: driver=ahci latency=0 resources: irq:41 ioport:3098(size=8) ioport:30bc(size=4) ioport:3090(size=8) ioport:30b8(size=4) ioport:3060(size=32) memory:d1a08000-d1a087ff *-disk description: ATA Disk product: ST9500325AS vendor: Seagate physical id: 0 bus info: scsi@0:0.0.0 logical name: /dev/sda version: 0001 serial: S2W1AMSX size: 465GiB (500GB) capabilities: partitioned partitioned:dos configuration: ansiversion=5 signature=a45f21e9 *-volume:0 description: Windows NTFS volume physical id: 1 bus info: scsi@0:0.0.0,1 logical name: /dev/sda1 version: 3.1 serial: 46aa-2a25 size: 19GiB capacity: 20GiB capabilities: primary ntfs initialized configuration: clustersize=4096 created=2011-08-25 21:32:00 filesystem=ntfs label=PQSERVICE state=clean *-volume:1 description: Windows NTFS volume physical id: 2 bus info: scsi@0:0.0.0,2 logical name: /dev/sda2 version: 3.1 serial: 10aa-ad1a size: 98MiB capacity: 100MiB capabilities: primary bootable ntfs initialized configuration: clustersize=4096 created=2011-08-25 21:32:03 filesystem=ntfs label=SYSTEM RESERVED state=clean *-volume:2 description: Windows NTFS volume physical id: 3 bus info: scsi@0:0.0.0,3 logical name: /dev/sda3 version: 3.1 serial: 668c5afc-182e-ff4b-b084-3cc09f54972d size: 395GiB capacity: 395GiB capabilities: primary ntfs initialized configuration: clustersize=4096 created=2011-08-25 21:32:03 filesystem=ntfs label=Don's Machine state=clean *-volume:3 description: Extended partition physical id: 4 bus info: scsi@0:0.0.0,4 logical name: /dev/sda4 size: 49GiB capacity: 49GiB capabilities: primary extended partitioned partitioned:extended *-logicalvolume:0 description: Linux swap / Solaris partition physical id: 5 logical name: /dev/sda5 capacity: 3945MiB capabilities: nofs *-logicalvolume:1 description: Linux filesystem partition physical id: 6 logical name: /dev/sda6 logical name: / capacity: 46GiB configuration: mount.fstype=ext4 mount.options=rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered state=mounted *-cdrom description: DVD-RAM writer product: DVD-RW DVRTD11RS vendor: PIONEER physical id: 1 bus info: scsi@1:0.0.0 logical name: /dev/cdrom logical name: /dev/cdrw logical name: /dev/dvd logical name: /dev/dvdrw logical name: /dev/sr0 version: 1.01 capabilities: removable audio cd-r cd-rw dvd dvd-r dvd-ram configuration: ansiversion=5 status=nodisc *-serial UNCLAIMED description: SMBus product: 6 Series/C200 Series Chipset Family SMBus Controller vendor: Intel Corporation physical id: 1f.3 bus info: pci@0000:00:1f.3 version: 04 width: 64 bits clock: 33MHz configuration: latency=0 resources: memory:d1a06000-d1a060ff ioport:3040(size=32) *-power UNCLAIMED description: OEM_Define1 product: OEM_Define5 vendor: OEM_Define2 physical id: 1 version: OEM_Define6 serial: OEM_Define3 capacity: 75mWh *-battery description: Lithium Ion Battery product: CRB Battery 0 vendor: -Virtual Battery 0- physical id: 2 version: 10/12/2007 serial: Battery 0 slot: Fake

Read the article

2-Bay External HDD Enclosure in JBOD mode fails to detect both drives (Linux & Windows)

- by mgc8888

I recently purchased a couple of USB 3.0 External HDD Enclosures to use for storage and backup; the idea was to have one act as backup to the other, with 4 x 3TB drives in total. However, the second drive in each is not accessible in either Linux nor Windows, and I could not determine the reason. 1. Situation The two enclosures are slightly different (couldn't find them in stock at the same time) yet from many little details appear to be the same Chinese base design with a tweaked outer shell. The models are: Sharkoon 2-Bay RAID Box Fantec MR-35DU3 The drives are Seagate 3TB Barracuda ST33000651AS, firmware CC44, all identical. From reading manuals and online sources, I determined that JBOD would be the optimal setup for my needs -- addressing the two drives separately in each enclosure would be important, making it easy to swap drives and mix&match them if needed; all the other modes implied the controller doing a combination of the drives. The software used was Debian GNU/Linux - testing/wheezy - kernel 2.6.39-2 and Windows 7 Ultimate. 2. Description of the problem Now, here comes the problem: every time I connect either of the enclosures to a PC using the supplied cable (tried a different one as well), only the HDD in the top bay is readable, the one below is detected yet errors out in various ways. According to the manuals, it should not happen: in JBOD, the system should be able to "see" two separate drives upon connection. This happens with both enclosures and any combination of HDDs (i.e. if I swap them, the same thing happens), so the HDDs are good and I think so are the enclosures (two different companies making similar products that failed in an identical fashion would be very unlikely). The top HDD can be used fine every time, I actually tried a speed test from Linux and got about 150MiB/s reads, so all is working as it should; the one below refuses to work every time. So the failure is consistent. To make sure this was not some obscure Linux bug, I tried the same under Windows 7, and the system also only created one drive letter for a drive of 3TB size (so it was only seeing one instead of both). Placing an older, known good, 2TB drive in the top bay made that the one recognised, so we have the same issue under Windows as well. Log entries under Linux (tested here with a 3TB and a 2TB drive so I could differentiate them; either one works in the top enclosure, in the test setup the 3TB one is on top). You can see them being detected, the top one is ok, but for the bottom one only errors: Jul 19 23:28:15 media kernel: [260150.582436] usb 6-1: New USB device found, idVendor=1ca1, idProduct=18ae Jul 19 23:28:15 media kernel: [260150.582440] usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Jul 19 23:28:15 media kernel: [260150.582442] usb 6-1: Product: Usb Sata Bridge Jul 19 23:28:15 media kernel: [260150.582444] usb 6-1: Manufacturer: SYMWAVE Jul 19 23:28:15 media kernel: [260150.582446] usb 6-1: SerialNumber: 39584B304C4E3441 Jul 19 23:28:15 media kernel: [260150.870412] scsi11 : usb-storage 6-1:1.0 Jul 19 23:28:16 media kernel: [260151.882087] scsi 11:0:0:0: Direct-Access SYMWAVE ST33000651AS CC44 PQ: 0 ANSI: 4 Jul 19 23:28:16 media kernel: [260151.882242] scsi 11:0:0:1: Direct-Access SYMWAVE ST32000641AS CC12 PQ: 0 ANSI: 4 Jul 19 23:28:16 media kernel: [260151.882677] sd 11:0:0:0: Attached scsi generic sg2 type 0 Jul 19 23:28:16 media kernel: [260151.882774] sd 11:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16). Jul 19 23:28:16 media kernel: [260151.882857] sd 11:0:0:1: Attached scsi generic sg3 type 0 Jul 19 23:28:16 media kernel: [260151.882893] sd 11:0:0:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) Jul 19 23:28:16 media kernel: [260151.883085] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.883582] sd 11:0:0:0: [sdb] Write Protect is off Jul 19 23:28:16 media kernel: [260151.883961] sd 11:0:0:1: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) Jul 19 23:28:16 media kernel: [260151.884145] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.884570] sd 11:0:0:1: [sdc] Write Protect is off Jul 19 23:28:16 media kernel: [260151.884855] sd 11:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16). Jul 19 23:28:16 media kernel: [260151.885286] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.885807] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.909595] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.910159] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.910163] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.910167] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.910169] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.910172] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 Jul 19 23:28:16 media kernel: [260151.910182] quiet_error: 2 callbacks suppressed Jul 19 23:28:16 media kernel: [260151.910570] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.911153] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.911156] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.911159] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.911161] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.911164] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 Jul 19 23:28:16 media kernel: [260151.911385] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.911902] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.911905] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.911908] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.911910] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.911913] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 Jul 19 23:28:16 media kernel: [260151.912128] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.912650] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.912653] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.912656] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.912657] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.912660] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 Jul 19 23:28:16 media kernel: [260151.912876] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.913439] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.913442] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.913445] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.913446] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.913449] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 Jul 19 23:28:16 media kernel: [260151.945227] xhci_hcd 0000:03:00.0: WARN: Stalled endpoint Jul 19 23:28:16 media kernel: [260151.945863] sd 11:0:0:1: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 19 23:28:16 media kernel: [260151.945866] sd 11:0:0:1: [sdc] Sense Key : Illegal Request [current] Jul 19 23:28:16 media kernel: [260151.945870] Info fld=0x0 Jul 19 23:28:16 media kernel: [260151.945871] sd 11:0:0:1: [sdc] Add. Sense: Invalid field in cdb Jul 19 23:28:16 media kernel: [260151.945875] sd 11:0:0:1: [sdc] CDB: Read(10): 28 20 00 00 00 00 00 00 08 00 (...) and so on for like 10 seconds until it gives up (...) 3. Question So, my question would be: what is causing this? Am I missing something, should I configure things differently, is this a known limitation? Searching online for more information did not yield any useful results... Thank you in advance for any help!

Read the article

webserver horrible slow, sometimes incredible fast

- by dhanke

i am running a small community ( 6000+ Members ) on a non-virtual 64-bit ubuntu 11.04 system. I am not a Linux-pro, not even advanced, i just tried to setup a webserver, which does nothing special actually. Delivering some dynamic PHP and RoR websites is its task. So it might be that my configuration files do look horrible bad. Also, i might use the wrong vocabulary, so in doubt, please ask. Having a current all-time record of 520 registered users (board-accounts, no system-users) online at same time, average server-load is about 2.0 - 5.0. Meantime (~250 users) average server load value is at about 0.4 - 0.8, sometimes, on some expensive searches a bit higher. everything fine. From time to time however, the load increases up to 120 (120.0, not 12.0 ;) ). In this time, its hard to even connect via SSH, but when i reach the server, and use top/htop/iotop to see whats happening, i cannot identify any process causing high CPU load. iotop tells me about a current reading/writing speed of about approx. 70kb/s, which is quite equal to power-off i think. Memory-Usage is max. at ~ 12GB of 16GB, so swap remains empty. now the odd (at least for me:) waiting some minutes ( since i always get a bit into a panic when this happens, it feels like 5 minutes, but i suppose its more like 20-30 minutes) and the server is back to normal. everything continues as normal. another odd fact: when i run hdparm -tT /dev/sda, i get answer like: /dev/sda: Timing cached reads: 7180 MB in 2.00 seconds = 3591.13 MB/sec Timing buffered disk reads: 348 MB in 3.02 seconds = 115.41 MB/sec when i run the same command while the server is "frozen", the answer is like /dev/sda: <- takes about 5 minutes until this line appears Timing cached reads: 7180 MB in 2.00 seconds = 3591.13 MB/sec <- 5 more minutes Timing buffered disk reads: 348 MB in 3.02 seconds = 115.41 MB/sec <- another 5 minutes so the values are the same, but the quoted time is completely wrong. using time command as prefix also tells me that ~ 15 minutes were used. I searched in dmesg, /var/log/[messages|syslog] - nothing found. /var/log/errors however tells me that: Jul 4 20:28:30 localhost kernel: [19080.671415] INFO: task php5-fpm:27728 blocked for more than 120 seconds. Jul 4 20:28:30 localhost kernel: [19080.671419] "echo 0 /proc/sys/kernel/hung_task_timeout_secs" disables this message. multiple times. now that message does tell me that php5-fpm task was blocked or did block ? - but not if that is the cause or just one of the results of that "freeze". Anyone? to cut the long story short, i dont know where even to start analyzing. So if you can give me any advice by looking at following specs and configs, or ask me to provide more information, i`d be glad. Specs: 6 Core AMD Phenom(tm) II X6 1055T Processor * 16 Gigabyte Ram 2x 1.5 TB Seagate ST1500DL003-9VT16L via SATA 3 via SoftwareRaid (i suppose) Services: (due to service --status-all, those with [ + ]) nginx Webserver 1.0.14 mySQL 5.1.63 Server Ruby on Rails 2.3.11 ( passenger-nginx-module ) php5-fpm 5.3.6-13ubuntu3.7 SSH ido2db Further services: default crontab + nightly backup. syslog-ng Website consists of 2 subdomains, forum. and www. where forum is a phpBB3.x PHP-Board, and www a Ruby on Rails 2.3.11 application (portal). Mini-Note: sometimes i notice that the forum is pretty slow, in contrast to the always-fast (except for this "freeze") portal. Both share the same Database, but the portal is using it read-only. The Webserver is nginx, using phusion passenger module to communicate with the ruby-application. Also, for the forum it communicates with php5-fpm via socket: relevant nginx configuration parts ( with comments/questions starting by ; ) ; in case of freeze due to too high Filesystem activity, maybe adding a limit? #worker_rlimit_nofile 50000; user www-data; ; 6 cores, so i read 6 fits. maybe already wrong? worker_processes 6; pid /var/run/nginx.pid; events { worker_connections 1024; } http { passenger_root /var/lib/gems/1.8/gems/passenger-3.0.11; passenger_ruby /usr/bin/ruby1.8; ; the forum once featured a chat, which was working w/o websockets. ; so it was a hell of pull requests (deactivated now, freeze still happening) keepalive_timeout 65; keepalive_requests 50; gzip on; server { listen 80; server_name www.domain.tld; root /var/www/domain/rails/public; passenger_enabled on; } server { listen 80; server_name forum.domain.tld; location / { root /var/www/domain/forum; index index.php; } ; satic stuff to be handled by nginx location ~* ^/style/.+.(jpg|jpeg|gif|css|png|js|ico|xml)$ { access_log off; expires 30d; root /var/www/domain/forum/; } ; now the php magic, note the "backend"-fcgi_pass location ~ .php$ { fastcgi_split_path_info ^(.+\.php)(.*)$; fastcgi_pass backend; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME /var/www/domain/forum$fastcgi_script_name; include fastcgi_params; fastcgi_param QUERY_STRING $query_string; fastcgi_param REQUEST_METHOD $request_method; fastcgi_param CONTENT_TYPE $content_type; fastcgi_param CONTENT_LENGTH $content_length; fastcgi_intercept_errors on; fastcgi_ignore_client_abort off; fastcgi_connect_timeout 60; fastcgi_send_timeout 180; fastcgi_read_timeout 180; fastcgi_buffer_size 128k; fastcgi_buffers 256 16k; fastcgi_busy_buffers_size 256k; fastcgi_temp_file_write_size 256k; fastcgi_max_temp_file_size 0; } location ~ /\.ht { deny all; } } ;the php5-fpm socket. i read that /dev/shm/ whould be the fastes place for this. bad idea in general? upstream backend { server unix:/dev/shm/phpfpm; } ... } php5-fpm settings (i changed this values due to php5-fpm error log messages higher and higher.. (freeze-problem was there before as well)* listen = /dev/shm/phpfpm user = www-data group = www-data pm = dynamic ; holy, 4000! well, shinking this value to earth-level gave me ; 100s of 502 bad gateway commands. this values were quite stable. ; since there are only max 520 users online i dont get it, why i would need ; as many children as configured here. due to keep-alive maybe? ; asking questions is easier for me since restarting server will make ; my community-members angry ;) pm.max_children = 4000 pm.start_servers = 100 pm.min_spare_servers = 50 pm.max_spare_servers = 150 pm.max_requests = 10 pm.status_path = /status ping.path = /ping ping.response = pong slowlog = log/$pool.log.slow ;should i use rlimit? ;rlimit_files = 1024 chdir = / mysql/my.cnf [client] port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking bind-address = 127.0.0.1 key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 myisam-recover = BACKUP ; high number, but less gives some phpBB errors. max_connections = 450 table_cache = 512 ; i read twice the cpu cores, bad? thread_concurrency = 12 join_buffer_size = 2084K concurrent_insert = 3 query_cache_limit = 64M query_cache_size = 512M query_cache_type = 1 log_error = /var/log/mysql/error.log log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 2 expire_logs_days = 10 max_binlog_size = 100M low_priority_updates=1 [mysqldump] quick quote-names max_allowed_packet = 16M [isamchk] key_buffer = 16M !includedir /etc/mysql/conf.d/ I used smartctl already, hdds seem to be fine. /proc/mdstatus quotes: Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md3 : active raid1 sda3[1] 1459264192 blocks [2/1] [_U] md1 : active raid1 sda1[0] 3911680 blocks [2/1] [U_] unused devices: ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127727 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 127727 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited I quote some questions in my configuration files, these are not (intentional) directly problem-related, but would be nice for me to know wether they are indeed questionable or done right. One additional Fact: my MYSQL-database is at 12GB size. i dont know if that does matter, but mytop sometimes shows me 4-5 seconds long insert queries, some are 20-30 seconds long. Its just a feeling that i am unable to prove (because i dont know how), but when i disable the database, the freeze seems not to happen. Example: i created a dummy rails application to see the development log. the app made some sql-queries, reads and inserts. the log quite often was like: DbTest Load (0.3ms) SELECT * FROM `db_test` WHERE (`db_test`.`id` = 31722) LIMIT 1 SQL (0.1ms) BEGIN DbTest Update (0.3ms) UPDATE `db_test` SET `updated_at` = '2012-07-04 23:32:34' WHERE `id` = 31722 - now the log stands still for 5-60 seconds. SQL (49.1ms) COMMIT - SQL-Update time in the log does not include freeze time Rendering test/index Completed in 96ms (View: 16, DB: 59) | 200 OK [http://localhost:9000/test] Bad part is: this mini-freeze here only happens from time to time as well. note: meanwhile i cannot even upload files via scp. I currently feel like running form bad to worse and back by googling for my server-problem due to immense lack of knowledge regarding server configurations. It still makes me wonder, why those problems even appear, since 250 users a time is not such a high amount, right? So my questions: whats wrong and how to fix? ;) or: what information can i provide to make the situation more clear? can you point at some critical bad configuration-line which i should consider to catch up in the documentation? are there any tools i can run to see some possible bottlenecks? any further advice? (next to: "pay someone who knows what he does" - its a private project, server costs enough already. :)) Thanks for your time and help. Best Regards, Daniel P.S.: i renamed the configfiles to domain.tld since i dont want to have any % more load to the server until its fixed. might be a exaggeratedly thought.. P.P.S: if i asked a complete duplicate question, sorry. my search results seemed to be quite specific in their own way.

Read the article

PC powers off at random times

- by Timo Huovinen

Short Version After experiencing some problems with Mobo batteries my PC started to power off at random times, the power off is instant and sudden and does not restart afterwards, need help figuring out the cause. Facts: Powers off when PC is playing games Powers off when PC is idle Powers off when PC is in safe mode Powers off when PC is in BIOS Powers off when PC is booted through a Windows installation USB Replaced the motherboard battery several times Replaced the 650W PSU with a 750W PSU Replaced the RAM Swapped the RAM between slots Re-applied thermal paste to the CPU Checked if the motherboard touches the case Nothing is overclocked PC Specs PC specs: OS: Windows 7 Ultimate SP1 RAM: klingston 1333MHz 4GB stick CPU: AMD Phenom II x4 955 Mobo: Gigabyte 88GMA-UD2H rev 2.2 Motherboard battery: CR2032 3v HDD: 500GB Seagate ST3500418AS ATA Device Graphics: ATI/AMD Radeon HD 6870 Very Long version Around 10 months ago I built a brand new gaming PC. Around 6 months ago it's time setting in windows started resetting to the year 2010. I swapped the Motherboard battery for a new one of the exact same size and shape and voltage, and the problems disappeared...for around 2 weeks. Then the same problem happened again, time gets reset, I swapped the battery again, and the problem was gone for good and everything was great for about 3 months.. then another problem started happening, the PC started to power off suddenly and without warning at completely random times, sometimes the PC works for and hour, sometimes 5 minutes. So I read on the forums that it might be either the PSU or the motherboard Battery or RAM or HDD or the Graphics card or the CPU or the motherboard or the drivers or a Virus or Grounding issues, or something short circuiting, basically it can be anything... I spent some days researching, and decided to remove the possibility of a virus. I reset the CMOS, cleared all BIOS settings and reinstalled windows 7 after a full format of the HDD, but the random power off kept happening. I then disabled the restart on error option in windows and looked at the event log for error events, but they did not help me figure out the problem. Network list service depends on network location awareness the dependency group failed to start Source Kernel Power Event 41 Task Category 63 Source Disk Event ID 11 Task Category None The driver detected a controller error on device disk I took apart the PC, every little piece, re-applied some expensive thermal paste to the CPU, and double checked that none of the pieces are touching the PC case. The problem was gone, the PC no longer powered off randomly I re-attached the graphics card and all was good for 4 months... then the power off problem appeared again, but was happening at high intervals, the PC would shutdown once in 2 days on average, at random points in time, sometimes when it's idle all day long, sometimes when it's running CRYSIS 2. I checked the CPU temperature, because I know that AMD CPU's have a built in protection mechanism that switches off the PC if the CPU gets too hot, and the Temp was 50C system temp, and 45C CPU after running the PC all day long (I did not do tests to see if there are any temperature spikes, don't know how to do them) Originally the PSU that powered the PC was 650Watts and had one 4 pin cable to power the CPU, I replaced it with a new 750Watts PSU which has two 4 pin cables for the CPU, but the problem remained. I removed the graphics card and let the motherboard use the built in one, but the PC kept suddenly powering off at random times. I took apart the PC completely again, and re-applied thermal paste to the CPU, added lots of insulation, and checked for any type of short-circuit possibility again and again, but the problem remained. The problem was like that for some months. I replaced the Battery a couple of times over the time, changed lots of options in windows, and tried everything I could, but it kept powering off, so I stopped using the PC as much as I used to, just living with the random power offs from time to time, until a couple of days ago, when the power off happens almost immediately after powering on the PC. I replaced the RAM with a brand new one, but that did not help. Took apart the PC again, checked for anything anywhere that might cause it, found some small scratches on the very edge of the motherboard to the left of the PCI express x16 slot. This might cause the problem, I thought, but the scratch looks very superficial, not deep at all, and if the scratch did harm the motherboard, wouldn't it cause it to not start at all? And why did it start to power off a while ago, and then suddenly stop powering off? The scratches could not have vanished??? did chkdsk \d but it powered off when it was at 75% I removed the hard disks, the graphics card, while I fiddled with the BIOS settings, and suddenly the PC shut down while I was looking at the BIOS version. This makes me realize, it is not caused by: HDD, Windows, Drivers or the Graphics card I cleared the CMOS again, updated the BIOS from F5 to F6f beta, but that did not help, it might even seem that the PC powers off even sooner. The shutdown even happened to me while I booted through a windows 7 installation USB and was in the repair console. I removed one of the cables powering the CPU, now only one 4pin cable powers it, and it worked for 30mins after doing that, which makes me think that it's the CPU overheating, and because it gets less power, it overheats slower? The things that I am still considering: CPU overheating (does not seem to overheat, maybe false readings?) Motherboard short circuiting (faulty motherboard?) I desperately need some advice in what is faulty, is it a faulty Motherboard or an overheating CPU? or maybe something else? I have been breaking my head over this problem over a span of 6 months. I'm not sure if this is a good place to ask this question, if it is not, then tell me where I can get some experienced help. More info I have also discovered a mysterious piece that seems to have fallen out of the motherboard i119.photobucket.com/albums/o126/yurikolovsky/strangepiece.jpg What is it? Looks like each time that it powers off the datetime gets reset I also found another forum post tomshardware.co.uk/forum/… except I don't have Integrated PeripheralsUSB Keyboard Function option in BIOS :S Comments summary (asked by Random moderator) Q. tell me, if the computer restarts, is it immediately? Does it take a second and then restarts? Do you see (BSOD) or hear (PSU, short circuit) any suspicious when it happens? After reading trough it, it remains the mainboard that is faulty. – JohannesM A. Immediate power off, all the fans stop instantly, all the light turn off instantly, no sound or anything, and it remains off until I turn it back on. Thanks for the feedback, faulty motherboard is what I fear. Q. Try stress-testing the system with Prime95 and see if errors or shutdowns occur when the CPU is under full load. – speakr A. Prime95 heat stress test peaked CPU heat at 60C after 5mins, it powered off after 30mins of testing in the middle of the test with no errors, Prime95 Heat test or the stress-testing with low RAM usage (small or in-place FFTs) do not report errors while testing for 10-60 mins. The power off does not seem like it is affected by Prime95 at all Makes me wonder if it's a CPU or Motherboard issue at all. Q. I had similar random/intermittent problems with my old board. It gave one of a few different symptoms: keyboard and/or mouse would die and/or the RAM wouldn't work and/or it would shut down. It was in bad shape. One problems was that my old PSU had literally burned the connector on it (browned around the pins), another was that a broken lead inside the layers of the PCB would work sometimes if it happened to be hot or if I bent the board—by jamming a hunk of wood behind it. I managed to keep the board alive for several years, but eventually nothing I did would make it work correctly anymore. – Synetech A. I will try that as the last resort, ok? ;) Q. Have you tried a different power cord, surge protector, outlet (on a different circuit). It's worth a shot just to ensure it's not subpar wiring or a week circuit (dips in power may cause shutdown if the PSU can't pull enough juice from the wall). – Kyle A. yes, I attached the PC to an entirely different outlet on a different circuit and the problem persists. After connecting it to a different outlet after starting the PC it gave me 3 long beeps and 1 short one, then the PC immediately proceeded to boot up normally. Q. Re-check your mainboard manual and all PSU connections to your mainboard to be sure that nothing is missing (e.g. 12V ATX 4-pin/6-pin connector). If you can provoke shutdowns with Prime95, then consider buying new hardware -- a stable system should run Prime95 for 24h without any errors. Prime95 mentions errors in the log when they occur and gives a summary after the stress test was stopped manually (e.g. "0 errors, 0 warnings", if all is fine) – speakr A. Re-checked, there are no more PSU connectors that I can physically connect, except the one ATX 4-pin (there are 2 that power the CPU) that I disconnected on purpose, I have reconnected it but the problem persists. Q. With one PC I had a short curcuit. The power button on the front plate had its cables soldered, but not isolated, and the contacts were very close to the metal case. A heavier touch was enough to cause a shutdown. The PC's vibration could be enough – ott-- A. yes, it seems to switch off with even the lightest touch, I switched on the PC, then pulled out the front panel power cable that connects to the motherboard so the power button does not work anymore, after 5 mins of working like that, with the power button completely disconnected, just sitting idle, the PC powered off again, I don't think it's the power button. Q. I wonder if you dare to operate components without the case, that is remove motherboard, power, disk ( just put the motherboard on a wooden desk). Don't bend the adapters when running like that. – ott-- A. yes, I do dare to do that, but only tomorrow, too tired/late right now.

Developer IT