fred 22 - Page 171 - Developer IT

--log-slave-updates is OFF but some updates are still logged to the slave binary log?

- by quanta

MySQL version 5.5.14 According to the document, by the default, slave does not log to its binary log any updates that are received from a master server. Here are my config. on the slave: # egrep 'bin|slave' /etc/my.cnf relay-log=mysqld-relay-bin log-bin = /var/log/mysql/mysql-bin binlog-format=MIXED sync_binlog = 1 log-bin-trust-function-creators = 1 mysql> show global variables like 'log_slave%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | log_slave_updates | OFF | +-------------------+-------+ 1 row in set (0.01 sec) mysql> select @@log_slave_updates; +---------------------+ | @@log_slave_updates | +---------------------+ | 0 | +---------------------+ 1 row in set (0.00 sec) but slave still logs the some changes to its binary logs, let's see the file size: -rw-rw---- 1 mysql mysql 37M Apr 1 01:00 /var/log/mysql/mysql-bin.001256 -rw-rw---- 1 mysql mysql 25M Apr 2 01:00 /var/log/mysql/mysql-bin.001257 -rw-rw---- 1 mysql mysql 46M Apr 3 01:00 /var/log/mysql/mysql-bin.001258 -rw-rw---- 1 mysql mysql 115M Apr 4 01:00 /var/log/mysql/mysql-bin.001259 -rw-rw---- 1 mysql mysql 105M Apr 4 18:54 /var/log/mysql/mysql-bin.001260 and the sample query when reading these binary files with mysqlbinlog utility: #120404 19:08:57 server id 3 end_log_pos 110324763 Query thread_id=382435 exec_time=0 error_code=0 SET TIMESTAMP=1333541337/*!*/; INSERT INTO norep_SplitValues VALUES ( NAME_CONST('cur_string',_utf8'118212' COLLATE 'utf8_general_ci')) /*!*/; # at 110324763 Did I miss something? Reply to @RolandoMySQLDBA: If replication brought this over, then the same query has to be in the relay logs. Please go find the relay log that has the INSERT query with the same TIMESTAMP (1333541337). There is no such query with the same TIMESTAMP in the relay logs. If you cannot find it in the relay logs, then look and see if Infobright is posting the INSERT query. In that instance, the INSERT should be recorded in the binary logs of the Slave. Looking more deeply into the binary logs, I see that almost of the queries are CREATE/INSERT/UPDATE/DROP "temporary" tables, something like this: # at 123873315 #120405 0:42:04 server id 3 end_log_pos 123873618 Query thread_id=395373 exec_time=0 error_code=0 SET TIMESTAMP=1333561324/*!*/; SET @@session.pseudo_thread_id=395373/*!*/; CREATE TEMPORARY TABLE `norep_tmpcampaign` ( `campaignid` INTEGER(11) NOT NULL DEFAULT '0' , `status` INTEGER(11) NOT NULL DEFAULT '0' , `updated` DATETIME, KEY `campaignid` (`campaignid`) )ENGINE=MEMORY /*!*/; # at 123873618 #120405 0:42:04 server id 3 end_log_pos 123873755 Query thread_id=395373 exec_time=0 error_code=0 SET TIMESTAMP=1333561324/*!*/; DROP TABLE IF EXISTS `norep_tmpcampaign1` /* generated by server */ "temporary" here means that they are dropped after calculation is done. I also tells the slave not to replicate any statement matches the norep_ wildcard pattern: replicate-wild-ignore-table=%.norep_% But there is an exception table in the binary logs: # at 123828094 #120405 0:37:21 server id 3 end_log_pos 123828495 Query thread_id=395209 exec_time=0 error_code=0 SET TIMESTAMP=1333561041/*!*/; INSERT INTO sessions (SessionId, ApplicationName, Created, Expires, LockDate, LockId, Timeout, Locked, SessionItems, Fla gs) Values('pgv2exo4y4vo4ccz44vwznu0', '/', '2012-04-05 00:37:21', '2012-04-05 00:57:21', '2012-04-05 00:37:21', 0, 20, 0, 'AwAAAP////8IdXNlcm5hbWUGdXNlcmlkCHBlcm1pdGlkAgAAAAQAAAAGAAAAAQABAAEA', 0) /*!*/; Description: mysql> desc reportingdb.sessions; +-----------------+------------------+------+-----+---------------------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+------------------+------+-----+---------------------+-------+ | SessionId | varchar(64) | NO | PRI | | | | ApplicationName | varchar(255) | NO | | | | | Created | timestamp | NO | | 0000-00-00 00:00:00 | | | Expires | timestamp | NO | | 0000-00-00 00:00:00 | | | LockDate | timestamp | NO | | 0000-00-00 00:00:00 | | | LockId | int(11) unsigned | NO | | NULL | | | Timeout | int(11) unsigned | NO | | NULL | | | Locked | bit(1) | NO | | NULL | | | SessionItems | varchar(255) | YES | | NULL | | | Flags | int(11) | NO | | NULL | | +-----------------+------------------+------+-----+---------------------+-------+ I'm sure all these queries are posting by MySQL, not Infobright: $ mysql-ib -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 48971 Server version: 5.1.40 build number (revision)=IB_4.0.5_r15240_15370(ice) (static) Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> select * from information_schema.tables where table_name='sessions'; Empty set (0.02 sec) I've been trying some INSERT/UPDATE queries with testing tables on the master, it is copied to the relay logs, not binary logs on slave: # at 311664029 #120405 0:15:23 server id 1 end_log_pos 311664006 Query thread_id=10458250 exec_time=0 error_code=0 use testuser/*!*/; SET TIMESTAMP=1333559723/*!*/; update users set email2='[email protected]' where id=22 /*!*/; Pay attention to the server id, in the relay logs, server id is master's (1) and in the binary log, server id is slave's (3 in this case). Reply to @RolandoMySQLDBA: Thu Apr 5 10:06:00 ICT 2012 Run CREATE DATABASE quantatest; on the Master now, please. Tell me if CREATE DATABASE quantatest; showed up in the Slave's Binary Logs. As I said above: I've been trying some INSERT/UPDATE queries with testing tables on the master, it is copied to the relay logs, not binary logs and you can guess, IO thread copied it to the relay logs, not binary logs. #120405 10:07:25 server id 1 end_log_pos 347573819 Query thread_id=10480775 exec_time=0 error_code=0 SET TIMESTAMP=1333595245/*!*/; /*!\C latin1 *//*!*/; SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/; create database quantatest /*!*/; The question must probably change to: why some update queries still logged to the slave binary logs althrough --log-slave-updates is disabled? Where they come from? Here are few last: /*!*/; # at 27492197 #120405 10:12:45 server id 3 end_log_pos 27492370 Query thread_id=410353 exec_time=0 error_code=0 SET TIMESTAMP=1333595565/*!*/; CREATE TEMPORARY TABLE norep_SplitValues ( value VARCHAR(1000) NOT NULL ) ENGINE=MEMORY /*!*/; # at 27492370 #120405 10:12:45 server id 3 end_log_pos 27492445 Query thread_id=410353 exec_time=0 error_code=0 SET TIMESTAMP=1333595565/*!*/; BEGIN /*!*/; # at 27492445 #120405 10:12:45 server id 3 end_log_pos 27492619 Query thread_id=410353 exec_time=0 error_code=0 SET TIMESTAMP=1333595565/*!*/; INSERT INTO norep_SplitValues VALUES ( NAME_CONST('cur_string',_utf8'119577' COLLATE 'utf8_general_ci')) /*!*/; # at 27492619 #120405 10:12:45 server id 3 end_log_pos 27492695 Query thread_id=410353 exec_time=0 error_code=0 SET TIMESTAMP=1333595565/*!*/; COMMIT /*!*/; # at 27492918 #120405 10:12:46 server id 3 end_log_pos 27493115 Query thread_id=410353 exec_time=0 error_code=0 SET TIMESTAMP=1333595566/*!*/; SELECT `reportingdb`.`selfserving_get_locationad`(_utf8'3' COLLATE 'utf8_general_ci',_utf8'' COLLATE 'utf8_general_ci') /*!*/; # at 27493199 #120405 10:12:46 server id 3 end_log_pos 27493353 Query thread_id=410353 exec_time=0 error_code=0 SET TIMESTAMP=1333595566/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/; DROP TEMPORARY TABLE IF EXISTS `norep_SplitValues` /* generated by server */ /*!*/;

Read the article

setup L2TP on Ubuntu 10.10

- by luca

I'm following this guide to setup a VPS on my Ubuntu VPS: http://riobard.com/blog/2010-04-30-l2tp-over-ipsec-ubuntu/ My config files are setup as in that guide, openswan version is 2.6.26 I think.. It doesn't work, I can show you my auth.log (on the VPS): Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: received Vendor ID payload [RFC 3947] method set to=109 Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: received Vendor ID payload [draft-ietf-ipsec-nat-t-ike] method set to=110 Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: ignoring unknown Vendor ID payload [8f8d83826d246b6fc7a8a6a428c11de8] Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: ignoring unknown Vendor ID payload [439b59f8ba676c4c7737ae22eab8f582] Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: ignoring unknown Vendor ID payload [4d1e0e136deafa34c4f3ea9f02ec7285] Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: ignoring unknown Vendor ID payload [80d0bb3def54565ee84645d4c85ce3ee] Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: ignoring unknown Vendor ID payload [9909b64eed937c6573de52ace952fa6b] Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: received Vendor ID payload [draft-ietf-ipsec-nat-t-ike-03] meth=108, but already using method 110 Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: received Vendor ID payload [draft-ietf-ipsec-nat-t-ike-02] meth=107, but already using method 110 Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: received Vendor ID payload [draft-ietf-ipsec-nat-t-ike-02_n] meth=106, but already using method 110 Feb 18 06:11:07 maverick pluto[6909]: packet from 93.36.127.12:500: received Vendor ID payload [Dead Peer Detection] Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[7] 93.36.127.12 #7: responding to Main Mode from unknown peer 93.36.127.12 Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[7] 93.36.127.12 #7: transition from state STATE_MAIN_R0 to state STATE_MAIN_R1 Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[7] 93.36.127.12 #7: STATE_MAIN_R1: sent MR1, expecting MI2 Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[7] 93.36.127.12 #7: NAT-Traversal: Result using draft-ietf-ipsec-nat-t-ike (MacOS X): peer is NATed Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[7] 93.36.127.12 #7: transition from state STATE_MAIN_R1 to state STATE_MAIN_R2 Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[7] 93.36.127.12 #7: STATE_MAIN_R2: sent MR2, expecting MI3 Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[7] 93.36.127.12 #7: Main mode peer ID is ID_IPV4_ADDR: '10.0.1.8' Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[7] 93.36.127.12 #7: switched from "L2TP-PSK-NAT" to "L2TP-PSK-NAT" Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: deleting connection "L2TP-PSK-NAT" instance with peer 93.36.127.12 {isakmp=#0/ipsec=#0} Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: transition from state STATE_MAIN_R2 to state STATE_MAIN_R3 Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: new NAT mapping for #7, was 93.36.127.12:500, now 93.36.127.12:36810 Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: STATE_MAIN_R3: sent MR3, ISAKMP SA established {auth=OAKLEY_PRESHARED_KEY cipher=oakley_3des_cbc_192 prf=oakley_sha group=modp1024} Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: ignoring informational payload, type IPSEC_INITIAL_CONTACT msgid=00000000 Feb 18 06:11:07 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: received and ignored informational message Feb 18 06:11:08 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: the peer proposed: 69.147.233.173/32:17/1701 -> 10.0.1.8/32:17/0 Feb 18 06:11:08 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #8: responding to Quick Mode proposal {msgid:183463cf} Feb 18 06:11:08 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #8: us: 69.147.233.173<69.147.233.173>[+S=C]:17/1701 Feb 18 06:11:08 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #8: them: 93.36.127.12[10.0.1.8,+S=C]:17/64111===10.0.1.8/32 Feb 18 06:11:08 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #8: transition from state STATE_QUICK_R0 to state STATE_QUICK_R1 Feb 18 06:11:08 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #8: STATE_QUICK_R1: sent QR1, inbound IPsec SA installed, expecting QI2 Feb 18 06:11:08 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #8: transition from state STATE_QUICK_R1 to state STATE_QUICK_R2 Feb 18 06:11:08 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #8: STATE_QUICK_R2: IPsec SA established transport mode {ESP=>0x0b1cf725 <0x0b719671 xfrm=AES_128-HMAC_SHA1 NATOA=none NATD=93.36.127.12:36810 DPD=none} Feb 18 06:11:28 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: received Delete SA(0x0b1cf725) payload: deleting IPSEC State #8 Feb 18 06:11:28 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: netlink recvfrom() of response to our XFRM_MSG_DELPOLICY message for policy eroute_connection delete was too long: 100 > 36 Feb 18 06:11:28 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: netlink recvfrom() of response to our XFRM_MSG_DELPOLICY message for policy [email protected] was too long: 168 > 36 Feb 18 06:11:28 maverick pluto[6909]: | raw_eroute result=0 Feb 18 06:11:28 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: received and ignored informational message Feb 18 06:11:28 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12 #7: received Delete SA payload: deleting ISAKMP State #7 Feb 18 06:11:28 maverick pluto[6909]: "L2TP-PSK-NAT"[8] 93.36.127.12: deleting connection "L2TP-PSK-NAT" instance with peer 93.36.127.12 {isakmp=#0/ipsec=#0} Feb 18 06:11:28 maverick pluto[6909]: packet from 93.36.127.12:36810: received and ignored informational message and my system log on OSX (from where I'm connecting): Feb 18 13:11:09 luca-ciorias-MacBook-Pro pppd[68656]: pppd 2.4.2 (Apple version 412.3) started by luca, uid 501 Feb 18 13:11:09 luca-ciorias-MacBook-Pro pppd[68656]: L2TP connecting to server '69.147.233.173' (69.147.233.173)... Feb 18 13:11:09 luca-ciorias-MacBook-Pro pppd[68656]: IPSec connection started Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: Connecting. Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: transmit success. (Initiator, Main-Mode message 1). Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: receive success. (Initiator, Main-Mode message 2). Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: transmit success. (Initiator, Main-Mode message 3). Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: receive success. (Initiator, Main-Mode message 4). Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: transmit success. (Initiator, Main-Mode message 5). Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: IKEv1 Phase1 AUTH: success. (Initiator, Main-Mode Message 6). Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: receive success. (Initiator, Main-Mode message 6). Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: IKEv1 Phase1 Initiator: success. (Initiator, Main-Mode). Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: transmit success. (Information message). Feb 18 13:11:09 luca-ciorias-MacBook-Pro racoon[68453]: IKEv1 Information-Notice: transmit success. (ISAKMP-SA). Feb 18 13:11:10 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: transmit success. (Initiator, Quick-Mode message 1). Feb 18 13:11:10 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: receive success. (Initiator, Quick-Mode message 2). Feb 18 13:11:10 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: transmit success. (Initiator, Quick-Mode message 3). Feb 18 13:11:10 luca-ciorias-MacBook-Pro racoon[68453]: IKEv1 Phase2 Initiator: success. (Initiator, Quick-Mode). Feb 18 13:11:10 luca-ciorias-MacBook-Pro racoon[68453]: Connected. Feb 18 13:11:10 luca-ciorias-MacBook-Pro pppd[68656]: IPSec connection established Feb 18 13:11:30 luca-ciorias-MacBook-Pro pppd[68656]: L2TP cannot connect to the server Feb 18 13:11:30 luca-ciorias-MacBook-Pro configd[20]: SCNCController: Disconnecting. (Connection tried to negotiate for, 22 seconds). Feb 18 13:11:30 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: transmit success. (Information message). Feb 18 13:11:30 luca-ciorias-MacBook-Pro racoon[68453]: IKEv1 Information-Notice: transmit success. (Delete IPSEC-SA). Feb 18 13:11:30 luca-ciorias-MacBook-Pro racoon[68453]: IKE Packet: transmit success. (Information message). Feb 18 13:11:30 luca-ciorias-MacBook-Pro racoon[68453]: IKEv1 Information-Notice: transmit success. (Delete ISAKMP-SA). Feb 18 13:11:31 luca-ciorias-MacBook-Pro racoon[68453]: Disconnecting. (Connection was up for, 20.157953 seconds).

Read the article

how to rollback/undo yum update on fedora after messing the fedora versions

- by misteryes

I want to install texlive on my fedora 16 laptop with the following procedure: # yum remove tex-* texlive-* # cat > /etc/yum.repos.d/texlive.repo <<EOF [texlive] name=texlive baseurl=http://jnovy.fedorapeople.org/texlive/2012/packages.f17/ enabled=1 gpgcheck=0 EOF # yum update; # yum install texlive after yum update, I notice that my laptop is fedora 16, while I used 2012/packages.fc17/ so I modify /etc/yum.repos.d/texlive.repo to use 2011/packages.fc16 and do yum update again however, there are many errors [root@kitty esolve]# yum update Loaded plugins: auto-update-debuginfo, langpacks, presto, refresh-packagekit http://repos.fedorapeople.org/repos/leigh123linux/cinnamon/fedora-16/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found : http://repos.fedorapeople.org/repos/leigh123linux/cinnamon/fedora-16/x86_64/repodata/repomd.xml Trying other mirror. Setting up Update Process Resolving Dependencies --> Running transaction check ---> Package dvipng.x86_64 0:1.14-1.fc15 will be obsoleted ---> Package kpathsea.x86_64 0:2007-66.fc16 will be obsoleted --> Processing Dependency: libkpathsea.so.4()(64bit) for package: evince-dvi-3.2.1-2.fc16.x86_64 ---> Package mkvtoolnix.x86_64 0:5.8.0-1 will be updated ---> Package mkvtoolnix.x86_64 0:6.3.0-1 will be an update ---> Package nautilus-dropbox.x86_64 0:1.4.0-1.fc10 will be updated ---> Package nautilus-dropbox.x86_64 0:1.6.0-1.fc10 will be an update ---> Package texlive-dvipng-bin.x86_64 2:svn26509.0-19.20130317_r29408.fc17 will be obsoleting --> Processing Dependency: texlive-kpathsea-lib = 2:2012-19.20130317_r29408.fc17 for package: 2:texlive-dvipng-bin-svn26509.0-19.20130317_r29408.fc17.x86_64 --> Processing Dependency: texlive-base for package: 2:texlive-dvipng-bin-svn26509.0-19.20130317_r29408.fc17.x86_64 --> Processing Dependency: tex-dvipng for package: 2:texlive-dvipng-bin-svn26509.0-19.20130317_r29408.fc17.x86_64 --> Processing Dependency: libpng15.so.15()(64bit) for package: 2:texlive-dvipng-bin-svn26509.0-19.20130317_r29408.fc17.x86_64 --> Processing Dependency: libkpathsea.so.6()(64bit) for package: 2:texlive-dvipng-bin-svn26509.0-19.20130317_r29408.fc17.x86_64 ---> Package texlive-kpathsea.noarch 2:svn28792.0-19.fc17 will be obsoleting --> Processing Dependency: texlive-kpathsea-bin for package: 2:texlive-kpathsea-svn28792.0-19.fc17.noarch --> Running transaction check ---> Package kpathsea.x86_64 0:2007-66.fc16 will be obsoleted --> Processing Dependency: libkpathsea.so.4()(64bit) for package: evince-dvi-3.2.1-2.fc16.x86_64 ---> Package texlive-base.noarch 2:2012-19.20130317_r29408.fc17 will be installed ---> Package texlive-dvipng.noarch 2:svn26689.1.14-19.fc17 will be installed ---> Package texlive-dvipng-bin.x86_64 2:svn26509.0-19.20130317_r29408.fc17 will be obsoleting --> Processing Dependency: libpng15.so.15()(64bit) for package: 2:texlive-dvipng-bin-svn26509.0-19.20130317_r29408.fc17.x86_64 ---> Package texlive-kpathsea-bin.x86_64 2:svn27347.0-19.20130317_r29408.fc17 will be installed ---> Package texlive-kpathsea-lib.x86_64 2:2012-19.20130317_r29408.fc17 will be installed --> Finished Dependency Resolution Error: Package: evince-dvi-3.2.1-2.fc16.x86_64 (@fedora) Requires: libkpathsea.so.4()(64bit) Removing: kpathsea-2007-66.fc16.x86_64 (@so-updates) libkpathsea.so.4()(64bit) Obsoleted By: 2:texlive-kpathsea-svn28792.0-19.fc17.noarch (texlive) Not found Error: Package: 2:texlive-dvipng-bin-svn26509.0-19.20130317_r29408.fc17.x86_64 (texlive) Requires: libpng15.so.15()(64bit) You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest and when I do yum install texlive, it simply tries to install the f17 version, which failed. what Can I do to install f16 version? how can I undo yum update with 2012/packages.f17/ I tried yum history, and for today's history, I only have Loaded plugins: auto-update-debuginfo, langpacks, presto, refresh-packagekit ID | Login user | Date and time | Action(s) | Altered ------------------------------------------------------------------------------- 124 | esolve ... <esolve> | 2013-09-12 18:35 | Erase | 24 123 | root <root> | 2013-08-23 11:08 | Update | 1 122 | root <root> | 2013-08-21 14:13 | Update | 1 < 121 | esolve ... <esolve> | 2013-05-31 15:36 | Install | 1 > 120 | root <root> | 2013-05-29 15:13 | Install | 1 < 119 | root <root> | 2013-04-18 13:13 | Update | 1 >< which seems not related to yum update the history results: 1003 yum update 1004 vim 1005 vim /etc/yum.repos.d/texlive.repo 1006 yum update 1007 yum install texlive 1008 vim /etc/yum.repos.d/texlive.repo 1009 clear 1010 yum history 1011 yum history list 1012 vim 1013 vim /etc/yum.repos.d/texlive.repo 1014 yum history list 1015 history also I tried yum history undo 124 but it failed! [root@kitty esolve]# yum history undo 124 Loaded plugins: auto-update-debuginfo, langpacks, presto, refresh-packagekit http://repos.fedorapeople.org/repos/leigh123linux/cinnamon/fedora-16/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found : http://repos.fedorapeople.org/repos/leigh123linux/cinnamon/fedora-16/x86_64/repodata/repomd.xml Trying other mirror. Undoing transaction 124, from Thu Sep 12 18:35:31 2013 Erase R-2.14.1-1.fc16.x86_64 ? Erase R-core-2.14.1-1.fc16.x86_64 ? Erase R-devel-2.14.1-1.fc16.x86_64 ? Erase a2ps-4.14-12.fc15.x86_64 ? Erase docbook-utils-pdf-0.6.14-29.fc16.noarch ? Erase html2ps-1.0-0.7.b7.fc15.noarch ? Erase jadetex-3.13-10.fc15.noarch ? Erase kile-2.1.1-1.fc16.x86_64 ? Erase linuxdoc-tools-0.9.66-9.fc15.x86_64 ? Erase tetex-dvipost-1.1-12.fc15.x86_64 ? Erase tex-cm-lgc-0.5-18.fc15.noarch ? Erase tex-preview-11.86-6.fc16.noarch ? Erase texinfo-tex-4.13a-15.fc15.x86_64 ? Erase texlive-2007-66.fc16.x86_64 ? Erase texlive-dvips-2007-66.fc16.x86_64 ? Erase texlive-latex-2007-66.fc16.x86_64 ? Erase texlive-texmf-2007-40.fc16.noarch ? Erase texlive-texmf-dvips-2007-40.fc16.noarch ? Erase texlive-texmf-fonts-2007-40.fc16.noarch ? Erase texlive-texmf-latex-2007-40.fc16.noarch ? Erase texlive-utils-2007-66.fc16.x86_64 ? Erase texmaker-1:3.2.2-1.fc16.x86_64 ? Erase texmf-RR-Inria-4.11-inria.0.noarch ? Erase xdvik-22.84.14-9.fc15.x86_64 ? Error: No package(s) available to install

Read the article

Failing Sata HDD

- by DaveCol

I think my HDD is fried... Could someone confirm or help me restore it? I was using Hardware RAID 1 Configuration [2 x 160GB SATA HDD] on a CentOS 4 Installation. All of a sudden I started seeing bad sectors on the second HDD which stopped being mirrored. I have removed the RAID array and have tested with SMART which showed the following error: 187 Unknown_Attribute 0x003a 001 001 051 Old_age Always FAILING_NOW 4645 I have no clue what this means, or if I can recover from it. Could someone give me some ideas on how to fix this, or what HDD to get to replace this? Complete SMART report: Smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: GB0160CAABV Serial Number: 6RX58NAA Firmware Version: HPG1 User Capacity: 160,041,885,696 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a Local Time is: Tue Oct 19 13:42:42 2010 COT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 433) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 54) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 253 006 Pre-fail Always - 0 3 Spin_Up_Time 0x0002 097 097 000 Old_age Always - 0 4 Start_Stop_Count 0x0033 100 100 020 Pre-fail Always - 152 5 Reallocated_Sector_Ct 0x0033 095 095 036 Pre-fail Always - 214 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 73109713 9 Power_On_Hours 0x0032 083 083 000 Old_age Always - 15133 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0033 100 100 020 Pre-fail Always - 154 184 Unknown_Attribute 0x0032 038 038 000 Old_age Always - 62 187 Unknown_Attribute 0x003a 001 001 051 Old_age Always FAILING_NOW 4645 189 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 190 Unknown_Attribute 0x001a 061 055 000 Old_age Always - 656408615 194 Temperature_Celsius 0x0000 039 045 000 Old_age Offline - 39 (Lifetime Min/Max 0/22) 195 Hardware_ECC_Recovered 0x0032 070 059 000 Old_age Always - 12605265 197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 1 198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0000 200 200 000 Old_age Offline - 62 SMART Error Log Version: 1 ATA Error Count: 4645 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 4645 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 02 7b 86 b1 ea 00 00:38:52.796 READ DMA ec 03 45 00 00 00 a0 00 00:38:52.796 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:52.794 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:49.991 IDENTIFY DEVICE c8 00 04 79 86 b1 ea 00 00:38:49.935 READ DMA Error 4644 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 04 79 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:49.991 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:49.935 READ DMA Error 4643 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA Error 4642 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA Error 4641 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 15131 - # 2 Short offline Completed without error 00% 15131 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

Read the article

Why load increase ,ssd iops increase but cpu iowait decrease?

- by mq44944

There is a strange thing on my server which has a mysql running on it. The QPS is more than 4000 but TPS is less than 20. The server load is more than 80 and cpu usr is more than 86% but iowait is less than 8%. The disk iops is more than 16000 and util of disk is more than 99%. When the QPS decreases, the load decreases, the cpu iowait increases. I can't catch this! root@mypc # dmidecode | grep "Product Name" Product Name: PowerEdge R510 Product Name: 084YMW root@mypc # megacli -PDList -aALL |grep "Inquiry Data" Inquiry Data: SEAGATE ST3600057SS ES656SL316PT Inquiry Data: SEAGATE ST3600057SS ES656SL30THV Inquiry Data: ATA INTEL SSDSA2CW300362CVPR201602A6300EGN Inquiry Data: ATA INTEL SSDSA2CW300362CVPR2044037K300EGN Inquiry Data: ATA INTEL SSDSA2CW300362CVPR204402PX300EGN Inquiry Data: ATA INTEL SSDSA2CW300362CVPR204403WN300EGN Inquiry Data: ATA INTEL SSDSA2CW300362CVPR202000HU300EGN Inquiry Data: ATA INTEL SSDSA2CW300362CVPR202001E7300EGN Inquiry Data: ATA INTEL SSDSA2CW300362CVPR204402WE300EGN Inquiry Data: ATA INTEL SSDSA2CW300362CVPR204404E5300EGN Inquiry Data: ATA INTEL SSDSA2CW300362CVPR204401QF300EGN Inquiry Data: ATA INTEL SSDSA2CW300362CVPR20450001300EGN the mysql data files lie on the ssd disks which are organizaed using RAID 10. root@mypc # megacli -LDInfo -L1 -a0 Adapter 0 -- Virtual Drive Information: Virtual Disk: 1 (Target Id: 1) Name: RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:1427840MB State: Optimal Stripe Size: 64kB Number Of Drives:2 Span Depth:5 Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Exit Code: 0x00 -------- -----load-avg---- ---cpu-usage--- ---swap--- -------------------------io-usage----------------------- -QPS- -TPS- -Hit%- time | 1m 5m 15m |usr sys idl iow| si so| r/s w/s rkB/s wkB/s queue await svctm %util| ins upd del sel iud| lor hit| 09:05:29|79.80 64.49 42.00| 82 7 6 5| 0 0|16421.1 10.6262705.9 85.2 8.3 0.5 0.1 99.5| 0 0 0 3968 0| 495482 96.58| 09:05:30|79.80 64.49 42.00| 79 7 8 6| 0 0|15907.4 230.6254409.7 6357.5 8.4 0.5 0.1 98.5| 0 0 0 4195 0| 496434 96.68| 09:05:31|81.34 65.07 42.31| 81 7 7 5| 0 0|16198.7 8.6259029.2 99.8 8.1 0.5 0.1 99.3| 0 0 0 4220 0| 508983 96.70| 09:05:32|81.34 65.07 42.31| 82 7 5 5| 0 0|16746.6 8.7267853.3 92.4 8.5 0.5 0.1 99.4| 0 0 0 4084 0| 503834 96.54| 09:05:33|81.34 65.07 42.31| 81 7 6 5| 0 0|16498.7 9.6263856.8 92.3 8.0 0.5 0.1 99.3| 0 0 0 4030 0| 507051 96.60| 09:05:34|81.34 65.07 42.31| 80 8 7 6| 0 0|16328.4 11.5261101.6 95.8 8.1 0.5 0.1 98.3| 0 0 0 4119 0| 504409 96.63| 09:05:35|81.31 65.33 42.52| 82 7 6 5| 0 0|16374.0 8.7261921.9 92.5 8.1 0.5 0.1 99.7| 0 0 0 4127 0| 507279 96.66| 09:05:36|81.31 65.33 42.52| 81 8 6 5| 0 0|16496.2 8.6263832.0 84.5 8.5 0.5 0.1 99.2| 0 0 0 4100 0| 505054 96.59| 09:05:37|81.31 65.33 42.52| 82 8 6 4| 0 0|16239.4 9.6259768.8 84.3 8.0 0.5 0.1 99.1| 0 0 0 4273 0| 510621 96.72| 09:05:38|81.31 65.33 42.52| 81 7 6 5| 0 0|16349.6 8.7261439.2 81.4 8.2 0.5 0.1 98.9| 0 0 0 4171 0| 510145 96.67| 09:05:39|81.31 65.33 42.52| 82 7 6 5| 0 0|16116.8 8.7257667.6 96.5 8.0 0.5 0.1 99.1| 0 0 0 4348 0| 513093 96.74| 09:05:40|79.60 65.24 42.61| 79 7 7 7| 0 0|16154.2 242.9258390.4 6388.4 8.5 0.5 0.1 99.0| 0 0 0 4033 0| 507244 96.70| 09:05:41|79.60 65.24 42.61| 79 7 8 6| 0 0|16583.1 21.2265129.6 173.5 8.2 0.5 0.1 99.1| 0 0 0 3995 0| 501474 96.57| 09:05:42|79.60 65.24 42.61| 81 8 6 5| 0 0|16281.0 9.7260372.2 69.5 8.3 0.5 0.1 98.7| 0 0 0 4221 0| 509322 96.70| 09:05:43|79.60 65.24 42.61| 80 7 7 6| 0 0|16355.3 8.7261515.5 104.3 8.2 0.5 0.1 99.6| 0 0 0 4087 0| 502052 96.62| -------- -----load-avg---- ---cpu-usage--- ---swap--- -------------------------io-usage----------------------- -QPS- -TPS- -Hit%- time | 1m 5m 15m |usr sys idl iow| si so| r/s w/s rkB/s wkB/s queue await svctm %util| ins upd del sel iud| lor hit| 09:05:44|79.60 65.24 42.61| 83 7 5 4| 0 0|16469.4 11.6263387.0 138.8 8.2 0.5 0.1 98.7| 0 0 0 4292 0| 509979 96.65| 09:05:45|79.07 65.37 42.77| 80 7 6 6| 0 0|16659.5 9.7266478.7 85.0 8.4 0.5 0.1 98.5| 0 0 0 3899 0| 496234 96.54| 09:05:46|79.07 65.37 42.77| 78 7 7 8| 0 0|16752.9 8.7267921.8 97.1 8.4 0.5 0.1 98.9| 0 0 0 4126 0| 508300 96.57| 09:05:47|79.07 65.37 42.77| 82 7 6 5| 0 0|16657.2 9.6266439.3 84.3 8.3 0.5 0.1 98.9| 0 0 0 4086 0| 502171 96.57| 09:05:48|79.07 65.37 42.77| 79 8 6 6| 0 0|16814.5 8.7268924.1 77.6 8.5 0.5 0.1 99.0| 0 0 0 4059 0| 499645 96.52| 09:05:49|79.07 65.37 42.77| 81 7 6 5| 0 0|16553.0 6.8264708.6 42.5 8.3 0.5 0.1 99.4| 0 0 0 4249 0| 501623 96.60| 09:05:50|79.63 65.71 43.01| 79 7 7 7| 0 0|16295.1 246.9260475.0 6442.4 8.7 0.5 0.1 99.1| 0 0 0 4231 0| 511032 96.70| 09:05:51|79.63 65.71 43.01| 80 7 6 6| 0 0|16568.9 8.7264919.7 104.7 8.3 0.5 0.1 99.7| 0 0 0 4272 0| 517177 96.68| 09:05:53|79.63 65.71 43.01| 79 7 7 6| 0 0|16539.0 8.6264502.9 87.6 8.4 0.5 0.1 98.9| 0 0 0 3992 0| 496728 96.52| 09:05:54|79.63 65.71 43.01| 79 7 7 7| 0 0|16527.5 11.6264363.6 92.6 8.5 0.5 0.1 98.8| 0 0 0 4045 0| 502944 96.59| 09:05:55|79.63 65.71 43.01| 80 7 7 6| 0 0|16374.7 12.5261687.2 134.9 8.6 0.5 0.1 99.2| 0 0 0 4143 0| 507006 96.66| 09:05:56|76.05 65.20 42.96| 77 8 8 8| 0 0|16464.9 9.6263314.3 111.9 8.5 0.5 0.1 98.9| 0 0 0 4250 0| 505417 96.64| 09:05:57|76.05 65.20 42.96| 79 7 6 7| 0 0|16460.1 8.8263283.2 93.4 8.3 0.5 0.1 98.8| 0 0 0 4294 0| 508168 96.66| 09:05:58|76.05 65.20 42.96| 80 7 7 7| 0 0|16176.5 9.6258762.1 127.3 8.3 0.5 0.1 98.9| 0 0 0 4160 0| 509349 96.72| 09:05:59|76.05 65.20 42.96| 75 7 9 10| 0 0|16522.0 10.7264274.6 93.1 8.6 0.5 0.1 97.5| 0 0 0 4034 0| 492623 96.51| -------- -----load-avg---- ---cpu-usage--- ---swap--- -------------------------io-usage----------------------- -QPS- -TPS- -Hit%- time | 1m 5m 15m |usr sys idl iow| si so| r/s w/s rkB/s wkB/s queue await svctm %util| ins upd del sel iud| lor hit| 09:06:00|76.05 65.20 42.96| 79 7 7 7| 0 0|16369.6 21.2261867.3 262.5 8.4 0.5 0.1 98.9| 0 0 0 4305 0| 494509 96.59| 09:06:01|75.33 65.23 43.09| 73 6 9 12| 0 0|15864.0 209.3253685.4 6238.0 10.0 0.6 0.1 98.7| 0 0 0 3913 0| 483480 96.62| 09:06:02|75.33 65.23 43.09| 73 7 8 12| 0 0|15854.7 12.7253613.2 93.6 11.0 0.7 0.1 99.0| 0 0 0 4271 0| 483771 96.64| 09:06:03|75.33 65.23 43.09| 75 7 9 9| 0 0|16074.8 8.7257104.3 81.7 8.1 0.5 0.1 98.5| 0 0 0 4060 0| 480701 96.55| 09:06:04|75.33 65.23 43.09| 76 7 8 9| 0 0|16221.7 9.7259500.1 139.4 8.1 0.5 0.1 97.6| 0 0 0 3953 0| 486774 96.56| 09:06:05|74.98 65.33 43.24| 78 7 8 8| 0 0|16330.7 8.7261166.5 85.3 8.2 0.5 0.1 98.5| 0 0 0 3957 0| 481775 96.53| 09:06:06|74.98 65.33 43.24| 75 7 9 9| 0 0|16093.7 11.7257436.1 93.7 8.2 0.5 0.1 99.2| 0 0 0 3938 0| 489251 96.60| 09:06:07|74.98 65.33 43.24| 75 7 5 13| 0 0|15758.9 19.2251989.4 188.2 14.7 0.9 0.1 99.7| 0 0 0 4140 0| 494738 96.70| 09:06:08|74.98 65.33 43.24| 69 7 10 15| 0 0|16166.3 8.7258474.9 81.2 8.9 0.5 0.1 98.7| 0 0 0 3993 0| 487162 96.58| 09:06:09|74.98 65.33 43.24| 74 7 9 10| 0 0|16071.0 8.7257010.9 93.3 8.2 0.5 0.1 99.2| 0 0 0 4098 0| 491557 96.61| 09:06:10|70.98 64.66 43.14| 71 7 9 12| 0 0|15549.6 216.1248701.1 6188.7 8.3 0.5 0.1 97.8| 0 0 0 3879 0| 480832 96.66| 09:06:11|70.98 64.66 43.14| 71 7 10 13| 0 0|16233.7 22.4259568.1 257.1 8.2 0.5 0.1 99.2| 0 0 0 4088 0| 493200 96.62| 09:06:12|70.98 64.66 43.14| 78 7 8 7| 0 0|15932.4 10.6254779.5 108.1 8.1 0.5 0.1 98.6| 0 0 0 4168 0| 489838 96.63| 09:06:13|70.98 64.66 43.14| 71 8 9 12| 0 0|16255.9 11.5259902.3 103.9 8.3 0.5 0.1 98.0| 0 0 0 3874 0| 481246 96.52| 09:06:14|70.98 64.66 43.14| 60 6 16 18| 0 0|15621.0 9.7249826.1 81.9 8.0 0.5 0.1 99.3| 0 0 0 3956 0| 480278 96.65|

Read the article

Ubuntu server has slow performance

- by Rich

I have a custom built Ubuntu 11.04 server with a 6 disk software RAID 10 primary drive. On it I'm primarily running a PostgreSQL and a few other utilities that stream data from the web. I often find after a few hours of uptime the server starts to lag with all kinds of processes. For example, it may take 10-15 seconds after log-in to get a shell prompt. It might take 5-10 seconds for top to come up. An ls might take a second or two. When I look at top there is almost no CPU usage. There's a fair amount of memory used by the PostgreSQL server but not enough to bleed into swap. I have no idea where to go from here, other than to suspect the RAID10 (I've only ever had software RAID 1's before). Edit: Output from top: top - 11:56:03 up 1:46, 3 users, load average: 0.89, 0.73, 0.72 Tasks: 119 total, 1 running, 118 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 0.0%sy, 0.0%ni, 93.5%id, 6.2%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16325596k total, 3478248k used, 12847348k free, 20880k buffers Swap: 19534176k total, 0k used, 19534176k free, 3041992k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1747 woodsp 20 0 109m 10m 4888 S 1 0.1 0:42.70 python 357 root 20 0 0 0 0 S 0 0.0 0:00.40 jbd2/sda3-8 1 root 20 0 24324 2284 1344 S 0 0.0 0:00.84 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:00.24 ksoftirqd/0 6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0 7 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/0 8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1 10 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/1 12 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/1 13 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/2 14 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/2:0 15 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 16 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/2 17 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/3 18 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/3:0 19 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/3 20 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/3 21 root 0 -20 0 0 0 S 0 0.0 0:00.00 cpuset 22 root 0 -20 0 0 0 S 0 0.0 0:00.00 khelper 23 root 20 0 0 0 0 S 0 0.0 0:00.00 kdevtmpfs 24 root 0 -20 0 0 0 S 0 0.0 0:00.00 netns 26 root 20 0 0 0 0 S 0 0.0 0:00.00 sync_supers df -h rpsharp@ncp-skookum:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 1.8T 549G 1.2T 32% / udev 7.8G 4.0K 7.8G 1% /dev tmpfs 3.2G 492K 3.2G 1% /run none 5.0M 0 5.0M 0% /run/lock none 7.8G 0 7.8G 0% /run/shm /dev/sda2 952M 128K 952M 1% /boot/efi /dev/md0 5.5T 562G 4.7T 11% /usr/local free -m psharp@ncp-skookum:~$ free -m total used free shared buffers cached Mem: 15942 3409 12533 0 20 2983 -/+ buffers/cache: 405 15537 Swap: 19076 0 19076 tail -50 /var/log/syslog Jul 3 06:31:32 ncp-skookum rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="1070" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Jul 3 06:39:01 ncp-skookum CRON[14211]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 06:40:01 ncp-skookum CRON[14223]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:00:01 ncp-skookum CRON[14328]: (woodsp) CMD (/home/woodsp/bin/mail_tweetupdate # email an update) Jul 3 07:00:01 ncp-skookum CRON[14327]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:00:28 ncp-skookum sendmail[14356]: q63E0SoZ014356: from=woodsp, size=2328, class=0, nrcpts=2, msgid=<[email protected]>, relay=woodsp@localhost Jul 3 07:00:29 ncp-skookum sm-mta[14357]: q63E0Si6014357: from=<[email protected]>, size=2569, class=0, nrcpts=2, msgid=<[email protected]>, proto=ESMTP, daemon=MTA-v4, relay=localhost [127.0.0.1] Jul 3 07:00:29 ncp-skookum sendmail[14356]: q63E0SoZ014356: to=Spencer Wood <[email protected]>,Martin Lacayo <[email protected]>, ctladdr=woodsp (1004/1005), delay=00:00:01, xdelay=00:00:01, mailer=relay, pri=62328, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (q63E0Si6014357 Message accepted for delivery) Jul 3 07:00:29 ncp-skookum sm-mta[14359]: STARTTLS=client, relay=mx3.stanford.edu., version=TLSv1/SSLv3, verify=FAIL, cipher=DHE-RSA-AES256-SHA, bits=256/256 Jul 3 07:00:29 ncp-skookum sm-mta[14359]: q63E0Si6014357: to=<[email protected]>,<[email protected]>, ctladdr=<[email protected]> (1004/1005), delay=00:00:01, xdelay=00:00:00, mailer=esmtp, pri=152569, relay=mx3.stanford.edu. [171.67.219.73], dsn=2.0.0, stat=Sent (Ok: queued as 8F3505802AC) Jul 3 07:09:08 ncp-skookum CRON[14396]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 07:17:01 ncp-skookum CRON[14438]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jul 3 07:20:01 ncp-skookum CRON[14453]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:39:01 ncp-skookum CRON[14551]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 07:40:01 ncp-skookum CRON[14562]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:00:01 ncp-skookum CRON[14668]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:09:01 ncp-skookum CRON[14724]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 08:17:01 ncp-skookum CRON[14766]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jul 3 08:20:01 ncp-skookum CRON[14781]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:39:01 ncp-skookum CRON[14881]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 08:40:01 ncp-skookum CRON[14892]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Output of hdparm -t /dev/sd{a,b,c,d,e,f} This looks suspicious? /dev/sda: Timing buffered disk reads: 2 MB in 4.84 seconds = 423.39 kB/sec /dev/sdb: Timing buffered disk reads: 420 MB in 3.01 seconds = 139.74 MB/sec /dev/sdc: Timing buffered disk reads: 390 MB in 3.00 seconds = 129.87 MB/sec /dev/sdd: Timing buffered disk reads: 416 MB in 3.00 seconds = 138.51 MB/sec /dev/sde: Timing buffered disk reads: 422 MB in 3.00 seconds = 140.50 MB/sec /dev/sdf: Timing buffered disk reads: 416 MB in 3.01 seconds = 138.26 MB/sec

Read the article

apt-get update mdadm scary warnings

- by user568829

Just ran an apt-get update on one of my dedicated servers to be left with a relatively scary warning: Processing triggers for initramfs-tools ... update-initramfs: Generating /boot/initrd.img-2.6.26-2-686-bigmem W: mdadm: the array /dev/md/1 with UUID c622dd79:496607cf:c230666b:5103eba0 W: mdadm: is currently active, but it is not listed in mdadm.conf. if W: mdadm: it is needed for boot, then YOUR SYSTEM IS NOW UNBOOTABLE! W: mdadm: please inspect the output of /usr/share/mdadm/mkconf, compare W: mdadm: it to /etc/mdadm/mdadm.conf, and make the necessary changes. W: mdadm: the array /dev/md/2 with UUID 24120323:8c54087c:c230666b:5103eba0 W: mdadm: is currently active, but it is not listed in mdadm.conf. if W: mdadm: it is needed for boot, then YOUR SYSTEM IS NOW UNBOOTABLE! W: mdadm: please inspect the output of /usr/share/mdadm/mkconf, compare W: mdadm: it to /etc/mdadm/mdadm.conf, and make the necessary changes. W: mdadm: the array /dev/md/6 with UUID eef74de5:9267b2a1:c230666b:5103eba0 W: mdadm: is currently active, but it is not listed in mdadm.conf. if W: mdadm: it is needed for boot, then YOUR SYSTEM IS NOW UNBOOTABLE! W: mdadm: please inspect the output of /usr/share/mdadm/mkconf, compare W: mdadm: it to /etc/mdadm/mdadm.conf, and make the necessary changes. W: mdadm: the array /dev/md/5 with UUID 5d45b20c:04d8138f:c230666b:5103eba0 W: mdadm: is currently active, but it is not listed in mdadm.conf. if W: mdadm: it is needed for boot, then YOUR SYSTEM IS NOW UNBOOTABLE! W: mdadm: please inspect the output of /usr/share/mdadm/mkconf, compare W: mdadm: it to /etc/mdadm/mdadm.conf, and make the necessary changes. As instructed I inspected the output of /usr/share/mdadm/mkconf and compared with /etc/mdadm/mdadm.conf and they are quite different. Here is the /etc/mdadm/mdadm.conf contents: # mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. # # by default, scan all partitions (/proc/partitions) for MD superblocks. # alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions # auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST <system> # instruct the monitoring daemon where to send mail alerts MAILADDR root # definitions of existing MD arrays ARRAY /dev/md0 level=raid1 num-devices=2 UUID=b93b0b87:5f7c2c46:0043fca9:4026c400 ARRAY /dev/md1 level=raid1 num-devices=2 UUID=c0fa8842:e214fb1a:fad8a3a2:28f2aabc ARRAY /dev/md2 level=raid1 num-devices=2 UUID=cdc2a9a9:63bbda21:f55e806c:a5371897 ARRAY /dev/md3 level=raid1 num-devices=2 UUID=eca75495:9c9ce18c:d2bac587:f1e79d80 # This file was auto-generated on Wed, 04 Nov 2009 11:32:16 +0100 # by mkconf $Id$ And here is the out put from /usr/share/mdadm/mkconf # mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. # # by default, scan all partitions (/proc/partitions) for MD superblocks. # alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions # auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST <system> # instruct the monitoring daemon where to send mail alerts MAILADDR root # definitions of existing MD arrays ARRAY /dev/md1 UUID=c622dd79:496607cf:c230666b:5103eba0 ARRAY /dev/md2 UUID=24120323:8c54087c:c230666b:5103eba0 ARRAY /dev/md5 UUID=5d45b20c:04d8138f:c230666b:5103eba0 ARRAY /dev/md6 UUID=eef74de5:9267b2a1:c230666b:5103eba0 # This configuration was auto-generated on Sat, 25 Feb 2012 13:10:00 +1030 # by mkconf 3.1.4-1+8efb9d1+squeeze1 As I understand it I need to replace the four lines that start with 'ARRAY' in the /etc/mdadm/mdadm.conf file with the different four 'ARRAY' lines from the /usr/share/mdadm/mkconf output. When I did this and then ran update-initramfs -u there were no more warnings. Is what I have done above correct? I am now terrified of rebooting the server for fear it will not reboot and being a remote dedicated server this would certainly mean downtime and possibly would be expensive to get running again. FOLLOW UP (response to question): the output from mount: /dev/md1 on / type ext3 (rw,usrquota,grpquota) tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) udev on /dev type tmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) /dev/md2 on /boot type ext2 (rw) /dev/md5 on /tmp type ext3 (rw) /dev/md6 on /home type ext3 (rw,usrquota,grpquota) mdadm --detail /dev/md0 mdadm: md device /dev/md0 does not appear to be active. mdadm --detail /dev/md1 /dev/md1: Version : 0.90 Creation Time : Sun Aug 14 09:43:08 2011 Raid Level : raid1 Array Size : 31463232 (30.01 GiB 32.22 GB) Used Dev Size : 31463232 (30.01 GiB 32.22 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Sat Feb 25 14:03:47 2012 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : c622dd79:496607cf:c230666b:5103eba0 Events : 0.24 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 mdadm --detail /dev/md2 /dev/md2: Version : 0.90 Creation Time : Sun Aug 14 09:43:09 2011 Raid Level : raid1 Array Size : 104320 (101.89 MiB 106.82 MB) Used Dev Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Sat Feb 25 13:20:20 2012 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 24120323:8c54087c:c230666b:5103eba0 Events : 0.30 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2 mdadm --detail /dev/md3 mdadm: md device /dev/md3 does not appear to be active. mdadm --detail /dev/md5 /dev/md5: Version : 0.90 Creation Time : Sun Aug 14 09:43:09 2011 Raid Level : raid1 Array Size : 2104448 (2.01 GiB 2.15 GB) Used Dev Size : 2104448 (2.01 GiB 2.15 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 5 Persistence : Superblock is persistent Update Time : Sat Feb 25 14:09:03 2012 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 5d45b20c:04d8138f:c230666b:5103eba0 Events : 0.30 Number Major Minor RaidDevice State 0 8 5 0 active sync /dev/sda5 1 8 21 1 active sync /dev/sdb5 mdadm --detail /dev/md6 /dev/md6: Version : 0.90 Creation Time : Sun Aug 14 09:43:09 2011 Raid Level : raid1 Array Size : 453659456 (432.64 GiB 464.55 GB) Used Dev Size : 453659456 (432.64 GiB 464.55 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 6 Persistence : Superblock is persistent Update Time : Sat Feb 25 14:10:00 2012 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : eef74de5:9267b2a1:c230666b:5103eba0 Events : 0.31 Number Major Minor RaidDevice State 0 8 6 0 active sync /dev/sda6 1 8 22 1 active sync /dev/sdb6 FOLLOW UP 2 (response to question): Output from /etc/fstab /dev/md1 / ext3 defaults,usrquota,grpquota 1 1 devpts /dev/pts devpts mode=0620,gid=5 0 0 proc /proc proc defaults 0 0 #usbdevfs /proc/bus/usb usbdevfs noauto 0 0 /dev/cdrom /media/cdrom auto ro,noauto,user,exec 0 0 /dev/dvd /media/dvd auto ro,noauto,user,exec 0 0 # # # /dev/md2 /boot ext2 defaults 1 2 /dev/sda3 swap swap pri=42 0 0 /dev/sdb3 swap swap pri=42 0 0 /dev/md5 /tmp ext3 defaults 0 0 /dev/md6 /home ext3 defaults,usrquota,grpquota 1 2

Read the article

What is causing the unusual high load average?

- by James

I noticed on Tuesday night of last week, the load average went up sharply and it seemed abnormal since the traffic is small. Usually, the numbers usually average around .40 or lower and my server stuff (mysql, php and apache) are optimized. I noticed that the IOWait is unusually high even though the processes is barely using any CPU. top - 01:44:39 up 1 day, 21:13, 1 user, load average: 1.41, 1.09, 0.86 Tasks: 60 total, 1 running, 59 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni, 91.5%id, 8.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1048576k total, 331944k used, 716632k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 15 0 2468 1376 1140 S 0 0.1 0:00.92 init 1656 root 15 0 13652 5212 664 S 0 0.5 0:00.00 apache2 9323 root 18 0 13652 5212 664 S 0 0.5 0:00.00 apache2 10079 root 18 0 3972 1248 972 S 0 0.1 0:00.00 su 10080 root 15 0 4612 1956 1448 S 0 0.2 0:00.01 bash 11298 root 15 0 13652 5212 664 S 0 0.5 0:00.00 apache2 11778 chikorit 15 0 2344 1092 884 S 0 0.1 0:00.05 top 15384 root 18 0 17544 13m 1568 S 0 1.3 0:02.28 miniserv.pl 15585 root 15 0 8280 2736 2168 S 0 0.3 0:00.02 sshd 15608 chikorit 15 0 8280 1436 860 S 0 0.1 0:00.02 sshd Here is the VMStat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 768644 0 0 0 0 14 23 0 10 1 0 99 0 IOStat - Nothing unusal Total DISK READ: 67.13 K/s | Total DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 19496 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19501 be/4 mysql 3.95 K/s 0.00 B/s 0.00 % 0.00 % mysqld 19568 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19569 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19570 be/4 chikorit 11.85 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19571 be/4 chikorit 7.90 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 19573 be/4 chikorit 7.90 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 11778 be/4 chikorit 0.00 B/s 0.00 B/s 0.00 % 0.00 % top 19470 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 0.00 % mysqld Load Average Chart - http://i.stack.imgur.com/kYsD0.png I want to be sure if this is not a MySQL problem before making sure. Also, this is a Ubuntu 10.04 LTS Server on OpenVZ. Edit: This will probably give a good picture on the IO Wait top - 22:12:22 up 17:41, 1 user, load average: 1.10, 1.09, 0.93 Tasks: 33 total, 1 running, 32 sleeping, 0 stopped, 0 zombie Cpu(s): 0.6%us, 0.2%sy, 0.0%ni, 89.0%id, 10.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1048576k total, 260708k used, 787868k free, 0k buffers Swap: 0k total, 0k used, 0k free, 0k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 15 0 2468 1376 1140 S 0 0.1 0:00.88 init 5849 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 8063 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 9732 root 16 0 8280 2728 2168 S 0 0.3 0:00.02 sshd 9746 chikorit 18 0 8412 1444 864 S 0 0.1 0:01.10 sshd 9747 chikorit 18 0 4576 1960 1488 S 0 0.2 0:00.24 bash 13706 chikorit 15 0 2344 1088 884 R 0 0.1 0:00.03 top 15745 chikorit 15 0 12968 5108 1280 S 0 0.5 0:00.00 apache2 15751 chikorit 15 0 72184 25m 18m S 0 2.5 0:00.37 php5-fpm 15790 chikorit 18 0 12472 4640 1192 S 0 0.4 0:00.00 apache2 15797 chikorit 15 0 72888 23m 16m S 0 2.3 0:00.06 php5-fpm 16038 root 15 0 67772 2848 592 D 0 0.3 0:00.00 php5-fpm 16309 syslog 18 0 24084 1316 992 S 0 0.1 0:00.07 rsyslogd 16316 root 15 0 5472 908 500 S 0 0.1 0:00.00 sshd 16326 root 15 0 2304 908 712 S 0 0.1 0:00.02 cron 17464 root 15 0 10252 7560 856 D 0 0.7 0:01.88 psad 17466 root 18 0 1684 276 208 S 0 0.0 0:00.31 psadwatchd 17559 root 18 0 11444 2020 732 S 0 0.2 0:00.47 sendmail-mta 17688 root 15 0 10252 5388 1136 S 0 0.5 0:03.81 python 17752 teamspea 19 0 44648 7308 4676 S 0 0.7 1:09.70 ts3server_linux 18098 root 15 0 12336 6380 3032 S 0 0.6 0:00.47 apache2 18099 chikorit 18 0 10368 2536 464 S 0 0.2 0:00.00 apache2 18120 ntp 15 0 4336 1316 984 S 0 0.1 0:00.87 ntpd 18379 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 18387 mysql 15 0 62796 36m 5864 S 0 3.6 1:43.26 mysqld 19584 root 15 0 12336 4028 668 S 0 0.4 0:00.02 apache2 22498 root 16 0 12336 4028 668 S 0 0.4 0:00.00 apache2 24260 root 15 0 67772 3612 1356 S 0 0.3 0:00.22 php5-fpm 27712 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 27730 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 30343 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 30366 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 This is the free ram as of today. total used free shared buffers cached Mem: 1024 302 721 0 0 0 -/+ buffers/cache: 302 721 Swap: 0 0 0 Update: Looking into the logs, particularly the PHP5-FPM, which is causing the CPU spike. I found that its segment faulting for some apparent reason. [03-Jun-2012 06:11:20] NOTICE: [pool www] child 14132 started [03-Jun-2012 06:11:25] WARNING: [pool www] child 13664 exited on signal 11 (SIGSEGV) after 53.686322 seconds from start [03-Jun-2012 06:11:25] NOTICE: [pool www] child 14328 started [03-Jun-2012 06:11:25] WARNING: [pool www] child 14132 exited on signal 11 (SIGSEGV) after 4.708681 seconds from start [03-Jun-2012 06:11:25] NOTICE: [pool www] child 14329 started [03-Jun-2012 06:11:58] WARNING: [pool www] child 14328 exited on signal 11 (SIGSEGV) after 32.981228 seconds from start [03-Jun-2012 06:11:58] NOTICE: [pool www] child 15745 started [03-Jun-2012 06:12:25] WARNING: [pool www] child 15745 exited on signal 11 (SIGSEGV) after 27.442864 seconds from start [03-Jun-2012 06:12:25] NOTICE: [pool www] child 17446 started [03-Jun-2012 06:12:25] WARNING: [pool www] child 14329 exited on signal 11 (SIGSEGV) after 60.411278 seconds from start [03-Jun-2012 06:12:25] NOTICE: [pool www] child 17447 started [03-Jun-2012 06:13:02] WARNING: [pool www] child 17446 exited on signal 11 (SIGSEGV) after 36.746793 seconds from start [03-Jun-2012 06:13:02] NOTICE: [pool www] child 18133 started [03-Jun-2012 06:13:48] WARNING: [pool www] child 17447 exited on signal 11 (SIGSEGV) after 82.710107 seconds from start I'm thinking that this might be causing the problem. If that is the cause, probably switching it off that to fastcgi/fcgid might resolve it... but still, I want to see if something else might be causing it to do this.

Read the article

ColdFusion Server crash after thousands of HTTP requests

- by Jason Bristol

We are running ColdFusion 8 on a windows server 2003 VPS with an API that exposes student records to a partner API through a connector. Our API returns around 50k student records serialized in XML format pretty seamlessly. My question originates when something very frightening happened today when we tested our connector to our partners API. Our entire website and web host went down. We assumed that our host was just having some issues and after 4 hours with no resolution and no response from their customer service we finally got a response from them claiming that they had an "unauthorized user" in their network. After our server was back up we were unable to connect to our website as if the web service or coldfusion itself had froze. This is really where my concern comes from as I fear we may have overloaded the web service. As I mentioned before we tried sending over 50k HTTP POST requests over to our partner's API, however everything stopped after around 1.6k Is this bad practice or is there some sort of rate limiting I can relax somewhere in server configuration? We managed to find a workaround, but it bypasses our connector which is critical to our design. This would have been a one time deal as the purpose of so many requests was to populate our partner's website with current data, after that hourly syncs will keep requests down to around 100 per hour. UPDATE Our partner API is owned and operated by Pardot. We are converting students to prospects by passing student data to their API which unfortunately only seems to accept one student at a time. For that reason we have to do all 50k requests individually. Our server has 4GB of RAM, an Intel Core 2 Duo @ 2.8GHz running Windows Server 2003 SP2. I monitored the server during a 100 student sync, a 400 student sync, and a 1.4k student sync with the following results: 100 students - 2.25GB of Memory, 30-40% CPU utilization, 0.2-0.3% network bandwidth 400 students - 2.30GB of Memory, 30-50% CPU utilization, 0.2-1.0% network bandwidth 1.4k students - 2.30GB of Memory, 30-70% CPU utilization, 0.2-1.0% network bandwidth I know this is a far cry from 50k students, but I don't want to risk taking down our CMS system again assuming that was the cause. To give you a look at our code: <cfif (#getStudents.statusCode# eq "200 OK")> <cftry> <cfloop index="StudentXML" array="#XmlSearch(responseSTUD,'/students/student')#"> <cfset StudentXML = XmlParse(StudentXML)> <cfhttp url="#PARDOT_CMS_UPSERT#" method="post" timeout="10000" > <cfhttpparam type="url" name="user_key" value="#PARDOT_CMS_USERKEY#"> <cfhttpparam type="url" name="api_key" value="#api_key#"> <cfhttpparam type="url" name="email" value="#StudentXML.student.email.XmlText#"> <cfhttpparam type="url" name="first_name" value="#StudentXML.student.first.XmlText#"> <cfhttpparam type="url" name="last_name" value="#StudentXML.student.last.XmlText#"> <cfhttpparam type="url" name="in_cms" value="#StudentXML.student.studentid.XmlText#"> <cfhttpparam type="url" name="company" value="#StudentXML.student.agencyname.XmlText#"> <cfhttpparam type="url" name="country" value="#StudentXML.student.countryname.XmlText#"> <cfhttpparam type="url" name="address_one" value="#StudentXML.student.address.XmlText#"> <cfhttpparam type="url" name="address_two" value="#StudentXML.student.address2.XmlText#"> <cfhttpparam type="url" name="city" value="#StudentXML.student.city.XmlText#"> <cfhttpparam type="url" name="state" value="#StudentXML.student.state_province.XmlText#"> <cfhttpparam type="url" name="zip" value="#StudentXML.student.postalcode.XmlText#"> <cfhttpparam type="url" name="phone" value="#StudentXML.student.phone.XmlText#"> <cfhttpparam type="url" name="fax" value="#StudentXML.student.fax.XmlText#"> <cfhttpparam type="url" name="output" value="simple"> </cfhttp> </cfloop> <cfcatch type="any"> <cfdump var="#cfcatch.Message#"> </cfcatch> </cftry> </cfif> UPDATE 2 I checked the CF logs and found a couple of these: "Error","jrpp-8","06/06/13","16:10:18","CMS-API","Java heap space The specific sequence of files included or processed is: D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm, line: 675 " java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.io.CharArrayWriter.write(CharArrayWriter.java:105) at coldfusion.runtime.CharBuffer.replace(CharBuffer.java:37) at coldfusion.runtime.CharBuffer.replace(CharBuffer.java:50) at coldfusion.runtime.NeoBodyContent.write(NeoBodyContent.java:254) at cfapi2ecfm292155732._factor30(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm:675) at cfapi2ecfm292155732._factor31(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm:662) at cfapi2ecfm292155732._factor36(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm:659) at cfapi2ecfm292155732._factor42(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm:657) at cfapi2ecfm292155732._factor37(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm) at cfapi2ecfm292155732._factor44(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm:456) at cfapi2ecfm292155732._factor38(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm) at cfapi2ecfm292155732._factor46(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm:455) at cfapi2ecfm292155732._factor39(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm) at cfapi2ecfm292155732._factor47(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm:453) at cfapi2ecfm292155732.runPage(D:\Clients\www.xxx.com\www\dev.cms\api\v1\api.cfm:1) at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:192) at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:366) at coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65) at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:279) at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48) at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40) at coldfusion.filter.PathFilter.invoke(PathFilter.java:86) at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70) at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28) at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38) at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46) at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38) at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22) at coldfusion.CfmServlet.service(CfmServlet.java:175) at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89) at jrun.servlet.FilterChain.doFilter(FilterChain.java:86) Looks like I might have crashed the JVM in CF, is there a better way to do this? We are thinking of just exporting all records initially as a CSV file and importing it into Pardot seeing as we will never have to do a request this large again.

Read the article

NFS issue brings down entire vSphere ESX estate

- by growse

I experienced an odd issue this morning where an NFS issue appeared to have taken down the majority of my VMs hosted on a small vSphere 5.0 estate. The infrastructure itself is 4x IBM HS21 blades running around 20 VMs. The storage is provided by a single HP X1600 array with attached D2700 chassis running Solaris 11. There's a couple of storage pools on this which are exposed over NFS for the storage of the VM files, and some iSCSI LUNs for things like MSCS shared disks. Normally, this is pretty stable, but I appreciate the lack of resiliancy in having a single X1600 doing all the storage. This morning, in the logs of each ESX host, at around 0521 GMT I saw a lot of entries like this: 2011-11-30T05:21:54.161Z cpu2:2050)NFSLock: 608: Stop accessing fd 0x41000a4cf9a8 3 2011-11-30T05:21:54.161Z cpu2:2050)NFSLock: 608: Stop accessing fd 0x41000a4dc9e8 3 2011-11-30T05:21:54.161Z cpu2:2050)NFSLock: 608: Stop accessing fd 0x41000a4d3fa8 3 2011-11-30T05:21:54.161Z cpu2:2050)NFSLock: 608: Stop accessing fd 0x41000a4de0a8 3 [....] 2011-11-30T06:16:07.042Z cpu0:2058)WARNING: NFS: 283: Lost connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T06:17:01.459Z cpu2:4011)NFS: 292: Restored connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T06:25:17.887Z cpu3:2051)NFSLock: 608: Stop accessing fd 0x41000a4c2b28 3 2011-11-30T06:27:16.063Z cpu3:4011)NFSLock: 568: Start accessing fd 0x41000a4d8928 again 2011-11-30T06:35:30.827Z cpu1:2058)WARNING: NFS: 283: Lost connection to the server 10.13.111.197 mount point /tank/ISO, mounted as 5acdbb3e-410e56e3-0000-000000000000 ("ISO (1)") 2011-11-30T06:36:37.953Z cpu6:2054)NFS: 292: Restored connection to the server 10.13.111.197 mount point /tank/ISO, mounted as 5acdbb3e-410e56e3-0000-000000000000 ("ISO (1)") 2011-11-30T06:40:08.242Z cpu6:2054)NFSLock: 608: Stop accessing fd 0x41000a4c3e68 3 2011-11-30T06:40:34.647Z cpu3:2051)NFSLock: 568: Start accessing fd 0x41000a4d8928 again 2011-11-30T06:44:42.663Z cpu1:2058)WARNING: NFS: 283: Lost connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T06:44:53.973Z cpu0:4011)NFS: 292: Restored connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T06:51:28.296Z cpu5:2058)NFSLock: 608: Stop accessing fd 0x41000ae3c528 3 2011-11-30T06:51:44.024Z cpu4:2052)NFSLock: 568: Start accessing fd 0x41000ae3b8e8 again 2011-11-30T06:56:30.758Z cpu4:2058)WARNING: NFS: 283: Lost connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T06:56:53.389Z cpu7:2055)NFS: 292: Restored connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T07:01:50.350Z cpu6:2054)ScsiDeviceIO: 2316: Cmd(0x41240072bc80) 0x12, CmdSN 0x9803 to dev "naa.600508e000000000505c16815a36c50d" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. 2011-11-30T07:03:48.449Z cpu3:2051)NFSLock: 608: Stop accessing fd 0x41000ae46b68 3 2011-11-30T07:03:57.318Z cpu4:4009)NFSLock: 568: Start accessing fd 0x41000ae48228 again (I've put a complete dump from one of the hosts on pastebin: http://pastebin.com/Vn60wgTt) When I got in the office at 9am, I saw various failures and alarms and troubleshooted the issue. It turned out that pretty much all of the VMs were inaccessible, and that the ESX hosts either were describing each VM as 'powered off', 'powered on', or 'unavailable'. The VMs described as 'powered on' where not in any way reachable or responding to pings, so this may be lies. There's absolutely no indication on the X1600 that anything was awry, and nothing on the switches to indicate any loss of connectivity. I only managed to resolve the issue by rebooting the ESX hosts in turn. I have a number of questions: What the hell happened? If this was a temporary NFS failure, why did it put the ESX hosts into a state from which a reboot was the only recovery? In the future, when the NFS server goes a little off-piste, what would be the best approach to add some resilience? I've been looking at budgeting for next year and potentially have budget to purchase another X1600/D2700/disks, would an identical mirrored disk setup help to mitigate these sorts of failures automatically? Edit (Added requested details) To expand with some details as requested: The X1600 has 12x 1TB disks lumped together in mirrored pairs as tank, and the D2700 (connected with a mini SAS cable) has 12x 300GB 10k SAS disks lumped together in mirrored pairs as sastank zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c7t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: sastank state: ONLINE scan: scrub repaired 0 in 74h21m with 0 errors on Wed Nov 30 02:51:58 2011 config: NAME STATE READ WRITE CKSUM sastank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t14d0 ONLINE 0 0 0 c7t15d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c7t16d0 ONLINE 0 0 0 c7t17d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c7t18d0 ONLINE 0 0 0 c7t19d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c7t20d0 ONLINE 0 0 0 c7t21d0 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 c7t22d0 ONLINE 0 0 0 c7t23d0 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 c7t24d0 ONLINE 0 0 0 c7t25d0 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scan: scrub repaired 0 in 17h28m with 0 errors on Mon Nov 28 17:58:19 2011 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c7t8d0 ONLINE 0 0 0 c7t9d0 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 c7t10d0 ONLINE 0 0 0 c7t11d0 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 c7t12d0 ONLINE 0 0 0 c7t13d0 ONLINE 0 0 0 errors: No known data errors The filesystem exposed over NFS for the primary datastore is sastank/VMStorage zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 45.1G 13.4G 92.5K /rpool rpool/ROOT 2.28G 13.4G 31K legacy rpool/ROOT/solaris 2.28G 13.4G 2.19G / rpool/dump 15.0G 13.4G 15.0G - rpool/export 11.9G 13.4G 32K /export rpool/export/home 11.9G 13.4G 32K /export/home rpool/export/home/andrew 11.9G 13.4G 11.9G /export/home/andrew rpool/swap 15.9G 29.2G 123M - sastank 1.08T 536G 33K /sastank sastank/VMStorage 1.01T 536G 1.01T /sastank/VMStorage sastank/comstar 71.7G 536G 31K /sastank/comstar sastank/comstar/sql_tempdb 6.31G 536G 6.31G - sastank/comstar/sql_tx_data 65.4G 536G 65.4G - tank 4.79T 578G 42K /tank tank/FTP 269G 578G 269G /tank/FTP tank/ISO 28.8G 578G 25.9G /tank/ISO tank/backupstage 2.64T 578G 2.49T /tank/backupstage tank/cifs 301G 578G 297G /tank/cifs tank/comstar 1.54T 578G 31K /tank/comstar tank/comstar/msdtc 1.07G 579G 32.8M - tank/comstar/quorum 577M 578G 47.9M - tank/comstar/sqldata 1.54T 886G 304G - tank/comstar/vsphere_lun 2.09G 580G 22.2M - tank/mcs-asset-repository 7.01M 578G 6.99M /tank/mcs-asset-repository tank/mscs-quorum 55K 578G 36K /tank/mscs-quorum tank/sccm 16.1G 578G 12.8G /tank/sccm As for the networking, all connections between the X1600, the Blades and the switch are either LACP or Etherchannel bonded 2x 1Gbit links. Switch is a single Cisco 3750. Storage traffic sits on its own VLAN segregated from VM machine traffic.

Read the article

Printing fails after first print with Centos 6 and HP LaserJet P3015dn printer

- by Gavin Simpson

Centos 6 recognises and configures a HP LaserJet P3015dn printer connected via USB. This machine is being configured as a small group file/print server. I can print a test page, which is processed/printed correctly. The next time printing is attempted (say printing a second test page), the page is not printed and the printer is set to disabled. The status of the printer is stated as: Stopped - /usr/lib/cups/backend/hp failed in the printer configuration dialogue. /var/log/cups/error_log contains this information (first two lines were there prior to the failed print job) E [24/Jun/2004:09:12:57 +0100] Returning HTTP Forbidden for Resume-Printer (ipp://localhost/printers/HP-LaserJet-P3010-Series) from localhost E [24/Jun/2004:09:20:59 +0100] Returning HTTP Forbidden for CUPS-Delete-Printer (ipp://localhost/printers/HP-LaserJet-P3010-Series) from localhost D [24/Jun/2004:09:37:28 +0100] [Job 28] The following messages were recorded from 09:36:43 AM to 09:37:28 AM D [24/Jun/2004:09:37:28 +0100] [Job 28] Adding start banner page "none". D [24/Jun/2004:09:37:28 +0100] [Job 28] Adding end banner page "none". D [24/Jun/2004:09:37:28 +0100] [Job 28] File of type application/vnd.cups-banner queued by "gavin". D [24/Jun/2004:09:37:28 +0100] [Job 28] hold_until=0 D [24/Jun/2004:09:37:28 +0100] [Job 28] Queued on "HP-LaserJet-P3010-Series" by "gavin". D [24/Jun/2004:09:37:28 +0100] [Job 28] job-sheets=none,none D [24/Jun/2004:09:37:28 +0100] [Job 28] argv[0]="HP-LaserJet-P3010-Series" D [24/Jun/2004:09:37:28 +0100] [Job 28] argv[1]="28" D [24/Jun/2004:09:37:28 +0100] [Job 28] argv[2]="gavin" D [24/Jun/2004:09:37:28 +0100] [Job 28] argv[3]="Test Page" D [24/Jun/2004:09:37:28 +0100] [Job 28] argv[4]="1" D [24/Jun/2004:09:37:28 +0100] [Job 28] argv[5]="job-uuid=urn:uuid:b3370a97-4ab6-3451-40a2-6239b13fa3e1 job-originating-host-name=localhost" D [24/Jun/2004:09:37:28 +0100] [Job 28] argv[6]="/var/spool/cups/d00028-001" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[0]="CUPS_CACHEDIR=/var/cache/cups" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[1]="CUPS_DATADIR=/usr/share/cups" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[2]="CUPS_DOCROOT=/usr/share/cups/www" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[3]="CUPS_FONTPATH=/usr/share/cups/fonts" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[4]="CUPS_REQUESTROOT=/var/spool/cups" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[5]="CUPS_SERVERBIN=/usr/lib/cups" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[6]="CUPS_SERVERROOT=/etc/cups" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[7]="CUPS_STATEDIR=/var/run/cups" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[8]="HOME=/var/spool/cups/tmp" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[9]="PATH=/usr/lib/cups/filter:/usr/bin:/usr/sbin:/bin:/usr/bin" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[10]="[email protected]" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[11]="SOFTWARE=CUPS/1.4.2" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[12]="TMPDIR=/var/spool/cups/tmp" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[13]="USER=root" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[14]="CUPS_SERVER=/var/run/cups/cups.sock" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[15]="CUPS_ENCRYPTION=IfRequested" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[16]="IPP_PORT=631" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[17]="CHARSET=utf-8" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[18]="LANG=en_US.UTF-8" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[19]="PPD=/etc/cups/ppd/HP-LaserJet-P3010-Series.ppd" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[20]="RIP_MAX_CACHE=8m" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[21]="CONTENT_TYPE=application/vnd.cups-banner" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[22]="DEVICE_URI=hp:/usb/HP_LaserJet_P3010_Series?serial=VNBV993GM4" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[23]="PRINTER_INFO=Hewlett-Packard HP LaserJet P3010 Series" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[24]="PRINTER_LOCATION=electra.geog.ucl.ac.uk" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[25]="PRINTER=HP-LaserJet-P3010-Series" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[26]="CUPS_FILETYPE=document" D [24/Jun/2004:09:37:28 +0100] [Job 28] envp[27]="FINAL_CONTENT_TYPE=application/vnd.cups-postscript" D [24/Jun/2004:09:37:28 +0100] [Job 28] Started filter /usr/lib/cups/filter/bannertops (PID 2858) D [24/Jun/2004:09:37:28 +0100] [Job 28] Started filter /usr/lib/cups/filter/pstops (PID 2859) D [24/Jun/2004:09:37:28 +0100] [Job 28] Started backend /usr/lib/cups/backend/hp (PID 2860) D [24/Jun/2004:09:37:28 +0100] [Job 28] load_banner(filename="/var/spool/cups/d00028-001") D [24/Jun/2004:09:37:28 +0100] [Job 28] Page = 612x792; 12,12 to 600,780 D [24/Jun/2004:09:37:28 +0100] [Job 28] Page = 612x792; 12,12 to 600,780 D [24/Jun/2004:09:37:28 +0100] [Job 28] slow_collate=0, slow_duplex=0, slow_order=0 D [24/Jun/2004:09:37:28 +0100] [Job 28] Before copy_comments - %!PS-Adobe-3.0 D [24/Jun/2004:09:37:28 +0100] [Job 28] %!PS-Adobe-3.0 D [24/Jun/2004:09:37:28 +0100] [Job 28] %%BoundingBox: 12 12 600 780 D [24/Jun/2004:09:37:28 +0100] [Job 28] %cupsRotation: 0 D [24/Jun/2004:09:37:28 +0100] [Job 28] %%Creator: bannertops/CUPS v1.4.2 D [24/Jun/2004:09:37:28 +0100] [Job 28] %%CreationDate: Thu 24 Jun 2004 09:36:43 AM BST D [24/Jun/2004:09:37:28 +0100] [Job 28] %%LanguageLevel: 2 D [24/Jun/2004:09:37:28 +0100] [Job 28] %%DocumentData: Clean7Bit D [24/Jun/2004:09:37:28 +0100] [Job 28] %%Title: (Test Page) D [24/Jun/2004:09:37:28 +0100] [Job 28] %%For: (gavin) D [24/Jun/2004:09:37:28 +0100] [Job 28] %%Pages: 1 D [24/Jun/2004:09:37:28 +0100] [Job 28] %%DocumentSuppliedResources: font Monospace D [24/Jun/2004:09:37:28 +0100] [Job 28] %%+ font Monospace-Bold D [24/Jun/2004:09:37:28 +0100] [Job 28] %%+ font Monospace-BoldOblique D [24/Jun/2004:09:37:28 +0100] [Job 28] %%+ font Monospace-Oblique D [24/Jun/2004:09:37:28 +0100] [Job 28] %%EndComments D [24/Jun/2004:09:37:28 +0100] [Job 28] Before copy_prolog - %%BeginProlog D [24/Jun/2004:09:37:28 +0100] [Job 28] STATE: +connecting-to-device D [24/Jun/2004:09:37:28 +0100] [Job 28] prnt/backend/hp.c 762: ERROR: cannot open channel PRINT D [24/Jun/2004:09:37:28 +0100] [Job 28] Backend returned status 1 (failed) D [24/Jun/2004:09:37:28 +0100] [Job 28] Printer stopped due to backend errors; please consult the error_log file for details. D [24/Jun/2004:09:37:28 +0100] [Job 28] End of messages D [24/Jun/2004:09:37:28 +0100] [Job 28] printer-state=5(stopped) D [24/Jun/2004:09:37:28 +0100] [Job 28] printer-state-message="/usr/lib/cups/backend/hp failed" D [24/Jun/2004:09:37:28 +0100] [Job 28] printer-state-reasons=paused /var/log/messages contains the following reports associated with the recognition of the printer and the failed print job: Jun 24 09:35:07 electra kernel: usb 1-8: new high speed USB device using ehci_hcd and address 2 Jun 24 09:35:07 electra kernel: usb 1-8: New USB device found, idVendor=03f0, idProduct=8d17 Jun 24 09:35:07 electra kernel: usb 1-8: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Jun 24 09:35:07 electra kernel: usb 1-8: Product: HP LaserJet P3010 Series Jun 24 09:35:07 electra kernel: usb 1-8: Manufacturer: Hewlett-Packard Jun 24 09:35:07 electra kernel: usb 1-8: SerialNumber: VNBV993GM4 Jun 24 09:35:07 electra kernel: usb 1-8: configuration #1 chosen from 1 choice Jun 24 09:35:07 electra kernel: usblp0: USB Bidirectional printer dev 2 if 0 alt 1 proto 2 vid 0x03F0 pid 0x8D17 Jun 24 09:35:07 electra kernel: usbcore: registered new interface driver usblp Jun 24 09:35:07 electra udev-configure-printer: invalid or missing IEEE 1284 Device ID Jun 24 09:35:08 electra hp[1942]: io/hpmud/pp.c 627: unable to read device-id ret=-1 Jun 24 09:35:09 electra python: io/hpmud/pp.c 627: unable to read device-id ret=-1 Jun 24 09:35:51 electra kernel: usblp0: removed Jun 24 09:37:28 electra hp[2860]: io/hpmud/dot4.c 254: unable to read Dot4ReverseReply data: Resource temporarily unavailable exp=2 act=0 Jun 24 09:37:28 electra hp[2860]: io/hpmud/dot4.c 330: invalid DOT4InitReply: cmd=0, result=20#012, revision=0 Jun 24 09:37:28 electra hp[2860]: prnt/backend/hp.c 762: ERROR: cannot open channel PRINT I am now at a loss as to how to proceed to get this printer working on my Centos machine. How can I configure the machine to print more than a single print job without needing to be unplugged/plugged in repeatedly?

Read the article

How can I get FreeNAS to respond to libvirt shutdown requests

- by ptomli

I have a KVM VM of FreeNAS 0.7.1 Shere (revision 5127) running on Ubuntu Server 10.04 and I'm unable to convince the VM to shutdown from the host virsh shutdown freenas I would expect this to send some ACPI? trigger to the VM and FreeNAS then do what it's told. I'm not a FreeBSD fundi so I don't really know what packages or processes to poke to get this running. I have tried to convince powerd to run, but the VM cpus don't have the required freq entry Sysctl HW $ sysctl hw hw.machine: amd64 hw.model: QEMU Virtual CPU version 0.12.3 hw.ncpu: 1 hw.byteorder: 1234 hw.physmem: 523116544 hw.usermem: 463806464 hw.pagesize: 4096 hw.floatingpoint: 1 hw.machine_arch: amd64 hw.realmem: 536850432 hw.aac.iosize_max: 65536 hw.amr.force_sg32: 0 hw.an.an_cache_iponly: 1 hw.an.an_cache_mcastonly: 0 hw.an.an_cache_mode: dbm hw.an.an_dump: off hw.ata.to: 15 hw.ata.wc: 1 hw.ata.atapi_dma: 1 hw.ata.ata_dma_check_80pin: 1 hw.ata.ata_dma: 1 hw.ath.txbuf: 200 hw.ath.rxbuf: 40 hw.ath.regdomain: 0 hw.ath.countrycode: 0 hw.ath.xchanmode: 1 hw.ath.outdoor: 1 hw.ath.calibrate: 30 hw.ath.hal.swba_backoff: 0 hw.ath.hal.sw_brt: 10 hw.ath.hal.dma_brt: 2 hw.bce.msi_enable: 1 hw.bce.tso_enable: 1 hw.bge.allow_asf: 0 hw.cardbus.cis_debug: 0 hw.cardbus.debug: 0 hw.cs.recv_delay: 570 hw.cs.ignore_checksum_failure: 0 hw.cs.debug: 0 hw.cxgb.snd_queue_len: 50 hw.cxgb.use_16k_clusters: 1 hw.cxgb.force_fw_update: 0 hw.cxgb.singleq: 0 hw.cxgb.ofld_disable: 0 hw.cxgb.msi_allowed: 2 hw.cxgb.txq_mr_size: 1024 hw.cxgb.sleep_ticks: 1 hw.cxgb.tx_coalesce: 0 hw.firewire.hold_count: 3 hw.firewire.try_bmr: 1 hw.firewire.fwmem.speed: 2 hw.firewire.fwmem.eui64_lo: 0 hw.firewire.fwmem.eui64_hi: 0 hw.firewire.phydma_enable: 1 hw.firewire.nocyclemaster: 0 hw.firewire.fwe.rx_queue_len: 128 hw.firewire.fwe.tx_speed: 2 hw.firewire.fwe.stream_ch: 1 hw.firewire.fwip.rx_queue_len: 128 hw.firewire.sbp.tags: 0 hw.firewire.sbp.use_doorbell: 0 hw.firewire.sbp.scan_delay: 500 hw.firewire.sbp.login_delay: 1000 hw.firewire.sbp.exclusive_login: 1 hw.firewire.sbp.max_speed: -1 hw.firewire.sbp.auto_login: 1 hw.mfi.max_cmds: 128 hw.mfi.event_class: 0 hw.mfi.event_locale: 65535 hw.pccard.cis_debug: 0 hw.pccard.debug: 0 hw.cbb.debug: 0 hw.cbb.start_32_io: 4096 hw.cbb.start_16_io: 256 hw.cbb.start_memory: 2281701376 hw.pcic.pd6722_vsense: 1 hw.pcic.intr_mask: 57016 hw.pci.honor_msi_blacklist: 1 hw.pci.enable_msix: 1 hw.pci.enable_msi: 1 hw.pci.do_power_resume: 1 hw.pci.do_power_nodriver: 0 hw.pci.enable_io_modes: 1 hw.pci.host_mem_start: 2147483648 hw.syscons.kbd_debug: 1 hw.syscons.kbd_reboot: 1 hw.syscons.bell: 1 hw.syscons.saver.keybonly: 1 hw.syscons.sc_no_suspend_vtswitch: 0 hw.usb.uplcom.interval: 100 hw.usb.uvscom.interval: 100 hw.usb.uvscom.opktsize: 8 hw.wi.debug: 0 hw.wi.txerate: 0 hw.xe.debug: 0 hw.intr_storm_threshold: 1000 hw.availpages: 127714 hw.bus.devctl_disable: 0 hw.ste.rxsyncs: 0 hw.busdma.total_bpages: 32 hw.busdma.zone0.total_bpages: 32 hw.busdma.zone0.free_bpages: 32 hw.busdma.zone0.reserved_bpages: 0 hw.busdma.zone0.active_bpages: 0 hw.busdma.zone0.total_bounced: 0 hw.busdma.zone0.total_deferred: 0 hw.busdma.zone0.lowaddr: 0xffffffff hw.busdma.zone0.alignment: 2 hw.busdma.zone0.boundary: 65536 hw.clockrate: 2808 hw.instruction_sse: 1 hw.apic.enable_extint: 0 hw.kbd.keymap_restrict_change: 0 hw.acpi.supported_sleep_state: S3 S4 S5 hw.acpi.power_button_state: S5 hw.acpi.sleep_button_state: S3 hw.acpi.lid_switch_state: NONE hw.acpi.standby_state: S1 hw.acpi.suspend_state: S3 hw.acpi.sleep_delay: 1 hw.acpi.s4bios: 0 hw.acpi.verbose: 0 hw.acpi.disable_on_reboot: 0 hw.acpi.handle_reboot: 0 hw.acpi.cpu.cx_lowest: C1 Processes $ ps ax PID TT STAT TIME COMMAND 0 ?? DLs 0:00.00 [swapper] 1 ?? ILs 0:00.00 /sbin/init -- 2 ?? DL 0:00.08 [g_event] 3 ?? DL 0:00.29 [g_up] 4 ?? DL 0:00.33 [g_down] 5 ?? DL 0:00.00 [crypto] 6 ?? DL 0:00.00 [crypto returns] 7 ?? DL 0:00.00 [xpt_thrd] 8 ?? DL 0:00.00 [kqueue taskq] 9 ?? DL 0:00.00 [acpi_task_0] 10 ?? RL 34:12.42 [idle: cpu0] 11 ?? WL 0:01.13 [swi4: clock sio] 12 ?? WL 0:00.00 [swi3: vm] 13 ?? WL 0:00.00 [swi1: net] 14 ?? DL 0:00.04 [yarrow] 15 ?? WL 0:00.00 [swi6: task queue] 16 ?? WL 0:00.00 [swi2: cambio] 17 ?? DL 0:00.00 [acpi_task_1] 18 ?? DL 0:00.00 [acpi_task_2] 19 ?? WL 0:00.00 [swi5: +] 20 ?? DL 0:00.01 [thread taskq] 21 ?? WL 0:00.00 [swi6: Giant taskq] 22 ?? WL 0:00.00 [irq9: acpi0] 23 ?? WL 0:00.09 [irq14: ata0] 24 ?? WL 0:00.11 [irq15: ata1] 25 ?? WL 0:00.57 [irq11: ed0 uhci0] 26 ?? DL 0:00.00 [usb0] 27 ?? DL 0:00.00 [usbtask-hc] 28 ?? DL 0:00.00 [usbtask-dr] 29 ?? WL 0:00.01 [irq1: atkbd0] 30 ?? WL 0:00.00 [swi0: sio] 31 ?? DL 0:00.00 [sctp_iterator] 32 ?? DL 0:00.00 [pagedaemon] 33 ?? DL 0:00.00 [vmdaemon] 34 ?? DL 0:00.00 [idlepoll] 35 ?? DL 0:00.00 [pagezero] 36 ?? DL 0:00.01 [bufdaemon] 37 ?? DL 0:00.00 [vnlru] 38 ?? DL 0:00.14 [syncer] 39 ?? DL 0:00.01 [softdepflush] 1221 ?? Is 0:00.00 /sbin/devd 1289 ?? Is 0:00.01 /usr/sbin/syslogd -ss -f /var/etc/syslog.conf 1608 ?? Is 0:00.00 /usr/sbin/cron -s 1692 ?? Ss 0:00.03 /usr/local/sbin/mDNSResponderPosix -b -f /var/etc/mdn 1730 ?? S 0:00.43 /usr/local/sbin/lighttpd -f /var/etc/lighttpd.conf -m 1882 ?? DL 0:00.00 [system_taskq] 1883 ?? DL 0:00.00 [arc_reclaim_thread] 4139 ?? S 0:00.03 /usr/local/bin/php /usr/local/www/exec.php 4144 ?? S 0:00.00 sh -c ps ax 4145 ?? R 0:00.00 ps ax 1816 v0 Is 0:00.01 login [pam] (login) 1818 v0 I+ 0:00.03 -tcsh (csh) 1817 v1 Is+ 0:00.00 /usr/libexec/getty Pc ttyv1 1402 con- I 0:00.00 /usr/local/sbin/afpd -F /var/etc/afpd.conf 1404 con- S 0:00.00 /usr/local/sbin/cnid_metad 1682 con- I 0:02.78 /usr/local/sbin/mt-daapd -m -c /var/etc/mt-daapd.conf 1789 con- S 0:00.18 /usr/local/bin/fuppesd --config-dir /var/etc --config Libvert snippet <domain type='kvm'> <name>freenas</name> <uuid>********-****-****-****-************</uuid> <memory>524288</memory> <currentMemory>524288</currentMemory> <vcpu>1</vcpu> <os> <type arch='x86_64' machine='pc-0.12'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/kvm</emulator> Is this possible? Ideally I'd like to be able to stop the host without having to manually deal with shutting down the VM.

Read the article

More interruptions than cpu context switches

- by Christopher Valles

I have a machine running Debian GNU/Linux 5.0.8 (lenny) 8 cores and 12Gb of RAM. We have one core permanently around 40% ~ 60% wait time and trying to spot what is happening I realized that we have more interruptions than cpu context switches. I found that the normal ratio between context switch and interruptions is around 10x more context switching than interruptions but on my server the values are completely different. backend1:~# vmstat -s 12330788 K total memory 12221676 K used memory 3668624 K active memory 6121724 K inactive memory 109112 K free memory 3929400 K buffer memory 4095536 K swap cache 4194296 K total swap 7988 K used swap 4186308 K free swap 44547459 non-nice user cpu ticks 702408 nice user cpu ticks 13346333 system cpu ticks 1607583668 idle cpu ticks 374043393 IO-wait cpu ticks 4144149 IRQ cpu ticks 3994255 softirq cpu ticks 0 stolen cpu ticks 4445557114 pages paged in 2910596714 pages paged out 128642 pages swapped in 267400 pages swapped out 3519307319 interrupts 2464686911 CPU context switches 1306744317 boot time 11555115 forks Any ideas if that is an issue? And in that case, how can I spot the cause and fix it? Update Following the instructions of the comments and focusing on the core stuck in wait I checked the processes attached to that core and below you can find the list: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 24 root RT -5 0 0 0 S 0 0.0 0:03.42 7 migration/7 25 root 15 -5 0 0 0 S 0 0.0 0:04.78 7 ksoftirqd/7 26 root RT -5 0 0 0 S 0 0.0 0:00.00 7 watchdog/7 34 root 15 -5 0 0 0 S 0 0.0 1:18.90 7 events/7 83 root 15 -5 0 0 0 S 0 0.0 1:10.68 7 kblockd/7 291 root 15 -5 0 0 0 S 0 0.0 0:00.00 7 aio/7 569 root 15 -5 0 0 0 S 0 0.0 0:00.00 7 ata/7 1545 root 15 -5 0 0 0 S 0 0.0 0:00.00 7 ksnapd 1644 root 15 -5 0 0 0 S 0 0.0 0:36.73 7 kjournald 1725 root 16 -4 16940 1152 488 S 0 0.0 0:00.00 7 udevd 2342 root 20 0 8828 1140 956 S 0 0.0 0:00.00 7 sh 2375 root 20 0 8848 1220 1016 S 0 0.0 0:00.00 7 locate 2421 root 30 10 8896 1268 1016 S 0 0.0 0:00.00 7 updatedb.findut 2430 root 30 10 58272 49m 616 S 0 0.4 0:17.44 7 sort 2431 root 30 10 3792 448 360 S 0 0.0 0:00.00 7 frcode 2682 root 15 -5 0 0 0 S 0 0.0 3:25.98 7 kjournald 2683 root 15 -5 0 0 0 S 0 0.0 0:00.64 7 kjournald 2687 root 15 -5 0 0 0 S 0 0.0 1:31.30 7 kjournald 3261 root 15 -5 0 0 0 S 0 0.0 2:30.56 7 kondemand/7 3364 root 20 0 3796 596 476 S 0 0.0 0:00.00 7 acpid 3575 root 20 0 8828 1140 956 S 0 0.0 0:00.00 7 sh 3597 root 20 0 8848 1216 1016 S 0 0.0 0:00.00 7 locate 3603 root 30 10 8896 1268 1016 S 0 0.0 0:00.00 7 updatedb.findut 3612 root 30 10 58272 49m 616 S 0 0.4 0:27.04 7 sort 3655 root 20 0 11056 2852 516 S 0 0.0 5:36.46 7 redis-server 3706 root 20 0 19832 1056 816 S 0 0.0 0:01.64 7 cron 3746 root 20 0 3796 580 484 S 0 0.0 0:00.00 7 getty 3748 root 20 0 3796 580 484 S 0 0.0 0:00.00 7 getty 7674 root 20 0 28376 1000 736 S 0 0.0 0:00.00 7 cron 7675 root 20 0 8828 1140 956 S 0 0.0 0:00.00 7 sh 7708 root 30 10 58272 49m 616 S 0 0.4 0:03.36 7 sort 22049 root 20 0 8828 1136 956 S 0 0.0 0:00.00 7 sh 22095 root 20 0 8848 1220 1016 S 0 0.0 0:00.00 7 locate 22099 root 30 10 8896 1264 1016 S 0 0.0 0:00.00 7 updatedb.findut 22108 root 30 10 58272 49m 616 S 0 0.4 0:44.55 7 sort 22109 root 30 10 3792 452 360 S 0 0.0 0:00.00 7 frcode 26927 root 20 0 8828 1140 956 S 0 0.0 0:00.00 7 sh 26947 root 20 0 8848 1216 1016 S 0 0.0 0:00.00 7 locate 26951 root 30 10 8896 1268 1016 S 0 0.0 0:00.00 7 updatedb.findut 26960 root 30 10 58272 49m 616 S 0 0.4 0:10.24 7 sort 26961 root 30 10 3792 452 360 S 0 0.0 0:00.00 7 frcode 27952 root 20 0 65948 3028 2400 S 0 0.0 0:00.00 7 sshd 30731 root 20 0 0 0 0 S 0 0.0 0:01.34 7 pdflush 31204 root 20 0 0 0 0 S 0 0.0 0:00.24 7 pdflush 21857 deploy 20 0 1227m 2240 868 S 0 0.0 2:44.22 7 nginx 21858 deploy 20 0 1228m 2784 868 S 0 0.0 2:42.45 7 nginx 21862 deploy 20 0 1228m 2732 868 S 0 0.0 2:43.90 7 nginx 21869 deploy 20 0 1228m 2840 868 S 0 0.0 2:44.14 7 nginx 27994 deploy 20 0 19372 2216 1380 S 0 0.0 0:00.00 7 bash 28493 deploy 20 0 331m 32m 16m S 4 0.3 0:00.40 7 apache2 21856 deploy 20 0 1228m 2844 868 S 0 0.0 2:43.64 7 nginx 3622 nobody 30 10 21156 10m 916 D 0 0.1 4:42.31 7 find 7716 nobody 30 10 12268 1280 888 D 0 0.0 0:43.50 7 find 22116 nobody 30 10 12612 1696 916 D 0 0.0 6:32.26 7 find 26968 nobody 30 10 12268 1284 888 D 0 0.0 1:56.92 7 find Update As suggested I take a look at /proc/interrupts and below the info there: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 35 0 0 1469085485 0 0 0 0 IO-APIC-edge timer 1: 0 0 0 8 0 0 0 0 IO-APIC-edge i8042 8: 0 0 0 1 0 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 0 105 0 0 0 0 IO-APIC-edge i8042 16: 0 0 0 0 0 0 0 580212114 IO-APIC-fasteoi 3w-9xxx, uhci_hcd:usb1 18: 0 0 142 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb6, ehci_hcd:usb7 19: 9 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3, uhci_hcd:usb5 21: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 23: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4, ehci_hcd:usb8 1273: 0 0 1600400502 0 0 0 0 0 PCI-MSI-edge eth0 1274: 0 0 0 0 0 0 0 0 PCI-MSI-edge ahci NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 214252181 69439018 317298553 21943690 72562482 56448835 137923978 407514738 Local timer interrupts RES: 27516446 16935944 26430972 44957009 24935543 19881887 57746906 24298747 Rescheduling interrupts CAL: 10655 10705 10685 10567 10689 10669 10667 396 function call interrupts TLB: 529548 462587 801138 596193 922202 747313 2027966 946594 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts ERR: 0 All the values seems more or less the same for all the cores but this one IO-APIC-fasteoi 3w-9xxx, uhci_hcd:usb1 only affects to the core 7 (the same with the wait time of 40% ~ 60%) could be something attached to the usb port causing the issue? Thanks in advanced

Read the article

Multiple routers, subnets, gateways etc

- by allentown

My current setup is: Cable modem dishes out 13 static IP's (/28), a GB switch is plugged into the cable modem, and has access to those 13 static IP's, I have about 6 "servers" in use right now. The cable modem is also a firewall, DHCP server, and 3 port 10/100 switch. I am using it as a firewall, but not currently as a DHCP server. I have plugged into the cable modem, two network cables, one which goes to the WAN port of a Linksys Dual Band Wireless 10/100/1000 router/switch. Into the linksys are a few workstations, a few printers, and some laptops connecting to wifi. I set the Linksys to use take static IP, and enabled DHCP for the workstations, printers, etc in 192.168.1.1/24. The network for the Linksys is mostly self contained, backups go to a SAN, on that network, it all happens through that switch, over GB. But I also get internet access from it as well via the cable modem using one static IP. This all works, however, I can not "see" the static IP machines when I am on the Linksys. I can get to them via ssh and other protocols, and if I want to from "outside", I open holes, like 80, 25, 587, 143, 22, etc. The second wire, from the cable modem/fireall/switch just uplinks to the managed GB switch. What are the pros and cons of this? I do not like giving up the static IP to the Linksys. I basically have a mixed network of public servers, and internal workstations. I want the public servers on public IP's because I do not want to mess with port forwarding and mappings. Is it correct also, that if someone breaches the Linksys wifi, they still would have a hard time getting to the static IP range, just by nature of the network topology? Today, just for a test, I toggled on the DHCP in the firewall/cable modem at 10.1.10.1/24 range, the Linksys is n the 192.168.1.100/24 range. At that point, all the static IP machines still had in and out access, but Linksys was unreachable. The cable modem only has 10/100 ports, so I will not plug anything but the network drop into it, which is 50Mb/10Mb. Which makes me think this could be less than ideal, as transfers from the workstation network to the server network will be bottlenecked at 100Mb when I have 1000Mb available. I may not need to solve that, if isolation is better though. I do not move a lot of data, if any, from Linsys network to server network, so for it to pretend to be remote is ok. Should I approach this any different? I could enable DHCP on the cable modem/firewall, it should still send out the statics to the GB switch, but will also be a DHCP in 10.1.10.1/24 range? I can then plug the Linksys into the GB switch, which is now picking up statics and the 10.1.10.1/24 ranges, tell the Linksys to use 10.1.10.5 or so. Now, do I disable DHCP on the Linksys, and the cable modem/firewall will pass through the statics and 10.0.10.1/24 ranges as well? Or, could I open a second DHCP pool on the Linksys? I guess doing so gives me network isolation again, but it is just the reverse of what I have now. But I get out of the bottleneck, not that the Linksys could ever really touch real GB speeds anyway, but the managed switch certainly can. This is all because 13 statics are not that many. Right now, 6 "servers", the Linksys, a managed switch, a few SSL certs, and I am running out. I do not want to waste a static IP on the managed GB switch, or the Linksys, unless it provides me some type of benefit. Final question, under my current setup, if I am on a workstation, sitting at 192.168.1.109, the Linksys, with GB, and I send a file over ssh to the static IP machine, is that literally leaving the internet, and coming back in, or does it stay local? To me it seems like: Workstation (192.168.1.109) -> Linksys DHCP -> Linksys Static IP -> Cable Modem -> Server ( and it hits the 10/100 ports on the cable modem, slowing me down. But does it round trip the network, leave and come back in, limiting me to the 50/10 internet speeds? *These are all made up numbers, I do not use default router IP's as I will one day add a VPN, and do not want collisions. I need some recommendations, do I want one big network, or two isolated ones. Printers these days need an IP, everything does, I can not get autoconf/bonjour to be reliable on most printers. but I am also not sure I want the "server" side of my operation to be polluted by the workstation side of my operation. Unless there is some magic subetting I have not learned yet, here is what I am thinking: Cable modem 10/100, has 13 static IP, publicly accessible -> Enable DHCP on the cable modem -> Cable modem plugs into managed switch -> Managed switch gets 10.1.10.1 ssh, telnet, https admin management address -> Managed switch sends static IP's to to servers -> Plug Linksys into managed switch, giving it 10.1.10.2 static internally in Linksys admin -> Linksys gets assigned 10.1.10.x as its DHCP sending range -> Local printers, workstations, iPhones etc, connect to this -> ( Do I enable DHCP or disable it on the Linksys, just define a non over lapping range, or create an entirely new DHCP at 10.1.50.0/24, I think I am back isolated again with that method too? ) Thank you for any suggestions. This is the first time I have had to deal with less than a /24, and most are larger than that, but it is just a drop to a cabinet. Otherwise, it's a router, a few repeaters, and soho stuff that is simple, with one IP. I know a few may suggest going all DHCP on the servers, and I may one day, just not now, there has been too much moving of gear for me to be interested in that, and I would want something in the Catalyst series to deal with that.

Read the article

ProFTPd server on Ubuntu getting access denied message when successfully authenticated?

- by exxoid

I have a Ubuntu box with a ProFTPD 1.3.4a Server, when I try to log in via my FTP Client I cannot do anything as it does not allow me to list directories; I have tried logging in as root and as a regular user and tried accessing different paths within the FTP Server. The error I get in my FTP Client is: Status: Retrieving directory listing... Command: CDUP Response: 250 CDUP command successful Command: PWD Response: 257 "/var" is the current directory Command: PASV Response: 227 Entering Passive Mode (172,16,4,22,237,205). Command: MLSD Response: 550 Access is denied. Error: Failed to retrieve directory listing Any idea? Here is the config of my proftpd: # # /etc/proftpd/proftpd.conf -- This is a basic ProFTPD configuration file. # To really apply changes, reload proftpd after modifications, if # it runs in daemon mode. It is not required in inetd/xinetd mode. # # Includes DSO modules Include /etc/proftpd/modules.conf # Set off to disable IPv6 support which is annoying on IPv4 only boxes. UseIPv6 off # If set on you can experience a longer connection delay in many cases. IdentLookups off ServerName "Drupal Intranet" ServerType standalone ServerIdent on "FTP Server ready" DeferWelcome on # Set the user and group that the server runs as User nobody Group nogroup MultilineRFC2228 on DefaultServer on ShowSymlinks on TimeoutNoTransfer 600 TimeoutStalled 600 TimeoutIdle 1200 DisplayLogin welcome.msg DisplayChdir .message true ListOptions "-l" DenyFilter \*.*/ # Use this to jail all users in their homes # DefaultRoot ~ # Users require a valid shell listed in /etc/shells to login. # Use this directive to release that constrain. # RequireValidShell off # Port 21 is the standard FTP port. Port 21 # In some cases you have to specify passive ports range to by-pass # firewall limitations. Ephemeral ports can be used for that, but # feel free to use a more narrow range. # PassivePorts 49152 65534 # If your host was NATted, this option is useful in order to # allow passive tranfers to work. You have to use your public # address and opening the passive ports used on your firewall as well. # MasqueradeAddress 1.2.3.4 # This is useful for masquerading address with dynamic IPs: # refresh any configured MasqueradeAddress directives every 8 hours <IfModule mod_dynmasq.c> # DynMasqRefresh 28800 </IfModule> # To prevent DoS attacks, set the maximum number of child processes # to 30. If you need to allow more than 30 concurrent connections # at once, simply increase this value. Note that this ONLY works # in standalone mode, in inetd mode you should use an inetd server # that allows you to limit maximum number of processes per service # (such as xinetd) MaxInstances 30 # Set the user and group that the server normally runs at. # Umask 022 is a good standard umask to prevent new files and dirs # (second parm) from being group and world writable. Umask 022 022 # Normally, we want files to be overwriteable. AllowOverwrite on # Uncomment this if you are using NIS or LDAP via NSS to retrieve passwords: # PersistentPasswd off # This is required to use both PAM-based authentication and local passwords AuthPAMConfig proftpd AuthOrder mod_auth_pam.c* mod_auth_unix.c # Be warned: use of this directive impacts CPU average load! # Uncomment this if you like to see progress and transfer rate with ftpwho # in downloads. That is not needed for uploads rates. # UseSendFile off TransferLog /var/log/proftpd/xferlog SystemLog /var/log/proftpd/proftpd.log # Logging onto /var/log/lastlog is enabled but set to off by default #UseLastlog on # In order to keep log file dates consistent after chroot, use timezone info # from /etc/localtime. If this is not set, and proftpd is configured to # chroot (e.g. DefaultRoot or <Anonymous>), it will use the non-daylight # savings timezone regardless of whether DST is in effect. #SetEnv TZ :/etc/localtime <IfModule mod_quotatab.c> QuotaEngine off </IfModule> <IfModule mod_ratio.c> Ratios off </IfModule> # Delay engine reduces impact of the so-called Timing Attack described in # http://www.securityfocus.com/bid/11430/discuss # It is on by default. <IfModule mod_delay.c> DelayEngine on </IfModule> <IfModule mod_ctrls.c> ControlsEngine off ControlsMaxClients 2 ControlsLog /var/log/proftpd/controls.log ControlsInterval 5 ControlsSocket /var/run/proftpd/proftpd.sock </IfModule> <IfModule mod_ctrls_admin.c> AdminControlsEngine off </IfModule> # # Alternative authentication frameworks # #Include /etc/proftpd/ldap.conf #Include /etc/proftpd/sql.conf # # This is used for FTPS connections # #Include /etc/proftpd/tls.conf # # Useful to keep VirtualHost/VirtualRoot directives separated # #Include /etc/proftpd/virtuals.con # A basic anonymous configuration, no upload directories. # <Anonymous ~ftp> # User ftp # Group nogroup # # We want clients to be able to login with "anonymous" as well as "ftp" # UserAlias anonymous ftp # # Cosmetic changes, all files belongs to ftp user # DirFakeUser on ftp # DirFakeGroup on ftp # # RequireValidShell off # # # Limit the maximum number of anonymous logins # MaxClients 10 # # # We want 'welcome.msg' displayed at login, and '.message' displayed # # in each newly chdired directory. # DisplayLogin welcome.msg # DisplayChdir .message # # # Limit WRITE everywhere in the anonymous chroot # <Directory *> # <Limit WRITE> # DenyAll # </Limit> # </Directory> # # # Uncomment this if you're brave. # # <Directory incoming> # # # Umask 022 is a good standard umask to prevent new files and dirs # # # (second parm) from being group and world writable. # # Umask 022 022 # # <Limit READ WRITE> # # DenyAll # # </Limit> # # <Limit STOR> # # AllowAll # # </Limit> # # </Directory> # # </Anonymous> # Include other custom configuration files Include /etc/proftpd/conf.d/ UseReverseDNS off <Global> RootLogin on UseFtpUsers on ServerIdent on DefaultChdir /var/www DeleteAbortedStores on LoginPasswordPrompt on AccessGrantMsg "You have been authenticated successfully." </Global> Any idea what could be wrong? Thanks for your help!

Read the article

Windows Server 2008 R2 network adapter stops working, requires hard reboot

- by Geoff Dalgas

TL;DR version: Turns out this was a Windows Server 2008 R2 kernel networking bug. After siccing Microsoft support on it, we (eventually) got an unpublished kernel hotfix from Microsoft to address it. If you, too, are experiencing mysterious low-level network driver failures requiring a reboot/bluescreen cycle, you might want that hotfix (or maybe Service Pack 1 whenever it is released, too.) We have been using HAProxy along with heartbeat from the Linux-HA project. We are using two linux instances to provide a failover. Each server has with their own public IP and a single IP which is shared between the two using a virtual interface (eth1:1) at IP: 69.59.196.211 The virtual interface (eth1:1) IP 69.59.196.211 is configured as the gateway for the windows servers behind them and we use ip_forwarding to route traffic. We are experiencing an occasional network outage on one of our windows servers behind our linux gateways. HAProxy will detect the server is offline which we can verify by remoting to the failed server and attempting to ping the gateway: Pinging 69.59.196.211 with 32 bytes of data: Reply from 69.59.196.220: Destination host unreachable. Running arp -a on this failed server shows that there is no entry for the gateway address (69.59.196.211): Interface: 69.59.196.220 --- 0xa Internet Address Physical Address Type 69.59.196.161 00-26-88-63-c7-80 dynamic 69.59.196.210 00-15-5d-0a-3e-0e dynamic 69.59.196.212 00-21-5e-4d-45-c9 dynamic 69.59.196.213 00-15-5d-00-b2-0d dynamic 69.59.196.215 00-21-5e-4d-61-1a dynamic 69.59.196.217 00-21-5e-4d-2c-e8 dynamic 69.59.196.219 00-21-5e-4d-38-e5 dynamic 69.59.196.221 00-15-5d-00-b2-0d dynamic 69.59.196.222 00-15-5d-0a-3e-09 dynamic 69.59.196.223 ff-ff-ff-ff-ff-ff static 224.0.0.22 01-00-5e-00-00-16 static 224.0.0.252 01-00-5e-00-00-fc static 225.0.0.1 01-00-5e-00-00-01 static On our linux gateway instances arp -a shows: peak-colo-196-220.peak.org (69.59.196.220) at <incomplete> on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-222.peak.org (69.59.196.222) at 00:15:5d:0a:3e:09 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 Why would arp occasionally set the entry for this failed server as <incomplete>? Should we be defining our arp entries statically? I've always left arp alone since it works 99% of the time, but in this one instance it appears to be failing. Are there any additional troubleshooting steps we can take help resolve this issue? THINGS WE HAVE TRIED I added a static arp entry for testing on one of the linux gateways which still didn't help. root@haproxy2:~# arp -a peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-221.peak.org (69.59.196.221) at 00:15:5d:00:b2:0d [ether] on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 peak-colo-196-220.peak.org (69.59.196.220) at 00:21:5e:4d:30:8d [ether] PERM on eth1 root@haproxy2:~# arp -i eth1 -s 69.59.196.220 00:21:5e:4d:30:8d root@haproxy2:~# ping 69.59.196.220 PING 69.59.196.220 (69.59.196.220) 56(84) bytes of data. --- 69.59.196.220 ping statistics --- 7 packets transmitted, 0 received, 100% packet loss, time 6006ms Rebooting the windows web server solves this issue temporarily with no other changes to the network but our experience shows this issue will come back. Swapping network cards and switches I noticed the link light on the port of the switch for the failed windows server was running at 100Mb instead of 1Gb on the failed interface. I moved the cable to several other open ports and the link indicated 100Mb for each port that I tried. I also swapped the cable with the same result. I tried changing the properties of the network card in windows and the server locked up and required a hard reset after clicking apply. This windows server has two physical network interfaces so I have swapped the cables and network settings on the two interfaces to see if the problem follows the interface. If the public interface goes down again we will know that it is not an issue with the network card. (We also tried another switch we have on hand, no change) Changing network hardware driver versions We've had the same problem with the latest Broadcom driver, as well as the built-in driver that ships in Windows Server 2008 R2. Replacing network cables As a last ditch effort we remembered another change that occurred was the replacement of all of the patch cords between our servers / switch. We had purchased two sets, one green of lengths 1ft - 3ft for the private interfaces and another set of red cables for the public interfaces. We swapped out all of the public interface patch cables with a different brand and ran our servers without issue for a full week ... aaaaaand then the problem recurred. Disable checksum offload, remove TProxy We also tried disabling TCP/IP checksum offload in the driver, no change. We're now pulling out TProxy and moving to a more traditional x-forwarded-for network arrangement without any fancy IP address rewriting. We'll see if that helps. Switch Virtualization providers On the off chance this was related to Hyper-V in some way (we do host Linux VMs on it), we switched to VMWare Server. No change. Switch host model We've reached the end of our troubleshooting rope and are now formally involving Microsoft support. They recommended changing the host model: http://en.wikipedia.org/wiki/Host_model http://technet.microsoft.com/en-us/magazine/2007.09.cableguy.aspx We did that, and.. we'll see.

Read the article

cannot access a site from Mac OSX Lion but can from other machines on network?

- by house9

SOLVED: The issue is with the hamachi client, hamachi is hi-jacking all of the 5.0.0.0/8 address block http://en.wikipedia.org/wiki/Hamachi_(software)#Criticism http://b.logme.in/2012/11/07/changes-to-hamachi-on-november-19th/ The fix on Mac LogMeIn Hamachi Preferences Settings Advanced Peer Connections IP protocol mode IPv6 only (default is both) If you can only connect to some of your network over IPv4 this 'fix' will NOT work for you ----- A few weeks ago I started using a service - https://semaphoreapp.com I think they made DNS changes a week ago and ever since I cannot access the site from my Mac OSX Lion (10.7.4) machine (my main development machine) but I can access the site from other machines on my network ipad windows machine MacMini (10.6.8) After some google searching I tried both of these dscacheutil -flushcache sudo killall -HUP mDNSResponder but no go, I've contacted semaphoreapp as well, but nothing so far - also of interest, one of my colleagues has the exact same problem, cannot access via Mac OSX Lion but can via windows machine, we work remotely and are not on the same ISP some additional info Lion (10.7.4) cannot access site host semaphoreapp.com semaphoreapp.com has address 5.9.53.16 ping semaphoreapp.com PING semaphoreapp.com (5.9.53.16): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 ping: sendto: No route to host Request timeout for icmp_seq 4 ping: sendto: Host is down Request timeout for icmp_seq 5 ping: sendto: Host is down Request timeout for icmp_seq 6 ping: sendto: Host is down Request timeout for icmp_seq 7 .... traceroute semaphoreapp.com traceroute to semaphoreapp.com (5.9.53.16), 64 hops max, 52 byte packets 1 * * * 2 * * * traceroute: sendto: No route to host 3 traceroute: wrote semaphoreapp.com 52 chars, ret=-1 *traceroute: sendto: Host is down traceroute: wrote semaphoreapp.com 52 chars, ret=-1 .... and MacMini (10.6.8) can access it host semaphoreapp.com semaphoreapp.com has address 5.9.53.16 ping semaphoreapp.com PING semaphoreapp.com (5.9.53.16): 56 data bytes 64 bytes from 5.9.53.16: icmp_seq=0 ttl=44 time=191.458 ms 64 bytes from 5.9.53.16: icmp_seq=1 ttl=44 time=202.923 ms 64 bytes from 5.9.53.16: icmp_seq=2 ttl=44 time=180.746 ms 64 bytes from 5.9.53.16: icmp_seq=3 ttl=44 time=200.616 ms 64 bytes from 5.9.53.16: icmp_seq=4 ttl=44 time=178.818 ms .... traceroute semaphoreapp.com traceroute to semaphoreapp.com (5.9.53.16), 64 hops max, 52 byte packets 1 192.168.0.1 (192.168.0.1) 1.677 ms 1.446 ms 1.445 ms 2 * LOCAL ISP 11.957 ms * 3 etc... 10.704 ms 14.183 ms 9.341 ms 4 etc... 32.641 ms 12.147 ms 10.850 ms 5 etc.... 44.205 ms 54.563 ms 36.243 ms 6 vlan139.car1.seattle1.level3.net (4.53.145.165) 50.136 ms 45.873 ms 30.396 ms 7 ae-32-52.ebr2.seattle1.level3.net (4.69.147.182) 31.926 ms 40.507 ms 49.993 ms 8 ae-2-2.ebr2.denver1.level3.net (4.69.132.54) 78.129 ms 59.674 ms 49.905 ms 9 ae-3-3.ebr1.chicago2.level3.net (4.69.132.62) 99.019 ms 82.008 ms 76.074 ms 10 ae-1-100.ebr2.chicago2.level3.net (4.69.132.114) 96.185 ms 75.658 ms 75.662 ms 11 ae-6-6.ebr2.washington12.level3.net (4.69.148.145) 104.322 ms 105.563 ms 118.480 ms 12 ae-5-5.ebr2.washington1.level3.net (4.69.143.221) 93.646 ms 99.423 ms 96.067 ms 13 ae-41-41.ebr2.paris1.level3.net (4.69.137.49) 177.744 ms ae-44-44.ebr2.paris1.level3.net (4.69.137.61) 199.363 ms 198.405 ms 14 ae-47-47.ebr1.frankfurt1.level3.net (4.69.143.141) 176.876 ms ae-45-45.ebr1.frankfurt1.level3.net (4.69.143.133) 170.994 ms ae-46-46.ebr1.frankfurt1.level3.net (4.69.143.137) 177.308 ms 15 ae-61-61.csw1.frankfurt1.level3.net (4.69.140.2) 176.769 ms ae-91-91.csw4.frankfurt1.level3.net (4.69.140.14) 178.676 ms 173.644 ms 16 ae-2-70.edge7.frankfurt1.level3.net (4.69.154.75) 180.407 ms ae-3-80.edge7.frankfurt1.level3.net (4.69.154.139) 174.861 ms 176.578 ms 17 as33891-net.edge7.frankfurt1.level3.net (195.16.162.94) 175.448 ms 185.658 ms 177.081 ms 18 hos-bb1.juniper4.rz16.hetzner.de (213.239.240.202) 188.700 ms 190.332 ms 188.196 ms 19 hos-tr4.ex3k14.rz16.hetzner.de (213.239.233.98) 199.632 ms hos-tr3.ex3k14.rz16.hetzner.de (213.239.233.66) 185.938 ms hos-tr2.ex3k14.rz16.hetzner.de (213.239.230.34) 182.378 ms 20 * * * 21 * * * 22 * * * any ideas? EDIT: adding tcpdump MacMini (which can connect) while running - ping semaphoreapp.com sudo tcpdump -v -i en0 dst semaphoreapp.com Password: tcpdump: listening on en0, link-type EN10MB (Ethernet), capture size 65535 bytes 17:33:03.337165 IP (tos 0x0, ttl 64, id 20153, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->3129)!) 192.168.0.6 > static.16.53.9.5.clients.your-server.de: ICMP echo request, id 61918, seq 0, length 64 17:33:04.337279 IP (tos 0x0, ttl 64, id 26049, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->1a21)!) 192.168.0.6 > static.16.53.9.5.clients.your-server.de: ICMP echo request, id 61918, seq 1, length 64 17:33:05.337425 IP (tos 0x0, ttl 64, id 47854, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->c4f3)!) 192.168.0.6 > static.16.53.9.5.clients.your-server.de: ICMP echo request, id 61918, seq 2, length 64 17:33:06.337548 IP (tos 0x0, ttl 64, id 24772, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->1f1e)!) 192.168.0.6 > static.16.53.9.5.clients.your-server.de: ICMP echo request, id 61918, seq 3, length 64 17:33:07.337670 IP (tos 0x0, ttl 64, id 8171, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->5ff7)!) 192.168.0.6 > static.16.53.9.5.clients.your-server.de: ICMP echo request, id 61918, seq 4, length 64 17:33:08.337816 IP (tos 0x0, ttl 64, id 35810, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->f3ff)!) 192.168.0.6 > static.16.53.9.5.clients.your-server.de: ICMP echo request, id 61918, seq 5, length 64 17:33:09.337948 IP (tos 0x0, ttl 64, id 31120, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->652)!) 192.168.0.6 > static.16.53.9.5.clients.your-server.de: ICMP echo request, id 61918, seq 6, length 64 ^C 7 packets captured 1047 packets received by filter 0 packets dropped by kernel OSX Lion (cannot connect) while running - ping semaphoreapp.com # wireless ~ $ sudo tcpdump -v -i en1 dst semaphoreapp.com Password: tcpdump: listening on en1, link-type EN10MB (Ethernet), capture size 65535 bytes ^C 0 packets captured 262 packets received by filter 0 packets dropped by kernel and # wired ~ $ sudo tcpdump -v -i en0 dst semaphoreapp.com tcpdump: listening on en0, link-type EN10MB (Ethernet), capture size 65535 bytes ^C 0 packets captured 219 packets received by filter 0 packets dropped by kernel above output after Request timeout for icmp_seq 25 or 30 times from ping. I don't know much about tcpdump, but to me it doesn't seem like the ping requests are leaving my machine?

Read the article

How to set up linux watchdog daemon with Intel 6300esb

- by ACiD GRiM

I've been searching for this on Google for sometime now and I have yet to find proper documentation on how to connect the kernel driver for my 6300esb watchdog timer to /dev/watchdog and ensure that watchdog daemon is keeping it alive. I am using RHEL compatible Scientific Linux 6.3 in a KVM virtual machine by the way Below is everything I've tried so far: dmesg|grep 6300 i6300ESB timer: Intel 6300ESB WatchDog Timer Driver v0.04 i6300ESB timer: initialized (0xffffc900008b8000). heartbeat=30 sec (nowayout=0) | ll /dev/watchdog crw-rw----. 1 root root 10, 130 Sep 22 22:25 /dev/watchdog | /etc/watchdog.conf #ping = 172.31.14.1 #ping = 172.26.1.255 #interface = eth0 file = /var/log/messages #change = 1407 # Uncomment to enable test. Setting one of these values to '0' disables it. # These values will hopefully never reboot your machine during normal use # (if your machine is really hung, the loadavg will go much higher than 25) max-load-1 = 24 max-load-5 = 18 max-load-15 = 12 # Note that this is the number of pages! # To get the real size, check how large the pagesize is on your machine. #min-memory = 1 #repair-binary = /usr/sbin/repair #test-binary = #test-timeout = watchdog-device = /dev/watchdog # Defaults compiled into the binary #temperature-device = #max-temperature = 120 # Defaults compiled into the binary #admin = root interval = 10 #logtick = 1 # This greatly decreases the chance that watchdog won't be scheduled before # your machine is really loaded realtime = yes priority = 1 # Check if syslogd is still running by enabling the following line #pidfile = /var/run/syslogd.pid Now maybe I'm not testing it correctly, but I would expecting that stopping the watchdog service would cause the /dev/watchdog to time out after 30 seconds and I should see the host reboot, however this does not happen. Also, here is my config for the KVM vm  <domain type='kvm'> <name>sl6template</name> <uuid>960d0ac2-2e6a-5efa-87a3-6bb779e15b6a</uuid> <memory unit='KiB'>262144</memory> <currentMemory unit='KiB'>262144</currentMemory> <vcpu placement='static'>1</vcpu> <os> <type arch='x86_64' machine='rhel6.3.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <cpu mode='custom' match='exact'> <model fallback='allow'>Westmere</model> <vendor>Intel</vendor> <feature policy='require' name='tm2'/> <feature policy='require' name='est'/> <feature policy='require' name='vmx'/> <feature policy='require' name='ds'/> <feature policy='require' name='smx'/> <feature policy='require' name='ss'/> <feature policy='require' name='vme'/> <feature policy='require' name='dtes64'/> <feature policy='require' name='rdtscp'/> <feature policy='require' name='ht'/> <feature policy='require' name='dca'/> <feature policy='require' name='pbe'/> <feature policy='require' name='tm'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='ds_cpl'/> <feature policy='require' name='pclmuldq'/> <feature policy='require' name='xtpr'/> <feature policy='require' name='acpi'/> <feature policy='require' name='monitor'/> <feature policy='require' name='aes'/> </cpu> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/mnt/data/vms/sl6template.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <controller type='usb' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <interface type='bridge'> <mac address='52:54:00:44:57:f6'/> <source bridge='br0.2'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='bridge'> <mac address='52:54:00:88:0f:42'/> <source bridge='br1'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <watchdog model='i6300esb' action='reset'> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </watchdog> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </memballoon> </devices> </domain> Any help is appreciated as the most I've found are patches to kvm and general softdog documentation or IPMI watchdog answers.

Read the article

rsync over ssh is not working anymore, while ssh itself is working fine (Write failed: broken pipe)

- by brazorf

This issue started happening after i changed router. This is the scenario: Windows7 Host Ubuntu 10.04 Guest (VirtualBox) Ubuntu 10.04 remote server What i used to do is run a very basic rsync command: rsync -avz --delete /local/path/ username@host:/path/to/remote/directory This worked perfect until i did change adsl provider, and i changed router aswell: now, this happens: rsync on Ubuntu Guest is not working anymore (to any random server), if using this new router rsync on Ubuntu Guest is WORKING, if i switch back to old router i tried a new virtual box ubuntu install, and the command is WORKING with both the routers So, the not-working-combo is oldUbuntu + newRouter. To get things worst, i can state that (on the not-working ubuntu) i ping the remote host plain ssh connection to the remote host is working fine (i can auth, connect, and do stuff on the remote host) scp is NOT working (this is just a further thing i tried) This is the console output of the execution, with ssh verbose set to vvvv: root@client:~# rsync -ae 'ssh -vvvv' /root/test-rsync/ {username}@{hostname}:/home/{username}/test/ OpenSSH_5.3p1 Debian-3ubuntu7, OpenSSL 0.9.8k 25 Mar 2009 debug1: Reading configuration data /root/.ssh/config debug1: Applying options for {hostname} debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug2: ssh_connect: needpriv 0 debug1: Connecting to {hostname} [{ip.add.re.ss}] port 22. debug1: Connection established. debug1: permanently_set_uid: 0/0 debug3: Not a RSA1 key file /root/.ssh/{private_key}. debug2: key_type_from_name: unknown key type '-----BEGIN' debug3: key_read: missing keytype debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug2: key_type_from_name: unknown key type '-----END' debug3: key_read: missing keytype debug1: identity file /root/.ssh/{private_key} type 1 debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-2048 debug1: Checking blacklist file /etc/ssh/blacklist.RSA-2048 debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3p1 Debian-3ubuntu7 debug1: match: OpenSSH_5.3p1 Debian-3ubuntu7 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_5.3p1 Debian-3ubuntu7 debug2: fd 3 setting O_NONBLOCK debug1: SSH2_MSG_KEXINIT sent debug3: Wrote 792 bytes for a total of 831 debug1: SSH2_MSG_KEXINIT received debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,[email protected] debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,[email protected] debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: [email protected],zlib,none debug2: kex_parse_kexinit: [email protected],zlib,none debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,[email protected] debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,[email protected] debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,[email protected] debug2: kex_parse_kexinit: none,[email protected] debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: mac_setup: found hmac-md5 debug1: kex: server->client aes128-ctr hmac-md5 [email protected] debug2: mac_setup: found hmac-md5 debug1: kex: client->server aes128-ctr hmac-md5 [email protected] debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug3: Wrote 24 bytes for a total of 855 debug2: dh_gen_key: priv key bits set: 125/256 debug2: bits set: 525/1024 debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY debug3: Wrote 144 bytes for a total of 999 debug3: check_host_in_hostfile: filename /root/.ssh/known_hosts debug3: check_host_in_hostfile: match line 4 debug3: check_host_in_hostfile: filename /root/.ssh/known_hosts debug3: check_host_in_hostfile: match line 5 debug1: Host '{hostname}' is known and matches the RSA host key. debug1: Found key in /root/.ssh/known_hosts:4 debug2: bits set: 512/1024 debug1: ssh_rsa_verify: signature correct debug2: kex_derive_keys debug2: set_newkeys: mode 1 debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug3: Wrote 16 bytes for a total of 1015 debug2: set_newkeys: mode 0 debug1: SSH2_MSG_NEWKEYS received debug1: SSH2_MSG_SERVICE_REQUEST sent debug3: Wrote 48 bytes for a total of 1063 debug2: service_accept: ssh-userauth debug1: SSH2_MSG_SERVICE_ACCEPT received debug2: key: /root/.ssh/{private_key} (0x7f3ad0e7f9b0) debug3: Wrote 80 bytes for a total of 1143 debug1: Authentications that can continue: publickey,password debug3: start over, passed a different list publickey,password debug3: preferred gssapi-keyex,gssapi-with-mic,gssapi,publickey,keyboard-interactive,password debug3: authmethod_lookup publickey debug3: remaining preferred: keyboard-interactive,password debug3: authmethod_is_enabled publickey debug1: Next authentication method: publickey debug1: Offering public key: /root/.ssh/{private_key} debug3: send_pubkey_test debug2: we sent a publickey packet, wait for reply debug3: Wrote 368 bytes for a total of 1511 debug1: Server accepts key: pkalg ssh-rsa blen 277 debug2: input_userauth_pk_ok: fp 1b:65:36:92:59:b3:12:3e:8c:c6:03:28:d4:81:09:dc debug3: sign_and_send_pubkey debug1: read PEM private key done: type RSA debug3: Wrote 656 bytes for a total of 2167 debug1: Enabling compression at level 6. debug1: Authentication succeeded (publickey). debug2: fd 4 setting O_NONBLOCK debug3: fd 5 is O_NONBLOCK debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug1: Requesting [email protected] debug1: Entering interactive session. debug3: Wrote 112 bytes for a total of 2279 debug2: callback start debug2: client_session2_setup: id 0 debug1: Sending environment. debug3: Ignored env TERM debug3: Ignored env SHELL debug3: Ignored env SSH_CLIENT debug3: Ignored env SSH_TTY debug1: Sending env LC_ALL = en_US.UTF-8 debug2: channel 0: request env confirm 0 debug3: Ignored env USER debug3: Ignored env LS_COLORS debug3: Ignored env MAIL debug3: Ignored env PATH debug3: Ignored env PWD debug1: Sending env LANG = en_US.UTF-8 debug2: channel 0: request env confirm 0 debug3: Ignored env SHLVL debug3: Ignored env HOME debug3: Ignored env LANGUAGE debug3: Ignored env LOGNAME debug3: Ignored env SSH_CONNECTION debug3: Ignored env LESSOPEN debug3: Ignored env LESSCLOSE debug3: Ignored env _ debug1: Sending command: rsync --server -logDtpre.iLsf . /home/{username}/test/ debug2: channel 0: request exec confirm 1 debug2: fd 3 setting TCP_NODELAY debug2: callback done debug2: channel 0: open confirm rwindow 0 rmax 32768 debug3: Wrote 208 bytes for a total of 2487 At this point everything freeze for lots of minutes, ending in Write failed: Broken pipe rsync: connection unexpectedly closed (0 bytes received so far) [sender] rsync error: unexplained error (code 255) at io.c(601) [sender=3.0.7] Any suggestion? Thank You F. Edit 2012/09/13: i am changing title and issue definition, since i made some TINY step ahead and i think i can give more detailed clues.

Read the article

Unable to connect to Linux (Virtual OS-vmware) through Putty on Windows

- by RBA

Hi, I want to access my Linux box (Virtual OS) through Putty on Windows using Run command: putty -ssh -P 22 192.168.171.130,,, but it is returning an error message, not able to connect. But few days back I was able to connect it today. But not now. Why?? Windows IP Configuration Host Name . . . . . . . . . . . . : rba7791fd466 Primary Dns Suffix . . . . . . . : Node Type . . . . . . . . . . . . : Unknown IP Routing Enabled. . . . . . . . : No WINS Proxy Enabled. . . . . . . . : No Ethernet adapter VMware Network Adapter VMnet1: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : VMware Virtual Ethernet Adapter for VMnet1 Physical Address. . . . . . . . . : 00-50-56-C0-00-01 Dhcp Enabled. . . . . . . . . . . : No IP Address. . . . . . . . . . . . : 192.168.234.1 Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : Ethernet adapter Wireless Network Connection: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Dell Wireless 1395 WLAN Mini-Card Physical Address. . . . . . . . . : 00-24-2B-60-A0-88 Dhcp Enabled. . . . . . . . . . . : Yes Autoconfiguration Enabled . . . . : Yes IP Address. . . . . . . . . . . . : 10.0.0.2 Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : 10.0.0.1 DHCP Server . . . . . . . . . . . : 10.0.0.1 DNS Servers . . . . . . . . . . . : 10.0.0.1 Lease Obtained. . . . . . . . . . : Friday, August 28, 2009 4:11:09 AM Lease Expires . . . . . . . . . . : Saturday, August 29, 2009 4:11:09 AM Ubuntu Configuration eth0 inet addr:192.168.171.130

Read the article

Linux server is only using 60% of memory, then swapping

- by Kamil Kisiel

I've got a Linux server that's running our bacula backup system. The machine is grinding like mad because it's going heavy in to swap. The problem is, it's only using 60% of its physical memory! Here's the output from free -m: free -m total used free shared buffers cached Mem: 3949 2356 1593 0 0 1 -/+ buffers/cache: 2354 1595 Swap: 7629 1804 5824 and some sample output from vmstat 1: procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 2 1843536 1634512 0 4188 54 13 2524 666 2 1 1 1 89 9 0 1 11 1845916 1640724 0 388 2700 4816 221880 4879 14409 170721 4 3 63 30 0 0 9 1846096 1643952 0 0 4956 756 174832 804 12357 159306 3 4 63 30 0 0 11 1846104 1643532 0 0 4916 540 174320 580 10609 139960 3 4 64 29 0 0 4 1846084 1640272 0 2336 4080 524 140408 548 9331 118287 3 4 63 30 0 0 8 1846104 1642096 0 1488 2940 432 102516 457 7023 82230 2 4 65 29 0 0 5 1846104 1642268 0 1276 3704 452 126520 452 9494 119612 3 5 65 27 0 3 12 1846104 1641528 0 328 6092 608 187776 636 8269 113059 4 3 64 29 0 2 2 1846084 1640960 0 724 5948 0 111480 0 7751 116370 4 4 63 29 0 0 4 1846100 1641484 0 404 4144 1476 125760 1500 10668 105358 2 3 71 25 0 0 13 1846104 1641932 0 0 5872 828 153808 840 10518 128447 3 4 70 22 0 0 8 1846096 1639172 0 3164 3556 556 74884 580 5082 65362 2 2 73 23 0 1 4 1846080 1638676 0 396 4512 28 50928 44 2672 38277 2 2 80 16 0 0 3 1846080 1628808 0 7132 2636 0 28004 8 1358 14090 0 1 78 20 0 0 2 1844728 1618552 0 11140 7680 0 12740 8 763 2245 0 0 82 18 0 0 2 1837764 1532056 0 101504 2952 0 95644 24 802 3817 0 1 87 12 0 0 11 1842092 1633324 0 4416 1748 10900 143144 11024 6279 134442 3 3 70 24 0 2 6 1846104 1642756 0 0 4768 468 78752 468 4672 60141 2 2 76 20 0 1 12 1846104 1640792 0 236 4752 440 140712 464 7614 99593 3 5 58 34 0 0 3 1846084 1630368 0 6316 5104 0 20336 0 1703 22424 1 1 72 26 0 2 17 1846104 1638332 0 3168 4080 1720 211960 1744 11977 155886 3 4 65 28 0 1 10 1846104 1640800 0 132 4488 556 126016 584 8016 106368 3 4 63 29 0 0 14 1846104 1639740 0 2248 3436 428 114188 452 7030 92418 3 3 59 35 0 1 6 1846096 1639504 0 1932 5500 436 141412 460 8261 112210 4 4 63 29 0 0 10 1846104 1640164 0 3052 4028 448 147684 472 7366 109554 4 4 61 30 0 0 10 1846100 1641040 0 2332 4952 632 147452 664 8767 118384 3 4 63 30 0 4 8 1846084 1641092 0 664 4948 276 152264 292 6448 98813 5 5 62 28 0 Furthermore, the output of top sorted by CPU time seems to support the theory that swap is what's bogging down the system: top - 09:05:32 up 37 days, 23:24, 1 user, load average: 9.75, 8.24, 7.12 Tasks: 173 total, 1 running, 172 sleeping, 0 stopped, 0 zombie Cpu(s): 1.6%us, 1.4%sy, 0.0%ni, 76.1%id, 20.6%wa, 0.1%hi, 0.2%si, 0.0%st Mem: 4044632k total, 2405628k used, 1639004k free, 0k buffers Swap: 7812492k total, 1851852k used, 5960640k free, 436k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME COMMAND 4174 root 17 0 63156 176 56 S 8 0.0 2138:52 35,38 bacula-fd 4185 root 17 0 63352 284 104 S 6 0.0 1709:25 28,29 bacula-sd 240 root 15 0 0 0 0 D 3 0.0 831:55.19 831:55 kswapd0 2852 root 10 -5 0 0 0 S 1 0.0 126:35.59 126:35 xfsbufd 2849 root 10 -5 0 0 0 S 0 0.0 119:50.94 119:50 xfsbufd 1364 root 10 -5 0 0 0 S 0 0.0 117:05.39 117:05 xfsbufd 21 root 10 -5 0 0 0 S 1 0.0 48:03.44 48:03 events/3 6940 postgres 16 0 43596 8 8 S 0 0.0 46:50.35 46:50 postmaster 1342 root 10 -5 0 0 0 S 0 0.0 23:14.34 23:14 xfsdatad/4 5415 root 17 0 1770m 108 48 S 0 0.0 15:03.74 15:03 bacula-dir 23 root 10 -5 0 0 0 S 0 0.0 13:09.71 13:09 events/5 5604 root 17 0 1216m 500 200 S 0 0.0 12:38.20 12:38 java 5552 root 16 0 1194m 580 248 S 0 0.0 11:58.00 11:58 java Here's the same sorted by virtual memory image size: top - 09:08:32 up 37 days, 23:27, 1 user, load average: 8.43, 8.26, 7.32 Tasks: 173 total, 1 running, 172 sleeping, 0 stopped, 0 zombie Cpu(s): 3.6%us, 3.4%sy, 0.0%ni, 62.2%id, 30.2%wa, 0.2%hi, 0.3%si, 0.0%st Mem: 4044632k total, 2404212k used, 1640420k free, 0k buffers Swap: 7812492k total, 1852548k used, 5959944k free, 100k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME COMMAND 5415 root 17 0 1770m 56 44 S 0 0.0 15:03.78 15:03 bacula-dir 5604 root 17 0 1216m 492 200 S 0 0.0 12:38.30 12:38 java 5552 root 16 0 1194m 476 200 S 0 0.0 11:58.20 11:58 java 4598 root 16 0 117m 44 44 S 0 0.0 0:13.37 0:13 eventmond 9614 gdm 16 0 93188 0 0 S 0 0.0 0:00.30 0:00 gdmgreeter 5527 root 17 0 78716 0 0 S 0 0.0 0:00.30 0:00 gdm 4185 root 17 0 63352 284 104 S 20 0.0 1709:52 28,29 bacula-sd 4174 root 17 0 63156 208 88 S 24 0.0 2139:25 35,39 bacula-fd 10849 postgres 18 0 54740 216 108 D 0 0.0 0:31.40 0:31 postmaster 6661 postgres 17 0 49432 0 0 S 0 0.0 0:03.50 0:03 postmaster 5507 root 15 0 47980 0 0 S 0 0.0 0:00.00 0:00 gdm 6940 postgres 16 0 43596 16 16 S 0 0.0 46:51.39 46:51 postmaster 5304 postgres 16 0 40580 132 88 S 0 0.0 6:21.79 6:21 postmaster 5301 postgres 17 0 40448 24 24 S 0 0.0 0:32.17 0:32 postmaster 11280 root 16 0 40288 28 28 S 0 0.0 0:00.11 0:00 sshd 5534 root 17 0 37580 0 0 S 0 0.0 0:56.18 0:56 X 30870 root 30 15 31668 28 28 S 0 0.0 1:13.38 1:13 snmpd 5305 postgres 17 0 30628 16 16 S 0 0.0 0:11.60 0:11 postmaster 27403 postfix 17 0 30248 0 0 S 0 0.0 0:02.76 0:02 qmgr 10815 postfix 15 0 30208 16 16 S 0 0.0 0:00.02 0:00 pickup 5306 postgres 16 0 29760 20 20 S 0 0.0 0:52.89 0:52 postmaster 5302 postgres 17 0 29628 64 32 S 0 0.0 1:00.64 1:00 postmaster I've tried tuning the swappiness kernel parameter to both high and low values, but nothing appears to change the behavior here. I'm at a loss to figure out what's going on. How can I find out what's causing this? Update: The system is a fully 64-bit system, so there should be no question of memory limitations due to 32-bit issues. Update2: As I mentioned in the original question, I've already tried tuning swappiness to all sorts of values, including 0. The result is always the same, with approximately 1.6 GB of memory remaining unused. Update3: Added top output to the above info.

Read the article

Taking web sites offline for demonstration

While working in software development in general, and in web development for a couple of customers it is quite common that it is necessary to provide a test bed where the client is able to get an image, or better said, a feeling for the visions and ideas you are talking about. Usually here at IOS Indian Ocean Software Ltd. we set up a demo web site on one of our staging servers, and provide credentials to the customer to access and review our progress and work ad hoc. This gives us the highest flexibility on both sides, as the test bed is simply online and available 24/7. We can update the structure, the UI and data at any time, and the client is able to view it as it suits best for her/him. Limited or lack of online connectivity But what is going to happen when your client is not capable to be online - no matter for what reasons; here are some more obvious ones: No internet connection (permanently or temporarily) Expensive connection, ie. mobile data package, stay at a hotel, etc. Presentation devices at an exhibition, ie. using tablets or iPads Being abroad for a certain time, and only occasionally online No network coverage, especially on mobile Bad infrastructure, like ie. in Third World countries Providing a catalogue on CD or USB pen drive Anyway, it doesn't matter really. We should be able to provide a solution for the circumstances of our customers. Presentation during an exhibition Recently, we had the following request from a customer: Is it possible to let us have a desktop version of ResortWork.co.uk that we can use for demo purposes at the forthcoming Ski Shows? It would allow us to let stand visitors browse the sites on an iPad to view jobs and training directory course listings. Yes, sure we can do that. Eventually, you might think why don't they simply use 3G enabled iPads for that purpose? As stated above, there might be several reasons for that - low coverage, expensive data packages, etc. Anyway, it is not a question on how to circumvent the request but to deliver a solution to that. Possible solutions... or not? We already did offline websites earlier, and even established complete mirrors of one or two web sites on our systems. There are actually several possibilities to handle this kind of request, and it mainly depends on the system or device where the offline site should be available on. Here, it is clearly expressed that we have to address this on an Apple iPad, well actually, I think that they'd like to use multiple devices during their exhibitions. Following is an overview of possible solutions depending on the technology or device in use, and how it can be done: Replication of source files and database The above mentioned web site is running on ASP.NET, IIS and SQL Server. In case that a laptop or slate runs a Windows OS, the easiest way would be to take a snapshot of the source files and database, and transfer them as local installation to those Windows machines. This approach would be fully operational on the local machine. Saving pages for offline usage This is actually a quite tedious job but still practicable for small web sites Tool based approach to 'harvest' the web site There quite some tools in the wild that could handle this job, namely wget, httrack, web copier, etc. Screenshots bundled as PDF document Not really... ;-) Creating screencast or video Simply navigate through your website and record your desktop session. Actually, we are using this kind of approach to track down difficult problems in order to see and understand exactly what the user was doing to cause an error. Of course, this list isn't complete and I'd love to get more of your ideas in the comments section below the article. Preparations for offline browsing The original website is dynamically and data-driven by ASP.NET, and looks like this: As we have to put the result onto iPads we are going to choose the tool-based approach to 'download' the whole web site for offline usage. Again, depending on the complexity of your web site you might have to check which of the applications produces the best results for you. My usual choice is to use wget but in this case, we run into problems related to the rewriting of hyperlinks. As a consequence of that we opted for using HTTrack. HTTrack comes in different flavours, like console application but also as either GUI (WinHTTrack on Windows) or Web client (WebHTTrack on Linux/Unix/BSD). Here's a brief description taken from the original website about HTTrack: HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. And there is an extensive documentation for all options and switches online. General recommendation is to go through the HTTrack Users Guide By Fred Cohen. It covers all the initial steps you need to get up and running. Be aware that it will take quite some time to get all the necessary resources down to your machine. Actually, for our customer we run the tool directly on their web server to avoid unnecessary traffic and bandwidth. After a couple of runs and some additional fine-tuning - explicit inclusion or exclusion of various external linked web sites - we finally had a more or less complete offline version available. A very handsome feature of HTTrack is the error/warning log after completing the download. It contains some detailed information about errors that appeared on the pages and the links within the pages that have been processed. Error: "Bad Request" (400) at link www.resortwork.co.uk/job-details_Ski_hire:tech_or_mgr_or_driver_37854.aspx (from www.resortwork.co.uk/Jobs_A_to_Z.aspx)Error: "Not Found" (404) at link www.247recruit.net/images/applynow.png (from www.247recruit.net/css/global.css)Error: "Not Found" (404) at link www.247recruit.net/activate.html (from www.247recruit.net/247recruit_tefl_jobs_network.html) In our situation, we took the records of HTTP 400/404 errors and passed them to the web development department. Improvements are to be expected soon. ;-) Quality assurance on the full-featured desktop Unfortunately, the generated output of HTTrack was still incomplete but luckily there were only images missing. Being directly on the web server we simply copied the missing images from the original source folder into our offline version. After that, we created an archive and transferred the file securely to our local workspace for further review and checks. From that point on, it wasn't necessary to get any more files from the original web server, and we could focus ourselves completely on the process of browsing and navigating through the offline version to isolate visual differences and functional problems. As said, the original web site runs on ASP.NET Web Forms and uses Postback calls for interaction like search, pagination and partly for navigation. This is the main field of improving the offline experience. Of course, same as for standard web development it is advised to test with various browsers, and strangely we discovered that the offline version looked pretty good on Firefox, Chrome and Safari, but not in Internet Explorer. A quick look at the HTML source shed some light on this, and there are conditional CSS inclusions based on the user agent. HTTrack is not acting as Internet Explorer and so we didn't have the necessary overrides for this browser. Not problematic after all in our case, but you might have to pay attention to this and get the IE-specific files explicitly. And while having a view at the source code, we also found out that HTTrack actually modifies the generated HTML output. In several occasions we discovered that <div> elements were converted into <table> constructs for no obvious reason; even nested structures. Search 'e'nd destroy - sed (or Notepad++) to the rescue During our intensive root cause analysis for a couple of HTML/CSS problems that needed some extra attention it is very helpful to be familiar with any editor that allows search and replace over multiple files like, ie. sed - stream editor for filtering and transforming text on Linux or my personal favourite Notepad++ on Windows. This allowed us to quickly fix a lot of anchors with onclick attributes and Javascript code that was addressed to ASP.NET files instead of their generated HTML counterparts, like so: grep -lr -e '.aspx' * | xargs sed -e 's/.aspx/.html?/g' The additional question mark after the HTML extension helps to separate the query string from the actual target and solved all our missing hyperlinks very fast. The same can be done in Notepad++ on Windows, too. Just use the 'Replace in files' feature and you are settled. Especially, in combination with Regular Expressions (regex). Landscape of browsers Okay, after several runs of HTML/CSS code analysis, searching and replacing some strings in a pool of more than 4.000 files, we finally had a very good match of an offline browsing experience in Firefox and Chrome on Linux. Next, we transferred that modified set of files to a Windows 8 machine for review on Firefox, Chrome and Internet Explorer 7 to 10, and a Mac mini running Mac OS X 10.7 to check the output on Safari and again on Chrome. Besides IE, for reasons already mentioned above, the results were identical. And last but not least it was about to check web site on tablets. Please continue to read on the following articles: Taking web sites offline for demonstration on Galaxy Tablet Taking web sites offline for demonstration on iPad

Read the article

Paying great programmers more than average programmers

- by Kelly French

It's fairly well recognized that some programmers are up to 10 times more productive than others. Joel mentions this topic on his blog. There is a whole blog devoted to the idea of the "10x productive programmer". In years since the original study, the general finding that "There are order-of-magnitude differences among programmers" has been confirmed by many other studies of professional programmers (Curtis 1981, Mills 1983, DeMarco and Lister 1985, Curtis et al. 1986, Card 1987, Boehm and Papaccio 1988, Valett and McGarry 1989, Boehm et al 2000). Fred Brooks mentions the wide range in the quality of designers in his "No Silver Bullet" article, The differences are not minor--they are rather like the differences between Salieri and Mozart. Study after study shows that the very best designers produce structures that are faster, smaller, simpler, cleaner, and produced with less effort. The differences between the great and the average approach an order of magnitude. The study that Brooks cites is: H. Sackman, W.J. Erikson, and E.E. Grant, "Exploratory Experimental Studies Comparing Online and Offline Programming Performance," Communications of the ACM, Vol. 11, No. 1 (January 1968), pp. 3-11. The way programmers are paid by employers these days makes it almost impossible to pay the great programmers a large multiple of what the entry-level salary is. When the starting salary for a just-graduated entry-level programmer, we'll call him Asok (From Dilbert), is $40K, even if the top programmer, we'll call him Linus, makes $120K that is only a multiple of 3. I'd be willing to be that Linus does much more than 3 times what Asok does, so why wouldn't we expect him to get paid more as well? Here is a quote from Stroustrup: "The companies are complaining because they are hurting. They can't produce quality products as cheaply, as reliably, and as quickly as they would like. They correctly see a shortage of good developers as a part of the problem. What they generally don't see is that inserting a good developer into a culture designed to constrain semi-skilled programmers from doing harm is pointless because the rules/culture will constrain the new developer from doing anything significantly new and better." This leads to two questions. I'm excluding self-employed programmers and contractors. If you disagree that's fine but please include your rationale. It might be that the self-employed or contract programmers are where you find the top-10 earners, but please provide a explanation/story/rationale along with any anecdotes. [EDIT] I thought up some other areas in which talent/ability affects pay. Financial traders (commodities, stock, derivatives, etc.) designers (fashion, interior decorators, architects, etc.) professionals (doctor, lawyer, accountant, etc.) sales Questions: Why aren't the top 1% of programmers paid like A-list movie stars? What would the industry be like if we did pay the "Smart and gets things done" programmers 6, 8, or 10 times what an intern makes? [Footnote: I posted this question after submitting it to the Stackoverflow podcast. It was included in episode 77 and I've written more about it as a Codewright's Tale post 'Of Rockstars and Bricklayers'] Epilogue: It's probably unfair to exclude contractors and the self-employed. One aspect of the highest earners in other fields is that they are free-agents. The competition for their skills is what drives up their earning power. This means they can not be interchangeable or otherwise treated as a plug-and-play resource. I liked the example in one answer of a major league baseball team trying to field two first-basemen. Also, something that Joel mentioned in the Stackoverflow podcast (#77). There are natural dynamics to shrink any extreme performance/pay ranges between the highs and lows. One is the peer pressure of organizations to pay within a given range, another is the likelyhood that the high performer will realize their undercompensation and seek greener pastures.

Read the article

Windows Azure Learning Plan - Security

- by BuckWoody

This is one in a series of posts on a Windows Azure Learning Plan. You can find the main post here. This one deals with Security for Windows Azure. General Security Information Overview and general information about Windows Azure Security - what it is, how it works, and where you can learn more. General Security Whitepaper – answers most questions http://blogs.msdn.com/b/usisvde/archive/2010/08/10/security-white-paper-on-windows-azure-answers-many-faq.aspx Windows Azure Security Notes from the Patterns and Practices site http://blogs.msdn.com/b/jmeier/archive/2010/08/03/now-available-azure-security-notes-pdf.aspx Overview of Azure Security http://www.windowsecurity.com/articles/Microsoft-Azure-Security-Cloud.html Azure Security Resources http://reddevnews.com/articles/2010/08/19/microsoft-releases-windows-azure-security-resources.aspx Cloud Computing Security Considerations http://www.microsoft.com/downloads/en/details.aspx?FamilyID=68fedf9c-1c27-4642-aa5b-0a34472303ea&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+MicrosoftDownloadCenter+%28Microsoft+Download+Center Security in Cloud Computing – a Microsoft Perspective http://www.microsoft.com/downloads/en/details.aspx?FamilyID=7c8507e8-50ca-4693-aa5a-34b7c24f4579&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+MicrosoftDownloadCenter+%28Microsoft+Download+Center Physical Security for Microsoft’s Online Computing Information on the Infrastructure and Locations for Azure Physical Security. The Global Foundation Services Group at Microsoft handles physical security http://www.globalfoundationservices.com/security/index.html Microsoft’s Security Response Center http://www.microsoft.com/security/msrc/ Software Security for Microsoft’s Online Computing Steps we take as a company to develop secure software Windows Azure is developed using the Trustworthy Computing Initiative http://www.microsoft.com/about/twc/en/us/default.aspx and http://msdn.microsoft.com/en-us/library/ms995349.aspx Identity and Access in the Cloud http://blogs.msdn.com/b/technology_titbits_by_rajesh_makhija/archive/2010/10/29/identity-and-access-in-the-cloud.aspx Security Steps you should take While Microsoft takes great pains to secure the infrastructure, platform and code for Windows Azure, you have a responsibility to write secure code. These pointers can help you do that. Securing your cloud architecture, step-by-step http://technet.microsoft.com/en-us/magazine/gg296364.aspx Security Guidelines for Windows Azure http://redmondmag.com/articles/2010/06/15/microsoft-issues-security-guidelines-for-windows-azure.aspx Best Practices for Windows Azure Security http://blogs.msdn.com/b/vbertocci/archive/2010/06/14/security-best-practices-for-developing-windows-azure-applications.aspx Active Directory and Windows Azure http://blogs.msdn.com/b/plankytronixx/archive/2010/10/22/projecting-your-active-directory-identity-to-the-azure-cloud.aspx Understanding Encryption (great overview and tutorial) http://blogs.msdn.com/b/plankytronixx/archive/2010/10/23/crypto-primer-understanding-encryption-public-private-key-signatures-and-certificates.aspx Securing your Connection Strings (SQL Azure) http://blogs.msdn.com/b/sqlazure/archive/2010/09/07/10058942.aspx Getting started with Windows Identity Foundation (WIF) quickly http://blogs.msdn.com/b/alikl/archive/2010/10/26/windows-identity-foundation-wif-fast-track.aspx

Read the article

Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Search Results

Search found 4451 results on 179 pages for 'fred 22'.

Page 171/179 | < Previous Page | 167 168 169 170 171 172 173 174 175 176 177 178 | Next Page >

- by quanta

- by luca

- by misteryes

- by DaveCol

- by mq44944

- by Rich

- by user568829

- by James

- by Jason Bristol

- by growse

- by Gavin Simpson

- by ptomli

- by Christopher Valles

- by allentown

- by exxoid

- by Geoff Dalgas

- by house9

- by ACiD GRiM

- by brazorf

- by RBA

- by Kamil Kisiel

- by Kelly French

- by BuckWoody

- by Paul White

< Previous Page | 167 168 169 170 171 172 173 174 175 176 177 178 | Next Page >