high resolution - Page 45

Extremely high mysqld CPU usage with no active queries

- by RadarNyan

I have a VPS running Ubuntu 12.04 LTS with LEMP stack, followed the guide from Linode Library (since I'm using a Linode) to setup, and everything worked fine until now. I don't know what's wrong, but my CPU usage just goes up since a week ago. Today things getting really bad - I got 74% CPU usage so I went check and found that mysqld taking too much CPU usage (somewhere around 30% ~ 80%) So I did some Google Search, tried disable InnoDB, restart mysql, reset ntp / system clock (Isn't this bug supposed to happen more than a year ago?!) and reboot my VPS, nothing helped. Even with mysql processlist empty, I still get mysqld CPU usage very high. I don't know what I missed and have totally no idea, any advice would be appreciated. Thanks in advance. Update: I got these from running "strace mysqld" write(2, "InnoDB: Unable to lock ./ibdata1"..., 44) = 44 write(2, "InnoDB: Check that you do not al"..., 115) = 115 select(0, NULL, NULL, NULL, {1, 0}^[[A^[[A) = 0 (Timeout) fcntl64(3, F_SETLK64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}, 0xbfa496f8) = -1 EAGAIN (Resource temporarily unavailable) hum... I did tried to disable InnoDB and it didn't fix this problem. Any idea? Update2: # ps -e | grep mysqld 13099 ? 00:00:20 mysqld then use "strace -p 13099", the following lines appears repeatedly: fcntl64(12, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept(12, {sa_family=AF_FILE, NULL}, [2]) = 14 fcntl64(12, F_SETFL, O_RDWR) = 0 getsockname(14, {sa_family=AF_FILE, path="/var/run/mysqld/mysqld.sock"}, [30]) = 0 fcntl64(14, F_SETFL, O_RDONLY) = 0 fcntl64(14, F_GETFL) = 0x2 (flags O_RDWR) setsockopt(14, SOL_SOCKET, SO_RCVTIMEO, "\36\0\0\0\0\0\0\0", 8) = 0 setsockopt(14, SOL_SOCKET, SO_SNDTIMEO, "<\0\0\0\0\0\0\0", 8) = 0 fcntl64(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0 setsockopt(14, SOL_IP, IP_TOS, [8], 4) = -1 EOPNOTSUPP (Operation not supported) futex(0xb786a584, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xb786a580, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 futex(0xb7869998, FUTEX_WAKE_PRIVATE, 1) = 1 poll([{fd=10, events=POLLIN}, {fd=12, events=POLLIN}], 2, -1) = 1 ([{fd=12, revents=POLLIN}]) er... now I totally don't get it x_x help

Read the article

High CPU usage in my digitalocean droplet

- by Ibrahim Azhar Armar

I am experiencing high CPU usage here is the stats i got from server, the consumption after every restart in 15 minutes go upto 100%, what could go wrong? I have a wordpress copy installed on the server which does not have much traffic, here is the stats that i got from using top command in server. top - 11:46:02 up 12 min, 3 users, load average: 40.89, 16.03, 6.11 Tasks: 132 total, 41 running, 91 sleeping, 0 stopped, 0 zombie Cpu(s): 24.3%us, 61.5%sy, 0.0%ni, 0.0%id, 4.0%wa, 0.0%hi, 0.0%si, 10.2%st Mem: 2050896k total, 1988656k used, 62240k free, 284k buffers Swap: 0k total, 0k used, 0k free, 4712k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31 root 20 0 0 0 0 R 39 0.0 1:35.53 kswapd0 899 root 20 0 15988 172 0 S 14 0.0 0:05.00 irqbalance 418 syslog 20 0 243m 600 0 S 13 0.0 0:06.85 rsyslogd 944 mysql 20 0 1320m 53m 0 S 12 2.7 0:21.15 mysqld 2357 root 20 0 17344 532 164 R 11 0.0 0:14.27 top 960 root 20 0 246m 3816 0 S 3 0.2 0:08.18 php5-fpm 2431 www-data 20 0 344m 64m 908 R 2 3.2 0:04.23 apache2 2435 www-data 20 0 304m 63m 836 R 2 3.2 0:03.43 apache2 2413 www-data 20 0 349m 63m 920 R 2 3.2 0:07.51 apache2 2465 www-data 20 0 349m 64m 944 R 2 3.2 0:05.04 apache2 2518 www-data 20 0 307m 41m 1204 R 2 2.1 0:01.37 apache2 2406 www-data 20 0 346m 56m 1144 R 2 2.8 0:03.76 apache2 2456 www-data 20 0 345m 55m 1184 R 2 2.8 0:02.67 apache2 2373 www-data 20 0 351m 63m 784 R 2 3.2 0:11.09 apache2 2439 www-data 20 0 306m 35m 916 R 2 1.8 0:02.51 apache2 2450 www-data 20 0 345m 55m 1088 R 2 2.8 0:02.96 apache2 2486 www-data 20 0 299m 10m 876 R 2 0.5 0:01.19 apache2 2523 www-data 20 0 300m 27m 796 R 2 1.4 0:00.99 apache2

Read the article

Archive Manager, SQL 2005 and MaxTokenSize high CPU

- by Tim Alexander

So, I posted this question a few days ago: Impact of increasing the MaxTokenSize for Kerberos Tickets Since then the thought was to test our settings on two member servers, one with IIS and one without. I setup two GPOs to configure the MaxTokenSize reg setting to 48000 and MaxFieldLength/MaxRequestBytes to 64200 (based on MS KB2020943, these are set at 4/3 * T + 200). The member server seemed to work ok (a devalued tape backup server). The IIS server however has had some strange repercussions. The IIS Sserver host Quest Software Archive Manager (AM) 4.5 that communicates with SQL Server 2005 Enterprise on Server 2003 R2. After the changes all looked good until the SQL Server hit 100% CPU. I have removed the GPOS, removed the reg values and even replaced them with defaults (12000 for token size and can't remember the other one but was in a blog post about the issue in my other post). No change. Bouncing the IIS Server stops the high CPU and a colleague has looked at the SQL server and it is definitely the AM connection taking up the time/work on the SQL server. I haven't changed the reg values on the SQL server or the DCs but am reluctant to do so without understanding why this has happened. I am guessing its to do with the overriding auth and group issue we have but I am not seeing Kerberos errors in either event log. Has anyone seen something similar or does anyone have some tips? Was definitely blindsided by the Kerberos issue and am swimming against the tide to keep things functioning.

Read the article

wireless router - configuring for low-latency, high traffic environment

- by Mark C

Hey all, I have a few questions about configuring a router to achieve low-latency, high speed throughput on a local area network that is not connected to the internet. I've read up on some stuff, but thought I would solicit some opinions here on what I've found and what I want to know.... Turn off SSID broadcast - it produces extraneous packets that all clients receive and reply (?) to. Not a huge deal, but it may help a bit. Mixed-mode off - I should attempt to have all devices using the same standard (e.g. 802.11n) and turn mixed-mode off. Any thoughts on security? Does having WEP or any of the WPA variants actually increase latency? Nothing super secure is going over this LAN so if turning security off made things better, that'd be cool. Any other thoughts or things to focus on to create the low latency environment I'm trying to go for would be great. Links to webpages and papers are also cool. I'm open to go through a bunch of stuff. Thanks in advance!

Read the article

Unexpected(?) high 'wasted' memory in memcached

- by Nanne

Looking at our memcached stats I think I have found an issue I was not aware of before. It seems that we have a strangely high amount of wasted space. I checked with phpmemcacheadmin for a change, and found this image staring at me: Now I was under the impression that the worst-case scenario would be that there is 50% waste, although I am the first to admit not knowing all the details. I have read - amongst others- this page which is indeed somewhat old, but so is our version of memcached. I think I do understand how the system works (e.g.) I believe, but I have a hard time understanding how we could get to 76% wasted space. The eviction rate that phpmemcacheadmin shows is 2 ev/s, so there is some problem here. The primary question is: what can I do to fix this. I could throw more memory at it (there is some extra available I think), maybe I should fiddle with the slab config (is that even possible with this version?), maybe there are other options? Upgrading the memcached version is not a quickly available option. The secondairy question, out of curiosity, is of course if the rate of 75% (and rising) wasted space is expected, and if so, why. System: This is currently not something I can do anything about, I know the memcached version isn't the newest, but these are the cards I've been dealt. Memcached 1.4.5 Apache 2.2.17 PHP 5.3.5

Read the article

High data on recv-q buffer and thread lock on java.io.BufferedInputStream in linux

- by Sagar Patel

We have a java application running on linux (ubuntu server). We have been facing high recv-q problem since quite some time. Application gets hang and does not read data from socket every few hours. In thread dump, we have found below stack trace. "Receiver-146" daemon prio=10 tid=0x00007fb3fc010000 nid=0x7642 runnable [0x00007fb5906c5000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream. socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0x00000007688f1ff0> (a java.io.BufferedInputStream) at org.smpp.TCPIPConnection.receive(TCPIPConnection.java:413) at org.smpp.ReceiverBase.receivePDUFromConnection(ReceiverBase.java:197) at org.smpp.Receiver.receiveAsync(Receiver.java:351) at org.smpp.ReceiverBase.process(ReceiverBase.java:96) at org.smpp.util.ProcessingThread.run(ProcessingThread.java:199) at java.lang.Thread.run(Thread.java:722) We are not able to trace the exact reason behind this? Kindly help. We are using 16 core machine and load on the system is around 30-40 at the time of issue. We use command ss dst <ip> to find out recv-q. Recently we have been facing issues with recv-q size getting hung, were in receive buffer gets stuck at some point of time. But recvQ size is not decreasing and as a result we are losing a lot of hits from the other side, our application is not accepting any data.

Read the article

Netgear CG3000D new Modem/Router - Random High Ping

- by justin.chmura

Cox just recently came out and looked at my internet and decided that the modem I had was causing high latency issue. The speed was fine but the ping would spike to around 100 and over when gaming or putting a load more than browsing on the line. After they replaced it, it seems like I get better latency, but when it spikes, I get upwards of over 300 ping with like 500 jitter. I figured I would hit the serverfault universe before sending another email to Cox. I opted not to do the Cox setup as it was an extra $20 which I thought would have just setup the wireless (which I can handle). Is it a setting or something that I missed that needs to be setup? The firmware for the CG3000D is awful and not fun to use. I did change some hidden settings on the RgServices.asp page (I'll attach a screenshot). I've also heard that the Router/Modem combos are awful and that I should go back and just ask for a modem stand-alone. Any input is helpful. All screenshots: http://imgur.com/a/JX6qu#0

Read the article

Mysql server high trafic makes websites really slow or unable to load

- by Holapress

Lately we have been having a lot of problems with our mysql server, from websites being really slow or even unable to load them at all. The server is a dedicated server that only runs our mysql database. i have been running some test using a profiler (JetProfiler) and tool to stress test (loadUI). If I use loadUI to connect with 50 simultaneous connections to one of our websites that runs a resently big query it will already make the website be unable to load. One of the things that makes me worried is that when I look at Jetprofile it always shows a Treads_connected of 1.00 and it seems that when it hits around 2.00 that I'm unable to connect. The 3 big peaks are when I run a test with loadUI, first one was 15 simultaneous connections wich made it still able for me to load the website but just really slow, the second one was 40 simultaneous connections which already made it impossible to load and the third one was with 100 connection which also didn't make it load anymore. Another thing that worries me is that in JetProfiler it says all the queries that get used are full table scans, could this maybe be the problem? The website I run as a test runs 3 queries, one for a menu that outputs around 1000 rows, one for the adds that has around 560 rows and a big one to get posts that has around 7000 rows (see screenshot bellow) I also have monitored the cpu of the server and there seems to be no problem there, even when I make a lot of connections with loadui the cpu stays low. I can't seem to figure out what is the main cause of the websites being unable to load when there is a high amount of traffic, if anyone has other suggestions for testing or something that might cause the problem please let me know.

Read the article

Developing high-performance and scalable zend framework website

- by Daniel

We are going to develop an ads website like http://www.gumtree.com/ (it will not be like this one but just to give you an ideea) and we are having some issues regarding performance and scalability. We are planning on using Zend Framework for this project but this is all that I'm sure off at this point. I don't think a classic approch like Zend Framework (PHP) + MySQL + Memcache + jQuery (and I would throw Doctrine 2 in there to) will fix result in a high-performance application. I was thinking on making this a RESTful application (with Zend Framework) + NGINX (or maybe MongoDB) + Memcache (or eAccelerator -- I understand this will create problems with scalability on multiple servers) + jQuery, a CDN for static content, a server for images and a scalable server for the requests and the rest. My questions are: - What do you think about my approch? - What solutions would you recommand in terms of servers approch (MySQL, NGINX, MongoDB or pgsql) for a scalable application expected to have a lot of traffic using PHP?...I would be interested in your approch. Note: I'm a Zend Framework developer and don't have to much experience with the servers part (to determin what would be best solution for my scalable application)

Read the article

Linux 2.6.24-gentoo-r3-comtrance on x86_64 high Useage for unknown reasons

- by Dorjan

Hello everyone, I'm a complete rookie when it comes to all things Linux related so please treat me as such and assume I know nothing. That being said my Top says this: top - 12:08:03 up 11 days, 15:36, 0 users, load average: 5.47, 5.53, 5.46 Tasks: 296 total, 2 running, 294 sleeping, 0 stopped, 0 zombie Cpu(s): 6.3%us, 1.4%sy, 0.0%ni, 71.3%id, 20.6%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 8176880k total, 8118236k used, 58644k free, 89312k buffers Swap: 1004052k total, 0k used, 1004052k free, 7235652k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1229 root 15 -5 0 0 0 D 1 0.0 199:28.63 kjournald 2946 root 20 0 1716 676 552 D 1 0.0 145:02.94 syslogd 14553 root 20 0 2644 1268 876 R 1 0.0 0:00.34 top 14609 postfix 20 0 7896 1884 1460 D 1 0.0 0:00.02 bounce 14630 postfix 20 0 7896 1876 1452 R 0 0.0 0:00.00 bounce And my hard drives says: > df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda3 4925556 4474836 200508 96% / /dev/sda5 489992 36090 428602 8% /tmp /dev/sda6 377951852 236171160 122581816 66% /var none 4088440 0 4088440 0% /dev/shm It has been like it for a few days now... I know not what is causing the high server load (Normally around 1.3) can anyone give any tips on how to track down the culprit? Many thanks,

Read the article

HIGH CPU USAGE + low memory usage

- by hadi

as you can see in below , there are high cpu usage by httpd request. please help me to decrease them. thanks. 28577 apache 15 0 99676 53m 3488 S 21 0.2 1:13.67 httpd 28568 apache 15 0 99676 53m 3496 S 19 0.2 1:14.92 httpd 28608 apache 15 0 99676 53m 3428 R 19 0.2 0:28.28 httpd 28615 apache 15 0 99676 53m 3436 R 19 0.2 0:25.33 httpd 28616 apache 15 0 99676 53m 3440 S 19 0.2 0:25.83 httpd 28619 apache 15 0 99676 53m 3436 R 19 0.2 0:26.12 httpd 28635 apache 15 0 97.9m 54m 3416 S 19 0.2 0:24.86 httpd 28558 apache 15 0 97.9m 54m 3432 R 17 0.2 1:40.75 httpd 28560 apache 15 0 97.9m 54m 3496 R 17 0.2 1:40.02 httpd 28621 apache 15 0 97.9m 54m 3420 S 17 0.2 0:25.61 httpd 28641 apache 16 0 97.9m 54m 3428 R 17 0.2 0:21.52 httpd 28642 apache 15 0 99756 53m 3424 R 15 0.2 0:21.46 httpd 28643 apache 15 0 99676 53m 3424 S 15 0.2 0:21.59 httpd 28594 apache 15 0 99756 53m 3428 R 13 0.2 0:44.41 httpd 28618 apache 15 0 99676 53m 3420 S 13 0.2 0:26.15 httpd 28654 apache 15 0 99676 53m 3472 S 13 0.2 0:04.27 httpd 28575 apache 15 0 99756 53m 3436 R 11 0.2 1:14.02 httpd 28576 apache 15 0 99676 53m 3496 S 11 0.2 1:16.79 httpd 28634 apache 15 0 99676 53m 3436 S 11 0.2 0:25.36 httpd 28653 apache 15 0 99676 53m 3424 S 11 0.2 0:04.35 httpd 28574 apache 15 0 99676 53m 3440 S 10 0.2 1:13.05 httpd 28592 apache 15 0 99676 53m 3492 R 10 0.2 0:45.78 httpd 28595 apache 15 0 99676 53m 3432 R 10 0.2 0:47.02 httpd 28617 apache 16 0 99676 53m 3436 S 10 0.2 0:25.32 httpd 28620 apache 15 0 99676 53m 3432 S 10 0.2 0:25.35 httpd 28597 apache 15 0 99676 53m 3428 S 8 0.2 0:43.56 httpd 11345 mysql 15 0 2927m 198m 4472 R 4 0.6 1624:43 mysqld 1 root 15 0 2036 648 552 S 0 0.0 0:16.97 init 2 root RT 0 0 0 0 S 0 0.0 0:48.50 migration/0 3 root 34 19 0 0 0 S 0 0.0 0:26.72 ksoftirqd/0 4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 5 root RT 0 0 0 0 S 0 0.0 0:04.98 migration/1 6 root 34 19 0 0 0 R 0 0.0 0:27.51 ksoftirqd/1 7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1 8 root RT 0 0 0 0 S 0 0.0 0:15.42 migration/2 9 root 34 19 0 0 0 S 0 0.0 0:26.50 ksoftirqd/2 10 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/2

Read the article

MySQL Extremely High Disk Activity (Read Operations)

- by Jake Schoermer

I have 1GB Linode VPS with a standard LAMP stack. Apache is tuned fine but for some reason MySQL's disk usage is high. This is causing really slow site load times. RAM and CPU usage are fine. Can anyone give me any pointers on tuning mysql's disk performance? I'm using InnoDB. iotop output is below. Total DISK READ: 38.50 M/s | Total DISK WRITE: 27.20 K/s TID PRIO USER DISK READ> DISK WRITE SWAPIN IO COMMAND 9808 be/4 mysql 22.40 M/s 0.00 B/s 0.00 % 63.75 % mysqld 10045 be/4 mysql 2.06 M/s 0.00 B/s 0.00 % 26.65 % mysqld 9987 be/4 mysql 1694.38 K/s 0.00 B/s 0.00 % 18.33 % mysqld 10015 be/4 mysql 1554.47 K/s 0.00 B/s 0.00 % 12.71 % mysqld 10019 be/4 mysql 1461.21 K/s 0.00 B/s 0.00 % 5.58 % mysqld 9839 be/4 mysql 1383.48 K/s 0.00 B/s 0.00 % 25.69 % mysqld 10031 be/4 mysql 1243.58 K/s 0.00 B/s 0.00 % 5.68 % mysqld 10023 be/4 mysql 1057.04 K/s 0.00 B/s 0.00 % 2.02 % mysqld 10020 be/4 mysql 1025.95 K/s 0.00 B/s 0.00 % 7.05 % mysqld 10001 be/4 mysql 808.33 K/s 683.97 K/s 0.00 % 1.16 % mysqld 10025 be/4 mysql 746.15 K/s 0.00 B/s 0.00 % 3.28 % mysqld 10043 be/4 mysql 715.06 K/s 0.00 B/s 0.00 % 0.48 % mysqld 10044 be/4 mysql 672.31 K/s 0.00 B/s 0.00 % 5.25 % mysqld 10034 be/4 mysql 668.42 K/s 1989.73 K/s 0.00 % 5.31 % mysqld 9985 be/4 mysql 450.80 K/s 124.36 K/s 0.00 % 8.83 % mysqld 9989 be/4 mysql 357.53 K/s 0.00 B/s 0.00 % 5.21 % mysqld 10033 be/4 mysql 186.54 K/s 0.00 B/s 0.00 % 1.59 % mysqld 10021 be/4 mysql 155.45 K/s 435.25 K/s 0.00 % 1.23 % mysqld 10007 be/4 mysql 124.36 K/s 0.00 B/s 0.00 % 0.53 % mysqld 9763 be/4 www-data 38.86 K/s 0.00 B/s 0.00 % 4.56 % apache2 -k start 10027 be/4 mysql 31.09 K/s 0.00 B/s 0.00 % 4.24 % mysqld 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] 3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0] 4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0] 5 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/u:0] 6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0] 7 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1]

Read the article

DNS problems on CentOS fresh install

- by Rick Koshi

I'm having some DNS issues on a new box I'm installing with CentOS 6.2. I am able to look up names using nslookup, dig, or host. I am able to ping machines by name or by IP address. However, when I try other tools, such as ssh, wget, or yum, they are unable to resolve names. For example: # wget http://www.google.com --2012-03-08 14:48:06-- http://www.google.com/ Resolving www.google.com... failed: Name or service not known. wget: unable to resolve host address `www.google.com' # ssh www.google.com ssh: Could not resolve hostname www.google.com: Name or service not known # ping -c 1 www.google.com PING www.l.google.com (74.125.113.106) 56(84) bytes of data. 64 bytes from vw-in-f106.1e100.net (74.125.113.106): icmp_seq=1 ttl=46 time=43.6 ms --- www.l.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 59ms rtt min/avg/max/mdev = 43.665/43.665/43.665/0.000 ms # host www.google.com www.google.com is an alias for www.l.google.com. www.l.google.com has address 74.125.113.99 www.l.google.com has address 74.125.113.103 www.l.google.com has address 74.125.113.104 www.l.google.com has address 74.125.113.105 www.l.google.com has address 74.125.113.106 www.l.google.com has address 74.125.113.147 My /etc/nsswitch.conf file is the default, including this (standard) line: hosts: files dns /etc/resolv.conf is as set up by DHCP: ; generated by /sbin/dhclient-script nameserver 192.168.1.254 192.168.1.254 is a working DNS server (my DSL modem, working for years with other machines) Anyone know why ping would work, but ssh/wget would fail? Per NcA's suggestion, I tried changing /etc/resolv.conf to point to 8.8.8.8. Oddly enough, this does make it work. Obviously, my DSL modem is responding to DNS requests in some way that some parts of Linux's resolution system don't like. Looking at the tcpdump, I am unable to see what the difference is. Certainly, both servers are sending the same addresses. Here's the output from tcpdump -nn -X with the server set to the DNS server on the DSL modem. It's clearly replying with the correct addresses, but ssh/wget don't seem happy with it for some reason: 15:53:52.133580 IP 192.168.1.254.53 > 192.168.1.2.54836: 33157 7/0/0 CNAME www.l.google.com., A 74.125.115.105, A 74.125.115.106, A 74.125.115.147, A 74.125.115.99, A 74.125.115.103, A 74.125.115.104 (148) 0x0000: 4500 00b0 e33a 0000 ff11 53b1 c0a8 01fe E....:....S..... 0x0010: c0a8 0102 0035 d634 009c 7528 8185 8180 .....5.4..u(.... 0x0020: 0001 0007 0000 0000 0377 7777 0667 6f6f .........www.goo 0x0030: 676c 6503 636f 6d00 0001 0001 c00c 0005 gle.com......... 0x0040: 0001 0007 acd0 0008 0377 7777 016c c010 .........www.l.. 0x0050: c02c 0001 0001 0000 0001 0004 4a7d 7369 .,..........J}si 0x0060: c02c 0001 0001 0000 0001 0004 4a7d 736a .,..........J}sj 0x0070: c02c 0001 0001 0000 0001 0004 4a7d 7393 .,..........J}s. 0x0080: c02c 0001 0001 0000 0001 0004 4a7d 7363 .,..........J}sc 0x0090: c02c 0001 0001 0000 0001 0004 4a7d 7367 .,..........J}sg 0x00a0: c02c 0001 0001 0000 0001 0004 4a7d 7368 .,..........J}sh 15:53:52.135669 IP 192.168.1.254.53 > 192.168.1.2.54836: 65062- 0/0/0 (32) 0x0000: 4500 003c e33b 0000 ff11 5424 c0a8 01fe E..<.;....T$.... 0x0010: c0a8 0102 0035 d634 0028 98f9 fe26 8000 .....5.4.(...&.. 0x0020: 0001 0000 0000 0000 0377 7777 0667 6f6f .........www.goo 0x0030: 676c 6503 636f 6d00 001c 0001 gle.com..... I'm not enough of an expert to know if this is malformed in some way, but ping seems to do the right thing with it. For comparison, here's the same thing when querying 8.8.8.8: 15:57:27.990270 IP 8.8.8.8.53 > 192.168.1.2.49028: 59114 7/0/0 CNAME www.l.google.com., A 74.125.113.105, A 74.125.113.103, A 74.125.113.106, A 74.125.113.147, A 74.125.113.104, A 74.125.113.99 (148) 0x0000: 4500 00b0 5530 0000 2f11 6453 0808 0808 E...U0../.dS.... 0x0010: c0a8 0102 0035 bf84 009c 39f8 e6ea 8180 .....5....9..... 0x0020: 0001 0007 0000 0000 0377 7777 0667 6f6f .........www.goo 0x0030: 676c 6503 636f 6d00 0001 0001 c00c 0005 gle.com......... 0x0040: 0001 0001 516a 0008 0377 7777 016c c010 ....Qj...www.l.. 0x0050: c02c 0001 0001 0000 0116 0004 4a7d 7169 .,..........J}qi 0x0060: c02c 0001 0001 0000 0116 0004 4a7d 7167 .,..........J}qg 0x0070: c02c 0001 0001 0000 0116 0004 4a7d 716a .,..........J}qj 0x0080: c02c 0001 0001 0000 0116 0004 4a7d 7193 .,..........J}q. 0x0090: c02c 0001 0001 0000 0116 0004 4a7d 7168 .,..........J}qh 0x00a0: c02c 0001 0001 0000 0116 0004 4a7d 7163 .,..........J}qc 15:57:28.018909 IP 8.8.8.8.53 > 192.168.1.2.49028: 31984 1/1/0 CNAME www.l.google.com. (102) 0x0000: 4500 0082 7b1b 0000 2f11 3e96 0808 0808 E...{.../.>..... 0x0010: c0a8 0102 0035 bf84 006e c67e 7cf0 8180 .....5...n.~|... 0x0020: 0001 0001 0001 0000 0377 7777 0667 6f6f .........www.goo 0x0030: 676c 6503 636f 6d00 001c 0001 c00c 0005 gle.com......... 0x0040: 0001 0001 517f 0008 0377 7777 016c c010 ....Q....www.l.. 0x0050: c030 0006 0001 0000 0258 0026 036e 7334 .0.......X.&.ns4 0x0060: c010 0964 6e73 2d61 646d 696e c010 0016 ...dns-admin.... 0x0070: 91f3 0000 0384 0000 0384 0000 0708 0000 ................ 0x0080: 003c .< I still don't know why the server's reply is adequate for ping but not for ssh/wget. If anyone has ideas, I'd be happy to hear them. For now, though, I can either refer to an outside DNS server or set up my own server on the new box. It's a workaround that seems like it should be unnecessary, but will allow me to proceed.

Read the article

Getting started with webserver clustering.

- by Ernie

I work for a small ISP, and we host about 250 domains and all the stuff that goes along with that: DNS, mail, spam filtering, and backups. Currently, we have separate DNS servers (two of them) and mail servers (outgoing mail is actually on the secondary DNS server, but was previously on its own server). In the past, this was done as an insurance measure. The last thing we need is for some doofus (usually yours truly) to hose a server, taking out DNS and mail right along with it, or for spammers to jam our incoming SMTP server, preventing outgoing mail from being sent too. In the past, this was a problem, and our servers were set up the way they are now to combat it. However, clustering solutions like Sun's Cobalt RAQ (in days of olde) and Virtualmin appear to cater to an all-in-one approach, then deal with failures through redundant servers. I have avoided this thus far, but we've been using Virtualmin on our web server for a while now, and I'd like to expand into using it for a high availability cluster. Our networking partner has recently built a datacenter that has eliminated all of our other bugaboos like network, cooling, and power issues, so now the only thing left to go wrong is me hosing a server, which happened earlier this month. One of the bigger reasons we've avoided going this route is because our hardware requirements aren't particularly high. One server easily handles all the sites we host (most of them are flat sites). Also, load-balancing routers tend to be expensive and complicated. All that I'm really expecting to do is building a two-node cluster for redundancy so that when I hose a server (however rare that might be), we're not out for 8-12 hours while I rebuild it. What I need to know is how to get started, and if I'm really in a position to bother with this kind of thing at all.

Read the article

DNS failover in a two datacenter scenario

- by wanson

I'm trying to implement a low-cost solution for website high availability. I'm looking for the downsides of the following scenario: I have two servers with the same configuration, content, mysql replication (dual-master). They are in different datacenters - let's call them serverA and serverB. Users use serverA - serverB is more like a backup. Now, I want to use DNS failover, to switch users from serverA to serverB when serverA goes down. My idea is that I setup DNS servers (bind/powerdns) on serverA and serverB - let's call them ns1.website.com and ns2.website.com (assuming I own website.com). Then I configure my domain to use them as its nameservers. Both DNS servers will return serverA IP as my website's IP. If serverA goes down I can (either manually or automatically from serverB) change configuration of serverB's DNS, to return IP of serverB as website's IP. Of course the TTL will be low, as it's supposed to be in DNS failovers. I know that it may take some time to switch to serverB (DNS ttl, time to detect serverA failure, serverB DNS reconfiguration etc), and that some small part of users won't use serverB anyway. And I'm OK with that. But what are other downsides of such an approach? An alternative scenario is that ns1.website.com will return serverA IP as website's IP, and ns2.website.com will return serverB IP as website's IP. But AFAIK clients not always use primary nameserver and sometimes would use secondary one. So some small part of users would use serverB instead of serverA which is not quite what I'd like. Can you confirm that DNS clients behave like that and can you tell what percentage of clients would possibly use serverB instead of serverA (statistically)? This one also has the downside that when serverA goes back up, it will be automatically used as website's primary server, which is also a bad situation (cold cache, mysql replication could fail in the meantime etc). So I'm adding it only as a theoretical alternative. I was thinking about using some professional DNS failover companies but they charge for the number of DNS requests and the fees are very high (why?)

Read the article

"iostat" command different in two equal machines

- by Oz.

We have several machines on Amazon (ec2) of the type c1.xlarge with 8 cpus, running the Amazon AMI. Details on the machine: 7 GB of memory 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High API name: c1.xlarge One out of the several machines is showing a high load average, since we have run the last yum upgrade a couple of weeks a go. We did not yet update the other machines, and everything looks normal on them. The strange thing is that the top command not showing any hint for the cause of the load. CPUs are - 4.8%us, 1.1%sy, 0.0%ni, 94.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st. Mem is about 1.5GB free. Any idea what could it be, or where else can we check? iostat command on the proper machine: avg-cpu: %user %nice %system %iowait %steal %idle 8.97 0.03 4.46 0.19 0.14 86.23 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvdap1 1.60 0.69 55.38 587620 47254184 xvdfp2 2.64 1.10 61.04 934786 52091056 xvdfp4 0.86 0.19 41.72 163866 35601920 xvdfp1 4.37 36.59 73.89 31220810 63051504 xvdfp3 8.03 7.08 94.63 6045402 80749184 iostat command on problematic machine: avg-cpu: %user %nice %system %iowait %steal %idle 9.29 0.04 5.55 0.26 0.11 84.74 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvdap1 2.13 3.34 68.85 246244 5077888 xvdfp1 7.60 74.31 104.88 5480362 7734840 xvdfp3 13.22 73.67 125.00 5433386 9218600 xvdfp4 1.11 0.76 65.08 55762 4799248 xvdfp2 4.16 3.31 99.17 243818 7313264 Many thanks for the help.

Read the article

Difference between all servers in one cluster and more than one cluster with servers?

- by silla

Not sure I understand what´s the difference or how it works when servers a running in one cluster or if there are more than one clusters with servers in it - regard High availability & Load Balancing. For me they are somehow the same, there is not really a big difference. Let´s make a simple example: 2 Servers in 1 Cluster 2 Clusters with each 1 Server - 1. If one Server failure, the other one is able to continue the work. The same for Load Balancing, these two Servers are able to balance the work together. - 2. The same thing! If one Server failure... The only thing that could be a problem with point 1. is if the Cluster fails (then both of the Server are dead). But is this even possible? I was reading stuff about clustering and high availability but I think I do not get this really. Probably I did not really understand how a cluster is working. Are these 2 points with 1 Cluster and 2 Clusters somehow the same or are there really some big differences? What should I know about it? Thank you

Read the article

Load Balance WCF and Share a Remote MSMQ for High Throughput

- by BarDev

After a ton of reading in books and on the web, I have noticed hints of information that WCF and MSMQ can be used in achieving high throughput. The information I have seen mentions using multiple WCF services in a farm that reads from a single MSMQ queue. The problem is that I have found paragraphs here and there that mentions that high throughput can be done, but I cannot seem to find a document of how to implement it. The following is an excerpt from a MSDN article. The following paragraph is from Best Practices for Queued Communication http://msdn.microsoft.com/en-us/library/ms731093.aspx To achieve higher throughput and availability, use a farm of WCF services that read from the queue. This requires that all of these services expose the same contract on the same endpoint. The farm approach works best for applications that have high production rates of messages because it enables a number of services to all read from the same queue. This is what I'm trying to solve. I have an intranet application where a client sends a request to a WCF service. But I want the ability to load balance the WCF services on multiple servers in a farm. I also want these WCF services in the farm to do transactional reads from a remote MSMQ when an item is available in the Queue. If this is possible, an issue I have is that I do not understand the activation process of WCF to retrieve messages from a remote queue. If this is possible, does anyone know of any articles or Webcasts that would explain it in detail? BarDev

Read the article

Why is execution-time method resolution faster than compile-time resolution?

- by Felix

At school, we about virtual functions in C++, and how they are resolved (or found, or matched, I don't know what the terminology is -- we're not studying in English) at execution time instead of compile time. The teacher also told us that compile-time resolution is much faster than execution-time (and it would make sense for it to be so). However, a quick experiment would suggest otherwise. I've built this small program: #include <iostream> #include <limits.h> using namespace std; class A { public: void f() { // do nothing } }; class B: public A { public: void f() { // do nothing } }; int main() { unsigned int i; A *a = new B; for (i=0; i < UINT_MAX; i++) a->f(); return 0; } Where I made A::f() once normal, once virtual. Here are my results: [felix@the-machine C]$ time ./normal real 0m25.834s user 0m25.742s sys 0m0.000s [felix@the-machine C]$ time ./virtual real 0m24.630s user 0m24.472s sys 0m0.003s [felix@the-machine C]$ time ./normal real 0m25.860s user 0m25.735s sys 0m0.007s [felix@the-machine C]$ time ./virtual real 0m24.514s user 0m24.475s sys 0m0.000s [felix@the-machine C]$ time ./normal real 0m26.022s user 0m25.795s sys 0m0.013s [felix@the-machine C]$ time ./virtual real 0m24.503s user 0m24.468s sys 0m0.000s There seems to be a steady ~1 second difference in favor of the virtual version. Why is this? Relevant or not: dual-core pentium @ 2.80Ghz, no extra applications running between two tests. Archlinux with gcc 4.5.0. Compiling normally, like: $ g++ test.cpp -o normal Also, -Wall doesn't spit out any warnings, either.

Read the article

Micro Second resolution timestamps on windows.

- by Nikhil

How to get micro second resolution timestamps on windows? I am loking for something better than QueryPerformanceCounter, QueryPerformanceFrequency (these can only give you an elapsed time since boot, and are not necessarily accurate if they are called on different threads - ie QueryPerformanceCounter may return different results on different CPUs. There are also some processors that adjust their frequency for power saving, which apparently isn't always reflected in their QueryPerformanceFrequency result.) There is this, http://msdn.microsoft.com/en-us/magazine/cc163996.aspx but it does not seem to be solid. This looks great but its not available for download any more. http://www.ibm.com/developerworks/library/i-seconds/ This is another resource. http://www.lochan.org/2005/keith-cl/useful/win32time.html But requires a number of steps, running a helper program plus some init stuff also, I am not sure if it works on multiple CPUs Also looked at the Wikipedia link on the subject which is interesting but not that useful. http://en.wikipedia.org/wiki/Time_Stamp_Counter If the answer is just do this with BSD or Linux, its a lot easier thats fine, but I would like to confirm this and get some explanation as to why this is so hard in windows and so easy in linux and bsd. Its the same damm hardware...

Read the article

Increasing time resolution of BOOST::progress timer

- by feelfree

BOOST::progress_timer is a very useful class to measure the running time of a function. However, the default implementation of progress_timer is not accurate enough and a possible way of increasing time resolution is to reconstruct a new class as the following codes show: #include <boost/progress.hpp> #include <boost/static_assert.hpp> template<int N=2> class new_progress_timer:public boost::timer { public: new_progress_timer(std::ostream &os=std::cout):m_os(os) { BOOST_STATIC_ASSERT(N>=0 &&N<=10); } ~new_progress_timer(void) { try { std::istream::fmtflags old_flags = m_os.setf(std::istream::fixed,std::istream::floatfield); std::streamsize old_prec = m_os.precision(N); m_os<<elapsed()<<"s\n" <<std::endl; m_os.flags(old_flags); m_os.precison(old_prec); } catch(...) { } } private: std::ostream &m_os; }; However, when I compile the codes with VC10, the following error appear: 'precison' : is not a member of 'std::basic_ostream<_Elem,_Traits>' Any ideas? Thanks.

Read the article

Android: Resolution issue while saving image into gallery

- by Luca D'Amico

I've managed to programmatically take a photo from the camera, then display it on an imageview, and then after pressing a button, saving it on the Gallery. It works, but the problem is that the saved photo are low resolution.. WHY?! I took the photo with this code: Intent cameraIntent = new Intent(android.provider.MediaStore.ACTION_IMAGE_CAPTURE); startActivityForResult(cameraIntent, CAMERA_PIC_REQUEST); Then I save the photo on a var using : protected void onActivityResult(int requestCode, int resultCode, Intent data) { if (requestCode == CAMERA_PIC_REQUEST) { thumbnail = (Bitmap) data.getExtras().get("data"); } } and then after displaying it on an imageview, I save it on the gallery with this function: public void SavePicToGallery(Bitmap picToSave, File savePath){ String JPEG_FILE_PREFIX= "PIC"; String JPEG_FILE_SUFFIX= ".JPG"; String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(new Date()); String imageFileName = JPEG_FILE_PREFIX + timeStamp + "_"; File filePath = null; try { filePath = File.createTempFile(imageFileName, JPEG_FILE_SUFFIX, savePath); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } FileOutputStream out = null; try { out = new FileOutputStream(filePath); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } picToSave.compress(CompressFormat.JPEG, 100, out); try { out.flush(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } try { out.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } //Add the pic to Android Gallery String mCurrentPhotoPath = filePath.getAbsolutePath(); MediaScannerConnection.scanFile(this, new String[] { mCurrentPhotoPath }, null, new MediaScannerConnection.OnScanCompletedListener() { public void onScanCompleted(String path, Uri uri) { } }); } I really can't figure out why it lose so much quality while saved.. Any help please ? Thanks..

Read the article

High apache load but little traffic logged

- by nrambeck

I recently installed Varnish to sit in front of Apache on a dedicated server running a single site. It appears to be working well, but the load on Apache is still very high. What doesn't make sense is that the Apache access log shows almost no traffic getting past Varnish. When I tail the apache log I see about 1-3 hits per second come through. Here is what the load on Apache looks like : USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND apache 13834 8.1 1.0 107716 34164 ? S 08:24 0:02 /usr/sbin/httpd apache 13835 8.1 1.0 107716 33856 ? S 08:24 0:02 /usr/sbin/httpd apache 11483 7.9 0.9 105916 30788 ? S 08:23 0:06 /usr/sbin/httpd apache 12255 7.5 1.0 107476 33312 ? S 08:24 0:04 /usr/sbin/httpd apache 9340 7.2 1.1 107732 34916 ? R 08:23 0:09 /usr/sbin/httpd apache 12029 6.8 0.9 106908 30416 ? S 08:24 0:04 /usr/sbin/httpd apache 11577 6.7 1.0 107192 34180 ? S 08:24 0:05 /usr/sbin/httpd apache 11486 6.6 1.0 106176 33112 ? S 08:23 0:05 /usr/sbin/httpd apache 11796 6.4 1.0 106936 31916 ? S 08:24 0:04 /usr/sbin/httpd apache 13815 6.3 1.0 107988 34464 ? S 08:24 0:02 /usr/sbin/httpd apache 18089 6.3 1.3 107444 43212 ? S 08:11 0:52 /usr/sbin/httpd apache 11797 5.9 1.0 107716 34580 ? S 08:24 0:04 /usr/sbin/httpd apache 7655 5.9 0.0 0 0 ? Z 08:22 0:09 [httpd] <defunct> mysql 8033 5.9 6.2 318240 199512 ? Sl May14 352:34 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/va apache 11488 5.8 0.9 106924 31632 ? S 08:23 0:04 /usr/sbin/httpd apache 9375 5.7 1.1 106956 35552 ? S 08:23 0:07 /usr/sbin/httpd apache 3551 5.6 1.1 106956 36140 ? S 08:20 0:14 /usr/sbin/httpd apache 7657 5.6 1.0 106968 32472 ? S 08:22 0:09 /usr/sbin/httpd apache 11433 5.6 1.0 107716 34396 ? S 08:23 0:04 /usr/sbin/httpd apache 5505 5.5 1.1 106944 34924 ? S 08:21 0:12 /usr/sbin/httpd apache 7172 5.3 1.1 106972 35368 ? S 08:22 0:09 /usr/sbin/httpd apache 10088 5.2 0.9 106160 31240 ? S 08:23 0:04 /usr/sbin/httpd apache 7656 5.1 1.0 106436 34388 ? S 08:22 0:08 /usr/sbin/httpd apache 3468 5.0 1.1 107716 35968 ? S 08:20 0:13 /usr/sbin/httpd apache 14242 4.8 1.0 107728 33032 ? S 08:25 0:00 /usr/sbin/httpd apache 3578 4.8 1.1 107988 35964 ? S 08:20 0:12 /usr/sbin/httpd apache 28192 4.8 1.2 106944 38060 ? S 08:17 0:23 /usr/sbin/httpd apache 3277 4.6 1.1 106956 35688 ? S 08:20 0:13 /usr/sbin/httpd apache 15434 3.7 0.7 106908 24684 ? S 08:25 0:00 /usr/sbin/httpd There is a default apache log and then one other VirtualHost log setup. I'm concerned that Apache is handling some kind of traffic that is not being logged. Is that possible? And is there anything I can do to capture that traffic?

Read the article

Determining cause of high NFS/IO utilization without iotop

- by Matt

I have a server that is doing an NFSv4 export for user's home directories. There are roughly 25 users (mostly developers/analysts) and about 40 servers mounting the home directory export. Performance is miserable, with users often seeing multi-second lags for simple commands (like ls, or writing a small text file). Sometimes the home directory mount completely hangs for minutes, with users getting "permission denied" errors. The hardware is a Dell R510 with dual E5620 CPUs and 8 GB RAM. There are eight 15k 2.5” 600 GB drives (Seagate ST3600057SS) configured in hardware RAID-6 with a single hot spare. RAID controller is a Dell PERC H700 w/512MB cache (Linux sees this as a LSI MegaSAS 9260). OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”. I have the HW RAID configured to present two virtual disks to the OS: /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for the home directories. What I find curious, and suspicious, is that the sda device often has very high utilization, even though it only contains the OS. I would expect this virtual drive to be idle almost all the time. The system is not swapping, according to "free" and "vmstat". Why would there be major load on this device? Here is a 30-second snapshot from iostat: Time: 09:37:28 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sdb 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 sdb1 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 dm-0 0.00 0.00 0.03 151.82 0.13 607.26 8.00 1.25 8.23 5.16 78.35 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 dm-3 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 Looks like iotop is the ideal tool to use to sniff out these kinds of issues. But I'm on CentOS 5.6, which doesn't have a new enough kernel to support that program. I looked at Determining which process is causing heavy disk I/O?, and besides iotop, one of the suggestions said to do a "echo 1 /proc/sys/vm/block_dump". I did that (after directing kernel messages to tempfs). In about 13 minutes I had about 700k reads or writes, roughly half from kjournald and the other half from nfsd: # egrep " kernel: .*(READ|WRITE)" messages | wc -l 768439 # egrep " kernel: kjournald.*(READ|WRITE)" messages | wc -l 403615 # egrep " kernel: nfsd.*(READ|WRITE)" messages | wc -l 314028 For what it's worth, for the last hour, utilization has constantly been over 90% for the home directory drive. My 30-second iostat keeps showing output like this: Time: 09:36:30 PM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.46 0.20 11.33 0.80 71.71 12.58 0.24 20.53 14.37 16.56 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 6.46 0.20 11.33 0.80 71.71 12.58 0.24 20.53 14.37 16.56 sdb 137.29 7.00 549.92 3.80 22817.19 43.19 82.57 3.02 5.45 1.74 96.32 sdb1 137.29 7.00 549.92 3.80 22817.19 43.19 82.57 3.02 5.45 1.74 96.32 dm-0 0.00 0.00 0.20 17.76 0.80 71.04 8.00 0.38 21.21 9.22 16.57 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 687.47 10.80 22817.19 43.19 65.48 4.62 6.61 1.43 99.81 dm-3 0.00 0.00 687.47 10.80 22817.19 43.19 65.48 4.62 6.61 1.43 99.82

Read the article

Diagnosing packet loss / high latency in Ubuntu

- by Sam Gammon

We have a Linux box (Ubuntu 12.04) running Nginx (1.5.2), which acts as a reverse proxy/load balancer to some Tornado and Apache hosts. The upstream servers are physically and logically close (same DC, sometimes same-rack) and show sub-millisecond latency between them: PING appserver (10.xx.xx.112) 56(84) bytes of data. 64 bytes from appserver (10.xx.xx.112): icmp_req=1 ttl=64 time=0.180 ms 64 bytes from appserver (10.xx.xx.112): icmp_req=2 ttl=64 time=0.165 ms 64 bytes from appserver (10.xx.xx.112): icmp_req=3 ttl=64 time=0.153 ms We receive a sustained load of about 500 requests per second, and are currently seeing regular packet loss / latency spikes from the Internet, even from basic pings: sam@AM-KEEN ~> ping -c 1000 loadbalancer PING 50.xx.xx.16 (50.xx.xx.16): 56 data bytes 64 bytes from loadbalancer: icmp_seq=0 ttl=56 time=11.624 ms 64 bytes from loadbalancer: icmp_seq=1 ttl=56 time=10.494 ms ... many packets later ... Request timeout for icmp_seq 2 64 bytes from loadbalancer: icmp_seq=2 ttl=56 time=1536.516 ms 64 bytes from loadbalancer: icmp_seq=3 ttl=56 time=536.907 ms 64 bytes from loadbalancer: icmp_seq=4 ttl=56 time=9.389 ms ... many packets later ... Request timeout for icmp_seq 919 64 bytes from loadbalancer: icmp_seq=918 ttl=56 time=2932.571 ms 64 bytes from loadbalancer: icmp_seq=919 ttl=56 time=1932.174 ms 64 bytes from loadbalancer: icmp_seq=920 ttl=56 time=932.018 ms 64 bytes from loadbalancer: icmp_seq=921 ttl=56 time=6.157 ms --- 50.xx.xx.16 ping statistics --- 1000 packets transmitted, 997 packets received, 0.3% packet loss round-trip min/avg/max/stddev = 5.119/52.712/2932.571/224.629 ms The pattern is always the same: things operate fine for a while (<20ms), then a ping drops completely, then three or four high-latency pings (1000ms), then it settles down again. Traffic comes in through a bonded public interface (we will call it bond0) configured as such: bond0 Link encap:Ethernet HWaddr 00:xx:xx:xx:xx:5d inet addr:50.xx.xx.16 Bcast:50.xx.xx.31 Mask:255.255.255.224 inet6 addr: <ipv6 address> Scope:Global inet6 addr: <ipv6 address> Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:527181270 errors:1 dropped:4 overruns:0 frame:1 TX packets:413335045 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:240016223540 (240.0 GB) TX bytes:104301759647 (104.3 GB) Requests are then submitted via HTTP to upstream servers on the private network (we can call it bond1), which is configured like so: bond1 Link encap:Ethernet HWaddr 00:xx:xx:xx:xx:5c inet addr:10.xx.xx.70 Bcast:10.xx.xx.127 Mask:255.255.255.192 inet6 addr: <ipv6 address> Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:430293342 errors:1 dropped:2 overruns:0 frame:1 TX packets:466983986 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:77714410892 (77.7 GB) TX bytes:227349392334 (227.3 GB) Output of uname -a: Linux <hostname> 3.5.0-42-generic #65~precise1-Ubuntu SMP Wed Oct 2 20:57:18 UTC 2013 x86_64 GNU/Linux We have customized sysctl.conf in an attempt to fix the problem, with no success. Output of /etc/sysctl.conf (with irrelevant configs omitted): # net: core net.core.netdev_max_backlog = 10000 # net: ipv4 stack net.ipv4.tcp_ecn = 2 net.ipv4.tcp_sack = 1 net.ipv4.tcp_fack = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_max_syn_backlog = 10000 net.ipv4.tcp_congestion_control = cubic net.ipv4.ip_local_port_range = 8000 65535 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_thin_dupack = 1 net.ipv4.tcp_thin_linear_timeouts = 1 net.netfilter.nf_conntrack_max = 99999999 net.netfilter.nf_conntrack_tcp_timeout_established = 300 Output of dmesg -d, with non-ICMP UFW messages suppressed: [508315.349295 < 19.852453>] [UFW BLOCK] IN=bond1 OUT= MAC=<mac addresses> SRC=118.xx.xx.143 DST=50.xx.xx.16 LEN=68 TOS=0x00 PREC=0x00 TTL=51 ID=43221 PROTO=ICMP TYPE=3 CODE=1 [SRC=50.xx.xx.16 DST=118.xx.xx.143 LEN=40 TOS=0x00 PREC=0x00 TTL=249 ID=10220 DF PROTO=TCP SPT=80 DPT=53817 WINDOW=8190 RES=0x00 ACK FIN URGP=0 ] [517787.732242 < 0.443127>] Peer 190.xx.xx.131:59705/80 unexpectedly shrunk window 1155488866:1155489425 (repaired) How can I go about diagnosing the cause of this problem, on a Debian-family Linux box?

Search Results

Search found 10429 results on 418 pages for 'high resolution'.

Page 45/418 | < Previous Page | 41 42 43 44 45 46 47 48 49 50 51 52 | Next Page >

- by RadarNyan

- by Ibrahim Azhar Armar

- by Tim Alexander

- by Mark C

- by Nanne

- by Sagar Patel

- by justin.chmura

- by Holapress

- by Daniel

- by Dorjan

- by hadi

- by Jake Schoermer

- by Rick Koshi

- by Ernie

- by wanson

- by Oz.

- by silla

- by BarDev

- by Felix

- by Nikhil

- by feelfree

- by Luca D'Amico

- by nrambeck

- by Matt

- by Sam Gammon

< Previous Page | 41 42 43 44 45 46 47 48 49 50 51 52 | Next Page >