Why does Process Explorer cause highly targeted failure of some applications / basic UI functions in a high-power EC2 Windows instance?
Posted
by
Dan Nissenbaum
on Server Fault
See other posts from Server Fault
or by Dan Nissenbaum
Published on 2013-11-09T00:28:33Z
Indexed on
2013/11/09
4:00 UTC
Read the original article
Hit count: 456
Update:
I have determined that Process Explorer itself - the program I am using to debug a performance issue - seems to be the cause of the issue.
See note, with updated question, at end.
I am running a high-power (cc2.8xlarge) Amazon AWS EC2 Windows instance off of a boot EBS volume, provisioned at 2500 PIOPS, which was created from a snapshot of a previous boot volume.
My purpose with the instance is to use it as a development workstation with many developer tools installed, such as Visual Studio, a local XAMPP stack, etc. I have upwards of 40 programs installed on the machine.
The usability of the instance as a development machine often works quite well. The RDP lag is adequately small. I have used it for hours on end without problems for some of my most intense development tasks.
As a result, I have just purchased a reserved instance, and I opted to rebuild my development machine starting from scratch with a Windows Server 2012 AMI.
After having installed all of my desired/required applications for development over this past week, again the machine seems to often work well and I have worked for up to an hour at a time without problems doing heavy development work.
However, I continue to run into catastrophic OS usability issues that may prevent me from being able to rely on this machine as a development machine. I would like to track down the source of the problem, if there is an easily identifiable source. (Update: I have tracked down the source to be Process Explorer, the very program I was using to debug the problem. See update at end.)
The issues are as follows. (These are some primary examples)
Some applications, after a period of adequate responsiveness, suddenly begin to respond very, very slowly to basic user interface actions such as clicking on menus and pressing Ctrl-Tab to switch between open documents. Two examples are UltraEdit and PhpEd. It typically takes ~2 seconds for a menu to appear, and ~4 seconds to switch between open documents. Additionally, insertion point motion in the editor is lagged by upwards of ~2 seconds.
Process Explorer, which I am using to help debug the problem, seems to run acceptably for a couple of minutes, but on multiple occasions Process Explorer itself hangs completely. It hangs at the same time as the problems noted above. When it hangs, it is 100% unresponsive. Clicking on its taskbar icon neither causes it to come to the top or go behind, and its viewable area is filled with nothing but a region partially containing pure white and partially containing incomplete windows widgets that are unreadable, and that never change. Waiting 10 minutes does not clear the problem. Attempting to force-quit Process Explorer by right-clicking on its taskbar icon and choosing "Close Window" takes about 5 full minutes to exit (Process Explorer itself can't be used to exit Process Explorer, and it is registered as a Task Manager substitute).
Other programs work just fine during this time. For example, Chrome tabs flip very quickly back and forth, menus pop open instantly, web pages load quickly, and typing in forms/web applications inside the browser works promptly. Another example of an application that works crisply is Filemaker - its menus open instantly, and switching views in this application occurs promptly. Other applications also work without issue. Also, switching between applications occurs promptly as well.
It is only a handful of applications that exhibit the problem, with some primary examples given above.
At first I thought that EBS IOPS might be a problem. Therefore, I ran Performance Monitor, and watched the "Disk Transfers/sec" monitor in real time. At no point did this measure come anywhere close to hitting the 2500 PIOPS provisioned for the EBS volume.
The RAM was also well under the limit (~10 GB used out of 60 GB).
I did notice that one CPU core (out of 32 logical cores) was fully thrashing at 100% (i.e., ~3.1%) during the problematic periods. This seems to indicate that a single CPU core is handling the menus / flipping between open documents (for some applications only) / managing the Process Explorer user interface, and that this single core was hosed for some reason during the problematic periods.
Also note that I have a desktop workstation (Windows 7) that I also use as a development machine, via a remote connection, with a nearly identical set of programs installed, and this desktop workstation does not exhibit any of the problems I've discussed above. I have been using it heavily for well over a year now.
Any suggestions regarding either the source of the problem, or steps I might take to investigate the source of the problem, would be appreciated. Thanks.
Note: After extensive testing & investigation, I have noticed that when I quit Process Explorer, the problem vanishes and the system performance returns to normal, and then reappears quickly when I run Process Explorer again (note: again, the performance problems only appear for a subset of applications - other applications work perfectly fine during the same period).
My question is therefore (thankfully) more specific: Why does Process Explorer cause highly targeted failure of some applications (including itself) and basic UI functions, in a high-power EC2 Windows instance?
© Server Fault or respective owner