ASP.Net Web Farm Monitoring
Posted
by cisellis
on Stack Overflow
See other posts from Stack Overflow
or by cisellis
Published on 2010-05-03T20:39:15Z
Indexed on
2010/05/03
23:18 UTC
Read the original article
Hit count: 575
I am looking for suggestions on doing some simple monitoring of an ASP.Net web farm as close to real-time as possible. The objectives of this question are to:
- Identify the best way to monitor several Windows Server production boxes during short (minutes long) period of ridiculous load
- Receive near-real-time feedback on a few key metrics about each box. These are simple metrics available via WMI such as CPU, Memory and Disk Paging. I am defining my time constraints as soon as possible with 120 seconds delayed being the absolute upper limit.
- Monitor whether any given box is up (with "up" being defined as responding web requests in a reasonable amount of time)
Here are more details, things I've tried, etc.
- I am not interested in logging. We have logging solutions in place.
- I have looked at solutions such as ELMAH which don't provide much in the way of hardware monitoring and are not visible across an entire web farm.
- ASP.Net Health Monitoring is too broad, focuses too much on logging and is not acceptable for deep analysis.
- We are on Amazon Web Services and we have looked into CloudWatch. It looks great but messages in the forum indicate that the metrics are often a few minutes behind, with one thread citing 2 minutes as the absolute soonest you could expect to receive the feedback. This would be good to have for later analysis but does not help us real-time
- Stuff like JetBrains profiler is good for testing but again, not helpful during real-time monitoring.
- The closest out-of-box solution I've seen is Nagios which is free and appears to measure key indicators on any kind of box, including Windows. However, it appears to require a Linux box to run itself on and a good deal of manual configuration. I'd prefer to not spend my time mining config files and then be up a creek when it fails in production since Linux is not my main (or even secondary) environment.
Are there any out-of-box solutions that I am missing? Obviously a windows-based solution that is easy to setup is ideal. I don't require many bells and whistles.
In the absence of an out-of-box solution, it seems easy for me to write something simple to handle what I need. I've been thinking a simple client-server setup where the server requests a few WMI metrics from each client over http and sticks them in a database. We could then monitor the metrics via a query or a dashboard or something. If the client doesn't respond, it's effectively down.
Any problems with this, best practices, or other ideas?
Thanks for any help/feedback.
© Stack Overflow or respective owner