Search Results

Search found 334 results on 14 pages for 'nagios'.

Page 5/14 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Nagios Availability Report plotting in Graphite

    - by Roman Zenka
    I would like to plot my Nagios availability reports over time. Does anyone know a plugin that would do so? I found 'graphios' at https://github.com/shawn-sterling/graphios that would plot extra data provided by plugins. What I need instead is a plugin that would plot information such as: 'the service was in ERROR state 0.5% of the time last week, while the week before it was 1%'. This would be useful for knowing whether the overall stability is getting better.

    Read the article

  • nagios grouping notifications

    - by rizen
    Is it possible to configure nagios to group notifications into a single e-mail? Sometimes when something goes down my inbox gets spammed with all the notifications. It would just be nice if these could somehow be lumped together. Does anyone know if this is possible?

    Read the article

  • Syntax error at '{'; expected '}' when using nagios in puppet

    - by jiangchengwu
    It's a big problem to me, because I'm not familiar with puppet. ERROR on the puppetmaster: debug: importing '/etc/puppet/manifests/nodes/group-1.pp' err: Could not parse for environment production: Syntax error at '{'; expected '}' at /etc/puppet/manifests/nodes/group-1.pp:6 ERROR on the puppet client: err: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not parse for environment production: Syntax error at '{'; expected '}' at /etc/puppet/manifests/nodes/group-1.pp:6 in group-1.pp: node 'group1' { include ntp class { 'nagios::host': #this is line 6 nodename => $clientcert, appname => 'test', } } nagios::host in module module/nagios/host.pp code are here: class nagios::host($nodename, $hostgroup) { file { '/usr/lib/nagios/plugins': mode = "755", require = Package["nagios-plugins"], } ... @@nagios_service { "${nodename}_check_ssh": ensure => present, use => 'generic-service', host_name => "${nodename}", notification_interval => 60, flap_detection_enabled => 0, service_description => "SSH", check_command => "check_ssh", target => "/etc/nagios3/services.d/${nodename}.cfg", } } and the file module/nagios/init.pp is blank How could I fix it ?

    Read the article

  • Nagios Terminal Services check?

    - by jldugger
    Most of our servers are licensed for 2 concurrent remote desktop sessions. This is fine, so long as everyone does their administrative task and logs off, but some people accidentally close sessions (disconnect but remain logged in) instead. I know that you can force someone off with the right Admin tools, but it's a bit ugly and may hurt productivity or maybe even the server(?). I was thinking that a nightly Nagios check of remote sessions available nagging people would help enforce build discipline on the subject. Can anyone recommend a service check that can monitor terminal service availability?

    Read the article

  • Cause of flapping UNKNOWN Nagios status?

    - by jldugger
    We run some Nagios service checks via OpsView, and one of our hosts is getting a strange response for SSH: "UNKNOWN: Service results are stale" It happens regularly, but seems to go away as the system retries a 2nd and 3rd time. It started after a patch and reboot of the server in question last week. The system itself responds to SSH from boxes I've tested with (which doesn't include the monitoring system I am not given access to). /var/log/secure is full of lines ala: sshd[15628]: Did not receive identification string from xxx.xxx.226.20 Time stamps are reliably every five minutes, which is pretty obviously the monitoring script disconnecting once it gets a login prompt. Anyone know what might be causing this, or how to fix it? It's really frustrating to see this pop on and off the status page.

    Read the article

  • nagios wrongly reports packet loss

    - by Alien Life Form
    Lately, on my nagios 3.2.3 install (CentOS5, monitoring ~ 300 hosts, 1150 services) has sdtarted to occasionally report high packet loss on 50-60 hosts at a time. Problem is it's bogus. Manual runs of ping (or its own check_ping binary) finds no fault with any of the affected hosts. The only possible cures I found so far are: run all the checks manually (they will succeed but it may act up again on next check) acknowledge and wait for the problem to go away (may take several ours) I suspect (but have no particular reason other than single rescheduled checks succeeding) that the problem may lay with all the checks being mass scheduled together - in which case introducing some jitter in the scheduling (how?) might help. Or it may be something completely different. Ideas, anyone?

    Read the article

  • Reducing the time between checks on a Nagios server

    - by Farooq Hussain
    Can anyone let me know how I would reduce time between Last Check Time and Next Scheduled Check on a particular service. I have a very critical task to monitor and the time between checks is currently 5 minutes, which is too long for this service. Can I reduce that time? I need this to be 1 minute or even 30 seconds. I want Nagios to check this service every 30 seconds. I currently have defined the service as follows: define service{ use local-service host_name OpenSIP,test-RTSIP service_description SIP Registration check_command check_nrpe!check_sipreg check_freshness 0 freshness_threshold 900 active_checks_enabled 1 passive_checks_enabled 1 }

    Read the article

  • How to configure nagios realtime SMS alert

    - by Jerry
    I have configured nagios SMS alert and it takes around one minute to send notification. I want to get SMS notification withing one/two second(s) after system/service failure. I could not find any way to send sms alert in a second. Can anybody help me??? Update Wednesday, 29 August 9:26:43 a.m GMT define host{ use generic-host ; Name of host template to use host_name localhost alias localhost address x.x.x.187 check_command check-host-alive normal_check_interval 1 max_check_attempts 1 retry_interval 1 notification_interval 120 notification_period 24x7 notification_options d,r contact_groups admins }

    Read the article

  • critical swap nagios

    - by Toby Joiner
    I installed nagios a very long time ago, and have started trying to use it now. I am getting this error: Current Status: CRITICAL (for 231d 16h 52m 49s) Status Information: SWAP CRITICAL - 100% free (0 MB out of 0 MB) Performance Data: swap=0MB;0;0;0;0 Current Attempt: 4/4 (HARD state) Last Check Time: 01-09-2011 13:26:34 Check Type: ACTIVE Check Latency / Duration: 0.125 / 0.004 seconds Next Scheduled Check: 01-09-2011 13:31:34 Last State Change: 05-22-2010 21:36:47 Last Notification: 01-09-2011 13:01:42 (notification 5521) Is This Service Flapping? NO (0.00% state change) In Scheduled Downtime? NO Last Update: 01-09-2011 13:29:32 ( 0d 0h 0m 4s ago) Is this normal? Should I be concerned? If more info is needed please let me know.

    Read the article

  • How can I monitor VNC via Nagios?

    - by atroon
    I have a number of remote sites which have VNC running on a few computers for support purposes. They are (obviously) only available on our internal network. I am using Nagios to keep track of all the systems in the network and I want to have it check to make sure the VNC server is running on the appropriate hosts. There is a 'check_vnc' plugin available here but it relies on VNC Snapshot which I don't want to use. Certainly I could use it, but it adds more complexity and dependency, which I want to avoid. It seems simpler to just use check_tcp to make sure I get the proper response to a connection request for VNC, e.g. port 5900, send a connect string, get back framebuffer info. My real question, I suppose, is this: What is the 'proper' generic connect string for VNC (I use both UltraVNC and RealVNC) and what is the expected response? If it's really easier to use the VNC Snapshot and check_vnc, let me know. I just can't imagine that a string of text isn't easier, faster, and less bandwidth intensive to monitor.

    Read the article

  • Monitor total service uptime using Icinga

    - by dhh
    I am using Icinga (Nagios fork) for monitoring ~10 webservers, each one providing different services. I would now like to provide an aggregated view on the server states on our companies intranet, providing information like: server | state | last downtime | Ø uptime (month) | Ø uptime (year) Srv1 | OK | 2013-10-09 | 99,5% | 99,8 % Srv2 | ERROR | 2013-10-31 | 73,1% | 85,4 % Is there a possibility to get those values from icinga?

    Read the article

  • Having C# application communicate with Nagios

    - by user200295
    We are using Nagios to monitor our network with great results. There is now a new requirement we are struggling with: We want to notify Nagios of an non fatal but critical application errors. The application does not stop running but there is some sort of issue that needs looking into. Once the issue has been looked into, we need some way to "unflag" the issue in Nagios. We tried using the syslog, but the biggest problem was once an error was logged, the service was put into an error state with no way to recover. Also, while applications would report a critical error to the syslog, most of the time they don't report an "All clear" error.

    Read the article

  • pnp4nagios does not generate perfdata

    - by gonvaled
    I am running nagios2, pnp4nagios-0.6.16 and php 5.2.4-2ubuntu5.19. In my setup, pnp4nagios is correctly generating perfdata, which can be seen via the web interface in graphical form for lots of services. The perfdata directory contains entries of the kind: /usr/local/pnp4nagios/var/perfdata/zeus/Disk_Space_Home.rrd /usr/local/pnp4nagios/var/perfdata/zeus/Disk_Space_Home.xml I have activated performance data for a new nagios service: define serviceextinfo { host_name zeus service_description 450average action_url /pnp4nagios/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ } This service is generating monitoring data in the format: status_info|perf_data as required for performance gathering. But somehow the performance data related to this service is not being collected by pnp4nagios (no related entries in /usr/local/pnp4nagios/var/perfdata) Are there any pnp4nagios scripts or settings which I could use to debug this?

    Read the article

  • Configure Nagios To Alert Depending On Host Group That Service Alert Originates From

    - by StrangeWill
    So my setup: Services are shared between all hosts (CPU/RAM/Disk/Services). Hosts are split into two main groups: "Production" and "Development". We have two contact groups: "Production" and "Development". Lets say my development SQL server runs low on RAM, I want it to only alert those in "Development" contact group (this service is of course assigned to a host in the "Development" host group, using the shared RAM monitoring service). I'm pretty much stumped on this... I can't configure it at the service level (they're shared there), and I can't seem to get escalations to do it for me either... Do I need to use service groups along with escalations and bite the bullet on building that list? Or am I missing something stupidly simple? I'm using Centreon for configuration if that helps.

    Read the article

  • nagios: trouble using check_smtps command

    - by ethrbunny
    I'm trying to use this command to check on port 587 for my postfix server. Using nmap -P0 mail.server.com I see this: Starting Nmap 5.51 ( http://nmap.org ) at 2013-11-04 05:01 PST Nmap scan report for mail.server.com (xx.xx.xx.xx) Host is up (0.0016s latency). rDNS record for xx.xx.xx.xx: another.server.com Not shown: 990 closed ports PORT STATE SERVICE 22/tcp open ssh 25/tcp open smtp 110/tcp open pop3 111/tcp open rpcbind 143/tcp open imap 465/tcp open smtps 587/tcp open submission 993/tcp open imaps 995/tcp open pop3s 5666/tcp open nrpe So I know the relevant ports for smtps (465 or 587) are open. When I use openssl s_client -connect mail.server.com:587 -starttls smtp I get a connection with all the various SSL info. (Same for port 465). But when I try libexec/check_ssmtp -H mail.server.com -p587 I get: CRITICAL - Cannot make SSL connection. 140200102082408:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:s23_clnt.c:699: What am I doing wrong?

    Read the article

  • shinken/nagios discriminative between warning alert and critical alert

    - by SWdream
    i using shinken for my monitoring system. Now, i have a problem when i configure shinken notification. My purpose is to discriminative between notification for warning state and critical state of check service: with warning state: + time to send alert from 8h = 18 h everyday, via email and sms + notification_interval is 60 minutes (Re-notify about service problems every hour) with critical state: + time to send alert : all time (24 x 7), via email and sms + notification_interval is 30 minutes Please show me how to solve my problem! I have tried the following: i configured: + contact templates: define contact{ name warning-contact ; The name of this contact template register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE! host_notifications_enabled 1 define contact{ service_notifications_enabled 1 email shinken@localhost can_submit_commands 1 notificationways email_warning, sms_warning } define contact{ name critical-contact ; The name of this contact template register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE! host_notifications_enabled 1 service_notifications_enabled 1 email shinken@localhost can_submit_commands 1 notificationways email_critical, sms_critical } + time poriod templates: define timeperiod{ timeperiod_name warning alias Normal Work Hours monday 08:00-18:00 tuesday 08:00-18:00 wednesday 08:00-18:00 thursday 08:00-18:00 friday 08:00-18:00 saturday 08:00-18:00 sunday 08:00-18:00 #exclude 24x7 } define timeperiod{ timeperiod_name 24x7 alias 24_Hours_A_Day,_7_Days_A_Week sunday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 #exclude workhours } + notification way templates: define notificationway{ notificationway_name email_warning service_notification_period warning host_notification_period warning service_notification_options w host_notification_options d,u,r,f,s notification_interval 60 ; Resend notifications every 30 minutes service_notification_commands notify-service-by-email ; send service notifications via email host_notification_commands notify-host-by-email ; send host notifications via email } define notificationway{ notificationway_name email_critical service_notification_period 24x7 host_notification_period 24x7 service_notification_options c,r host_notification_options d,u,r,f,s notification_interval 30 ; Resend notifications every 30 minutes service_notification_commands notify-service-by-email ; send service notifications via email host_notification_commands notify-host-by-email ; send host notifications via email } define notificationway{ notificationway_name sms_warning service_notification_period warning host_notification_period warning service_notification_options w host_notification_options d,u,r,f,s notification_interval 60 ; Resend notifications every 30 minutes service_notification_commands notify-service-by-sms ; send service notifications via sms host_notification_commands notify-host-by-sms ; send host notifications via sms } define notificationway{ notificationway_name sms_critical service_notification_period 24x7 host_notification_period 24x7 service_notification_options c,r host_notification_options d,u,r,f,s notification_interval 30 ; Resend notifications every 30 minutes service_notification_commands notify-service-by-sms ; send service notifications via sms host_notification_commands notify-host-by-sms ; send host notifications via sms } + my contacts define contact{ use warning-contact contact_name thanhwarn email xxxx pager xxxx ; contact phone number } define contact{ use critical-contact contact_name thanhcritical email xxxxx pager 01689xxxx ; contact phone number } + and define service: define service{ use generic-service service_description check_ram host_name graphite contacts thanhcritical, thanhwarn check_command check_nrpe!check_ram } but my shinken system don't send alert. i don't understand this. please show me where I went wrong! thanks all!

    Read the article

  • .bat file - Nagios v3.2 service check and start if stopped

    - by LbakerIT
    I'm just barely getting into programming so I do apologize for my ignorance. I'm trying to create a .bat file that will check if a service is running on XP Pro. If service is running it will exit 0. If the service is stopped start service wait 10 seconds (via ping i'm guessing) check if service is running if service is running exit 0 if service is stopped start service wait 10 seconds Do this check a total of 3 times. if service does not come up within that time: exit 2 Exit 0 = ok exit 1 = warning exit 3 = critical (and this will alert) I need to do this for 3 different services but i'm expecting that it would be better to create one per service. That way you get notified on the specific service that is not coming back up. The goal is that if the service stops it will start it. If after 30 seconds it is unable to start the service then it will send an alert. The reason I'm trying to do it with a .bat is this is consistent with all other scripts and I did not want to complicate it further by adding different kinds of code. Yay for consistency! Again I do apologize for my ignorance I've been thrown into this project last minute. Thank you for the help and reading my question!

    Read the article

  • Nagios - Custom Report Generation

    - by Naveen
    Hi, I want to generate a custom report in "Nagios-3.2.0". I have defined the work-hours in "timeperiods.cfg" as follows: 'workhours' timeperiod definition define timeperiod { timeperiod_name 0800-2000 alias full time monday 08:00-20:00 tuesday 08:00-20:00 wednesday 08:00-20:00 thursday 08:00-20:00 friday 08:00-20:00 saturday 08:00-20:00 } Now, if I replace "saturday" with "2010-03-27" or "march 27" as shown below: 'workhours' timeperiod definition define timeperiod { timeperiod_name 0800-2000 alias full time monday 08:00-20:00 tuesday 08:00-20:00 wednesday 08:00-20:00 thursday 08:00-20:00 friday 08:00-20:00 2010-03-27 08:00-20:00 } Nagios is not generating report for the given date (2010-03-27). How can I modify "timeperiods.cfg" so that I can generate reports for the given dates ?

    Read the article

  • How do you keep up with Nagios/Capistrano configs when using EC2?

    - by imaginative
    I use Amazon EC2 for my mobile app. Depending on load of the application at a given time, I might spawn new instances and then take them down when load is lower to save costs. How does one keep up with Nagios configurations for such a dynamic environment? When one deals with managed hardware, configuration files are predictable. In this case Nagios, Capistrano and a bunch of other configuration files would need to be added. Capistrano needs to know where to deploy a new build to for an app server. Nagios needs to know to remove an existing instance or add a new instance for monitoring. Nagios also needs to know if a node was intentionally taken down or if the host is down due to error. How is this done with the wonderful world of VPS/dynamic instances?

    Read the article

  • Dependencies issue while installing Graphviz 2.28

    - by M. Saâd
    I want to install this packages for Nagvis : graphviz-2.28.0-1.el6.i686.rpm graphviz-doc-2.28.0-1.el6.i686.rpm graphviz-gd-2.28.0-1.el6.i686.rpm graphviz-graphs-2.28.0-1.el6.i686.rpm graphviz-perl-2.28.0-1.el6.i686.rpm But while installing, i have this error : # rpm -ivh graphviz-2.28.0-1.el6.i686.rpm erreur: Dépendances requises: libgdkglext-x11-1.0.so.0 est nécessaire pour graphviz-2.28.0-1.el6.i686 libglut.so.3 est nécessaire pour graphviz-2.28.0-1.el6.i686 libgtkglext-x11-1.0.so.0 est nécessaire pour graphviz-2.28.0-1.el6.i686 libgts-0.7.so.5 est nécessaire pour graphviz-2.28.0-1.el6.i686

    Read the article

  • View Centreon graph data

    - by Rich
    Is there a way to view the data that Centreon uses to build graphs, from within the Centreon web interface? We have some gaps in some of our graphs, and I would like to see if it is a problem with the data being returned from the NRPE plugins. I have seen the MonitoringEvent Logs section, but I can't get that to show the returned string and status for each call to a particular plugin, which is what I'd like. Is there a hidden function to do this? Thanks in advance

    Read the article

  • How to parse nagios status.dat file?

    - by daniels
    I'd like to parse status.dat file for nagios3 and output as xml with a python script. The xml part is the easy one but how do I go about parsing the file? Use multi line regex? It's possible the file will be large as many hosts and services are monitored, will loading the whole file in memory be wise? I only need to extract services that have critical state and host they belong to. Any help and pointing in the right direction will be highly appreciated. LE Here's how the file looks: ######################################## # NAGIOS STATUS FILE # # THIS FILE IS AUTOMATICALLY GENERATED # BY NAGIOS. DO NOT MODIFY THIS FILE! ######################################## info { created=1233491098 version=2.11 } program { modified_host_attributes=0 modified_service_attributes=0 nagios_pid=15015 daemon_mode=1 program_start=1233490393 last_command_check=0 last_log_rotation=0 enable_notifications=1 active_service_checks_enabled=1 passive_service_checks_enabled=1 active_host_checks_enabled=1 passive_host_checks_enabled=1 enable_event_handlers=1 obsess_over_services=0 obsess_over_hosts=0 check_service_freshness=1 check_host_freshness=0 enable_flap_detection=0 enable_failure_prediction=1 process_performance_data=0 global_host_event_handler= global_service_event_handler= total_external_command_buffer_slots=4096 used_external_command_buffer_slots=0 high_external_command_buffer_slots=0 total_check_result_buffer_slots=4096 used_check_result_buffer_slots=0 high_check_result_buffer_slots=2 } host { host_name=localhost modified_attributes=0 check_command=check-host-alive event_handler= has_been_checked=1 should_be_scheduled=0 check_execution_time=0.019 check_latency=0.000 check_type=0 current_state=0 last_hard_state=0 plugin_output=PING OK - Packet loss = 0%, RTA = 3.57 ms performance_data= last_check=1233490883 next_check=0 current_attempt=1 max_attempts=10 state_type=1 last_state_change=1233489475 last_hard_state_change=1233489475 last_time_up=1233490883 last_time_down=0 last_time_unreachable=0 last_notification=0 next_notification=0 no_more_notifications=0 current_notification_number=0 notifications_enabled=1 problem_has_been_acknowledged=0 acknowledgement_type=0 active_checks_enabled=1 passive_checks_enabled=1 event_handler_enabled=1 flap_detection_enabled=1 failure_prediction_enabled=1 process_performance_data=1 obsess_over_host=1 last_update=1233491098 is_flapping=0 percent_state_change=0.00 scheduled_downtime_depth=0 } service { host_name=gateway service_description=PING modified_attributes=0 check_command=check_ping!100.0,20%!500.0,60% event_handler= has_been_checked=1 should_be_scheduled=1 check_execution_time=4.017 check_latency=0.210 check_type=0 current_state=0 last_hard_state=0 current_attempt=1 max_attempts=4 state_type=1 last_state_change=1233489432 last_hard_state_change=1233489432 last_time_ok=1233491078 last_time_warning=0 last_time_unknown=0 last_time_critical=0 plugin_output=PING OK - Packet loss = 0%, RTA = 2.98 ms performance_data= last_check=1233491078 next_check=1233491378 current_notification_number=0 last_notification=0 next_notification=0 no_more_notifications=0 notifications_enabled=1 active_checks_enabled=1 passive_checks_enabled=1 event_handler_enabled=1 problem_has_been_acknowledged=0 acknowledgement_type=0 flap_detection_enabled=1 failure_prediction_enabled=1 process_performance_data=1 obsess_over_service=1 last_update=1233491098 is_flapping=0 percent_state_change=0.00 scheduled_downtime_depth=0 } It can have any number of hosts and a host can have any number of services.

    Read the article

  • Is it possible to ack nagios alerts from the terminal on a remote workstation?

    - by cat pants
    I have nagios alerts set up to come through jabber with an http link to ack. Is is possible there is a script I can run from a terminal on a remote workstation that takes the hostname as a parameter and acks the alert? ./ack hostname The benefit, while seemingly mundane, is threefold. First, take http load off nagios. Secondly, nagios http pages can take up to 10-20 seconds to load, so I want to save time there. Thirdly, avoiding slower use of mouse + web interface + firefox/other annoyingly slow browser. Ideally, I would like a script bound to a keyboard shortcut that simply acks the most recent alert. Finally, I want to take the inputs from a joystick, buttons and whatnot, and connect one to a big red button bound to the script so I can just ack the most recent nagios alert by hitting the button lol. (It would be rad too if the button had a screen on the enclosure that showed the text of the alert getting acked lol) Make fun of me all you want, but this is actually something that would be useful to me. If I can save five seconds per alert, and I get 200 alerts per day I need to ack, that's saving me 15 minutes a day. And isn't the whole point of the sysadmin to automate what can be automated? Thanks!

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >