Search Results

Search found 334 results on 14 pages for 'nagios'.

Page 9/14 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page >

Best tool for monitoring backups, etc. and trending statstics from that data

- by Randy Syring

I have done some research on nagios, opennms, and zenoss but am not confident that I have found what I am looking for. The main driving force for me right now is being able to monitor backups. This includes mysql, mssql, and eventually some file system backups. We have a tool that wraps the backup process for these different systems and collects statistics. So, items like: number of databases backed up size of db backup file size of db backup file compressed time to make backup time to zip file I want to be able to A) have notifications if the jobs are not run according to schedule B) be able to set thresholds on the statistics which would trigger notifications C) I want to be able to trend and graph the statistics I am planning on sending this information to the monitoring application through an HTTP POST. Or, the monitoring application could pull it from a log file as well. However, we will have other processes with other "arbitrary" (from the monitoring system's perspective) statics that will want to monitor and trend, so flexibility is very important. The tool or tools should also be able to do general monitoring and trending of network interfaces, server load, etc. Once we get the backup monitoring in place, we will want to include those items as well. Thanks.

Read the article
Using udev to create a character device based on a driver being loaded

- by SteveCB

I'm in the process of setting up RAID monitoring for a number of Dell servers that use the PERC 6i integrated card. We're using Nagios at present and the check_megasasctl plugin seems to fit the bill. However, the plugin relies upon the existence of: /dev/megaraid_sas_ioctl_node This device node doesn't exist by default, you have to create it by hand using something like: mknod /dev/megaraid_sas_ioctl_node c 253 0 Now, to make the existence of this device node persistent across reboots, I thought I could write a udev rule, but as usual, I'm missing something. I thought I could create a file such as /etc/udev/rules.d/10-local/rules that contained: DRIVER=="megasas" NAME="megaraid_sas_ioctl_node" MODE="0600" But this doesn't work - no device node after a reboot. Dmesg output indicates the megasas driver is loaded and functional: megasas: 00.00.04.01-RH1 Thu July 10 09:41:51 PST 2008 megasas: 0x1000:0x0060:0x1028:0x1f0c: bus 1:slot 0:func 0 megasas: FW now in Ready state Further, I don't see any means to instruct udev on which type of device node to create: character or block. I suspect I'm failing to understand exactly how udev is meant to work. I realise I could just cheat and run MegaCLI in /etc/rc.local, redirecting output to /dev/null; it creates the megaraid_sas_ioctl_node device node as part of its execution. I just thought using udev rules would be a) cleaner and b) a useful learning exercise. Perhaps I should just dump the above mknod command in /etc/rc.local... So how do I get udev to create the /dev/megaraid_sas_ioctl_node device node based on the presence of the megasas driver? Cheers Steve

Read the article
Mobile app for sysadmins with monitoring and fixing tools(SSH, ping, traceroute) [closed]

- by Roman

I present a start-up company which is working on a new mobile tool for system administrators. Our team has released first several versions of Server Auditor which is now just a SSH terminal with special UI approach for touch devices and got quite good feedbacks, e.g. iOS and Android. Now we are thinking about adding extra features to make Server Auditor a tool number one for all system administrators and would like to know your opinion. Main question would you use a tool like Server Auditor with extra features described below: Fast problem fixing - preloaded recipes/snippets, e.g. clean logs, restart a process, reboot etc. Secure user data synchronisation(IP/DNS name, connection options, keys, snippets) across all your devices iPhone and Android. Built-in tools like ping, traceroute, whois System status integration - you can observe information about the system in a friendly way, e.g CPU load, hard drive and RAM usage etc. Monitoring tool integration. Your servers are watched by our Nagios-like system in the cloud and you get notified by push-notifications/SMS. Similar products are Server Density, CopperEgg. If we start to implement features from 1 to 5 when you will be ready to start use it or even potentially pay for it? Can you see any issues that would prevent you from using this kind of system? Thank you a lot for your time, we kindly appreciate it. Looking forward to hear your opinion

Read the article
cscript - Invalid procedure call or argument when running a vbs file

- by quanta

I've been trying to use check_time.vbs to check the Windows time. Here's the script: http://pastebin.com/NfUrCAqU The help message could be display: C:\Program Files\NSClient++\scripts>cscript //NoLogo check_time.vbs /? check_time.vbs V1.01 Usage: cscript /NoLogo check_time.vbs serverlist warn crit [biggest] Options: serverlist (required): one or more server names, coma-separated warn (required): warning offset in seconds, can be partial crit (required): critical offset in seconds, can be partial biggest (optional): if multiple servers, else use default least offset Example: cscript /NoLogo check_time.vbs myserver1,myserver2 0.4 5 biggest But I get the following error when running: C:\Program Files\NSClient++\scripts>cscript //NoLogo check_time.vbs 0.asia.pool.ntp.org 20 50 C:\Program Files\NSClient++\scripts\check_time.vbs(53, 1) Microsoft VBScript run time error: Invalid procedure call or argument The screenshot: Manually execute w32tm still works fine: What might be the cause of this?

Read the article
How to calculate CPU % based on raw CPU ticks in SNMP

- by bjeanes

According to http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html#scalar_notcurrent ssCpuUser, ssCpuSystem, ssCpuIdle, etc are deprecated in favor of the raw variants (ssCpuRawUser, etc). The former values (which don't cover things like nice, wait, kernel, interrupt, etc) returned a percentage value: The percentage of CPU time spent processing user-level code, calculated over the last minute. This object has been deprecated in favour of 'ssCpuRawUser(50)', which can be used to calculate the same metric, but over any desired time period. The raw values return the "raw" number of ticks the CPU spent: The number of 'ticks' (typically 1/100s) spent processing user-level code. On a multi-processor system, the 'ssCpuRaw*' counters are cumulative over all CPUs, so their sum will typically be N*100 (for N processors). My question is: how do you turn the number of ticks into percentage? That is, how do you know how many ticks per second (it's typically — which implies not always — 1/100s, which either means 1 every 100 seconds or that a tick represents 1/100th of a second). I imagine you also need to know how many CPUs there are or you need to fetch all the CPU values to add them all together. I can't seem to find a MIB that gives you an integer value for # of CPUs which makes the former route awkward. The latter route seems unreliable because some of the numbers overlap (sometimes). For example, ssCpuRawWait has the following warning: This object will not be implemented on hosts where the underlying operating system does not measure this particular CPU metric. This time may also be included within the 'ssCpuRawSystem(52)' counter. Some help would be appreciated. Everywhere seems to just say that % is deprecated because it can be derived, but I haven't found anywhere that shows the official standard way to perform this derivation. The second component is that these "ticks" seem to be cumulative instead of over some time period. How do I sample values over some time period? The ultimate information I want is: % of user, system, idle, nice (and ideally steal, though there doesn't seem to be a standard MIB for this) "currently" (over the last 1-60s would probably be sufficient, with a preference for smaller time spans).

Read the article
check_mk IPMI PCM sensor reading randomly fails

- by Julian Kessel

I use check_mk_agent for monitoring a server with IPMI and the freeipmi-tools installed. As far as I can see, the monitoring randomly detects no value returned by the IPMI Sensor "Temperature_PCH_Temp". That's a problem since it results in a CRITICAL state triggering a notification. The interruption lasts only over one check, the following is always OK. The temperature is in no edge area and neither the readings before the fail nor after show a Temp that is tending to overrun a treshold. Has someone an idea on what could be the reason for this behaviour and how prevent it?

Read the article
NRPE and the $USER1$ variable

- by timbrigham

I have NRPE daemons running on all of my remote Linux boxes. I have a couple configurations in place and I'm trying to standardize the paths in my nrpe.cfg. The changes are deployed via Puppet. I would like to use the following syntax: command[mycommand]=$USER1$/check_tcp .. etc. The $USER1$ variable is not available in my NRPE setup. I could write Puppet templates for all the variants but I would much prefer to manage this through a native method. Is there anything available to do so? If not does anyone have a sample Puppet config that will address this?

Read the article
Opsview Notifications, how to report event duration

- by dotwaffle

At present, Opsview reports recoveries in the following format: RECOVERY: Internal Alarm is OK on host Ellie: SNMP OK - 0 Service: Internal Alarm Host: Ellie Alias: Ellie Address: 1.2.3.4 State: OK Comment: () Date/Time: Mon Oct 5 14:57:53 BST 2009 Additional Info: SNMP OK - 0 What I would ideally like to do is add a "duration" field, so that you can tell without scrolling back on a Blackberry how long the event has been at fault for. Is there an easy solution to this?

Read the article
Graphing services using pnp4nagios

- by Matias

Hi! I've managed to install pnp4nagios 0.6.3 and I'm a bit confused about how pnp4nagios generates graphics. Almost out of the box, it started graphs for ping and some http servers (not all of them). But, how can I make it graph things like disk utilization (When that value comes from SNMP)?? For example, ls /usr/local/pnp4nagios/var/perfdata/isis/ Cola_de_Mail.rrd Cola_de_Mail.xml HTTP.rrd HTTP.xml PING.rrd PING.xml Those are checks running on the host isis, but there are many other checks for that server that are not taken into account by pnp4nagios. How can I make pnp4nagios "see" the other checks?? Thanks!

Read the article
Monitoring multiple sites on a single server using OpsView

- by Kev

We have several web servers. On each of these servers there can be ~250 web sites. I need to add a HTTP check for each site on each server. Each site has a reserved host header that we know can always be resolved in the format of: w10000.hostchecks.mycompany.com w10020.hostchecks.mycompany.com w11992.hostchecks.mycompany.com ..and so on.. What I want is for there to be a master ping check on the web server's main IP address and then separate HTTP checks for each of the sites on the server. If the master ping test fails then I want the HTTP tests to cease until the master ping check goes OK. I had a stab at this and tried do the following: Create a parent host that does a ping check on the server's main ip address (e.g. server is named WEB0001). For each of the sites that reside on WEB0001: Create a separate Host with a Primary Hostname of wXXXXX.hostchecks.mycompany.com Make WEB0001 the parent host Add a monitor (HTTP check to a special url that is mapped into each site using a virtual directory: H- $HOSTADDRESS$ -u /__hostcheck/IsAlive.aspx -w 5 -c 10 -p 80 However I find that if I down the parent server (WEB0001) the http checks seem to continue. Am I going about this completely the wrong way?

Read the article
Icinga notifications are being marked as spam when sent to my mailbox

- by user784637

I'm using gmail and my domain is foo.com About half the notifications from my icinga server, [email protected] go to my spam folder for [email protected] Received-SPF: fail (google.com: domain of [email protected] does not designate <ip6> as permitted sender) client-ip=<ip6>; Authentication-Results: mx.google.com; spf=hardfail (google.com: domain of [email protected] does not designate <ip6> as permitted sender) [email protected] Is my current SPF record set up to allow my icinga server with the ip <ip4> and <ip6> to send email from the domain foo.com? ;; ANSWER SECTION: foo.com. 300 IN TXT "v=spf1 ip4:<ip4> ip6:<ip6> -all"

Read the article
Can ZenOSS integrate Ganglia smoothly?

- by chen

I like Ganglia for its Gmetric function, and I like its multi-layer capability. But Ganglia does not have healthiness check, alerting and etc. for the server monitor functionality. So it would be great to bring this two species together. Sure, we can install Ganglia, and then install ZenOSS. But is there a plugin or something that can smoothly integrate them together? At least integration at the presentation level. Thanks

Read the article
Graphing services using pnp4nagios

- by Matias

I've managed to install pnp4nagios 0.6.3 and I'm a bit confused about how pnp4nagios generates graphics. Almost out of the box, it started graphs for ping and some http servers (not all of them). But, how can I make it graph things like disk utilization (When that value comes from SNMP)?? For example, ls /usr/local/pnp4nagios/var/perfdata/isis/ Cola_de_Mail.rrd Cola_de_Mail.xml HTTP.rrd HTTP.xml PING.rrd PING.xml Those are checks running on the host isis, but there are many other checks for that server that are not taken into account by pnp4nagios. How can I make pnp4nagios "see" the other checks?? Thanks!

Read the article
Opsview Notifications, how to report event duration

- by dotwaffle

At present, Opsview reports recoveries in the following format: RECOVERY: Internal Alarm is OK on host Ellie: SNMP OK - 0 Service: Internal Alarm Host: Ellie Alias: Ellie Address: 1.2.3.4 State: OK Comment: () Date/Time: Mon Oct 5 14:57:53 BST 2009 Additional Info: SNMP OK - 0 What I would ideally like to do is add a "duration" field, so that you can tell without scrolling back on a Blackberry how long the event has been at fault for. Is there an easy solution to this?

Read the article
added shell script to sudoers still getting permission denied

- by Bill S

I don't understand this? Other uses of sudo work fine. [oracle@o plugins]$ su Password: [root@ plugins]# su nrpe bash-3.2$ /home/oracle/obiee/instances/instance1/bifoundation/OracleBIApplication/coreapplication/setup/bi-init.sh bash: /home/oracle/obiee/instances/instance1/bifoundation/OracleBIApplication/coreapplication/setup/bi-init.sh: Permission denied bash-3.2$ sudo -l Matching Defaults entries for nrpe on this host: env_reset, env_keep="COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR LS_COLORS MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY" Runas and Command-specific defaults for nrpe: User nrpe may run the following commands on this host: (ALL) NOPASSWD: /home/oracle/obiee/instances/instance1/bifoundation/OracleBIApplication/coreapplication/setup/bi-init.sh bash-3.2$

Read the article
Why isn't nrpe 'check_procs' finding my Passenger process?

- by ethrbunny

I'm trying to use check_procs from NRPE to find out whether Passenger is running on my server. It loads from httpd but appears separately. 32135 ? Sl 0:09 Passenger RackApp: /usr/share/puppet/rack/puppetmasterd 32589 ? Sl 0:01 Passenger AppPreloader: /usr/share/puppet/rack/puppetmasterd 32629 ? Sl 0:05 Passenger RackApp: /usr/share/puppet/rack/puppetmasterd 32751 ? Sl 0:05 Passenger RackApp: /usr/share/puppet/rack/puppetmasterd When I try to test it like so: check_procs -w 2: -c 3: -C Passenger It tells me there are 0 processes found. I see them - how do I get NRPE to count them?

Read the article
Httpd and LDAP Authentication not working for sub-pages

- by DavisTasar

I just recently installed a Nagios implementation, and I'm trying to get LDAP authentication working for httpd on Red Hat. (nagios.conf for Apache config below, sanitized of course) ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin" <Directory "/usr/local/nagios/sbin"> #SSLRequireSSL Options ExecCGI AllowOverride none AuthType Basic AuthName "LDAP Authentication" AuthLDAPURL "ldap://my.domain.controller:389/OU=Users,DC=my,DC=domain,DC=controller?sAMAccountName?sub?(objectClass=user)" NONE AuthzLDAPAuthoritative off AuthLDAPBindDN "CN=NagiosAdmin,DC=my,DC=domain,DC=controller" AuthLDAPBindPassword "myPassword" require valid-user </Directory> Alias /nagios "/usr/local/nagios/share" <Directory /usr/local/nagios/share> #SSLRequireSSL Options None AllowOverride none AuthBasicProvider ldap AuthType Basic AuthName "LDAP Authentication" AuthzLDAPAuthoritative off AuthLDAPURL "ldap://my.domain.controller:389/OU=Users,DC=my,DC=domain,DC=controller?sAMAccountName?sub?(objectClass=user)" NONE AuthLDAPBindDN "CN=NagiosAdmin,DC=my,DC=domain,DC=controller" AuthLDAPBindPassword "myPassword" require valid-user </Directory> Now, the initial authentication works, so when you first hit the page you can log in just fine. However, when you go anywhere else, it prompts you for authentication, fails (asking for a re-prompt), and gives this error message: [Mon Oct 21 14:46:23 2013] [error] [client 172.28.9.30] access to /nagios/cgi-bin/statusmap.cgi failed, reason: verification of user id '<myuseraccount>' not configured, referer: http://<nagiosserver>/nagios/side.php I'm almost certain its a simple flag or option, but I just can't find it, and I don't have a lot of experience working with Apache. Any assistance or help would be greatly appreciated.

Read the article
Where are Nagios 3 Config Files in Ubuntu 12.04?

- by Aaron James

I just installed Nagios3 via Synaptic. The package and it's dependencies all installed fine and I log in using a web browser, however I'd like to add hosts now and according to the official Nagios Documentation the config file should be in the /usr/local/nagios/* directory. When I go to /usr/local it's not there. I can't seem to find these config files anywhere. I'm not sure what I did wrong. I'm running Xubuntu 12.04 64-bit Any help at all would be greatly appreciated, Thank you!

Read the article
What is the way to submit a patch to fix all the damage that LP: #600941 causes?

- by nutznboltz

What is the best way to submit a patch to fix all the damage that LP: #600941 causes? I ask because LP: #600941 was put into every version of Ubuntu still supported at this time. Should I pick a particular version and run ubuntu-bug on it? Should that version be the LTS or Oneiric or Precise (how can I get Precise if I need it?) The story is that after it was pushed out all of our systems started experiencing Nagios nrpe restart failures. Commands like /etc/init.d/nagios-nrpe-server restart would cause nrpe to stop but not restart. I tracked this down to the way that the /etc/init.d/nagios-nrpe-server script is calling start-stop-daemon. The issue is that the "stop" stanza in the /etc/init.d/nagios-nrpe-server script first calls start-stop-daemon which sends SIGTERM to nrpe and then waits only for one second. If nrpe has not exited by that time the pid file will still exist and the /etc/init.d/nagios-nrpe-server script will remove it. Worse if /etc/init.d/nagios-nrpe-server restart is used not only will the pid file be removed, the attempt to restart nrpe will fail provided that the nrpe daemon is still tardy in shutting down. The attempt to start under those circumstances will fail because nrpe will still be bound to a socket and the second attempt at binding will cause the nrpe startup to abort. They should have wondered why there was a comment about "sometimes the pid file does not get removed". They should have tested on systems that have a heavy load and therefore slow nrpe response times. The fix is to add --retry 10 or such to the invocation of start-stop-daemon ... --stop ... Thanks

Read the article
How can I convert Perl one-liners into complete scripts?

- by Stefan Lasiewski

I find a lot of Perl one-liners online. Sometimes I want to convert these one-liners into a script, because otherwise I'll forget the syntax of the one-liner. For example, I'm using the following command (from nagios.com): tail -f /var/log/nagios/nagios.log | perl -pe 's/(\d+)/localtime($1)/e' I'd to replace it with something like this: tail -f /var/log/nagios/nagios.log | ~/bin/nagiostime.pl However, I can't figure out the best way to quickly throw this stuff into a script. Does anyone have a quick way to throw these one-liners into a Bash or Perl script?

Read the article
Convert perl one-liner into a script

- by Stefan Lasiewski

I find alot of perl one-liners online. Sometimes I want to convert these one-liners into a script, because otherwise I'll forget the syntax of the one-liner. For example, I'm using the following command (from nagios.com): tail -f /var/log/nagios/nagios.log | perl -pe 's/(\d+)/localtime/e' I'd to replace it with something like this: tail -f /var/log/nagios/nagios.log | ~/bin/nagiostime.pl However, I can't figure out the best way to quickly throw this stuff into a script. Does anyone have a quick way to throw these one-liners into a Bash or Perl script?

Read the article
How do you guys handle custom yum repository?

- by luckytaxi

I have a bunch of tools (nagios, munin, puppet, etc...) that gets installed on all my servers. I'm in the process of building a local yum repository. I know most folks just dump all the rpms into a single folder (broken down into the correct path) and then run createrepo inside the directory. However, what would happen if you had to update the rpms? I ask because I was going to throw each software into its own folder. Example one, put all packages inside one folder (custom_software) /admin/software/custom_software/5.4/i386 /admin/software/custom_software/5.4/x86_64 /admin/software/custom_software/4.6/i386 /admin/software/custom_software/4.6/x86_64 What I'm thinking of ... /admin/software/custom_software/nagios/5.4/i386 /admin/software/custom_software/nagios/5.4/x86_64 /admin/software/custom_software/nagios/4.6/i386 /admin/software/custom_software/nagios/4.6/x86_64 /admin/software/custom_software/puppet/5.4/i386 /admin/software/custom_software/puppet/5.4/x86_64 /admin/software/custom_software/puppet/4.6/i386 /admin/software/custom_software/puppet/4.6/x86_64 Ths way, if I had to update to the latest version of puppet, I can save manage the files accordingly. I wouldn't know which rpms belong to which software if I threw them into one big folder. Makes sense?

Read the article
How to reliably recieve message from AWS that my instance was rebooted / terminated / stopped?

- by Andrew Smith

I have Nagios, and I want it to stop monitoring instances when they are stopped from the console. The requirements are: The message passed from AWS is 100R% reliable, e.g. when Nagios is down, and the message cannot be delivered, it will be re-delivered promptly when Nagios is up The message will pass quickly There is no need to scan status of all instances via EC2 API all the time, but only once a while Many thanks!

Read the article
reverse proxying with NGINX to two back-end servers

- by aag

I am trying to learn how to configure the Nginx proxy. All requests from external (www.external.com) should go to internal server 10.10.10.16:2080, except for www.external.com/nagios requests, which should go to internal 10.10.10.18. My location block looks as follows: location ~* / { proxy_buffers 16 4k; proxy_buffer_size 2k; proxy_buffering off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header Accept-Encoding ""; proxy_pass http://10.10.10.16:2080; } # # nagios server location ~* /nagios/ { proxy_buffers 16 4k; proxy_buffer_size 2k; proxy_buffering off; # proxy_set_header Host $host; # proxy_set_header X-Real-IP $remote_addr; # proxy_set_header Accept-Encoding ""; proxy_pass http://10.10.10.18; } The first location seems to work fine. However, any request to www.external.com/nagios sends the browser into the eternal pastures. Of course, 10.10.10.18/nagios was tested and works fine. What am I missing?

Read the article
Problem libc-so-6-version-glibc-2-14-not-found

- by alessio

i have installed nagios core 4 on ubuntu server 12.04 lts.. everything working fine,, but... i have a problem with remote command to a remote linux (ubuntu server 12.04) pc! when i try to check a service, for example: check_swap, check_disk etc.. i got everytime an error: Remote command execution failed: /home/nagios/plugins/check_disk: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.14' not found (required by /home/nagios/plugins/check_disk) the remote pc is not my pc and i don't want to make a disaster! :) So... how can i fix this problem? any help be appreciate!!! ;) thanks in advance! :)

Read the article

< Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page >