nagios - Page 12 - Developer IT

Ho do you view all your monitoring software

- by BLAKE

In my office I have all our monitoring tools setup and working great, but I dont have an easy way to view them. I have a large TV in my office plugged into an old PC that has my nagios status page always showing. If I need to change anything on that computer I use Synergy to access the computer from my desktop. We are thinking about adding another TV and I want some suggestions on setting it all up. We are a 99.99% Windows shop. What do you use to run all the TV's in your helpdesk? Synergy works for me, but what if one of the other admins want to change the screen? Is there any easy way that any of us (currently 4 people) can change the screen from our desktops? (Remote desktop doesn't work because it locks the console which is the output to the TVs.) Any advice would help, Thanks.

Read the article

Where is all the memory being consumed?

- by Mark L

Hello, I have a Dell R300 Ubuntu 9.10 box with 4GB of memory. All I'm running on there is haproxy, nagios and postfix yet there is ~2.7GB of memory being consumed. I've run ps and I can't get the sums to add up. Could anyone shed any light on where all the memory is being used? Cheers, Mark $ sudo free -m total used free shared buffers cached Mem: 3957 2746 1211 0 169 2320 -/+ buffers/cache: 256 3701 Swap: 6212 0 6212 Sorry for pasting all of ps' output but I'm keen to get to the bottom of this. $ sudo ps aux [sudo] password for mark: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 19320 1656 ? Ss May20 0:05 /sbin/init root 2 0.0 0.0 0 0 ? S< May20 0:00 [kthreadd] root 3 0.0 0.0 0 0 ? S< May20 0:00 [migration/0] root 4 0.0 0.0 0 0 ? S< May20 0:16 [ksoftirqd/0] root 5 0.0 0.0 0 0 ? S< May20 0:00 [watchdog/0] root 6 0.0 0.0 0 0 ? S< May20 0:03 [migration/1] root 7 0.0 0.0 0 0 ? S< May20 3:10 [ksoftirqd/1] root 8 0.0 0.0 0 0 ? S< May20 0:00 [watchdog/1] root 9 0.0 0.0 0 0 ? S< May20 0:00 [migration/2] root 10 0.0 0.0 0 0 ? S< May20 0:19 [ksoftirqd/2] root 11 0.0 0.0 0 0 ? S< May20 0:00 [watchdog/2] root 12 0.0 0.0 0 0 ? S< May20 0:01 [migration/3] root 13 0.0 0.0 0 0 ? S< May20 0:41 [ksoftirqd/3] root 14 0.0 0.0 0 0 ? S< May20 0:00 [watchdog/3] root 15 0.0 0.0 0 0 ? S< May20 0:03 [events/0] root 16 0.0 0.0 0 0 ? S< May20 0:10 [events/1] root 17 0.0 0.0 0 0 ? S< May20 0:08 [events/2] root 18 0.0 0.0 0 0 ? S< May20 0:08 [events/3] root 19 0.0 0.0 0 0 ? S< May20 0:00 [cpuset] root 20 0.0 0.0 0 0 ? S< May20 0:00 [khelper] root 21 0.0 0.0 0 0 ? S< May20 0:00 [netns] root 22 0.0 0.0 0 0 ? S< May20 0:00 [async/mgr] root 23 0.0 0.0 0 0 ? S< May20 0:00 [kintegrityd/0] root 24 0.0 0.0 0 0 ? S< May20 0:00 [kintegrityd/1] root 25 0.0 0.0 0 0 ? S< May20 0:00 [kintegrityd/2] root 26 0.0 0.0 0 0 ? S< May20 0:00 [kintegrityd/3] root 27 0.0 0.0 0 0 ? S< May20 0:00 [kblockd/0] root 28 0.0 0.0 0 0 ? S< May20 0:01 [kblockd/1] root 29 0.0 0.0 0 0 ? S< May20 0:04 [kblockd/2] root 30 0.0 0.0 0 0 ? S< May20 0:02 [kblockd/3] root 31 0.0 0.0 0 0 ? S< May20 0:00 [kacpid] root 32 0.0 0.0 0 0 ? S< May20 0:00 [kacpi_notify] root 33 0.0 0.0 0 0 ? S< May20 0:00 [kacpi_hotplug] root 34 0.0 0.0 0 0 ? S< May20 0:00 [ata/0] root 35 0.0 0.0 0 0 ? S< May20 0:00 [ata/1] root 36 0.0 0.0 0 0 ? S< May20 0:00 [ata/2] root 37 0.0 0.0 0 0 ? S< May20 0:00 [ata/3] root 38 0.0 0.0 0 0 ? S< May20 0:00 [ata_aux] root 39 0.0 0.0 0 0 ? S< May20 0:00 [ksuspend_usbd] root 40 0.0 0.0 0 0 ? S< May20 0:00 [khubd] root 41 0.0 0.0 0 0 ? S< May20 0:00 [kseriod] root 42 0.0 0.0 0 0 ? S< May20 0:00 [kmmcd] root 43 0.0 0.0 0 0 ? S< May20 0:00 [bluetooth] root 44 0.0 0.0 0 0 ? S May20 0:00 [khungtaskd] root 45 0.0 0.0 0 0 ? S May20 0:00 [pdflush] root 46 0.0 0.0 0 0 ? S May20 0:09 [pdflush] root 47 0.0 0.0 0 0 ? S< May20 0:00 [kswapd0] root 48 0.0 0.0 0 0 ? S< May20 0:00 [aio/0] root 49 0.0 0.0 0 0 ? S< May20 0:00 [aio/1] root 50 0.0 0.0 0 0 ? S< May20 0:00 [aio/2] root 51 0.0 0.0 0 0 ? S< May20 0:00 [aio/3] root 52 0.0 0.0 0 0 ? S< May20 0:00 [ecryptfs-kthrea] root 53 0.0 0.0 0 0 ? S< May20 0:00 [crypto/0] root 54 0.0 0.0 0 0 ? S< May20 0:00 [crypto/1] root 55 0.0 0.0 0 0 ? S< May20 0:00 [crypto/2] root 56 0.0 0.0 0 0 ? S< May20 0:00 [crypto/3] root 70 0.0 0.0 0 0 ? S< May20 0:00 [scsi_eh_0] root 71 0.0 0.0 0 0 ? S< May20 0:00 [scsi_eh_1] root 74 0.0 0.0 0 0 ? S< May20 0:00 [scsi_eh_2] root 75 0.0 0.0 0 0 ? S< May20 0:00 [scsi_eh_3] root 82 0.0 0.0 0 0 ? S< May20 0:00 [kstriped] root 83 0.0 0.0 0 0 ? S< May20 0:00 [kmpathd/0] root 84 0.0 0.0 0 0 ? S< May20 0:00 [kmpathd/1] root 85 0.0 0.0 0 0 ? S< May20 0:00 [kmpathd/2] root 86 0.0 0.0 0 0 ? S< May20 0:00 [kmpathd/3] root 87 0.0 0.0 0 0 ? S< May20 0:00 [kmpath_handlerd] root 88 0.0 0.0 0 0 ? S< May20 0:00 [ksnapd] root 89 0.0 0.0 0 0 ? S< May20 0:00 [kondemand/0] root 90 0.0 0.0 0 0 ? S< May20 0:00 [kondemand/1] root 91 0.0 0.0 0 0 ? S< May20 0:00 [kondemand/2] root 92 0.0 0.0 0 0 ? S< May20 0:00 [kondemand/3] root 93 0.0 0.0 0 0 ? S< May20 0:00 [kconservative/0] root 94 0.0 0.0 0 0 ? S< May20 0:00 [kconservative/1] root 95 0.0 0.0 0 0 ? S< May20 0:00 [kconservative/2] root 96 0.0 0.0 0 0 ? S< May20 0:00 [kconservative/3] root 97 0.0 0.0 0 0 ? S< May20 0:00 [krfcommd] root 315 0.0 0.0 0 0 ? S< May20 0:09 [mpt_poll_0] root 317 0.0 0.0 0 0 ? S< May20 0:00 [mpt/0] root 547 0.0 0.0 0 0 ? S< May20 0:00 [scsi_eh_4] root 587 0.0 0.0 0 0 ? S< May20 0:11 [kjournald2] root 636 0.0 0.0 12748 860 ? S May20 0:00 upstart-udev-bridge --daemon root 657 0.0 0.0 17064 924 ? S<s May20 0:00 udevd --daemon root 666 0.0 0.0 8192 612 ? Ss May20 0:00 dd bs=1 if=/proc/kmsg of=/var/run/rsyslog/kmsg root 774 0.0 0.0 17060 888 ? S< May20 0:00 udevd --daemon root 775 0.0 0.0 17060 888 ? S< May20 0:00 udevd --daemon syslog 825 0.0 0.0 191696 1988 ? Sl May20 0:31 rsyslogd -c4 root 839 0.0 0.0 0 0 ? S< May20 0:00 [edac-poller] root 870 0.0 0.0 0 0 ? S< May20 0:00 [kpsmoused] root 1006 0.0 0.0 5988 604 tty4 Ss+ May20 0:00 /sbin/getty -8 38400 tty4 root 1008 0.0 0.0 5988 604 tty5 Ss+ May20 0:00 /sbin/getty -8 38400 tty5 root 1015 0.0 0.0 5988 604 tty2 Ss+ May20 0:00 /sbin/getty -8 38400 tty2 root 1016 0.0 0.0 5988 608 tty3 Ss+ May20 0:00 /sbin/getty -8 38400 tty3 root 1018 0.0 0.0 5988 604 tty6 Ss+ May20 0:00 /sbin/getty -8 38400 tty6 daemon 1025 0.0 0.0 16512 472 ? Ss May20 0:00 atd root 1026 0.0 0.0 18708 1000 ? Ss May20 0:03 cron root 1052 0.0 0.0 49072 1252 ? Ss May20 0:25 /usr/sbin/sshd root 1084 0.0 0.0 5988 604 tty1 Ss+ May20 0:00 /sbin/getty -8 38400 tty1 root 6320 0.0 0.0 19440 956 ? Ss May21 0:00 /usr/sbin/xinetd -pidfile /var/run/xinetd.pid -stayalive -inetd_compat -inetd_ipv6 nagios 8197 0.0 0.0 27452 1696 ? SNs May21 2:57 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg root 10882 0.1 0.0 70280 3104 ? Ss 10:30 0:00 sshd: mark [priv] mark 10934 0.0 0.0 70432 1776 ? S 10:30 0:00 sshd: mark@pts/0 mark 10935 1.4 0.1 21572 4336 pts/0 Ss 10:30 0:00 -bash root 10953 1.0 0.0 15164 1136 pts/0 R+ 10:30 0:00 ps aux haproxy 12738 0.0 0.0 17208 992 ? Ss Jun08 0:49 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg root 23953 0.0 0.0 37012 2192 ? Ss Jun04 0:03 /usr/lib/postfix/master postfix 23955 0.0 0.0 39232 2356 ? S Jun04 0:00 qmgr -l -t fifo -u postfix 32603 0.0 0.0 39072 2132 ? S 09:05 0:00 pickup -l -t fifo -u -c Here's meminfo: $ cat /proc/meminfo MemTotal: 4052852 kB MemFree: 1240488 kB Buffers: 173172 kB Cached: 2376420 kB SwapCached: 0 kB Active: 1479288 kB Inactive: 1081876 kB Active(anon): 11792 kB Inactive(anon): 0 kB Active(file): 1467496 kB Inactive(file): 1081876 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 6361700 kB SwapFree: 6361700 kB Dirty: 44 kB Writeback: 0 kB AnonPages: 11568 kB Mapped: 5844 kB Slab: 155032 kB SReclaimable: 145804 kB SUnreclaim: 9228 kB PageTables: 1592 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 8388124 kB Committed_AS: 51732 kB VmallocTotal: 34359738367 kB VmallocUsed: 282604 kB VmallocChunk: 34359453499 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 6784 kB DirectMap2M: 4182016 kB Here's slabinfo: $ cat /proc/slabinfo slabinfo - version: 2.1 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> ip6_dst_cache 50 50 320 25 2 : tunables 0 0 0 : slabdata 2 2 0 UDPLITEv6 0 0 960 17 4 : tunables 0 0 0 : slabdata 0 0 0 UDPv6 68 68 960 17 4 : tunables 0 0 0 : slabdata 4 4 0 tw_sock_TCPv6 0 0 320 25 2 : tunables 0 0 0 : slabdata 0 0 0 TCPv6 72 72 1792 18 8 : tunables 0 0 0 : slabdata 4 4 0 dm_raid1_read_record 0 0 1064 30 8 : tunables 0 0 0 : slabdata 0 0 0 kcopyd_job 0 0 368 22 2 : tunables 0 0 0 : slabdata 0 0 0 dm_uevent 0 0 2608 12 8 : tunables 0 0 0 : slabdata 0 0 0 dm_rq_target_io 0 0 376 21 2 : tunables 0 0 0 : slabdata 0 0 0 uhci_urb_priv 0 0 56 73 1 : tunables 0 0 0 : slabdata 0 0 0 cfq_queue 0 0 168 24 1 : tunables 0 0 0 : slabdata 0 0 0 mqueue_inode_cache 18 18 896 18 4 : tunables 0 0 0 : slabdata 1 1 0 fuse_request 0 0 632 25 4 : tunables 0 0 0 : slabdata 0 0 0 fuse_inode 0 0 768 21 4 : tunables 0 0 0 : slabdata 0 0 0 ecryptfs_inode_cache 0 0 1024 16 4 : tunables 0 0 0 : slabdata 0 0 0 hugetlbfs_inode_cache 26 26 608 26 4 : tunables 0 0 0 : slabdata 1 1 0 journal_handle 680 680 24 170 1 : tunables 0 0 0 : slabdata 4 4 0 journal_head 144 144 112 36 1 : tunables 0 0 0 : slabdata 4 4 0 revoke_table 256 256 16 256 1 : tunables 0 0 0 : slabdata 1 1 0 revoke_record 512 512 32 128 1 : tunables 0 0 0 : slabdata 4 4 0 ext4_inode_cache 53306 53424 888 18 4 : tunables 0 0 0 : slabdata 2968 2968 0 ext4_free_block_extents 292 292 56 73 1 : tunables 0 0 0 : slabdata 4 4 0 ext4_alloc_context 112 112 144 28 1 : tunables 0 0 0 : slabdata 4 4 0 ext4_prealloc_space 156 156 104 39 1 : tunables 0 0 0 : slabdata 4 4 0 ext4_system_zone 0 0 40 102 1 : tunables 0 0 0 : slabdata 0 0 0 ext2_inode_cache 0 0 776 21 4 : tunables 0 0 0 : slabdata 0 0 0 ext3_inode_cache 0 0 784 20 4 : tunables 0 0 0 : slabdata 0 0 0 ext3_xattr 0 0 88 46 1 : tunables 0 0 0 : slabdata 0 0 0 dquot 0 0 256 16 1 : tunables 0 0 0 : slabdata 0 0 0 shmem_inode_cache 606 620 800 20 4 : tunables 0 0 0 : slabdata 31 31 0 pid_namespace 0 0 2112 15 8 : tunables 0 0 0 : slabdata 0 0 0 UDP-Lite 0 0 832 19 4 : tunables 0 0 0 : slabdata 0 0 0 RAW 183 210 768 21 4 : tunables 0 0 0 : slabdata 10 10 0 UDP 76 76 832 19 4 : tunables 0 0 0 : slabdata 4 4 0 tw_sock_TCP 80 80 256 16 1 : tunables 0 0 0 : slabdata 5 5 0 TCP 81 114 1664 19 8 : tunables 0 0 0 : slabdata 6 6 0 blkdev_integrity 144 144 112 36 1 : tunables 0 0 0 : slabdata 4 4 0 blkdev_queue 64 64 2024 16 8 : tunables 0 0 0 : slabdata 4 4 0 blkdev_requests 120 120 336 24 2 : tunables 0 0 0 : slabdata 5 5 0 fsnotify_event 156 156 104 39 1 : tunables 0 0 0 : slabdata 4 4 0 bip-256 7 7 4224 7 8 : tunables 0 0 0 : slabdata 1 1 0 bip-128 0 0 2176 15 8 : tunables 0 0 0 : slabdata 0 0 0 bip-64 0 0 1152 28 8 : tunables 0 0 0 : slabdata 0 0 0 bip-16 84 84 384 21 2 : tunables 0 0 0 : slabdata 4 4 0 sock_inode_cache 224 276 704 23 4 : tunables 0 0 0 : slabdata 12 12 0 file_lock_cache 88 88 184 22 1 : tunables 0 0 0 : slabdata 4 4 0 net_namespace 0 0 1920 17 8 : tunables 0 0 0 : slabdata 0 0 0 Acpi-ParseExt 640 672 72 56 1 : tunables 0 0 0 : slabdata 12 12 0 taskstats 48 48 328 24 2 : tunables 0 0 0 : slabdata 2 2 0 proc_inode_cache 1613 1750 640 25 4 : tunables 0 0 0 : slabdata 70 70 0 sigqueue 100 100 160 25 1 : tunables 0 0 0 : slabdata 4 4 0 radix_tree_node 22443 22475 560 29 4 : tunables 0 0 0 : slabdata 775 775 0 bdev_cache 72 72 896 18 4 : tunables 0 0 0 : slabdata 4 4 0 sysfs_dir_cache 9866 9894 80 51 1 : tunables 0 0 0 : slabdata 194 194 0 inode_cache 2268 2268 592 27 4 : tunables 0 0 0 : slabdata 84 84 0 dentry 285907 286062 192 21 1 : tunables 0 0 0 : slabdata 13622 13622 0 buffer_head 256447 257472 112 36 1 : tunables 0 0 0 : slabdata 7152 7152 0 vm_area_struct 1469 1541 176 23 1 : tunables 0 0 0 : slabdata 67 67 0 mm_struct 82 95 832 19 4 : tunables 0 0 0 : slabdata 5 5 0 files_cache 104 161 704 23 4 : tunables 0 0 0 : slabdata 7 7 0 signal_cache 163 187 960 17 4 : tunables 0 0 0 : slabdata 11 11 0 sighand_cache 145 165 2112 15 8 : tunables 0 0 0 : slabdata 11 11 0 task_xstate 118 140 576 28 4 : tunables 0 0 0 : slabdata 5 5 0 task_struct 128 165 5808 5 8 : tunables 0 0 0 : slabdata 33 33 0 anon_vma 731 896 32 128 1 : tunables 0 0 0 : slabdata 7 7 0 shared_policy_node 85 85 48 85 1 : tunables 0 0 0 : slabdata 1 1 0 numa_policy 170 170 24 170 1 : tunables 0 0 0 : slabdata 1 1 0 idr_layer_cache 240 240 544 30 4 : tunables 0 0 0 : slabdata 8 8 0 kmalloc-8192 27 32 8192 4 8 : tunables 0 0 0 : slabdata 8 8 0 kmalloc-4096 291 344 4096 8 8 : tunables 0 0 0 : slabdata 43 43 0 kmalloc-2048 225 240 2048 16 8 : tunables 0 0 0 : slabdata 15 15 0 kmalloc-1024 366 432 1024 16 4 : tunables 0 0 0 : slabdata 27 27 0 kmalloc-512 536 544 512 16 2 : tunables 0 0 0 : slabdata 34 34 0 kmalloc-256 406 528 256 16 1 : tunables 0 0 0 : slabdata 33 33 0 kmalloc-128 503 576 128 32 1 : tunables 0 0 0 : slabdata 18 18 0 kmalloc-64 3467 3712 64 64 1 : tunables 0 0 0 : slabdata 58 58 0 kmalloc-32 1520 1920 32 128 1 : tunables 0 0 0 : slabdata 15 15 0 kmalloc-16 3547 3840 16 256 1 : tunables 0 0 0 : slabdata 15 15 0 kmalloc-8 4607 4608 8 512 1 : tunables 0 0 0 : slabdata 9 9 0 kmalloc-192 4620 5313 192 21 1 : tunables 0 0 0 : slabdata 253 253 0 kmalloc-96 1780 1848 96 42 1 : tunables 0 0 0 : slabdata 44 44 0 kmem_cache_node 0 0 64 64 1 : tunables 0 0 0 : slabdata 0 0 0

Read the article

Monitoring System for the cloud?

- by Maxim Veksler

I need a monitoring system, much like ganglia / nagios that is build for the cloud. I need it to support : Adding / removing nodes dynamically. (Node shuts down, dose not imply node failure...) Dynamic node based categorization, meaning node can identify them self as being part of group X (ganglia gets this almost right, but lacks the dynamic part...) Does not require multicast support (generally not allowed in cloud based setups) Plugins for recent cool stuff such as Hadoop, Cassandra, Mongo would be cool. More features include: External API, web interface and co. I've looked at Ganglia, munin and they both seem be almost there (but not exactly). I would also go for reasonably priced Software as Service solution. I'm currently doing research, so Suggestions are highly appreciated. Thank you, Maxim

Read the article

SNMP keeps crashing

- by jldugger

We're using OpsView/Nagios to monitor our servers. We've added the SNMP service to all our servers and deployed the configuration via GPO, but one win2k3 server seems to have a problem; it crashes pretty regularly. The event log carries messages like: Event Type: Error Event Source: Service Control Manager Event Category: None Event ID: 7034 Date: 6/11/2009 Time: 7:11:49 PM User: N/A Computer: HOSTNNAME Description: The SNMP Service service terminated unexpectedly. It has done this 2 time(s). and also Event Type: Error Event Source: Application Error Event Category: (100) Event ID: 1000 Date: 6/11/2009 Time: 7:11:18 PM User: N/A Computer: HOSTNAME Description: Faulting application snmp.exe, version 5.2.3790.3959, faulting module ntdll.dll, version 5.2.3790.3959, fault address 0x000417af. Now, I could probably set it to simply restart on crash in perpetuity, but I think it's better to fix problems like this. Is this a known problem? If not, what should I do to diagnose it?

Read the article

Monitoring ASA packet loss via SNMP

- by dunxd

I want to monitor packet loss on my ASA 5505 VPN endpoints using SNMP. This is so I can graph the rates in Cacti and/or get alerts in Nagios. However, I am not sure what SNMP values I should use to measure packet loss. In the ASA I can run sh interface Internet stats to show traffic statistics for the interface connected to the Internet. This shows 1 minute and 5 minute drop rates. Are these measures an indicator of packet loss? Are there SNMP values I can access that correspond to those values? Should I be looking at different values? Is the ASA even able to measure packet loss?

Read the article

Spawn phone call from EC2 alerts

- by Matt

I have a system setup on AWS/EC2, it currently is using their CloudWatch alert system. The problem is this sends just to email, when ideally I would like this to be making a phone call and/or sending text messages to certain phone numbers when an alert fires (Note that I do not need the phone call to actually say anything, just call the person). We are trying to solve the problem that Amazon alerts are only useful if people are checking their email, which isnt always the case because all server problems love to happen at 4am on saturday... Please respond with any possible solutions/ideas, ideally I do not want to implement an entire monitoring system (IE: Nagios) on top of everything to handle this.

Read the article

Lightweight monitoring for a Windows XP laptop

- by kazanaki

Hello I have a windows XP laptop in a remote location. I would like to have an overview for CPU/Memory statistics from a remote location. Monitoring a specific service (a Tomcat instance) would be nice but not essential. I have seen the monitoring solutions (Nagios, cacti e.t.c) and they are all very heavy. I do not want to install mysql, web server and other stuff like that on the laptop. I don't even need a web solution at all. It could just be a simple command line app with a server port and on my machine another GUI application would connect there (and not a web browser) Is there something like this available?

Read the article

Disk space mismatch on OS X Server (Leopard)

- by John Gardeniers

My Nagios system sent me an alert to inform me that the disk space on one of the drives on our OS X server is very low. When I run df /Volumes/Apps/ I get /dev/disk0s3 117209520 114932472 2277048 99% /Volumes/Apps When I run du -c /Volumes/Apps it reports 11489944 total Why might there be such a vast difference? Even more importantly, how do I find the problem and what can I do about it? I'm essentially just a Windows admin, so am well out of my comfort zone here. I use a Mac but I'm not a Mac admin in any real sense of the word.

Read the article

Android software for the system administrator on the move

- by GruffTech

My company has over service through Verizon, and AT&T Service in the area is "shoddy" at its best, so I haven't been able to join the "iPhone party" like so many of my fellow system administrators have been able to. That being said, this week finally a phone I like has hit Verizon, the HTC Incredible. (I've been waiting for the Desire or Nexus One, but after seeing spec sheets and reviews, HTC Incredible comes out ahead anyway). So (finally) I'm looking for Android Apps that are "gotta-haves" for system administrators. I've found the bottom three. If there are others you prefer over these let me know. RDP Program - RemoteRDP SSH Client - ConnectBot Nagios - NagMonDroid Reply with your favorite Android App and why!

Read the article

CLI-Based monitoring tool for KVM

- by Pinnacle

I am developing a scheduler for running VMs on KVM. The scheduling has over-commitment of resources like memory and CPU. For this, I need a CLI-based monitoring tool that keeps me giving information about the resource usage of each VM, because it might be the case that due to over-provisioning of resources, VMs on a particular host are running very slowly depending on the benchmarks/programs each VM is running, and then I need to migrate a VM to another host and so on. I looked into libvirt-based tools like collects, MUNIN, Nagios-vert, etc.( http://libvirt.org/apps.html#monitoring ) I also looked into Ubuntu utility perf-kvm ( http://manpages.ubuntu.com/manpages/maverick/man1/perf-kvm.1.html ) I want to ask which CLI-based would be recommended by the community so that I can make a automated scheduler that takes care of the above situation.

Read the article

What are best monitoring tool customizable for cluster / distributed system?

- by Adil

I am working on a system having multiple servers. I am interested in monitoring some server specific data like CPU/memory usage, disk/filesystem usage, network traffic, system load etc. and some other my process specific data. What are available open source that can serve my purpose? If it provides to customize the parameter to be monitored and monitor your own data by creating plugin / agent. Any suggestions? I heard of Nagios, Zabbix and Pandora but not sure if they provide such interface.

Read the article

Restarting rsyslog re-sends logs again

- by Jay Taylor

I am running Ubuntu 12.04.1 LTS on EC2. I have a bunch of application servers which are configured to forward their logs to a central server via rsyslog. Since putting in Nagios monitoring on the log files on the central server, I've been getting alerts indicating that particular application servers are failing to forward their logs to the centralized server. Logging into the machines and restarting the rsyslog service fixes the problem. However, rsyslog then re-transmits the logs again, resulting in duplicates on the collector. Why is it doing this?

Read the article

Monitoring several remote servers over different VPNs

- by Ciaran

I'm a developer with about 20 different clients running our server application. I access each of the clients' servers remotely through VPN to provide support, updates, etc. Is there any tool available that I can set up locally that will connect through each of the VPNs automatically to allow me to monitor? The idea sounds very far fetched to me as the VPN software varies a good bit but maybe someone's had to do something similar before? It's been a few years since I last used Nagios but I think it'd be quite cool to have that set up pointing at each of the remote servers through VPN somehow.

Read the article

Server monitoring for medium scale UNIX network

- by nbartolomeo

I'm looking for suggestions for a good monitoring tools, or tools, to handle a mixed Linux (RedHat 4-5) and HPUX environment. Currently we are using Hobbit which is working reasonably well but it is becoming harder to keep track of what alerts are sent out for what servers. Features I'd like to see: Easy configuration of servers. The ability to monitor CPU, network, memory, and specific processes I've looked into Nagios but from what I have seen it won't be easy to set up the configuration for all of our servers ~200 and that without installing a plugin into each agent I won't be able to monitor processes.

Read the article

Network monitoring library, or objects, for a cloud

- by Andrew Smith

I am looking for library to support server / switch monitoring, to actually be able to check with the device if it's working OK. However this requires some sort of auto-detection and device support. Basically I need to automatically detect a new device, start monitoring it like CPU and PING. So how do I auto-detect the machine remotely, this is something I need library for. Rackspace has something like this - "Cloud Monitoring API". But is there anything opensource which can be used same way? The Nagios and others doesnt have such API, and the big and expensive systems are too big to handle in public cloud, so there must be some other network monitoring engine with API, which can add a new servers automatically and support user isolation for example so I dont see other servers except mine.

Read the article

How to manage configuration software installations of non-domain Windows XP machines?

- by Digi

I have a large set of unattended Windows XP machines who are not connected to a domain or even to each other. I am struggling to find any tools out there that I can use to deal with them in one application. I am hoping to find software that I can perhaps install a client on each machine, then have it essentially proxy out configuration information and possibly commands (install, uninstall, stop service, etc) across the whole network. The closest I've come is Nagios and its client, but it cannot be used to push files through and run commands remotely. Any suggestions?

Read the article

Debian Wheezy: installing from sources or repositories? upgrading to new software release?

- by user269842

a. I'm wondering for some software if it is wiser to install them from sources or from official repositories when available like: glpi inventory fusion inventory monitoring tools like nagios I tried both for glpi: compiled from sources and installing from repositories. I also installed zabbix from sources. b. What about new software releases providing enhancements: is it better to keep the release installed from the repositories /compiled or is their a 'best practice' like downloading the new software release and compiling it again (I really have no clue)? Could someone make it more clear for me? Thanks!

Read the article

Graphing/charting of CPU utilisation [on hold]

- by Peter

So nagios can be good at graphing particular resource utilisation or other metrics, but I'm looking for a tool that can output a chart or other graphical representation of how much CPU time/CPU utilisation /all/ services on a server are currently consuming. I think New Relic could probably achieve this to an extent, but I was wondering if there was a popular open source app used for this. In case I am explaining this in a bad way, my actual problem is that I have a shared server with suexec enabled (ie. httpd cgi running under multiple user accounts). I'd like to know which users are using the most CPU during periods of the day.

Read the article

Enterprise Level Monitoring Solution

- by Garthmeister J.

My company is currently looking to replace our current solution used for monitoring our web-based enterprise solutions for both up-time and performance. Please note this is not intended to be a network monitoring-type solution (internally we currently use Nagios). If anyone has a provider that they have had a positive experience with, it would be much appreciated. Here is a list of our requirements: • Must have a large number of probes/agents around the globe to be representative of our customer base • Must have a flexible scripting capability to automate multi-step user actions • 24 hour a day monitoring • Flexible alerting system • Report generation capability • Mimic browser specific monitoring (optional, not a must-have)

Read the article

Windows 2008 Server network issues

- by Snowflow

I have this one server that just doesn't want to be on the internet It's a new server, a twinblade, the other twin works, but not this one. It can connect fine to everythign else in the LAN, but cannot go out on the net It can be reached by ICMP requests over the net (the nagios server can probe it, but not ping it for instance), but not TCP Everything seems fine both in firewall and machine, i get no issues. Anyone care to help me out where i can start looking, i'm seriously confused. edit: it can ping gateway and through the sonicwall site to site VPN, it\s also able to resolve DNS. the only thing it can`t do is reach anything outside of LAN/VPN

Read the article

Too many files open issue (in CentOS)

- by Ram

Recently I ran into this issue in one of our production machines. The actual issue from PHP looked like this: fopen(dberror_20110308.txt): failed to open stream: Too many open files I am running LAMP stack along with memcache in this machine. I also run a couple of Java applications in this machine. While I did increase the limit on the number of files that can be opened to 10000 (from 1024), I would really like to know if there is an easy way to track this (# of files open at any moment) as a metric. I know lsof is a command which will list the file descriptors opened by processes. Wondering if there is any other better (in terms of report) way of tracking this using say, nagios.

Read the article

Newbie, deciding Python or Erlang

- by Joe

Hi Guys, I'm a Administrator (unix, Linux and some windows apps such as Exchange) by experience and have never worked on any programming language besides C# and scripting on Bash and lately on powershell. I'm starting out as a service provider and using multiple network/server monitoring tools based on open source (nagios, opennms etc) in order to monitor them. At this moment, being inspired by a design that I came up with, to do more than what is available with the open source at this time, I would like to start programming and test some of these ideas. The requirement is that a server software that captures a stream of data and store them in a database(CouchDB or MongoDB preferably) and the client side (agent installed on a server) would be sending this stream of data on a schedule of every 10 minutes or so. For these two core ideas, I have been reading about Python and Erlang besides ruby. I do plan to use either Amazon or Rackspace where the server platform would run. This gives me the scalability needed when we have more customers with many servers. For that reason alone, I thought Erlang was a better fit(I could be totally wrong, new to this game) and I understand that Erlang has limited support in some ways compared to Ruby or Python. But also I'm totally new to the programming realm of things and any advise would be appreciated grately. Jo

Read the article

The Mystery of the Vanishing Disk Space

- by Oddthinking

My disk space is dwindling by about 2GB a day! I only have a few more days before I run out of space. $ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda4 143G 126G 11G 93% / udev 491M 4.0K 491M 1% /dev tmpfs 200M 696K 199M 1% /run none 5.0M 0 5.0M 0% /run/lock none 499M 144K 499M 1% /run/shm /dev/sda2 1.9G 580M 1.2G 33% /tmp /dev/sda1 92M 29M 58M 33% /boot I have been searching for the biggest directories/log files, deleting and compressing. But I am still losing the war. Finally, I realised I have a big misunderstanding: julian@server1:~$ sudo du -h / | tail -n 1 16G / All of my files in / only add up to 16 GB. That leaves 110 GB unaccounted for! Clearly I have a misunderstanding: I thought the '/dev/sda4' line represented all the files visible from '/'. What should I be reading to understand where the other storage has gone? More details: I have an Ubuntu 11.10 server, that was set-up by data-center staff. It is running my own code (which is fairly prolific with log files, but otherwise doesn't store much stuff on the drive) duplicity for backups (which tends to store a lot of signature files) various other standard services, like Apache, nagios, etc. They are very lightly used. It has been up for about 4 months without a reboot. I lied about the du output (simplified it for effect). It also complained about not being able to access GVFS and the du processes's own resources. I believe they are irrelevant: . du: cannot access `/home/julian/.gvfs': Permission denied du: cannot access `/proc/10841/task/10841/fd/4': No such file or directory du: cannot access `/proc/10841/task/10841/fdinfo/4': No such file or directory du: cannot access `/proc/10841/fd/4': No such file or directory du: cannot access `/proc/10841/fdinfo/4': No such file or directory

Read the article

Synchronizing 3 servers over IP

- by user93078

I'm setting up a medical server for a hospital that has doctors located in 3 different locations, meaning there would be 3 servers (1 in each location). All 3 servers would just have the following software: Ubuntu Server 12.04 minimal MySQL, PHP 5, Apache The medical software which would read/write to the MySQL database Remote admin apps like Nagios & Webmin Rsync for backup (rsync-over-ssh) as a cron job and the doctors at each location would access patient & billing data from their respective servers. What I'd like is, that each of these servers all have synchronized info (especially the mySQL database's) - let's say on an hourly basis each of these servers synchronize data to a common remote server and the data is then brought down to each of the servers. I know an easier way would be to have the medical app running on a remote web server, but since this is medical that we're talking about and knowing how common it is in our area for the net to go gown, I wouldn't like a web based scenatio. Is such a setup possible? Would this be the right way to do things or is there a better way to this? Would really appreciate views and comments (or how to set this up) on this.

Read the article

Is Eclipse Remote System Explorer broken on Windows?

- by Kev

I have the following setup on Windows 7 Ultimate x64: Eclipse Indigo 2.7.2 (Build: M20120208-0800) Remote System Explorer 3.3.2 (see screenshot) (Oracle/Sun) Java 1.6 Update 31 (x86) Despite all my best efforts I am unable to connect to a remote system (a Centos 5.6 server on my local LAN) using a Remote System Explorer SSH connection - I've tried both password authentication and using my SSH private key. Here is a screenshot of both the Eclipse error dialogue and what is logged in my /var/log/secure log file: /var/log/secure: Apr 1 12:00:21 nagios sshd[6176]: Received disconnect from 172.16.3.88: 3: com.jcraft.jsch.JSchException: Auth fail When I connect for the first time I do get prompted to verify the authenticity of the remote host and the RSA key fingerprint. But that's as far as things go. Performing the same operation with the same credentials on my Fedora Core 16 box (also running the same version of Eclipse and Java) to the same server is successful. This leads me to believe that RSE SSH support on Windows is either broken or there's some piece of the SSH-on-Windows puzzle I'm missing. Is this the case?

Search Results

Search found 334 results on 14 pages for 'nagios'.

Page 12/14 | < Previous Page | 8 9 10 11 12 13 14 | Next Page >

- by BLAKE

- by Mark L

- by Maxim Veksler

- by jldugger

- by dunxd

- by Matt

- by kazanaki

- by John Gardeniers

- by GruffTech

- by Pinnacle

- by Adil

- by Jay Taylor

- by Ciaran

- by nbartolomeo

- by Andrew Smith

- by Digi

- by user269842

- by Peter

- by Garthmeister J.

- by Snowflow

- by Ram

- by Joe

- by Oddthinking

- by user93078

- by Kev

< Previous Page | 8 9 10 11 12 13 14 | Next Page >