Search Results

Search found 4268 results on 171 pages for 'srikanth vm'.

Page 164/171 | < Previous Page | 160 161 162 163 164 165 166 167 168 169 170 171  | Next Page >

  • Determining cause of high NFS/IO utilization without iotop

    - by Matt
    I have a server that is doing an NFSv4 export for user's home directories. There are roughly 25 users (mostly developers/analysts) and about 40 servers mounting the home directory export. Performance is miserable, with users often seeing multi-second lags for simple commands (like ls, or writing a small text file). Sometimes the home directory mount completely hangs for minutes, with users getting "permission denied" errors. The hardware is a Dell R510 with dual E5620 CPUs and 8 GB RAM. There are eight 15k 2.5” 600 GB drives (Seagate ST3600057SS) configured in hardware RAID-6 with a single hot spare. RAID controller is a Dell PERC H700 w/512MB cache (Linux sees this as a LSI MegaSAS 9260). OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”. I have the HW RAID configured to present two virtual disks to the OS: /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for the home directories. What I find curious, and suspicious, is that the sda device often has very high utilization, even though it only contains the OS. I would expect this virtual drive to be idle almost all the time. The system is not swapping, according to "free" and "vmstat". Why would there be major load on this device? Here is a 30-second snapshot from iostat: Time: 09:37:28 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sdb 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 sdb1 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 dm-0 0.00 0.00 0.03 151.82 0.13 607.26 8.00 1.25 8.23 5.16 78.35 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 dm-3 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 Looks like iotop is the ideal tool to use to sniff out these kinds of issues. But I'm on CentOS 5.6, which doesn't have a new enough kernel to support that program. I looked at Determining which process is causing heavy disk I/O?, and besides iotop, one of the suggestions said to do a "echo 1 /proc/sys/vm/block_dump". I did that (after directing kernel messages to tempfs). In about 13 minutes I had about 700k reads or writes, roughly half from kjournald and the other half from nfsd: # egrep " kernel: .*(READ|WRITE)" messages | wc -l 768439 # egrep " kernel: kjournald.*(READ|WRITE)" messages | wc -l 403615 # egrep " kernel: nfsd.*(READ|WRITE)" messages | wc -l 314028 For what it's worth, for the last hour, utilization has constantly been over 90% for the home directory drive. My 30-second iostat keeps showing output like this: Time: 09:36:30 PM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.46 0.20 11.33 0.80 71.71 12.58 0.24 20.53 14.37 16.56 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 6.46 0.20 11.33 0.80 71.71 12.58 0.24 20.53 14.37 16.56 sdb 137.29 7.00 549.92 3.80 22817.19 43.19 82.57 3.02 5.45 1.74 96.32 sdb1 137.29 7.00 549.92 3.80 22817.19 43.19 82.57 3.02 5.45 1.74 96.32 dm-0 0.00 0.00 0.20 17.76 0.80 71.04 8.00 0.38 21.21 9.22 16.57 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 687.47 10.80 22817.19 43.19 65.48 4.62 6.61 1.43 99.81 dm-3 0.00 0.00 687.47 10.80 22817.19 43.19 65.48 4.62 6.61 1.43 99.82

    Read the article

  • xen debian: domU can't get out side

    - by iftol
    hi every body. i'm a trainee as a sysAdmin, it is my first expérience with virtualization. i have a server setup debian xen 3 with 2 physical interfaces. eth 0 for local network 10.0.0.1 and eth1 for internet (194.X.X.4). i created 3 VMs (web server, mail server and dabase server) with local ip addresses 172.10.0.x/24. the problem i had first is that domU can't ping dom0. i asked the sysAdmin of our ISP and he sais that he fogot to setup the bridginb. so he ceated a bridge with 172.10.0.1/24 after that i was able to ping the real server (194.X.X.4). but i can't go out side from my VMs, how can i fixe this issue? real or physical server ifconfig eth0 Link encap:Ethernet HWaddr 23:26:34:84:ce:xe inet adr:10.1.3.12 Bcast:10.1.3.255 Masque:255.255.255.0 adr inet6: fe80::226:b9ff:fe84:ceb4/64 Scope:Lien UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:412006 errors:0 dropped:0 overruns:0 frame:0 TX packets:411296 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:31410957 (29.9 MiB) TX bytes:31178370 (29.7 MiB) Interruption:36 Mémoire:d6000000-d6012100 eth1 Link encap:Ethernet HWaddr 23:26:34:84:ce:xe inet adr:194.x.x.4 Bcast:194.254.167.255 Masque:255.255.255.0 adr inet6: fe80::226:b9ff:fe84:ceb6/64 Scope:Lien UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:25872332 errors:0 dropped:0 overruns:0 frame:0 TX packets:414578 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:0 RX bytes:2642278343 (2.4 GiB) TX bytes:35436775 (33.7 MiB) lo Link encap:Boucle locale inet adr:127.0.0.1 Masque:255.0.0.0 adr inet6: ::1/128 Scope:Hôte UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1308073 errors:0 dropped:0 overruns:0 frame:0 TX packets:1308073 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:0 RX bytes:109871395 (104.7 MiB) TX bytes:109871395 (104.7 MiB) peth1 Link encap:Ethernet HWaddr 23:26:34:84:ce:xe UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:31818694 errors:0 dropped:0 overruns:0 frame:0 TX packets:414818 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:5197318822 (4.8 GiB) TX bytes:37904897 (36.1 MiB) Interruption:48 Mémoire:d8000000-d8012100 vif281.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff adr inet6: fe80::fcff:ffff:feff:ffff/64 Scope:Lien UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:207 errors:0 dropped:0 overruns:0 frame:0 TX packets:298 errors:0 dropped:2 overruns:0 carrier:0 collisions:0 lg file transmission:32 RX bytes:24629 (24.0 KiB) TX bytes:28404 (27.7 KiB) vif281.1 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff adr inet6: fe80::fcff:ffff:feff:ffff/64 Scope:Lien UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:45 errors:0 dropped:47063 overruns:0 carrier:0 collisions:0 lg file transmission:32 RX bytes:0 (0.0 B) TX bytes:4449 (4.3 KiB) vif282.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff adr inet6: fe80::fcff:ffff:feff:ffff/64 Scope:Lien UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:78 errors:0 dropped:0 overruns:0 frame:0 TX packets:13 errors:0 dropped:1 overruns:0 carrier:0 collisions:0 lg file transmission:32 RX bytes:5041 (4.9 KiB) TX bytes:714 (714.0 B) xenbr0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff inet adr:172.10.0.1 Bcast:172.10.0.255 Masque:255.255.255.0 adr inet6: fe80::5c72:c6ff:fe49:7fe/64 Scope:Lien UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7180 errors:0 dropped:0 overruns:0 frame:0 TX packets:8615 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:0 RX bytes:756804 (739.0 KiB) TX bytes:791206 (772.6 KiB) brtcl show bridge name bridge id STP enabled interfaces eth1 8000.0026b984ceb6 no peth1 vif281.1 xenbr0 8000.feffffffffff no vif281.0 vif282.0 network-multi-bridge /etc/xen/scripts/network-virtual start vifnum="0" bridgeip="172.10.0.1/24" brnet="172.10.0.0/24" VM webserver eth0 Link encap:Ethernet HWaddr 00:16:3E:42:33:70 inet addr:172.10.0.2 Bcast:172.10.0.255 Mask:255.255.255.0 inet6 addr: fe80::216:3eff:fe42:3370/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3 errors:0 dropped:0 overruns:0 frame:0 TX packets:27 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:126 (126.0 b) TX bytes:2036 (1.9 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Thank you for your help.

    Read the article

  • Ubuntu 12.04 KVM running Ubuntu 12.04 with linux-image-virtual crash on boot

    - by D.Mill
    One of my VMs is stuck on "pause" in virsh. If I destroy and restart it, it will go to pause after a bit of time as "running". I can at best enter my username at login if I'm quick but it'll then shutdown. I don't know where to start with this so any help would be great!! I can access the VMs files via guestfish. the kern.log and syslog don't populate new lines. This is the last input I get from kern.log: Dec 13 11:21:08 soft201 kernel: imklog 5.8.6, log source = /proc/kmsg started. Dec 13 11:21:08 soft201 kernel: [ 0.000000] Initializing cgroup subsys cpuset Dec 13 11:21:08 soft201 kernel: [ 0.000000] Initializing cgroup subsys cpu Dec 13 11:21:08 soft201 kernel: [ 0.000000] Linux version 3.2.0-34-virtual (buildd@allspice) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #53-Ubuntu SMP Thu Nov 15 11:08:40 UTC 2012 (Ubuntu 3.2.0-34.53-virtual 3.2.33) Dec 13 11:21:08 soft201 kernel: [ 0.000000] Command line: root=UUID=61d48b48-a06a-48fb-842e-b38014086a93 ro quiet splash Dec 13 11:21:08 soft201 kernel: [ 0.000000] KERNEL supported cpus: Dec 13 11:21:08 soft201 kernel: [ 0.000000] Intel GenuineIntel Dec 13 11:21:08 soft201 kernel: [ 0.000000] AMD AuthenticAMD Dec 13 11:21:08 soft201 kernel: [ 0.000000] Centaur CentaurHauls Dec 13 11:21:08 soft201 kernel: [ 0.000000] BIOS-provided physical RAM map: Dec 13 11:21:08 soft201 kernel: [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) Dec 13 11:21:08 soft201 kernel: [ 0.000000] BIOS-e820: 0000000000100000 - 00000000dfffc000 (usable) Dec 13 11:21:08 soft201 kernel: [ 0.000000] BIOS-e820: 00000000dfffc000 - 00000000e0000000 (reserved) Dec 13 11:21:08 soft201 kernel: [ 0.000000] BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved) Dec 13 11:21:08 soft201 kernel: [ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved) Dec 13 11:21:08 soft201 kernel: [ 0.000000] BIOS-e820: 0000000100000000 - 0000000a20000000 (usable) Dec 13 11:21:08 soft201 kernel: [ 0.000000] NX (Execute Disable) protection: active Dec 13 11:21:08 soft201 kernel: [ 0.000000] DMI 2.4 present. Dec 13 11:21:08 soft201 kernel: [ 0.000000] DMI: Bochs Bochs, BIOS Bochs 01/01/2007 Dec 13 11:21:08 soft201 kernel: [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) Dec 13 11:21:08 soft201 kernel: [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) Dec 13 As you can see the last line gets cut off. I don't even know if this is that relevant. dmesg logs are empty. The qemu log for the VM returns this: 2012-12-13 12:29:47.584+0000: starting up LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 40960 -smp 14,sockets=14,cores=1,threads=1 -name numerink201 -uuid f4a889ed-a089-05d0-cc9d-9825ab1faeba -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/numerink201.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -drive file=/var/lib/libvirt/images/client.soft.fr/tmpcZAD9U.qcow2,if=none,id=drive-ide0-0-0,format=qcow2 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -fsdev local,security_model=none,id=fsdev-fs0,path=/home/shared_folders/soft201 -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=hostshare,bus=pci.0,addr=0x5 -netdev tap,fd=18,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:00:1d:b9:e7,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 char device redirected to /dev/pts/3 qemu: terminating on signal 15 from pid 28248 2012-12-13 12:30:14.455+0000: shutting down I've added more logging, libvirt.log gives me this: 2012-12-13 13:24:38.525+0000: 27694: info : libvirt version: 0.9.8 2012-12-13 13:24:38.525+0000: 27694: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory 2012-12-13 13:24:38.525+0000: 27694: warning : qemuCapsInit:856 : Failed to get host power management capabilities 2012-12-13 13:24:39.865+0000: 27694: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory 2012-12-13 13:24:39.865+0000: 27694: warning : lxcCapsInit:77 : Failed to get host power management capabilities 2012-12-13 13:24:39.866+0000: 27694: error : virExecWithHook:328 : Cannot find 'pm-is-supported' in path: No such file or directory 2012-12-13 13:24:39.866+0000: 27694: warning : umlCapsInit:87 : Failed to get host power management capabilities I don't really know where to go from here. I'll post whatever info you require

    Read the article

  • django, mod_wsgi, MySQL High CPU - Problems

    - by Red Rover
    I am having a problem with an OSQA site. It is Django/Apache/mod_wsgi configured site. Every hour, the CPU spikes to 164% (Average) for task HTTPD. After 10 minutes, it frees back up. I have reviewed the logs, cron tables, made many config changes, but cannot track this problem down. Can someone please look at the information below and let me know if it is a configuration problem, or if anyone else has experienced this issue. Running TOP shows HTTPD using 165% of CPU VMware performance monitor also displays spikes. This happens every hour for 10 minutes. I have the following information from server status Server Version: Apache/2.2.15 (Unix) DAV/2 mod_wsgi/3.2 Python/2.6.6 Server Built: Feb 7 2012 09:50:15 Current Time: Sunday, 10-Jun-2012 21:44:29 EDT Restart Time: Sunday, 10-Jun-2012 19:44:51 EDT Parent Server Generation: 0 Server uptime: 1 hour 59 minutes 37 seconds Total accesses: 1088 - Total Traffic: 11.5 MB CPU Usage: u80.26 s243.8 cu0 cs0 - 4.52% CPU load .152 requests/sec - 1682 B/second - 10.8 kB/request 4 requests currently being processed, 11 idle workers ....._..........__......W....................................... ...................................C._..._....._L__._L_._....... ...................... Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Reading Request, "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup, "C" Closing connection, "L" Logging, "G" Gracefully finishing, "I" Idle cleanup of worker, "." Open slot with no current process Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request 0-0 - 0/0/34 . 0.42 327 17 0.0 0.00 0.67 127.0.0.1 osqa.informs.org OPTIONS * HTTP/1.0 1-0 - 0/0/22 . 0.31 339 32 0.0 0.00 0.26 127.0.0.1 osqa.informs.org OPTIONS * HTTP/1.0 2-0 - 0/0/22 . 0.65 358 10 0.0 0.00 0.31 127.0.0.1 osqa.informs.org OPTIONS * HTTP/1.0 3-0 - 0/0/31 . 1.03 378 31 0.0 0.00 0.60 127.0.0.1 osqa.informs.org OPTIONS * HTTP/1.0 4-0 - 0/0/20 . 0.45 356 9 0.0 0.00 0.31 127.0.0.1 osqa.informs.org OPTIONS * HTTP/1.0 5-0 18852 0/16/34 _ 0.98 27 18120 0.0 0.37 0.62 69.180.250.36 osqa.informs.org GET /questions/289/what-is-the-difference-between-operations-re 6-0 - 0/0/32 . 0.94 309 29 0.0 0.00 0.64 127.0.0.1 osqa.informs.org OPTIONS * HTTP/1.0 7-0 - 0/0/31 . 1.15 382 32 0.0 0.00 0.75 127.0.0.1 osqa.informs.org OPTIONS * HTTP/1.0 8-0 - 0/0/21 . 0.28 403 19 0.0 0.00 0.20 127.0.0.1 osqa.informs.org OPTIONS * HTTP/1.0 9-0 - 0/0/32 . 1.37 288 16 0.0 0.00 0.60 127.0.0.1 osqa.informs.org OPTIONS * HTTP/1.0 10-0 - 0/0/33 . 1.72 383 16 0.0 0.00 0.40 127.0.0.1 osqa.informs.org OPTIONS * HTTP/1.0 I am running Django 1.3 This is a mod_wsgi configuration and copied is the wsgi.conf file: <IfModule !python_module> <IfModule !wsgi_module> LoadModule wsgi_module modules/mod_wsgi.so <IfModule wsgi_module> <Directory /var/www/osqa> Order allow,deny Allow from all #Deny from all </Directory> WSGISocketPrefix /var/run/wsgi WSGIPythonEggs /var/tmp WSGIDaemonProcess OSQA maximum-requests=10000 WSGIProcessGroup OSQA Alias /admin_media/ /usr/lib/python2.6/site-packages/Django-1.2.5-py2.6.egg/django/contrib/admin/media/ Alias /m/ /var/www/osqa/forum/skins/ Alias /upfiles/ /var/www/osqa/forum/upfiles/ <Directory /var/www/osqa/forum/skins> Order allow,deny Allow from all </Directory> WSGIScriptAlias / /var/www/osqa/osqa.wsgi </IfModule> </IfModule> </IfModule> This is the httpd.conf file Timeout 120 KeepAlive Off MaxKeepAliveRequests 100 MaxKeepAliveRequests 400 KeepAliveTimeout 3 <IfModule prefork.c> Startservers 15 MinSpareServers 10 MaxSpareServers 20 ServerLimit 50 MaxClients 50 MaxRequestsPerChild 0 </IfModule> <IfModule worker.c> StartServers 4 MaxClients 150 MinSpareThreads 25 MaxSpareThreads 75 ThreadsPerChild 25 MaxRequestsPerChild 0 </IfModule> We are using MySQL The server is an ESX4i, configured for the VM to use 4 CPUs and 8 GB Ram. Hyper threading is enabled, 2 physical CPU's, with 4 Logical. the CPU are Intel Xeon 2.8 GHz. Total memory is 12GB

    Read the article

  • Tomcat dying silently on regular basis

    - by Hendrik
    My tomcat (6.0.32, Java Sun 1.6.0_22-b04 on Ubuntu 10.04) keeps crashing multiple times daily without any specific output in catalina.out. This usually happens on high load (see top output). Update: The pid-file is properly removed when this happens. Update 2: No CATALINA_OPTS set, _JAVA_OPTS are: export _JAVA_OPTIONS="-Xms128m -Xmx1024m -XX:MaxPermSize=512m \ -XX:MinHeapFreeRatio=20 \ -XX:MaxHeapFreeRatio=40 \ -XX:NewSize=10m \ -XX:MaxNewSize=10m \ -XX:SurvivorRatio=6 \ -XX:TargetSurvivorRatio=80 \ -XX:+CMSClassUnloadingEnabled \ -Djava.awt.headless=true \ -Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.port=37331 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=true \ -Djava.rmi.server.hostname=(myhostname) \ -Dcom.sun.management.jmxremote.password.file=/etc/java-6-sun/management/jmxremote.password \ -Dcom.sun.management.jmxremote.access.file=/etc/java-6-sun/management/jmxremote.access" Top: top - 12:40:03 up 9 days, 12:15, 3 users, load average: 30.00, 22.39, 21.91 Tasks: 89 total, 4 running, 85 sleeping, 0 stopped, 0 zombie Cpu(s): 53.2%us, 9.7%sy, 0.0%ni, 34.7%id, 1.5%wa, 0.0%hi, 0.8%si, 0.0%st Mem: 4194304k total, 3311304k used, 883000k free, 0k buffers Swap: 4194304k total, 0k used, 4194304k free, 0k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25850 tomcat6 20 0 1981m 1.2g 11m S 161 29.6 11:41.56 java 12632 mysql 20 0 393m 97m 4452 S 141 2.4 1690:05 mysqld 14932 nobody 20 0 253m 44m 9152 R 56 1.1 3:26.57 php-cgi 7011 nobody 20 0 241m 31m 9124 S 30 0.8 1:35.96 php-cgi 10093 nobody 20 0 228m 18m 8520 S 25 0.5 2:29.97 php-cgi 27071 nobody 20 0 237m 28m 8640 S 11 0.7 3:13.72 php-cgi 3306 nobody 20 0 227m 16m 6736 R 7 0.4 2:29.83 php-cgi 7756 nobody 20 0 261m 58m 15m R 5 1.4 2:22.33 php-cgi 7129 www-data 20 0 3646m 7228 1896 S 2 0.2 0:36.65 nginx 2657 nobody 20 0 228m 18m 8540 S 1 0.5 1:59.51 php-cgi 7131 www-data 20 0 3645m 6464 1960 S 1 0.2 0:34.13 nginx 7140 www-data 20 0 3652m 12m 1896 S 1 0.3 0:35.80 nginx 619 nobody 20 0 231m 29m 15m S 0 0.7 2:33.46 php-cgi 16552 nobody 20 0 250m 41m 8784 S 0 1.0 2:48.12 php-cgi 17134 nobody 20 0 239m 37m 16m S 0 0.9 2:32.86 php-cgi 21004 nobody 20 0 243m 34m 8700 S 0 0.8 1:19.85 php-cgi 26105 root 20 0 19220 1392 1060 R 0 0.0 0:00.82 top 32430 nobody 20 0 256m 47m 9196 S 0 1.2 2:19.01 php-cgi 314 nobody 20 0 256m 47m 8804 S 0 1.1 1:46.00 php-cgi 2111 nobody 20 0 253m 44m 9196 S 0 1.1 3:01.14 php-cgi 2142 root 20 0 26452 2564 868 S 0 0.1 0:00.56 screen 2144 root 20 0 19484 2012 1368 S 0 0.0 0:00.00 bash 2333 nobody 20 0 249m 41m 9160 S 0 1.0 1:10.33 php-cgi 2552 root 20 0 19484 2260 1620 S 0 0.1 0:00.01 bash 2587 nobody 20 0 258m 49m 9192 S 0 1.2 2:04.50 php-cgi 2684 root 20 0 4092 652 540 S 0 0.0 0:00.00 xvfb-run 2696 root 20 0 60720 13m 2352 S 0 0.3 0:09.12 Xvfb 2759 root 20 0 617m 12m 4676 S 0 0.3 0:00.66 node 3514 nobody 20 0 270m 61m 9216 S 0 1.5 3:13.69 php-cgi 5270 root 20 0 25164 1324 1036 S 0 0.0 0:00.01 screen 5402 nobody 20 0 227m 16m 8032 S 0 0.4 1:33.61 php-cgi 5765 root 20 0 81180 3820 3028 S 0 0.1 0:00.31 sshd 5798 nobody 20 0 242m 32m 9124 S 0 0.8 1:52.08 php-cgi 5856 root 20 0 19496 2292 1636 S 0 0.1 0:00.03 bash 6442 root 20 0 62332 20m 1960 S 0 0.5 0:30.58 mrtg 7082 root 20 0 88992 1916 1636 S 0 0.0 0:00.00 PassengerWatchd I can't find any concrete reason for it, no Exceptions or messages of a shutdown in catalina.out (and no other logs in tomcat's log dir). I can start up the service and it will run for a few days or just minutes before dying again. Is there somewhere else i could look for output? Could the kernel start killing threads due to a lack of ressources and by that bring the VM down?

    Read the article

  • GNU/Linux swapping blocks system

    - by Ole Tange
    I have used GNU/Linux on systems from 4 MB RAM to 512 GB RAM. When they start swapping, most of the time you can still log in and kill off the offending process - you just have to be 100-1000 times more patient. On my new 32 GB system that has changed: It blocks when it starts swapping. Sometimes with full disk activity but other times with no disk activity. To examine what might be the issue I have written this program. The idea is: 1 grab 3% of the memory free right now 2 if that caused swap to increase: stop 3 keep the chunk used for 30 seconds by forking off 4 goto 1 - #!/usr/bin/perl sub freekb { my $free = `free|grep buffers/cache`; my @a=split / +/,$free; return $a[3]; } sub swapkb { my $swap = `free|grep Swap:`; my @a=split / +/,$swap; return $a[2]; } my $swap = swapkb(); my $lastswap = $swap; my $free; while($lastswap >= $swap) { print "$swap $free"; $lastswap = $swap; $swap = swapkb(); $free = freekb(); my $used_mem = "x"x(1024 * $free * 0.03); if(not fork()) { sleep 30; exit(); } } print "Swap increased $swap $lastswap\n"; Running the program forever ought to keep the system at the limit of swapping, but only grabbing a minimal amount of swap and do that very slowly (i.e. a few MB at a time at most). If I run: forever free | stdbuf -o0 timestamp > freelog I ought to see swap slowly rising every second. (forever and timestamp from https://github.com/ole-tange/tangetools). But that is not the behaviour I see: I see swap increasing in jumps and that the system is completely blocked during these jumps. Here the system is blocked for 30 seconds with the swap usage increases with 1 GB: secs 169.527 Swap: 18440184 154184 18286000 170.531 Swap: 18440184 154184 18286000 200.630 Swap: 18440184 1134240 17305944 210.259 Swap: 18440184 1076228 17363956 Blocked: 21 secs. Swap increase 2400 MB: 307.773 Swap: 18440184 581324 17858860 308.799 Swap: 18440184 597676 17842508 330.103 Swap: 18440184 2503020 15937164 331.106 Swap: 18440184 2502936 15937248 Blocked: 20 secs. Swap increase 2200 MB: 751.283 Swap: 18440184 885288 17554896 752.286 Swap: 18440184 911676 17528508 772.331 Swap: 18440184 3193532 15246652 773.333 Swap: 18440184 1404540 17035644 Blocked: 37 secs. Swap increase 2400 MB: 904.068 Swap: 18440184 613108 17827076 905.072 Swap: 18440184 610368 17829816 942.424 Swap: 18440184 3014668 15425516 942.610 Swap: 18440184 2073580 16366604 This is bad enough, but what is even worse is that the system sometimes stops responding at all - even if I wait for hours. I have the feeling it is related to the swapping issue, but I cannot tell for sure. My first idea was to tweak /proc/sys/vm/swappiness from 60 to 0 or 100, just to see if that had any effect at all. 0 did not have an effect, but 100 did cause the problem to arise less often. How can I prevent the system from blocking for such a long time? Why does it decide to swapout 1-3 GB when less than 10 MB would suffice?

    Read the article

  • Unable to connect to OpenVPN server

    - by Incognito
    I'm trying to get a working setup of OpenVPN on my VM and authenticate into it from a client. I'm not sure but it looks to me like it's socket related, as it's not set to LISTEN, and localhost seems wrong. I've never set up VPN before. # netstat -tulpn | grep vpn Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name udp 0 0 127.0.0.1:1194 0.0.0.0:* 24059/openvpn I don't think this is set up correctly. Here's some detail into what I've done. I have a VPS from MediaTemple: These are my interfaces before starting openvpn: lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:39482 errors:0 dropped:0 overruns:0 frame:0 TX packets:39482 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3237452 (3.2 MB) TX bytes:3237452 (3.2 MB) venet0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:127.0.0.1 P-t-P:127.0.0.1 Bcast:0.0.0.0 Mask:255.255.255.255 UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1 RX packets:4885284 errors:0 dropped:0 overruns:0 frame:0 TX packets:4679884 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:835278537 (835.2 MB) TX bytes:1989289617 (1.9 GB) venet0:0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:205.[redacted] P-t-P:205.186.148.82 Bcast:0.0.0.0 Mask:255.255.255.255 UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1 I've followed this guide on setting up a basic server and getting a .p12 file, however, I was receiving an error that stated /dev/net/tun was missing, so I created it mkdir -p /dev/net mknod /dev/net/tun c 10 200 chmod 600 /dev/net/tun This resolved the error preventing the service from launching, however, I am unable to connect. On the server I've set up the myserver.conf file (as per the tutorial) to indicate local 127.0.0.1 (I've also attempted with the public IP address, perhaps I don't understand what they mean by local IP?). The server launches without error, this is what the log looks like when it starts: Sun Apr 1 17:21:27 2012 OpenVPN 2.1.3 x86_64-pc-linux-gnu [SSL] [LZO2] [EPOLL] [PKCS11] [MH] [PF_INET6] [eurephia] built on Mar 11 2011 Sun Apr 1 17:21:27 2012 IMPORTANT: OpenVPN's default port number is now 1194, based on an official port number assignment by IANA. OpenVPN 2.0-beta16 and earlier used 5000 as the default port. Sun Apr 1 17:21:27 2012 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts Sun Apr 1 17:21:27 2012 /usr/bin/openssl-vulnkey -q -b 1024 -m <modulus omitted> Sun Apr 1 17:21:27 2012 TUN/TAP device tun0 opened Sun Apr 1 17:21:27 2012 /sbin/ifconfig tun0 10.8.0.1 pointopoint 10.8.0.2 mtu 1500 Sun Apr 1 17:21:27 2012 GID set to openvpn Sun Apr 1 17:21:27 2012 UID set to openvpn Sun Apr 1 17:21:27 2012 UDPv4 link local (bound): [AF_INET]127.0.0.1:1194 Sun Apr 1 17:21:27 2012 UDPv4 link remote: [undef] Sun Apr 1 17:21:27 2012 Initialization Sequence Completed This creates a tun0 interface that looks like this: tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:10.8.0.1 P-t-P:10.8.0.2 Mask:255.255.255.255 UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) And the netstat command still indicates the state is not set to LISTEN. On the client-side I've installed the p12 certs onto two devices (one is an android tablet, the other is an Ubuntu desktop). I don't see port 1194 as open either. Both clients install the cert files and then ask me for the L2TP secret (which was set on the file), but then they oddly ask me for a username and a password, which I don't know where I could possibly get those from. I attempted all of my logins, and some whacky guesses that were frantically pulling at straws. If there's any more information I could provide let me know.

    Read the article

  • Ubuntu 12 crashed and took down network

    - by Leopd
    We recently set up a new Ubuntu 12.04LTS server on our network. It's not fully configured so it's not doing much beyond sshd and a default apache2 install. But this evening it appears to have crashed. It wasn't responding to the network or the keyboard. But the worst part is, it took down the entire network. My knowledge of the network stack below OSI layer 3 is very limited, so the rest confuses me. When this machine was physically connected to the network, no other machine could connect to the outside internet. When things were broken, running arp showed that our gateway's IP address (10.0.1.1) was listed as "invalid." Unplugging the server from the network fixed the problem, and plugging it back in broke it again. So the crashed server was advertising itself as owning the gateway's IP address? There's nothing at all in syslog during the time when it was causing problems. Any ideas about how to figure out what went wrong or what we can do to prevent it from happening again? I'm hesitant to even put the machine back on the network right now. Update ** It crashed again, and I ran tcpdump -penn arp (thanks bahamat!) for several minutes and got this... (timestamps and duplicate lines removed) 00:1e:65:f8:dc:24 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 10.0.1.1 tell 10.0.2.191, length 46 00:1e:65:f8:dc:24 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 10.0.1.44 tell 10.0.2.191, length 46 60:d8:19:d4:71:d6 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 10.0.1.1 tell 10.0.2.125, length 46 d4:9a:20:04:e9:78 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.1 tell 192.168.1.100, length 28 Update 2 ** When the network is functioning properly, arping -c4 10.0.1.1 returns this: ARPING 10.0.1.1 60 bytes from c0:c1:c0:77:25:8e (10.0.1.1): index=0 time=267.982 usec 60 bytes from c0:c1:c0:77:25:8e (10.0.1.1): index=1 time=422.955 usec 60 bytes from c0:c1:c0:77:25:8e (10.0.1.1): index=2 time=299.215 usec 60 bytes from c0:c1:c0:77:25:8e (10.0.1.1): index=3 time=366.926 usec --- 10.0.1.1 statistics --- 4 packets transmitted, 4 packets received, 0% unanswered (0 extra) When the bad server is plugged in, arping -c4 10.0.1.1 returns: ARPING 10.0.1.1 --- 10.0.1.1 statistics --- 4 packets transmitted, 0 packets received, 100% unanswered (0 extra) Context ** 10.0.x.x is the main subnet. 10.0.1.1 is the main internet gateway 10.0.1.44 is a printer 10.0.2.* devices are all laptops / workstations I have no idea what's using the 192.168.x.x subnet -- your guesses are at least as good as mine. A VM on a workstation? A misconfigured WAP? Somebody re-sharing wifi? A machine that failed to DHCP? The offending ubuntu server's MAC address ends in cd:80 so isn't listed in the dump. It should DHCP to 10.0.3.3 Thanks for any help. This ARP stuff is all voodoo to me. Packets just go to IP addresses, right? ;)

    Read the article

  • Centos/OVH: public IP on KVM virtual machine

    - by Sébastien
    Since a few days, I'm trying to configure my KVM vm to have a public IP address, without any success. First, I'm on OVH, and you need to know they don't allow networking from different mac addresses. I have so registered a virtual mac address associated with my failover IP Here's my configuration: Guest wanted IP: 46.105.40.x Host IP: 176.31.240.x Host configuration dummy0 interface: ifcfg-dummy0 BOOTPROTO=static IPADDR=10.0.0.1 NETMASK=255.0.0.0 ONBOOT=yes NM_CONTROLLED=no ARP=yes BRIDGE=br0 br0 bridge: ifcfg-br0 DEVICE=br0 TYPE=Bridge DELAY=0 ONBOOT=yes BOOTPROTO=static IPADDR=192.168.1.1 NETMASK=255.255.255.0 PEERDNS=yes NM_CONTROLLED=no ARP=yes Failover ip is redirected to the br0 bridge with ip route add 46.105.40.xxx dev br0 > cat /proc/sys/net/ipv4/ip_forward 1 > cat /proc/sys/net/ipv4/conf/vnet0/proxy_arp 1 > route -n Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 176.31.240.254 0.0.0.0 UG 0 0 0 eth0 46.105.40.x 0.0.0.0 255.255.255.255 UH 0 0 0 br0 176.31.240.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 br0 Guest configuration: KVM: <interface type='bridge'> <mac address='02:00:00:30:22:05'/> <source bridge='br0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </interface> I've borrowed most of the OVH configuration here (in french, http://guides.ovh.com/BridgeClient) for the guest configuration eth0 interface: ifcfg-eth0 DEVICE="eth0" BOOTPROTO=none HWADDR="02:00:00:30:22:05" NM_CONTROLLED="yes" ONBOOT="yes" TYPE="Ethernet" UUID="e9138469-0d81-4ee6-b5ab-de0d7d17d1c8" USERCTL=no PEERDNS=yes IPADDR=46.105.40.xxx NETMASK=255.255.255.255 GATEWAY=176.31.240.254 ARP=yes For the routes, I have in route-eth0: 176.31.240.254 dev eth0 default via 176.31.240.254 dev eth0 With this configuration, I don't have any access to the internet. The only thing I can do is to ping the public ip of the host, nothing more. My final conclusion is that the route does not work, because, when, on the guest, I run ping 8.8.8.8, I have, on the host: > tcpdump -i vnet0 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br0, link-type EN10MB (Ethernet), capture size 65535 bytes 13:38:09.009324 IP 46-105-40-xxx.kimsufi.com > google-public-dns-a.google.com: ICMP echo request, id 50183, seq 1, length 64 13:38:09.815344 IP 46-105-40-xxx.kimsufi.com > google-public-dns-a.google.com: ICMP echo request, id 50183, seq 2, length 64 I never get the ping reply, only the request. It seems Guest - Host communication is fine. On eth0: > tcpdump -i eth0 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 13:39:40.240561 IP 46-105-40-xxx.kimsufi.com > google-public-dns-a.google.com: ICMP echo request, id 50439, seq 1, length 64 13:39:40.250161 IP google-public-dns-a.google.com > 46-105-40-xxx.kimsufi.com: ICMP echo reply, id 50439, seq 1, length 64 I have the request and the reply on eth0, but reply is not forwarded to the bridge. I really don't understand why, I though it was the aim of the route to do that! IPtables is disabled on both host and guest. I really hope some of you will be able to help me! Many thanks in advance, Sébastien

    Read the article

  • opennms postgres connection slow

    - by krisdigitx
    i am running the opennms application server on a physical server and the database on an ESXi VM. Recently the opennms webconsole has been very slow to load as such i deleted most of the events from the database table, now both servers have no load at all, and the psql connection from the application server to the database server is also very fast, but somehow opennms webconsole is still slow. this is the strace from the opennms process id: 18629 futex(0x2aaac77d8a84, FUTEX_WAIT_PRIVATE, 453, NULL <unfinished ...> 3015 futex(0x2aaabc4a2ee4, FUTEX_WAIT_PRIVATE, 323, NULL <unfinished ...> 10863 futex(0x2aaabbebaa94, FUTEX_WAIT_PRIVATE, 395, NULL <unfinished ...> 25260 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 10859 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 10982 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 3011 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 25260 futex(0x2aaae098fc28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 10982 futex(0x2aaac0eaf928, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 3011 futex(0x2aaab0cb1728, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 10859 futex(0x2aaac062c328, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 25260 <... futex resumed> ) = 0 10982 <... futex resumed> ) = 0 3011 <... futex resumed> ) = 0 10859 <... futex resumed> ) = 0 25260 futex(0x2aaabc38b6b4, FUTEX_WAIT_PRIVATE, 443, NULL <unfinished ...> 10982 futex(0x2aaabc5d7b94, FUTEX_WAIT_PRIVATE, 99, NULL <unfinished ...> 3011 futex(0x2aaac7c55334, FUTEX_WAIT_PRIVATE, 183, NULL <unfinished ...> 10859 futex(0x2aaabbb8c9d4, FUTEX_WAIT_PRIVATE, 347, NULL <unfinished ...> 10846 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 10846 futex(0x2aaae9022428, FUTEX_WAKE_PRIVATE, 1) = 0 10846 futex(0x2aaabe0030b4, FUTEX_WAIT_PRIVATE, 251, NULL <unfinished ...> 20281 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 14100 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 2925 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 10843 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 20281 futex(0x2aaac7e93628, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 14100 futex(0x2aaac04e8c28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 2925 futex(0x2aaaec085528, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 10843 futex(0x2aaab20b0528, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> and shows lots of connection timeout??? i think its the connection between the java application and database which is causing issues. any ideas how to troubleshoot this???

    Read the article

  • Add a small RAID card? Will it help overall stability and performance of my nine hard drives?

    - by Ray
    Hi, Will I get any extra genuine added performance and RAID stability if I insert a basic RAID card into a PCI-E x1 slot? I am considering the Adaptec 1220SA - 2 port SATA , pci-express (1x) , raid 0/1. Ok it only supports two SATA drives. Purpose is to help support the eight internal hard drives (1TB each), a DVD drive and an external e-SATA connected 2TB hard drive - by dealing with two of the internal hard drives. My current configuration of eight internal 1TB Barracuda (7200.12) SATA hard drives, one external 2TB SATA Western Digital Green Drive (e-SATA) and one DVD drive can already be supported by the Intel P55 & JMicron controllers on the ASUS motherboard : the Intel P55 (controls six HDD; configured as three x RAID 1), and the JMicron (controls two HDD as one RAID 1, as well as the DVD drive and the external SATA drive via the motherboard's e-SATA port (controlled by the JMicron)). Bigger picture details : I have an ASUS motherboard designed for the LGA1156 type processor and it includes the Intel P55 Express Chipset and JMicron. I am using the Intel Core i7-870 processor, and have 8GB DDR3 (1333) memory (four x 2GB Corsair DIMMs). Enough overall power. The power supply is more than sufficicient for the system. Corsair AX850. The system will never need the full 850 watts (future : second graphics card). The RAID card would provide hardware RAID 1 for two of the eight intrnal drives. It would either reduce the load on : the Intel P55 firmware RAID support, or replace the JMicron controller's RAID 1 set. I am busy installing the above configuration using Windows 7 Ultimate 64-bit as the OS. The RAID card is a last minute addition to the plan. Is it worth spending the extra R700 - R900 on the Adaptec 1220SA, or equivalent RAID card? I cannot afford to spend yet another R2000 - R3000 on a RAID card that would support many SATA2 hard drives, with a better RAID, example the RAID 5. My Issue & assumption : I am trusting that the Intel P55 chipset can properly handle six drives, configured as three * RAID 1. I am assuming that the JMicron can handle, using its RED SATA ports, one RAID-1 (two HDDs). The DVD drive connects to the JMicron optical SATA port 1 (white port 1). White port 2 is not used. The e-SATA connection is from the JMicron straight to, and through the motherboard - to an on-board (rear panel) e-SATA port. Am I being a little hopeful in only using the on-board Intel P55 and the JMicron? Is it a waste of money to install a RAID card that handles two SATA2 drives? OR Is it wisdom to take the pressure a little off the Intel P55? Obviously I am interested in data security, hence RAID 1, not RAID Zero. RAID 5 would be nice. The CPU, Intel Core i7-870 will provide the clout. Context to nine drives : I am using virtualisation with Windows 7 Ultimate. Bootable VMs. The operating system gets a mirror. Loaded apps gets a mirror. The current design data is kept in another mirror and Another mirror is back-up one and / or VM territory. Then the external 2TB drive (via e-SATA) is the next layer of data security and then finally, I use off-site data security. Thanks.

    Read the article

  • Bridged virtual interface is not available or visible to ifconfig.

    - by Omniwombat
    Hello all. I'm running Ubuntu 9.04, kernel 2.6.28-18, and vmware-server 2.0.1. I'm attempting to setup a virtual linux machine to use a bridged interface rather than NAT or host-only. Both NAT and host-only work just fine. When running vmware-config.pl, I set /dev/vmnet0 to bridge eth0, /dev/vmnet1 to host-only, and /dev/vmnet8 to NAT. When I run ifconfig -a I see the physical interface (eth0), vmnet1 and vmnet8 both of which are up and have IP addresses assigned to them. I also see other various interfaces that are not relevant here. In the web console, when I ask that the guest machine's network card be bridged, it states that a bridged setup is "Not available" and shows the disabled device icon. Inside the guest machine, I do have an eth0 interface which I can set to anything I like, however it can't see my external network, or the host. I do see errors in my vmware/hostd.log which state: "The network bridge on device vmnet0 is not running. The virtual machine will not be able to communicate with the host or with other machines on your network" which confirms the problem. vmnet-bridge is running, and I see the following in my process table: /usr/bin/vmnet-bridge -d /var/run/vmnet-bridge-0.pid -n 0 -i eth0 I confirm that the /var/run/vmnet-bridge-0.pid file is there and that it points to the correct process. I saw this question relating to Ubuntu 9.04 and bridged interfaces, in which the poster determined that the vsock library was not getting built due to a flaw in the vmware-config.pl script. I applied the patch, reran the script, and confirm that vsock.ko and vsock.o are in my /lib directory structure. vsock does show up in an lsmod. My /etc/vmware directory has /vmnet1 and /vmnet8 subdirectories. They contain configuration utilities for running DHCP and nat type services as expected. There is no vmnet0 subdirectory. My /etc/vmware/netmap.conf file DOES show entries for vmnet0; both the name and the device as I configured it from the script. My /dev directory contains devices vmnet0 through vmnet9. They have major device number 119, and minor device numbers 0 through 9. /proc/net/dev shows statistics for vmnet1 and vmnet8, but not vmnet0. I have a /proc/vmnet directory, but it's empty. When I start or stop the vmware service with /etc/init.d/vmware start, I see the following: Starting VMware services: Virtual machine monitor done Virtual machine communication interface done VM communication interface socket family: done Virtual ethernet done Bridged networking on /dev/vmnet0 done Host-only networking on /dev/vmnet1 (background) done DHCP server on /dev/vmnet1 done Host-only networking on /dev/vmnet8 (background) done DHCP server on /dev/vmnet8 done NAT service on /dev/vmnet8 done VMware Server Authentication Daemon (background) done Shared Memory Available done Starting VMware management services: VMware Server Host Agent (background) done VMware Virtual Infrastructure Web Access Starting VMware autostart virtual machines: Virtual machines done Nothing appears to be wrong there. What n00b thing am I doing such that vmnet0 and only vmnet0 does not show up in the interface list?

    Read the article

  • Why would VMWare to go defunct? How to recover from/prevent it?

    - by Josh
    I am running VMWare Server 2.0.2 (Build 203138) on a dual core Intel i5 with Ubuntu Server 10.04 LTS system (kernel 2.6.32-22-server #33-Ubuntu SMP). Disk Subsystem is a software RAID5 array. The system has been set up for a little over a week. For the past 5 days I have been running at leat 3 VMs (Linux and a variety of Windows OSes) with no issues whatsoever. But while I was installing Linux onto a new VM, suddenly all VMs became unresponsive, including the one I was installing to. I could not log in to the VMWare Management Interface, and the system was somewhat unresponsive via SSH. When I looked at top, I saw: top - 16:14:51 up 6 days, 1:49, 8 users, load average: 24.29, 24.33 17.54 Tasks: 203 total, 7 running, 195 sleeping, 0 stopped, 1 zombie Cpu(s): 0.2%us, 25.6%sy, 0.0%ni, 74.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8056656k total, 5927580k used, 2129076k free, 20320k buffers Swap: 7811064k total, 240216k used, 7570848k free, 5045884k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21549 root 39 19 0 0 0 Z 100 0.0 15:02.44 [vmware-vmx] <defunct> 2115 root 20 0 0 0 0 S 1 0.0 170:32.08 [vmware-rtc] 2231 root 21 1 1494m 126m 100m S 1 1.6 892:58.05 /usr/lib/vmware/bin/vmware-vmx -# product=2; 2280 jnet 20 0 19320 1164 800 R 0 0.0 30:04.55 top 12236 root 20 0 833m 41m 34m S 0 0.5 88:34.24 /usr/lib/vmware/bin/vmware-vmx -# product=2; 1 root 20 0 23704 1476 920 S 0 0.0 0:00.80 /sbin/init 2 root 20 0 0 0 0 S 0 0.0 0:00.01 [kthreadd] 3 root RT 0 0 0 0 S 0 0.0 0:00.00 [migration/0] 4 root 20 0 0 0 0 S 0 0.0 0:00.84 [ksoftirqd/0] 5 root RT 0 0 0 0 S 0 0.0 0:00.00 [watchdog/0] 6 root RT 0 0 0 0 S 0 0.0 0:00.00 [migration/1] The VMWare process for the virtual machine I was installing into became a zombie. Yet, it was still consuming 100% of the CPU time on one of the cores, and I couldn't reach it or any other virtual machines. (I was logged in to one virtual machine over SSH, another via X11, and a third via VNC. All three connections died). When I ran ps -ef and similar commands, I found that the defunct vmware-vmx process had it's parent PID set to init (1). I also used lsof -p 21549 and found that the defunct process had no open files. Yet it was using 100% of CPU time... I was unable to kill any vmware-vmx processes, including the defunct one, even with kill -9. As a last resort to resolve the situation I tried to reboot the box, however shutdown, halt, reboot, and init 6 all failed to reboot/shutdown, even when given appropriate --force settings. ControlAltDel produced a message about rebooting on the console, but the system would not reboot. I had to hard power-cycle the box to resolve the situation. (See my other question, Should I worry about the integrity of my linux software RAID5 after a crash or kernel panic?) What would cause a scenario like this? What else could I have done to resolve it besides a hard reboot? What can I do to prevent such a situation in the future?

    Read the article

  • Linux software RAID6: rebuild slow

    - by Ole Tange
    I am trying to find the bottleneck in the rebuilding of a software raid6. ## Pause rebuilding when measuring raw I/O performance # echo 1 > /proc/sys/dev/raid/speed_limit_min # echo 1 > /proc/sys/dev/raid/speed_limit_max ## Drop caches so that does not interfere with measuring # sync ; echo 3 | tee /proc/sys/vm/drop_caches >/dev/null # time parallel -j0 "dd if=/dev/{} bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg 4000+0 records in 4000+0 records out 1048576000 bytes (1.0 GB) copied, 7.30336 s, 144 MB/s [... similar for each disk ...] # time parallel -j0 "dd if=/dev/{} skip=15000000 bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg 4000+0 records in 4000+0 records out 1048576000 bytes (1.0 GB) copied, 12.7991 s, 81.9 MB/s [... similar for each disk ...] So we can read sequentially at 140 MB/s in the outer tracks and 82 MB/s in the inner tracks on all the drives simultaneously. Sequential write performance is similar. This would lead me to expect a rebuild speed of 82 MB/s or more. # echo 800000 > /proc/sys/dev/raid/speed_limit_min # echo 800000 > /proc/sys/dev/raid/speed_limit_max # cat /proc/mdstat md2 : active raid6 sdbd[10](S) sdbc[9] sdbf[0] sdbm[8] sdbl[7] sdbk[6] sdbe[11] sdbj[4] sdbi[3](F) sdbh[2] sdbg[1] 27349121408 blocks super 1.2 level 6, 128k chunk, algorithm 2 [9/8] [UUU_UUUUU] [=========>...........] recovery = 47.3% (1849905884/3907017344) finish=855.9min speed=40054K/sec But we only get 40 MB/s. And often this drops to 30 MB/s. # iostat -dkx 1 sdbc 0.00 8023.00 0.00 329.00 0.00 33408.00 203.09 0.70 2.12 1.06 34.80 sdbd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdbe 13.00 0.00 8334.00 0.00 33388.00 0.00 8.01 0.65 0.08 0.06 47.20 sdbf 0.00 0.00 8348.00 0.00 33388.00 0.00 8.00 0.58 0.07 0.06 48.00 sdbg 16.00 0.00 8331.00 0.00 33388.00 0.00 8.02 0.71 0.09 0.06 48.80 sdbh 961.00 0.00 8314.00 0.00 37100.00 0.00 8.92 0.93 0.11 0.07 54.80 sdbj 70.00 0.00 8276.00 0.00 33384.00 0.00 8.07 0.78 0.10 0.06 48.40 sdbk 124.00 0.00 8221.00 0.00 33380.00 0.00 8.12 0.88 0.11 0.06 47.20 sdbl 83.00 0.00 8262.00 0.00 33380.00 0.00 8.08 0.96 0.12 0.06 47.60 sdbm 0.00 0.00 8344.00 0.00 33376.00 0.00 8.00 0.56 0.07 0.06 47.60 iostat says the disks are not 100% busy (but only 40-50%). This fits with the hypothesis that the max is around 80 MB/s. Since this is software raid the limiting factor could be CPU. top says: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 38520 root 20 0 0 0 0 R 64 0.0 2947:50 md2_raid6 6117 root 20 0 0 0 0 D 53 0.0 473:25.96 md2_resync So md2_raid6 and md2_resync are clearly busy taking up 64% and 53% of a CPU respectively, but not near 100%. The chunk size (128k) of the RAID was chosen after measuring which chunksize gave the least CPU penalty. If this speed is normal: What is the limiting factor? Can I measure that? If this speed is not normal: How can I find the limiting factor? Can I change that?

    Read the article

  • Looking for advice on Hyper-v storage replication

    - by Notre1
    I am designing a 2-host Hyper-V R2 cluster with 6-10 guests stored on a SMB iSCSI SAN device (probably Promise VessRAID). I will be getting at least two of the SAN devices and need to eliminate the storage a single point of failure. Ideally, that would involve real-time failover for the storage, like the Windows failover clustering does for the hosts. This design will be used at around six of our sites, and I would like to allow for us to eventually setup a cluster at colocation site and replicate each site's VMs there for DR. (Ideally a live multi-site cluster, but a manual import of the VMs would be fine for this sort of DR.) The tools that come with enterprise SANs, like EMC and NetApp, seem to be the most commonly used items for a Hyper-V cluster, but I can't afford their prices with my budget. Outside of them, the two tools that seem to be most common for Hyper-V storage replication are SteelEye (now SIOS) DataKeeper Cluster Edition and Double-Take Availability. Originally, I was planning on using Clustered Shared Volume(s) (CSV), but it seems like replication support for these is either not available or brand new in both these products. It looks like CSVs are supported in Double-Take 5.22, see this discussion, but I don't think I want to run something that new in production. Right now, it seems like the best option for me is not to implement CSVs, implement some sort of storage replication, and upgrade to CSVs at a later date once replicating them is more mature. I would love to have live migration, and CSVs are not required for live migration if you are using one LUN per VM, so I guess this is what I'll do. I would prefer to stick to the using the Microsoft Windows Server and Hyper-V tools and features as much as possible. From that standpoint, SteelEye looks more appealing than Double-Take because they make the DataKeeper volume(s) available to the Failover Clustering Manager and then failover clustering is all configured and managed through the native Microsoft tools. Double-Take says that "clustered Hyper-V hosts are not supported," and Double-Take Availability itself seems to be what is used for the actual clustering and failover. Does anyone know if any of these replication tools work with more than two hosts in the cluster? All the information I can find on the web only uses two hosts in their examples. Are there any better tools than SteelEye and Double-Take for doing what I am trying to do, which is eliminate the storage as as single point of failure? Neverfail, AppAssure, and DataCore all seem to offer similar functionality, but they don't seems to be as popular as SteelEye and Double-Take. I have seen a number of people suggest using Starwind iSCSI SAN software for the shared storage, which includes replication (and CSV replication at that). There are a couple of reasons I have not seriously considered this route: 1) The company I work for is exclusively a Dell shop and Dell does not have any servers with that I can pack with more than six 3.5" SATA drives. 2) In the future, it could be advantegous for us to not be locked into a particular brand or type of storage and third-party replication softwares all allow replication to heterogeneous storage devices. I am pretty new to iSCSI and clustering, so please let me know if it looks like I am planning something that goes against best practices or overlooking/missing something.

    Read the article

  • Where's my memory?! Nginx + PHP-FPM front end webserver slows to a crawl...

    - by incredimike
    I'm not sure if I have a problem with a memory leak (as my hosting company suggests), or if we both need to read http://linuxatemyram.com. Maybe you clever people can help us out? This is a front-end webserver VM running essentially only nginx & php-fpm on RHEL 5.5. This server is powering Magento, a PHP eCommerce thinggy. The server is running in a shared environment, but we're changing that soon. Anyway.. after a reboot the server runs just fine, but within a day it will grind itself into nothingness. Pages will take literally 2 minutes to load, CPU spikes like crazy, etc.. The console is even sluggish when I SSH in. It's like my whole server is being brought to its knees. I've also been monitoring the DB server via top and tcpdumping incoming traffic. The DB stays idle for a good portion of that "slow" load time. When i start seeing queries coming from the front-end server, the page loads soon afterward. Here are some stats after me logging in during a slow-down, after restarting php-fpm: [mike@front01 ~]$ free -m total used free shared buffers cached Mem: 5963 5217 745 0 192 314 -/+ buffers/cache: 4711 1252 Swap: 4047 4 4042 [mike@front01 ~]$ top top - 11:38:55 up 2 days, 1:01, 3 users, load average: 0.06, 0.17, 0.21 Tasks: 131 total, 1 running, 130 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.3%sy, 0.0%ni, 99.3%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 6106800k total, 5361288k used, 745512k free, 199960k buffers Swap: 4144728k total, 4976k used, 4139752k free, 328480k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31806 apache 15 0 601m 120m 37m S 0.0 2.0 0:22.23 php-fpm 31805 apache 15 0 549m 66m 31m S 0.0 1.1 0:14.54 php-fpm 31809 apache 16 0 547m 65m 32m S 0.0 1.1 0:12.84 php-fpm 32285 apache 15 0 546m 63m 33m S 0.0 1.1 0:09.22 php-fpm 32373 apache 15 0 546m 62m 32m S 0.0 1.1 0:09.66 php-fpm 31808 apache 16 0 543m 60m 35m S 0.0 1.0 0:18.93 php-fpm 31807 apache 16 0 533m 49m 30m S 0.0 0.8 0:08.93 php-fpm 32092 apache 15 0 535m 48m 27m S 0.0 0.8 0:06.67 php-fpm 4392 root 18 0 194m 10m 7184 S 0.0 0.2 0:06.96 cvd 4064 root 15 0 154m 8304 4220 S 0.0 0.1 3:55.57 snmpd 4394 root 15 0 119m 5660 2944 S 0.0 0.1 0:02.84 EvMgrC 31804 root 15 0 519m 5180 932 S 0.0 0.1 0:00.46 php-fpm 4138 ntp 15 0 23396 5032 3904 S 0.0 0.1 0:02.38 ntpd 643 nginx 15 0 95276 4408 1524 S 0.0 0.1 0:01.15 nginx 5131 root 16 0 90128 3340 2600 S 0.0 0.1 0:01.41 sshd 28467 root 15 0 90128 3340 2600 S 0.0 0.1 0:00.35 sshd 32602 root 16 0 90128 3332 2600 S 0.0 0.1 0:00.36 sshd 1614 root 16 0 90128 3308 2588 S 0.0 0.1 0:00.02 sshd 2817 root 5 -10 7216 3140 1724 S 0.0 0.1 0:03.80 iscsid 4161 root 15 0 66948 2340 800 S 0.0 0.0 0:10.35 sendmail 1617 nicole 17 0 53876 2000 1516 S 0.0 0.0 0:00.02 sftp-server ... Is there anything else I should be looking at, or any more information that might be useful? I'm just a developer, but the slowdowns on this system worry me and make it hard to do my work.. Help me out, ServerFault!

    Read the article

  • Looking for advice on Hyper-v storage replication

    - by Notre1
    I am designing a 2-host Hyper-V R2 cluster with 6-10 guests stored on a SMB iSCSI SAN device (probably Promise VessRAID). I will be getting at least two of the SAN devices and need to eliminate the storage a single point of failure. Ideally, that would involve real-time failover for the storage, like the Windows failover clustering does for the hosts. This design will be used at around six of our sites, and I would like to allow for us to eventually setup a cluster at colocation site and replicate each site's VMs there for DR. (Ideally a live multi-site cluster, but a manual import of the VMs would be fine for this sort of DR.) The tools that come with enterprise SANs, like EMC and NetApp, seem to be the most commonly used items for a Hyper-V cluster, but I can't afford their prices with my budget. Outside of them, the two tools that seem to be most common for Hyper-V storage replication are SteelEye (now SIOS) DataKeeper Cluster Edition and Double-Take Availability. Originally, I was planning on using Clustered Shared Volume(s) (CSV), but it seems like replication support for these is either not available or brand new in both these products. It looks like CSVs are supported in Double-Take 5.22, see this discussion, but I don't think I want to run something that new in production. Right now, it seems like the best option for me is not to implement CSVs, implement some sort of storage replication, and upgrade to CSVs at a later date once replicating them is more mature. I would love to have live migration, and CSVs are not required for live migration if you are using one LUN per VM, so I guess this is what I'll do. I would prefer to stick to the using the Microsoft Windows Server and Hyper-V tools and features as much as possible. From that standpoint, SteelEye looks more appealing than Double-Take because they make the DataKeeper volume(s) available to the Failover Clustering Manager and then failover clustering is all configured and managed through the native Microsoft tools. Double-Take says that "clustered Hyper-V hosts are not supported," and Double-Take Availability itself seems to be what is used for the actual clustering and failover. Does anyone know if any of these replication tools work with more than two hosts in the cluster? All the information I can find on the web only uses two hosts in their examples. Are there any better tools than SteelEye and Double-Take for doing what I am trying to do, which is eliminate the storage as as single point of failure? Neverfail, AppAssure, and DataCore all seem to offer similar functionality, but they don't seems to be as popular as SteelEye and Double-Take. I have seen a number of people suggest using Starwind iSCSI SAN software for the shared storage, which includes replication (and CSV replication at that). There are a couple of reasons I have not seriously considered this route: 1) The company I work for is exclusively a Dell shop and Dell does not have any servers with that I can pack with more than six 3.5" SATA drives. 2) In the future, it could be advantegous for us to not be locked into a particular brand or type of storage and third-party replication softwares all allow replication to heterogeneous storage devices. I am pretty new to iSCSI and clustering, so please let me know if it looks like I am planning something that goes against best practices or overlooking/missing something.

    Read the article

  • Using PHP to connect to RADIUS works on one server but not another

    - by JDS
    I have a fleet of webservers that server a LAMP webapp broken into multiple customer apps by virtualhost/domain. The platform is Ubuntu 10.04 VM + PHP 5.3 + Apache 2.2.14, on top of VMware ESX (v4 I think). This stuff's not too important, though -- I'm just setting up the background. I have one customer that connects to a RADIUS server for authentication. We've found that the app responds as if some number of web servers are configured correctly and some are not. i.e. Apparently random authentication failures or successes, with no rhyme or reason. I did a lot of analysis of our fleet, and resolved it down to the differences between two specific web servers. I'll call them "A" and "B". "A" works. "B" does not. "Works" means "connects to and gets authentication data successfully from the RADIUS server". Ultimately, I'm looking for one thing that is different, and I've exhausted everything that I can come up with, so, looking for something else. Here are things I've looked at PHP package versions (all from Ubuntu repos). These are exactly the same across servers. PECL package. There are no PECL packages that aren't installed by apt. Other libraries or packages. Nothing that was network-related or RADIUS-related was different among servers. (There were some minor package differences, though.) Network or hosting environment. I found that some of the working servers were on the same physical environment as some not-working ones (i.e. same ESX containers). So, probably, the physical network layer is not the problem. Test case. I created a test case as follows. It works on the working servers, and fails on the not-working servers, very consistently. <?php $radius = radius_auth_open(); $username = 'theusername'; $password = 'thepassword'; $hostname = '12.34.56.78'; $radius_secret = '39wmmvxghg'; if (! radius_add_server($radius,$hostname,0,$radius_secret,5,3)) { die('Radius Error 1: ' . radius_strerror($radius) . "\n"); } if (! radius_create_request($radius,RADIUS_ACCESS_REQUEST)) { die('Radius Error 2: ' . radius_strerror($radius) . "\n"); } radius_put_attr($radius,RADIUS_USER_NAME,$username); radius_put_attr($radius,RADIUS_USER_PASSWORD,$password); switch (radius_send_request($radius)) { case RADIUS_ACCESS_ACCEPT: echo 'GOOD LOGIN'; break; case RADIUS_ACCESS_REJECT: echo 'BAD LOGIN'; break; case RADIUS_ACCESS_CHALLENGE: echo 'CHALLENGE REQUESTED'; break; default: die('Radius Error 3: ' . radius_strerror($radius) . "\n"); } ?>

    Read the article

  • Distributed and/or Parallel SSIS processing

    - by Jeff
    Background: Our company hosts SaaS DSS applications, where clients provide us data Daily and/or Weekly, which we process & merge into their existing database. During business hours, load in the servers are pretty minimal as it's mostly users running simple pre-defined queries via the website, or running drill-through reports that mostly hit the SSAS OLAP cube. I manage the IT Operations Team, and so far this has presented an interesting "scaling" issue for us. For our daily-refreshed clients, the server is only "busy" for about 4-6 hrs at night. For our weekly-refresh clients, the server is only "busy" for maybe 8-10 hrs per week! We've done our best to use some simple methods of distributing the load by spreading the daily clients evenly among the servers such that we're not trying to process daily clients back-to-back over night. But long-term this scaling strategy creates two notable issues. First, it's going to consume a pretty immense amount of hardware that sits idle for large periods of time. Second, it takes significant Production Support over-head to basically "schedule" the ETL such that they don't over-lap, and move clients/schedules around if they out-grow the resources on a particular server or allocated time-slot. As the title would imply, one option we've tried is running multiple SSIS packages in parallel, but in most cases this has yielded VERY inconsistent results. The most common failures are DTExec, SQL, and SSAS fighting for physical memory and throwing out-of-memory errors, and ETLs running 3,4,5x longer than expected. So from my practical experience thus far, it seems like running multiple ETL packages on the same hardware isn't a good idea, but I can't be the first person that doesn't want to scale multiple ETLs around manual scheduling, and sequential processing. One option we've considered is virtualizing the servers, which obviously doesn't give you any additional resources, but moves the resource contention onto the hypervisor, which (from my experience) seems to manage simultaneous CPU/RAM/Disk I/O a little more gracefully than letting DTExec, SQL, and SSAS battle it out within Windows. Question to the forum: So my question to the forum is, are we missing something obvious here? Are there tools out there that can help manage running multiple SSIS packages on the same hardware? Would it be more "efficient" in terms of parallel execution if instead of running DTExec, SQL, and SSAS same machine (with every machine running that configuration), we run in pairs of three machines with SSIS running on one machine, SQL on another, and SSAS on a third? Obviously that would only make sense if we could process more than the three ETL we were able to process on the machine independently. Another option we've considered is completely re-architecting our SSIS package to have one "master" package for all clients that attempts to intelligently chose a server based off how "busy" it already is in terms of CPU/Memory/Disk utilization, but that would be a herculean effort, and seems like we're trying to reinvent something that you would think someone would sell (although I haven't had any luck finding it). So in summary, are we missing an obvious solution for this, and does anyone know if any tools (for free or for purchase, doesn't matter) that facilitate running multiple SSIS ETL packages in parallel and on multiple servers? (What I would call a "queue & node based" system, but that's not an official term). Ultimately VMWare's Distributed Resource Scheduler addresses this as you simply run a consistent number of clients per VM that you know will never conflict scheduleing-wise, then leave it up to VMWare to move the VMs around to balance out hardware usage. I'm definitely not against using VMWare to do this, but since we're a 100% Microsoft app stack, it seems like -someone- out there would have solved this problem at the application layer instead of the hypervisor layer by checking on resource utilization at the OS, SQL, SSAS levels. I'm open to ANY discussion on this, and remember no suggestion is too crazy or radical! :-) Right now, VMWare is the only option we've found to get away from "manually" balancing our resources, so any suggestions that leave us on a pure Microsoft stack would be great. Thanks guys, Jeff

    Read the article

  • Invalid or expired security context token in WCF web service

    - by Damian
    All, I have a WCF web service (let's called service "B") hosted under IIS using a service account (VM, Windows 2003 SP2). The service exposes an endpoint that use WSHttpBinding with the default values except for maxReceivedMessageSize, maxBufferPoolSize, maxBufferSize and some of the time outs that have been increased. The web service has been load tested using Visual Studio Load Test framework with around 800 concurrent users and successfully passed all tests with no exceptions being thrown. The proxy in the unit test has been created from configuration. There is a sharepoint application that use the Office Sharepoint Server Search service to call web services "A" and "B". The application will get data from service "A" to create a request that will be sent to service "B". The response coming from service "B" is indexed for search. The proxy is created programmatically using the ChannelFactory. When service "A" takes less than 10 minutes, the calls to service "B" are successfull. But when service "A" takes more time (~20 minutes) the calls to service "B" throw the following exception: Exception Message: An unsecured or incorrectly secured fault was received from the other party. See the inner FaultException for the fault code and detail Inner Exception Message: The message could not be processed. This is most likely because the action 'namespace/OperationName' is incorrect or because the message contains an invalid or expired security context token or because there is a mismatch between bindings. The security context token would be invalid if the service aborted the channel due to inactivity. To prevent the service from aborting idle sessions prematurely increase the Receive timeout on the service endpoint's binding. The binding settings are the same, the time in both client server and web service server are synchronize with the Windows Time service, same time zone. When i look at the server where web service "B" is hosted i can see the following security errors being logged: Source: Security Category: Logon/Logoff Event ID: 537 User NT AUTHORITY\SYSTEM Logon Failure: Reason: An error occurred during logon Logon Type: 3 Logon Process: Kerberos Authentication Package: Kerberos Status code: 0xC000006D Substatus code: 0xC0000133 After reading some of the blogs online, the Status code means STATUS_LOGON_FAILURE and the substatus code means STATUS_TIME_DIFFERENCE_AT_DC. but i already checked both server and client clocks and they are syncronized. I also noticed that the security token seems to be cached somewhere in the client server because they have another process that calls the web service "B" using the same service account and successfully gets data the first time is called. Then they start the proccess to update the office sharepoint server search service indexes and it fails. Then if they called the first proccess again it will fail too. Has anyone experienced this type of problems or have any ideas? Regards, --Damian

    Read the article

  • FILESTREAM/FILETABLE Clarifications for Implementation

    - by user1209734
    Recently our team was looking at FILESTREAM to expand the capabilities of our proprietary application. The main purpose of this app is managing the various PDFS, Images and documents to all of the parts we manufacture. Our ASP application uses a few third party tools to allow viewing of these files. We currently have 980GB of data on the Fileserver. We have around 200GB of Binary data in SQL Server that we would like to extract since it is not performing well hence FILESTREAM seems to be a good compromise to the two major data storage/access issues. A few things are not exactly clear to us: FILESTREAM Can or Cannot store its data on a drive that is not locally attached. We already have a File Server with a RAID 10 (1.5TB drives). This server stores all of the documents right now, would we have to move these drives to the SQL Server for FILESTREAM? That would be a tough bullet to bite since the server also is doubling as the Application Server (Two VMs on one physical server). FILETABLE stores the common metadata about the files but where is the Full Text part of it stored to allow searching of files like doc/docx? Is this separate? Are you able to freely add criteria to this to search by? If so any links to clarify would be appreciated. Can FILETABLE be referenced in another table with a foreign key? Thank you in advance EDIT: For those having these questions this web video covered everything and more in terms of explaining filestream from 2008 to 2012 and the cavets to consider (I would seriously rep him if I could): http://channel9.msdn.com/Events/TechDays/Techdays-2012-the-Netherlands/2270 In conclusion we will not be using FILESTREAM as it would be way to huge of an upsurge to accommodate for investment. EDIT 2: Update to #1 - After carefully assessing FileTable in addition to FILESTREAM we got a winning combination. We did have to move the files over to the new server (wasn't to painful since they were on the same VM).It honestly took more time to write an extraction tool to dump the binary data within SQL to the File System. Update to #2 - This was seperate but again Bob had an excellent webinar explaining this: http://channel9.msdn.com/Events/TechEd/Europe/2012/DBI411 Update to #3 - Using TFT inheritance we recycled the Docs table we had (minus the huge binary blobs) which required very little changes in our legacy apps. This was a huge upshot for the developer team.

    Read the article

  • Mirth Transformer Error

    - by Ryan H
    I'm getting the following error when trying to convert HL7v3 to HL7v2 The message passed in is: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"> <S:Body> <PRPA_IN201306UV02 xmlns="urn:hl7-org:v3" xmlns:ns2="urn:gov:hhs:fha:nhinc:common:nhinccommon" xmlns:ns3="urn:gov:hhs:fha:nhinc:common:patientcorrelationfacade" xmlns:ns4="http://schemas.xmlsoap.org/ws/2004/08/addressing" ITSVersion="XML_1.0"> <id extension="4ae5403:12752e71a17:-7b52" root="1.1.1"/> ... </PRPA_IN201306UV02> </S:Body> </S:Envelope> The error I get is: ERROR-300: Transformer error ERROR MESSAGE: Error evaluating transformer com.webreach.mirth.server.MirthJavascriptTransformerException: CHANNEL: v3v2ConversionResponseMessage CONNECTOR: sourceConnector SCRIPT SOURCE: LINE NUMBER: 5 DETAILS: TypeError: The prefix "S" for element "S:Envelope" is not bound. at com.webreach.mirth.server.mule.transformers.JavaScriptTransformer.evaluateScript(JavaScriptTransformer.java:460) at com.webreach.mirth.server.mule.transformers.JavaScriptTransformer.transform(JavaScriptTransformer.java:356) at org.mule.transformers.AbstractEventAwareTransformer.doTransform(AbstractEventAwareTransformer.java:48) at org.mule.transformers.AbstractTransformer.transform(AbstractTransformer.java:197) at org.mule.transformers.AbstractTransformer.transform(AbstractTransformer.java:200) at org.mule.impl.MuleEvent.getTransformedMessage(MuleEvent.java:251) at org.mule.routing.inbound.SelectiveConsumer.isMatch(SelectiveConsumer.java:61) at org.mule.routing.inbound.InboundMessageRouter.route(InboundMessageRouter.java:83) at org.mule.providers.AbstractMessageReceiver$DefaultInternalMessageListener.onMessage(AbstractMessageReceiver.java:493) at org.mule.providers.AbstractMessageReceiver.routeMessage(AbstractMessageReceiver.java:272) at org.mule.providers.AbstractMessageReceiver.routeMessage(AbstractMessageReceiver.java:231) at com.webreach.mirth.connectors.vm.VMMessageReceiver.getMessages(VMMessageReceiver.java:207) at org.mule.providers.TransactedPollingMessageReceiver.poll(TransactedPollingMessageReceiver.java:108) at org.mule.providers.PollingMessageReceiver.run(PollingMessageReceiver.java:90) at org.mule.impl.work.WorkerContext.run(WorkerContext.java:290) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Unknown Source) When I remove the S: tag in front of the Envelope and Body and redefine the namespace to default, it gives me a new error "TypeError: The prefix "xsi" for attribute "xsi:nil" associated with an element type "targetMessage" is not bound." referring to <targetMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/> As if mirth can't handle the namespaces being defined on the same line as the first use of that element. Any suggestions would be useful

    Read the article

  • Eclipse Galileo won't start after OS X update to 10.6.3

    - by GC
    Hi All, I have just updated os x to 10.6.3 and no Eclipse won't start the logs show the following error, but I can't figure it out. Can anyone shed any light? !SESSION 2010-03-30 10:06:38.244 ----------------------------------------------- eclipse.buildId=M20090917-0800 java.version=1.6.0_17 java.vendor=Apple Inc. BootLoader constants: OS=macosx, ARCH=x86, WS=cocoa, NL=en_US Framework arguments: -product org.eclipse.epp.package.php.product -keyring /Users/gav/.eclipse_keyring -showlocation Command-line arguments: -os macosx -ws cocoa -arch x86 -product org.eclipse.epp.package.php.product -keyring /Users/gav/.eclipse_keyring -showlocation !ENTRY org.eclipse.ui.workbench 2 0 2010-03-30 10:06:40.139 !MESSAGE A handler conflict occurred. This may disable some commands. !SUBENTRY 1 org.eclipse.ui.workbench 2 0 2010-03-30 10:06:40.139 !MESSAGE Conflict for 'com.aptana.ide.editors.views.actions.actionKeyCommand': HandlerActivation(commandId=com.aptana.ide.editors.views.actions.actionKeyCommand, handler=com.aptana.ide.editors.views.actions.ActionKeyCommandHandler, expression=,sourcePriority=0) HandlerActivation(commandId=com.aptana.ide.editors.views.actions.actionKeyCommand, handler=com.aptana.ide.editors.views.actions.ActionKeyCommandHandler, expression=,sourcePriority=0) !ENTRY org.eclipse.ui 4 0 2010-03-30 10:06:40.964 !MESSAGE Unhandled event loop exception !STACK 0 java.lang.NullPointerException at org.eclipse.swt.graphics.Device.getFontList(Device.java:369) at org.eclipse.jface.resource.FontRegistry.filterData(FontRegistry.java:465) at org.eclipse.jface.resource.FontRegistry.createFont(FontRegistry.java:499) at org.eclipse.jface.resource.FontRegistry.defaultFontRecord(FontRegistry.java:563) at org.eclipse.jface.resource.FontRegistry.defaultFontData(FontRegistry.java:575) at org.eclipse.jface.resource.FontRegistry.getFontData(FontRegistry.java:591) at org.eclipse.ui.internal.themes.ThemeElementHelper.installFont(ThemeElementHelper.java:116) at org.eclipse.ui.internal.themes.ThemeElementHelper.populateRegistry(ThemeElementHelper.java:59) at org.eclipse.ui.internal.Workbench$33.runWithException(Workbench.java:1482) at org.eclipse.ui.internal.StartupThreading$StartupRunnable.run(StartupThreading.java:31) at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35) at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:134) at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3405) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3102) at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2316) at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221) at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332) at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:493) at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149) at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:113) at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:194) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:368) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514) at org.eclipse.equinox.launcher.Main.run(Main.java:1311) It looks like the update may have upgraded the Java version, possibly :S but I don't know if this can be rolled back even if it did update it. java version "1.6.0_17" Java(TM) SE Runtime Environment (build 1.6.0_17-b04-248-10M3025) Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01-101, mixed mode) Thanks in advance!

    Read the article

  • Managing database connections in an Android Activity

    - by Daniel Lew
    I have an application with a ListActivity that uses a CursorAdapter as its adapter. The ListActivity opens the database and does the querying for the CursorAdapter, which is all well and good, but I am having issues with figuring out when to close both the Cursor and the SQLiteDatabase. The way things are handled right now, if the user finishes the activity, I close the database and the cursor. However, this still ends up with the DalvikVM warning me that I've left a database open - for example, if the user hits the "home" button (leaving the activity in the task's stack), rather than the "back" button. If I close them during pause and then re-query during resume, then I don't get any errors, but then a user cannot return to the list without it requerying (and thus losing the user's place in the list). By this I mean, the user can click on any item in the list and open a new activity based on it, but will often want to hit "back" afterwards and return to the same place on the list. If I requery, then I cannot return the user back to the correct spot. What is the proper way to handle this issue? I want the list to remain scrolled properly, but I don't want the VM to keep complaining about unclosed databases. Edit: Here's a general outline of how I handle the code at the moment: public class MyListActivity extends ListActivity { private Cursor mCursor; private CursorAdapter mAdapter; protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); mAdapter = new MyCursorAdapter(this); setListAdapter(mAdapter); } protected void onPause() { super.onPause(); if (isFinishing()) { mCursor.close(); } } protected void onDestroy() { super.onDestroy(); mCursor.close(); } private void updateQuery() { // If we had a cursor open before, close it. if (mCursor != null) { mCursor.close(); } MyDbHelper dbHelper = new MyDbHelper(this); SQLiteDatabase db = dbHelper.getReadableDatabase(); mCursor = db.query(...); mAdapter.changeCursor(mCursor); db.close(); } } updateQuery() can be called multiple times because the user can filter the results via menu items (I left this part out of the code, as the problem still occurs even if the user does no filtering). Again, the issue is that when I hit home I get leak errors. Yet, after going home, I can go back to the app and find my list again - cursor fully intact.

    Read the article

  • SQL 2008 R2 login/network issue

    - by martinjd
    I have a Windows Server 2008 R2 new clean install , not a VM, that I have added to a Windows Server 2003 based domain using my account which has domain admin rights. The domain functional level is 2003. I performed a clean install of SQL Server 2008 R2 using my account which has domain admin rights. The installation completed without any errors. I logged into SSMS locally and attempted to add another domain account by clicking Search, Advanced and finding the user in the domain. When I return to the "Dialog - New" window and click OK I receive the following error: Create failed for Login 'Domain\User'. (Microsoft.SqlServer.Smo) An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo) Windows NT user or group 'Domain\User' not found. Check the name again. (Microsoft SQL Server, Error: 15401) I have verified that the firewall is off, tried adding a different domain user, tried using SA to add a user, installed the hotfix for KB 976494 and verified that the Local Security Policy for Domain Member: Digitally encrypt or sign secure channel Domain Member: Digitally encrypt secure channel Domain Member: Digitally sign secure channel are disabled none of which have made a difference. I can RDP to a Server 2003 server running SQL 2008 and add the same domain user without issue. Also if I try to connect with SSMS to the sql server from another system on the domain using my account I get the following error: Login failed. The login is from an untrusted domain and cannot be used with Windows authentication. (Microsoft SQL Server, Error: 18452) and on the database server I see the following in the security event log: An account failed to log on. Subject: Security ID: NULL SID Account Name: - Account Domain: - Logon ID: 0x0 Logon Type: 3 Account For Which Logon Failed: Security ID: NULL SID Account Name: myUserName Account Domain: MYDOMAIN Failure Information: Failure Reason: An Error occured during Logon. Status: 0xc000018d Sub Status: 0x0 Process Information: Caller Process ID: 0x0 Caller Process Name: - Network Information: Workstation Name: MYWKS Source Network Address: - Source Port: - Detailed Authentication Information: Logon Process: NtLmSsp Authentication Package: NTLM Transited Services: - Package Name (NTLM only): - Key Length: 0 I am sure that the "NULL SID" has some significant meaning but have no idea at this point what the issue could be.

    Read the article

< Previous Page | 160 161 162 163 164 165 166 167 168 169 170 171  | Next Page >