Search Results

Search found 6580 results on 264 pages for 'require'.

Page 92/264 | < Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99 | Next Page >

Hadoop: Mapping binary files

- by restrictedinfinity

Typically in a the input file is capable of being partially read and processed by Mapper function (as in text files). Is there anything that can be done to handle binaries (say images, serialized objects) which would require all the blocks to be on same host, before the processing can start.

Read the article
Connecting debian and windows via IPsec VPN with Racoon and ipsec-tools

- by Michi Qne

I've some trouble with the IPsec configuration on my debian server (6 squeeze). This server should connect via IPsec VPN to an windows server, which is protected by an firewall. I've used racoon and ipsec-tools and this tutorial http://wiki.debian.org/IPsec. However, I am not quite sure, if this tutorial fits to my purpose, because of some differences: my Host and my gateway are the same server. So I don't have two different ip addresses. I guess, that's not a problem the other server is an windows system behind a firewall. Hopefully, not a problem the subnet of the windows system is /32 not /24. So I change it to /32. I worked through the tutorial step by step, but I wasn't able to route the ip. The following command didn't work for me: ip route add to 172.16.128.100/32 via XXX.XXX.XXX.XXX src XXX.XXX.XXX.XXX So I tried the following instead: ip route add to 172.16.128.100 .., which obviously not solved the problem. The next problem is the compression. The windows doesn't use a compression, but 'compression_algorithm none;' doesn't work with my racoon. So the current value is 'compression_algorithm deflate;' So my current result looks like this: When I am trying to ping the windows host (ping 172.16.128.100), I receive the following error message from ping: ping: sendmsg: Operation not permitted And racoon logs: racoon: ERROR: failed to get sainfo. After googling for a while I came to no conclusion, what's the solution. Does this error message mean that the first phase of IPsec works? I am thankful for any advice. I guess my configs might be helpful. My racoon.conf looks like this: path pre_shared_key "/etc/racoon/psk.txt"; remote YYY.YYY.YYY.YYY { exchange_mode main; proposal { lifetime time 8 hour; encryption_algorithm 3des; hash_algorithm sha1; authentication_method pre_shared_key; dh_group 2; } } sainfo address XXX.XXX.XXX.XXX/32 any address 172.16.128.100/32 any { pfs_group 2; lifetime time 8 hour; encryption_algorithm aes 256; authentication_algorithm hmac_sha1; compression_algorithm deflate; } And my ipsec-tools.conf looks like this: flush; spdflush; spdadd XXX.XXX.XXX.XXX/32 172.16.128.100/32 any -P out ipsec esp/tunnel/XXX.XXX.XXX.XXX-YYY.YYY.YYY.YYY/require; spdadd 172.16.128.100/32 XXX.XXX.XXX.XXX/32 any -P in ipsec esp/tunnel/YYY.YYY.YYY.YYY-XXX.XXX.XXX.XXX/require; If anyone has an advice, that would be awesome. Thanks in Advance. Greets, Michael It was a simple copy-and-paste error in an ip address.

Read the article
How should I ask for help in getting my emails to stop bouncing?

- by Gregg Williams

For several months, people have been telling me that emails they sent to me have been bouncing back, marked as undeliverable. The bounce message would contain portions like this: Final-Recipient: rfc822;[email protected] Action: failed Status: 5.7.1 Diagnostic-Code: smtp;550 5.7.1 <[email protected]>... Recipient declines email from 69.64.159.2, <spamhaus-xbl>, Ref: http://www.spamhaus.org/query/bl?ip=69.64.159.2 Clicking the link on the last line, the destination page told me that "this IP address is infected with/emitting spamware/spamtrojan traffic and needs to be fixed." I could temporarily de-list this node by clicking a link on that page, but it would get back on the list and more emails to me to bounce. I own a domain, innerpaths.net, and I normally use [email protected] for my email. I have my domain registrar, namecheap.com, forward all email from innerpaths.net to the email account [email protected]. (BTW, I had this same problem at a former registrar. I changed registrars, hoping that would fix the problem. It didn't.) Trying to isolate the problem, I asked namecheap.com what I should do. Their answer, though substantial, left me scratching my head: We have received feedback from our upstream provider which informed us that the mail server that you are trying to email subscribes to a 3rd party blacklist service which they appear to be listed on at the present time and is causing destination mail server to reject the messages. Being blocked with one of these services can happen to anyone for many reasons and is something that is beyond our control. 3rd party blacklist services require companies whose mail servers they have blacklisted, pay fees in order to be removed from their lists. As we cannot pay fees to blacklist services which require them for removal, you should contact your email provider and have them whitelist our mail server IP address: 69.64.157.73. My best guess is that I should email my ISP, sonic.net, tell them what is going on and ask them to whitelist the IP address 69.64.157.73. (If not, please let me know.) But I want to know what is going on and how email works. I understand that there's a device at location 69.64.159.2 that is doing something bad that causes the "destination mail server [sonic.net's, I assume --gw] to reject the messages." I know that email is sent through multiple devices in a way that eventually gets it to its destination. Beyond that, here are my questions: 1) I thought the Internet "routed around damage." Why does email starting at namecheap.com always (or is it 'sometimes'?) go through 69.64.159.2? 2) Who is the "upstream provider" that the namecheap.com representative mentions, and what is their role? 3) How does having sonic.net's whitelisting namecheap.com's mail server prevent my email being bounced by 69.64.159.2? I've searched the Internet for answers but have found nothing useful. Thanks for whatever answers you can provide.

Read the article
cherrypy fails to stop when puppet tries to ensure running and refresh it at the same time

- by ento

I am trying to manage a cherrypy service with puppet. However, when the configuration is applied, cherryd ends up with no PID file although the process is up and running. Since the PID file is lost I can no longer stop the process with /etc/init.d/mycherryd stop (unless I modify the handmade init script to lookup the PID with ps or something.) $ /etc/init.d/mycherryd status cherryd dead but subsys locked The problem seems to be that puppet is trying to refresh/restart cherryd (triggered by changes in configuration files) immediately after ensuring it's running (as specified in the manifest), and cherrypy fails to stop and start (restart) itself while still executing its startup process. Is there a clear cut solution to this problem? Is this a bug on the cherrypy side, or can I write a puppet manifest so refresh is called only after the service is up and running? Any suggestion welcome. cherrypy log See how cherrypy catches SIGTERM midway through startup and still starts to listen. :cherrypy.error[18666] 2010-02-12 13:10:23,551 INFO: ENGINE Listening for SIGHUP. :cherrypy.error[18666] 2010-02-12 13:10:23,552 INFO: ENGINE Listening for SIGTERM. :cherrypy.error[18666] 2010-02-12 13:10:23,552 INFO: ENGINE Listening for SIGUSR1. :cherrypy.error[18666] 2010-02-12 13:10:23,552 INFO: ENGINE Bus STARTING :cherrypy.error[18671] 2010-02-12 13:10:23,554 INFO: ENGINE Daemonized to PID: 18671 :cherrypy.error[18671] 2010-02-12 13:10:23,554 INFO: ENGINE PID 18671 written to '/var/mycherryd/cherry.pid'. :cherrypy.error[18671] 2010-02-12 13:10:23,555 INFO: ENGINE Started monitor thread '_TimeoutMonitor'. :cherrypy.error[18670] 2010-02-12 13:10:23,556 INFO: ENGINE Forking twice. :cherrypy.error[18666] 2010-02-12 13:10:23,557 INFO: ENGINE Forking once. :cherrypy.error[18671] 2010-02-12 13:10:23,716 INFO: ENGINE Caught signal SIGTERM. :cherrypy.error[18671] 2010-02-12 13:10:23,716 INFO: ENGINE Bus STOPPING :cherrypy.error[18671] 2010-02-12 13:10:23,716 INFO: ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('0.0.0.0', 12380)) already shut down :cherrypy.error[18671] 2010-02-12 13:10:23,717 INFO: ENGINE Stopped thread '_TimeoutMonitor'. :cherrypy.error[18671] 2010-02-12 13:10:23,717 INFO: ENGINE Bus STOPPED :cherrypy.error[18671] 2010-02-12 13:10:23,732 INFO: ENGINE Bus EXITING :cherrypy.error[18671] 2010-02-12 13:10:23,759 INFO: ENGINE PID file removed: '/var/mycherryd/cherry.pid'. :cherrypy.error[18671] 2010-02-12 13:10:23,782 INFO: ENGINE Bus EXITED :cherrypy.error[18671] 2010-02-12 13:10:23,792 INFO: ENGINE Serving on 0.0.0.0:12380 :cherrypy.error[18671] 2010-02-12 13:10:23,826 INFO: ENGINE Bus STARTED puppet log puppet tries to refresh the service immediately after ensuring it to be 'running'. Feb 12 13:10:22 localhost puppetd[8159]: (//mycherrypy/File[conffiles]) Scheduling refresh of Service[cherryd] Feb 12 13:10:22 localhost last message repeated 12 times Feb 12 13:10:23 localhost puppetd[8159]: (//mycherrypy/Service[mycherryd]/ensure) ensure changed 'stopped' to 'running' Feb 12 13:10:23 localhost puppetd[8159]: (//mycherrypy/Service[mycherryd]) Triggering 'refresh' from 13 dependencies Feb 12 13:11:23 localhost puppetd[8159]: (//mycherrypy/Service[mycherryd]) Failed to call refresh on Service[mycherryd]: Could not stop Service[mycherryd]: Execution of '/sbin/service mycherryd stop' returned 1: at /etc/puppet/manifests/mycherrypy.pp:161 Feb 12 13:11:24 localhost puppetd[8159]: Value of 'preferred_serialization_format' (pson) is invalid for report, using default (marshal) Feb 12 13:11:24 localhost puppetd[8159]: Finished catalog run in 99.25 seconds puppet manifest excerpt class mycherrypy { file { 'rpm': path => "/tmp/${apiserver}.i386.rpm", source => "${fileserver}/${apiserver}.i386.rpm"; 'conffiles': require => Package["${apiserver}"], path => "${service_home}/config/", ensure => present, source => "${fileserver}/config/", notify => Service["mycherryd"]; } package { "$apiserver": provider => 'rpm', source => "/tmp/${apiserver}.i386.rpm", ensure => latest; } service { "mycherryd": require => [File["conffiles"], Package["${apiserver}"]], ensure => running, provider => redhat, hasstatus => true; } }

Read the article
cherrypy fails to stop when puppet tries to ensure running and refresh it at the same time

- by ento

I am trying to manage a cherrypy service with puppet. However, when the configuration is applied, cherryd ends up with no PID file although the process is up and running. Since the PID file is lost I can no longer stop the process with /etc/init.d/mycherryd stop (unless I modify the handmade init script to lookup the PID with ps or something.) $ /etc/init.d/mycherryd status cherryd dead but subsys locked The problem seems to be that puppet is trying to refresh/restart cherryd (triggered by changes in configuration files) immediately after ensuring it's running (as specified in the manifest), and cherrypy fails to stop and start (restart) itself while still executing its startup process. Is there a clear cut solution to this problem? Is this a bug on the cherrypy side, or can I write a puppet manifest so refresh is called only after the service is up and running? Any suggestion welcome. cherrypy log See how cherrypy catches SIGTERM midway through startup and still starts to listen. :cherrypy.error[18666] 2010-02-12 13:10:23,551 INFO: ENGINE Listening for SIGHUP. :cherrypy.error[18666] 2010-02-12 13:10:23,552 INFO: ENGINE Listening for SIGTERM. :cherrypy.error[18666] 2010-02-12 13:10:23,552 INFO: ENGINE Listening for SIGUSR1. :cherrypy.error[18666] 2010-02-12 13:10:23,552 INFO: ENGINE Bus STARTING :cherrypy.error[18671] 2010-02-12 13:10:23,554 INFO: ENGINE Daemonized to PID: 18671 :cherrypy.error[18671] 2010-02-12 13:10:23,554 INFO: ENGINE PID 18671 written to '/var/mycherryd/cherry.pid'. :cherrypy.error[18671] 2010-02-12 13:10:23,555 INFO: ENGINE Started monitor thread '_TimeoutMonitor'. :cherrypy.error[18670] 2010-02-12 13:10:23,556 INFO: ENGINE Forking twice. :cherrypy.error[18666] 2010-02-12 13:10:23,557 INFO: ENGINE Forking once. :cherrypy.error[18671] 2010-02-12 13:10:23,716 INFO: ENGINE Caught signal SIGTERM. :cherrypy.error[18671] 2010-02-12 13:10:23,716 INFO: ENGINE Bus STOPPING :cherrypy.error[18671] 2010-02-12 13:10:23,716 INFO: ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('0.0.0.0', 12380)) already shut down :cherrypy.error[18671] 2010-02-12 13:10:23,717 INFO: ENGINE Stopped thread '_TimeoutMonitor'. :cherrypy.error[18671] 2010-02-12 13:10:23,717 INFO: ENGINE Bus STOPPED :cherrypy.error[18671] 2010-02-12 13:10:23,732 INFO: ENGINE Bus EXITING :cherrypy.error[18671] 2010-02-12 13:10:23,759 INFO: ENGINE PID file removed: '/var/mycherryd/cherry.pid'. :cherrypy.error[18671] 2010-02-12 13:10:23,782 INFO: ENGINE Bus EXITED :cherrypy.error[18671] 2010-02-12 13:10:23,792 INFO: ENGINE Serving on 0.0.0.0:12380 :cherrypy.error[18671] 2010-02-12 13:10:23,826 INFO: ENGINE Bus STARTED puppet log puppet tries to refresh the service immediately after ensuring it to be 'running'. Feb 12 13:10:22 localhost puppetd[8159]: (//mycherrypy/File[conffiles]) Scheduling refresh of Service[cherryd] Feb 12 13:10:22 localhost last message repeated 12 times Feb 12 13:10:23 localhost puppetd[8159]: (//mycherrypy/Service[mycherryd]/ensure) ensure changed 'stopped' to 'running' Feb 12 13:10:23 localhost puppetd[8159]: (//mycherrypy/Service[mycherryd]) Triggering 'refresh' from 13 dependencies Feb 12 13:11:23 localhost puppetd[8159]: (//mycherrypy/Service[mycherryd]) Failed to call refresh on Service[mycherryd]: Could not stop Service[mycherryd]: Execution of '/sbin/service mycherryd stop' returned 1: at /etc/puppet/manifests/mycherrypy.pp:161 Feb 12 13:11:24 localhost puppetd[8159]: Value of 'preferred_serialization_format' (pson) is invalid for report, using default (marshal) Feb 12 13:11:24 localhost puppetd[8159]: Finished catalog run in 99.25 seconds puppet manifest excerpt class mycherrypy { file { 'rpm': path => "/tmp/${apiserver}.i386.rpm", source => "${fileserver}/${apiserver}.i386.rpm"; 'conffiles': require => Package["${apiserver}"], path => "${service_home}/config/", ensure => present, source => "${fileserver}/config/", notify => Service["mycherryd"]; } package { "$apiserver": provider => 'rpm', source => "/tmp/${apiserver}.i386.rpm", ensure => latest; } service { "mycherryd": require => [File["conffiles"], Package["${apiserver}"]], ensure => running, provider => redhat, hasstatus => true; } }

Read the article
Linux Kernel not passing through multicast UDP packets

- by buecking

Recently I've set up a new Ubuntu Server 10.04 and noticed my UDP server is no longer able to see any multicast data sent to the interface, even after joining the multicast group. I've got the exact same set up on two other Ubuntu 8.04.4 LTS machines and there is no problem receiving data after joining the same multicast group. The ethernet card is a Broadcom netXtreme II BCM5709 and the driver used is: b $ ethtool -i eth1 driver: bnx2 version: 2.0.2 firmware-version: 5.0.11 NCSI 2.0.5 bus-info: 0000:01:00.1 I'm using smcroute to manage my multicast registrations. b$ smcroute -d b$ smcroute -j eth1 233.37.54.71 After joining the group ip maddr shows the newly added registration. b$ ip maddr 1: lo inet 224.0.0.1 inet6 ff02::1 2: eth0 link 33:33:ff:40:c6:ad link 01:00:5e:00:00:01 link 33:33:00:00:00:01 inet 224.0.0.1 inet6 ff02::1:ff40:c6ad inet6 ff02::1 3: eth1 link 01:00:5e:25:36:47 link 01:00:5e:25:36:3e link 01:00:5e:25:36:3d link 33:33:ff:40:c6:af link 01:00:5e:00:00:01 link 33:33:00:00:00:01 inet 233.37.54.71 <------- McastGroup. inet 224.0.0.1 inet6 ff02::1:ff40:c6af inet6 ff02::1 So far so good, I can see that I'm receiving data for this multicast group. b$ sudo tcpdump -i eth1 -s 65534 host 233.37.54.71 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 65534 bytes 09:30:09.924337 IP 192.164.1.120.58848 > 233.37.54.71.15572: UDP, length 212 09:30:09.947547 IP 192.164.1.120.58848 > 233.37.54.71.15572: UDP, length 212 09:30:10.108378 IP 192.164.1.120.58866 > 233.37.54.71.15574: UDP, length 268 09:30:10.196841 IP 192.164.1.120.58848 > 233.37.54.71.15572: UDP, length 212 ... I can also confirm that the interface is receiving mcast packets. b $ ethtool -S eth1 | grep mcast_pack rx_mcast_packets: 103998 tx_mcast_packets: 33 Now here's the problem. When I try to capture the traffic using a simple ruby UDP server I receive zero data! Here's a simple server that reads data send on port 15572 and prints the first two characters. This works on the two 8.04.4 Ubuntu Servers, but not the 10.04 server. require 'socket' s = UDPSocket.new s.bind("", 15572) 5.times do text, sender = s.recvfrom(2) puts text end If I send a UDP packet crafted in ruby to localhost, the server receives it and prints out the first two characters. So I know that the server above is working correctly. irb(main):001:0> require 'socket' => true irb(main):002:0> s = UDPSocket.new => #<UDPSocket:0x7f3ccd6615f0> irb(main):003:0> s.send("I2 XXX", 0, 'localhost', 15572) When I check the protocol statistics I see that InMcastPkts is not increasing. While on the other 8.04 servers, on the same network, received a few thousands packets in 10 seconds. b $ netstat -sgu ; sleep 10 ; netstat -sgu IcmpMsg: InType3: 11 OutType3: 11 Udp: 446 packets received 4 packets to unknown port received. 0 packet receive errors 461 packets sent UdpLite: IpExt: InMcastPkts: 4654 <--------- Same as below OutMcastPkts: 3426 InBcastPkts: 9854 InOctets: -1691733021 OutOctets: 51187936 InMcastOctets: 145207 OutMcastOctets: 109680 InBcastOctets: 1246341 IcmpMsg: InType3: 11 OutType3: 11 Udp: 446 packets received 4 packets to unknown port received. 0 packet receive errors 461 packets sent UdpLite: IpExt: InMcastPkts: 4656 <-------------- Same as above OutMcastPkts: 3427 InBcastPkts: 9854 InOctets: -1690886265 OutOctets: 51188788 InMcastOctets: 145267 OutMcastOctets: 109712 InBcastOctets: 1246341 If I try forcing the interface into promisc mode nothing changes. At this point I'm stuck. I've confirmed the kernel config has multicast enabled. Perhaps there are other config options I should be checking? b $ grep CONFIG_IP_MULTICAST /boot/config-2.6.32-23-server CONFIG_IP_MULTICAST=y Any thoughts on where to go from here?

Read the article
Choice of an OS for a home ZFS NAS

- by OlafM

I am preparing a home NAS with an old Athlon 64 X2 3800+, 4 GB ECC RAM, Asus M2V MX motherboard, and a single 3 TB WDC Green (another one as mirror may be installed in the future). It's the cheapest solution I found that includes ECC memory and the higher energy consumption is offset by the lower (zero) cost of acquisition. The system will be used for: music storage and stream to other desktop computers; storage of the scanned dia slides (3-4k slides, 180 MB TIFF each one plus reduced quality JPEG version); stream of these photos to a local iPad 2 (maybe Plex App? not yet sure); (one additional) remote backup via rsync/ssh or ZFS send/receive. It will be controlled via remote ssh, maybe VNC, no monitor attached. Absolute requirement is a reliable ZFS solution, plus the ability to easily install packets/software/virtual machines and to update remotely (I will be the admin and I don't live near the NAS). I have mainly three options: NAS4free/FreeNAS OpenIndiana Solaris Express 11 (yeah yeah I know the license requirements, I will write a perl script on it to count it as development machine). Problems: NAS4free/FreeNAS (I tested only NAS4free) required embedded installation for remote upgrading, but full install for easy addition of software packets. Since I need at least AirVideo Server (linux/win) and Plex App (win/linux) to stream the photos and some videos to iPad (they both require virtualbox), but I cannot be there to install updates, NAS4free/FreeNAS are excluded. http://www.nas4free.org/general_information.html explains the issue: embedded can be remotely updated, full cannot. Solaris has also another advantage: Crashplan client supports Solaris and I'm already using it for other backups. I would like to leave the option open, even if I will be doing backups probably through zfs send/receive. NexentaStor was left out because zfs send/receive are not included in the free version. The question is now Solaris 11 Express over OpenIndiana. To ease the management, I will be using http://www.napp-it.org Which one would you suggest and why? I found lots of informations and it's difficult for me to decide. I think (from the napp-it manual) that Solaris has some additional options for SMB shares, but are they really needed at home? I think I won't even use ACLs, since normal unix-style permissions are enough. OpenIndiana has maybe more frequent updates (Solaris offers only security updates between releases), but again, do I need them? I don't think so. Moreover, this is a NAS that has to work and nothing else, I cannot risk having problems that require me to access the server. Isn't OpenIndiana a bit more... cutting edge (in the Solaris world)? I'm just asking, no need to focus on this for the answer :-) I would limit myself to these two options (SE11.1/OI) also because I will be making a NAS for me in the future (where high performances with Mac shares are also required) and Solaris has kernel support for AFP. I will use this server to gather experience as well. After this long question, thanks in advance! If you need additional info, let me know and I will update this post.

Read the article
Choice of an OS for a home ZFS NAS

- by OlafM

I am preparing a home NAS with an old Athlon 64 X2 3800+, 4 GB ECC RAM, Asus M2V MX motherboard, and a single 3 TB WDC Green (another one as mirror may be installed in the future). It's the cheapest solution I found that includes ECC memory and the higher energy consumption is offset by the lower (zero) cost of acquisition. The system will be used for: music storage and stream to other desktop computers; storage of the scanned dia slides (3-4k slides, 180 MB TIFF each one plus reduced quality JPEG version); stream of these photos to a local iPad 2 (maybe Plex App? not yet sure); (one additional) remote backup via rsync/ssh or ZFS send/receive. It will be controlled via remote ssh, maybe VNC, no monitor attached. Absolute requirement is a reliable ZFS solution, plus the ability to easily install packets/software/virtual machines and to update remotely (I will be the admin and I don't live near the NAS). I have mainly three options: NAS4free/FreeNAS OpenIndiana Solaris Express 11 (yeah yeah I know the license requirements, I will write a perl script on it to count it as development machine). Problems: NAS4free/FreeNAS (I tested only NAS4free) required embedded installation for remote upgrading, but full install for easy addition of software packets. Since I need at least AirVideo Server (linux/win) and Plex App (win/linux) to stream the photos and some videos to iPad (they both require virtualbox), but I cannot be there to install updates, NAS4free/FreeNAS are excluded. http://www.nas4free.org/general_information.html explains the issue: embedded can be remotely updated, full cannot. Solaris has also another advantage: Crashplan client supports Solaris and I'm already using it for other backups. I would like to leave the option open, even if I will be doing backups probably through zfs send/receive. NexentaStor was left out because zfs send/receive are not included in the free version. The question is now Solaris 11 Express over OpenIndiana. To ease the management, I will be using http://www.napp-it.org Which one would you suggest and why? I found lots of informations and it's difficult for me to decide. I think (from the napp-it manual) that Solaris has some additional options for SMB shares, but are they really needed at home? I think I won't even use ACLs, since normal unix-style permissions are enough. OpenIndiana has maybe more frequent updates (Solaris offers only security updates between releases), but again, do I need them? I don't think so. Moreover, this is a NAS that has to work and nothing else, I cannot risk having problems that require me to access the server. Isn't OpenIndiana a bit more... cutting edge (in the Solaris world)? I'm just asking, no need to focus on this for the answer :-) I would limit myself to these two options (SE11.1/OI) also because I will be making a NAS for me in the future (where high performances with Mac shares are also required) and Solaris has kernel support for AFP. I will use this server to gather experience as well. After this long question, thanks in advance! If you need additional info, let me know and I will update this post. UPDATES Given the first answers, I will strongly suggest the person paying the hardware to insert a second HD. Better 2x2TB than 1x3TB (3 TB is oversized anyway). I was trying to keep the initial costs down to spread them over a longer period, but better having something good from the beginning.

Read the article
Issue in nginx proxying to apache

- by Luis Masuelli

My current nginx configuration is as follows: specific configuration for (currently two) domains: server { listen 443 ssl; server_name studiotv.service.tebusco.lan phpmyadmin.service.tebusco.lan; ssl_certificate /home/administrador/nginx-confs/ssl/service.tebusco.lan.crt; ssl_certificate_key /home/administrador/nginx-confs/ssl/service.tebusco.lan.key; location / { proxy_pass http://127.0.0.1:8180; proxy_set_header Host $http_host:8180; } } default configuration for unmatched ssl connections: server { listen 443 default ssl; ssl_certificate /home/administrador/nginx-confs/ssl/service.tebusco.lan.crt; ssl_certificate_key /home/administrador/nginx-confs/ssl/service.tebusco.lan.key; location / { return 403; } } http configuration: server { listen 80; rewrite ^ https://$host$request_uri? permanent; } The intention is clear: Redirect http traffic to https. Proxy each https:// call from phpmyadmin.service.tebusco.lan and studiotv.service.tebusco.lan to apache2. This includes passing a host header, which is detected. Each unmatched ssl connection must return a 403 in nginx. Does not even reach apache2. In the apache2 side of the life, I have a default site, and a non-default site which will match studiotv.service.tebusco.lan: 000-default.conf file (available and enabled): <VirtualHost 127.0.0.1:8180> # The ServerName directive sets the request scheme, hostname and port that # the server uses to identify itself. This is used when creating # redirection URLs. In the context of virtual hosts, the ServerName # specifies what hostname must appear in the request's Host: header to # match this virtual host. For the default virtual host (this file) this # value is not decisive as it is used as a last resort host regardless. # However, you must set it for any further virtual host explicitly. ServerName localhost ServerAdmin webmaster@localhost DocumentRoot /var/www/html <Directory /var/www/html> Order deny,allow Require all granted </Directory> </VirtualHost> # vim: syntax=apache ts=4 sw=4 sts=4 sr noet studiotv.conf file (available and enabled): <VirtualHost *:8180> ServerName studiotv.service.tebusco.lan ServerAdmin [email protected] DocumentRoot /var/www/studiotv <Directory /var/www/studiotv/> Options -Indexes +FollowSymLinks AllowOverride None Order deny,allow Allow from all Require all granted </Directory> # Available loglevels: trace8, ..., trace1, debug, info, notice, warn, # error, crit, alert, emerg. # It is also possible to configure the loglevel for particular # modules, e.g. #LogLevel info ssl:warn # No usamos ${APACHE_LOG_DIR} sino en su lugar /var/log/<host> ErrorLog /var/log/apache2/studiotv/error.log CustomLog /var/log/apache2/studiotv/access.log combined </VirtualHost> # vim: syntax=apache ts=4 sw=4 sts=4 sr noet However, when I hit the browser with http://studiotv.service.tebusco.lan, the default php page is shown instead. Question: What am I missing? (apache 2.4.7, nginx 1.6.0, ubuntu server 14.04).

Read the article
How to make XAMPP virtual hosts accessible to VM's and other computers on LAN?

- by martin's

XAMPP running on Vista 64 Ultimate dev machine (don't think it matters). Machine / Browser configuration Safari, Firefox, Chrome and IE9 on dev machine IE7 and IE8 on separate XP Pro VM's (VMWare on dev machine) IE10 and Chrome on Windows 8 VM (VMware on dev machine) Safari, Firefox and Chrome running on a iMac (same network as dev) Safari, Firefox and Chrome running on a couple of Mac Pro's (same network as dev) IE7, IE8, IE9 running on other PC's on the same network as dev machine Development Configuration Multiple virtual hosts for different projects .local fake TLD for development No firewall restrictions on dev machine for Apache Some sites have .htaccess mapping www to non-www Port 80 is open in the dev machine's firewall Problem XAMPP local home page (http://192.168.1.98/xampp/) can be accessed from everywhere, real or virtual, by IP All .local sites can be accessed from the browsers on the dev machine. All .local sites can be accessed form the browsers in the XP VM's. Some .local sites cannot be accessed from IE10 or Chrome on the W8 VM Sites that cannot be accessed from W8 VM have a minimal .htaccess file No .local sites can be accessed from ANY machine (PC or Mac) on the LAN hosts on dev machine (relevant excerpt) 127.0.0.1 site1.local 127.0.0.1 site2.local 127.0.0.1 site3.local 127.0.0.1 site4.local 127.0.0.1 site5.local 127.0.0.1 site6.local 127.0.0.1 site7.local 127.0.0.1 site8.local 127.0.0.1 site9.local 192.168.1.98 site1.local 192.168.1.98 site2.local 192.168.1.98 site3.local 192.168.1.98 site4.local 192.168.1.98 site5.local 192.168.1.98 site6.local 192.168.1.98 site7.local 192.168.1.98 site8.local 192.168.1.98 site9.local httpd-vhosts.conf on dev machine (relevant excerpt) NameVirtualHost *:80 <VirtualHost *:80> ServerName localhost ServerAlias localhost *.localhost.* DocumentRoot D:/xampp/htdocs </VirtualHost> # ======================================== site1.local <VirtualHost *:80> ServerName site1.local ServerAlias site1.local *.site1.local DocumentRoot D:/xampp-sites/site1/public_html ErrorLog D:/xampp-sites/site1/logs/access.log CustomLog D:/xampp-sites/site1/logs/error.log combined <Directory D:/xampp-sites/site1> Options Indexes FollowSymLinks AllowOverride All Require all granted </Directory> </VirtualHost> NOTE: The above <VirtualHost *:80> block is repeated for each of the nine virtual hosts in the file, no sense in posting it here. hosts on all VM's and physical machines on the network (relevant excerpt) 127.0.0.1 localhost ::1 localhost 192.168.1.98 site1.local 192.168.1.98 site2.local 192.168.1.98 site3.local 192.168.1.98 site4.local 192.168.1.98 site5.local 192.168.1.98 site6.local 192.168.1.98 site7.local 192.168.1.98 site8.local 192.168.1.98 site9.local None of the VM's have any firewall blocks on http traffic. They can reach any site on the real Internet. The same is true of the real machines on the network. The biggest puzzle perhaps is that the W8 VM actually DOES reach some of the virtual hosts. It does NOT reach site2, site6 and site 9, all of which have this minimal .htaccess file. .htaccess file <IfModule mod_rewrite.c> RewriteEngine On RewriteCond %{HTTP_HOST} !^www\. RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L] </IfModule> Adding this file to any of the virtual hosts that do work on the W8 VM will break the site (only for W8 VM, not the XP VM's) and require a cache flush on the W8 VM before it will see the site again after deleting the file. Regardless of whether a .htaccess file exists or not, no machine on the same LAN can access anything other than the XAMPP home page via IP. Even with hosts files on all machines. I can ping any virtual host from any machine on the network and get a response from the correct IP address. I can't see anything in out Netgear router that might prevent one machine from reaching the other. Besides, once the local hosts file resolves to an ip address that's all that goes out onto the local network. I've gone through an extensive number of posts on both SO and as the result of Google searches. I can't say that I have found anything definitive anywhere.

Read the article
256 Windows Azure Worker Roles, Windows Kinect and a 90's Text-Based Ray-Tracer

- by Alan Smith

For a couple of years I have been demoing a simple render farm hosted in Windows Azure using worker roles and the Azure Storage service. At the start of the presentation I deploy an Azure application that uses 16 worker roles to render a 1,500 frame 3D ray-traced animation. At the end of the presentation, when the animation was complete, I would play the animation delete the Azure deployment. The standing joke with the audience was that it was that it was a “$2 demo”, as the compute charges for running the 16 instances for an hour was $1.92, factor in the bandwidth charges and it’s a couple of dollars. The point of the demo is that it highlights one of the great benefits of cloud computing, you pay for what you use, and if you need massive compute power for a short period of time using Windows Azure can work out very cost effective. The “$2 demo” was great for presenting at user groups and conferences in that it could be deployed to Azure, used to render an animation, and then removed in a one hour session. I have always had the idea of doing something a bit more impressive with the demo, and scaling it from a “$2 demo” to a “$30 demo”. The challenge was to create a visually appealing animation in high definition format and keep the demo time down to one hour. This article will take a run through how I achieved this. Ray Tracing Ray tracing, a technique for generating high quality photorealistic images, gained popularity in the 90’s with companies like Pixar creating feature length computer animations, and also the emergence of shareware text-based ray tracers that could run on a home PC. In order to render a ray traced image, the ray of light that would pass from the view point must be tracked until it intersects with an object. At the intersection, the color, reflectiveness, transparency, and refractive index of the object are used to calculate if the ray will be reflected or refracted. Each pixel may require thousands of calculations to determine what color it will be in the rendered image. Pin-Board Toys Having very little artistic talent and a basic understanding of maths I decided to focus on an animation that could be modeled fairly easily and would look visually impressive. I’ve always liked the pin-board desktop toys that become popular in the 80’s and when I was working as a 3D animator back in the 90’s I always had the idea of creating a 3D ray-traced animation of a pin-board, but never found the energy to do it. Even if I had a go at it, the render time to produce an animation that would look respectable on a 486 would have been measured in months. PolyRay Back in 1995 I landed my first real job, after spending three years being a beach-ski-climbing-paragliding-bum, and was employed to create 3D ray-traced animations for a CD-ROM that school kids would use to learn physics. I had got into the strange and wonderful world of text-based ray tracing, and was using a shareware ray-tracer called PolyRay. PolyRay takes a text file describing a scene as input and, after a few hours processing on a 486, produced a high quality ray-traced image. The following is an example of a basic PolyRay scene file. background Midnight_Blue static define matte surface { ambient 0.1 diffuse 0.7 } define matte_white texture { matte { color white } } define matte_black texture { matte { color dark_slate_gray } } define position_cylindrical 3 define lookup_sawtooth 1 define light_wood <0.6, 0.24, 0.1> define median_wood <0.3, 0.12, 0.03> define dark_wood <0.05, 0.01, 0.005> define wooden texture { noise surface { ambient 0.2 diffuse 0.7 specular white, 0.5 microfacet Reitz 10 position_fn position_cylindrical position_scale 1 lookup_fn lookup_sawtooth octaves 1 turbulence 1 color_map( [0.0, 0.2, light_wood, light_wood] [0.2, 0.3, light_wood, median_wood] [0.3, 0.4, median_wood, light_wood] [0.4, 0.7, light_wood, light_wood] [0.7, 0.8, light_wood, median_wood] [0.8, 0.9, median_wood, light_wood] [0.9, 1.0, light_wood, dark_wood]) } } define glass texture { surface { ambient 0 diffuse 0 specular 0.2 reflection white, 0.1 transmission white, 1, 1.5 }} define shiny surface { ambient 0.1 diffuse 0.6 specular white, 0.6 microfacet Phong 7 } define steely_blue texture { shiny { color black } } define chrome texture { surface { color white ambient 0.0 diffuse 0.2 specular 0.4 microfacet Phong 10 reflection 0.8 } } viewpoint { from <4.000, -1.000, 1.000> at <0.000, 0.000, 0.000> up <0, 1, 0> angle 60 resolution 640, 480 aspect 1.6 image_format 0 } light <-10, 30, 20> light <-10, 30, -20> object { disc <0, -2, 0>, <0, 1, 0>, 30 wooden } object { sphere <0.000, 0.000, 0.000>, 1.00 chrome } object { cylinder <0.000, 0.000, 0.000>, <0.000, 0.000, -4.000>, 0.50 chrome } After setting up the background and defining colors and textures, the viewpoint is specified. The “camera” is located at a point in 3D space, and it looks towards another point. The angle, image resolution, and aspect ratio are specified. Two lights are present in the image at defined coordinates. The three objects in the image are a wooden disc to represent a table top, and a sphere and cylinder that intersect to form a pin that will be used for the pin board toy in the final animation. When the image is rendered, the following image is produced. The pins are modeled with a chrome surface, so they reflect the environment around them. Note that the scale of the pin shaft is not correct, this will be fixed later. Modeling the Pin Board The frame of the pin-board is made up of three boxes, and six cylinders, the front box is modeled using a clear, slightly reflective solid, with the same refractive index of glass. The other shapes are modeled as metal. object { box <-5.5, -1.5, 1>, <5.5, 5.5, 1.2> glass } object { box <-5.5, -1.5, -0.04>, <5.5, 5.5, -0.09> steely_blue } object { box <-5.5, -1.5, -0.52>, <5.5, 5.5, -0.59> steely_blue } object { cylinder <-5.2, -1.2, 1.4>, <-5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, -1.2, 1.4>, <5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <-5.2, 5.2, 1.4>, <-5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, 5.2, 1.4>, <5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <0, -1.2, 1.4>, <0, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <0, 5.2, 1.4>, <0, 5.2, -0.74>, 0.2 steely_blue } In order to create the matrix of pins that make up the pin board I used a basic console application with a few nested loops to create two intersecting matrixes of pins, which models the layout used in the pin boards. The resulting image is shown below. The pin board contains 11,481 pins, with the scene file containing 23,709 lines of code. For the complete animation 2,000 scene files will be created, which is over 47 million lines of code. Each pin in the pin-board will slide out a specific distance when an object is pressed into the back of the board. This is easily modeled by setting the Z coordinate of the pin to a specific value. In order to set all of the pins in the pin-board to the correct position, a bitmap image can be used. The position of the pin can be set based on the color of the pixel at the appropriate position in the image. When the Windows Azure logo is used to set the Z coordinate of the pins, the following image is generated. The challenge now was to make a cool animation. The Azure Logo is fine, but it is static. Using a normal video to animate the pins would not work; the colors in the video would not be the same as the depth of the objects from the camera. In order to simulate the pin board accurately a series of frames from a depth camera could be used. Windows Kinect The Kenect controllers for the X-Box 360 and Windows feature a depth camera. The Kinect SDK for Windows provides a programming interface for Kenect, providing easy access for .NET developers to the Kinect sensors. The Kinect Explorer provided with the Kinect SDK is a great starting point for exploring Kinect from a developers perspective. Both the X-Box 360 Kinect and the Windows Kinect will work with the Kinect SDK, the Windows Kinect is required for commercial applications, but the X-Box Kinect can be used for hobby projects. The Windows Kinect has the advantage of providing a mode to allow depth capture with objects closer to the camera, which makes for a more accurate depth image for setting the pin positions. Creating a Depth Field Animation The depth field animation used to set the positions of the pin in the pin board was created using a modified version of the Kinect Explorer sample application. In order to simulate the pin board accurately, a small section of the depth range from the depth sensor will be used. Any part of the object in front of the depth range will result in a white pixel; anything behind the depth range will be black. Within the depth range the pixels in the image will be set to RGB values from 0,0,0 to 255,255,255. A screen shot of the modified Kinect Explorer application is shown below. The Kinect Explorer sample application was modified to include slider controls that are used to set the depth range that forms the image from the depth stream. This allows the fine tuning of the depth image that is required for simulating the position of the pins in the pin board. The Kinect Explorer was also modified to record a series of images from the depth camera and save them as a sequence JPEG files that will be used to animate the pins in the animation the Start and Stop buttons are used to start and stop the image recording. En example of one of the depth images is shown below. Once a series of 2,000 depth images has been captured, the task of creating the animation can begin. Rendering a Test Frame In order to test the creation of frames and get an approximation of the time required to render each frame a test frame was rendered on-premise using PolyRay. The output of the rendering process is shown below. The test frame contained 23,629 primitive shapes, most of which are the spheres and cylinders that are used for the 11,800 or so pins in the pin board. The 1280x720 image contains 921,600 pixels, but as anti-aliasing was used the number of rays that were calculated was 4,235,777, with 3,478,754,073 object boundaries checked. The test frame of the pin board with the depth field image applied is shown below. The tracing time for the test frame was 4 minutes 27 seconds, which means rendering the2,000 frames in the animation would take over 148 hours, or a little over 6 days. Although this is much faster that an old 486, waiting almost a week to see the results of an animation would make it challenging for animators to create, view, and refine their animations. It would be much better if the animation could be rendered in less than one hour. Windows Azure Worker Roles The cost of creating an on-premise render farm to render animations increases in proportion to the number of servers. The table below shows the cost of servers for creating a render farm, assuming a cost of $500 per server. Number of Servers Cost 1 $500 16 $8,000 256 $128,000 As well as the cost of the servers, there would be additional costs for networking, racks etc. Hosting an environment of 256 servers on-premise would require a server room with cooling, and some pretty hefty power cabling. The Windows Azure compute services provide worker roles, which are ideal for performing processor intensive compute tasks. With the scalability available in Windows Azure a job that takes 256 hours to complete could be perfumed using different numbers of worker roles. The time and cost of using 1, 16 or 256 worker roles is shown below. Number of Worker Roles Render Time Cost 1 256 hours $30.72 16 16 hours $30.72 256 1 hour $30.72 Using worker roles in Windows Azure provides the same cost for the 256 hour job, irrespective of the number of worker roles used. Provided the compute task can be broken down into many small units, and the worker role compute power can be used effectively, it makes sense to scale the application so that the task is completed quickly, making the results available in a timely fashion. The task of rendering 2,000 frames in an animation is one that can easily be broken down into 2,000 individual pieces, which can be performed by a number of worker roles. Creating a Render Farm in Windows Azure The architecture of the render farm is shown in the following diagram. The render farm is a hybrid application with the following components: · On-Premise o Windows Kinect – Used combined with the Kinect Explorer to create a stream of depth images. o Animation Creator – This application uses the depth images from the Kinect sensor to create scene description files for PolyRay. These files are then uploaded to the jobs blob container, and job messages added to the jobs queue. o Process Monitor – This application queries the role instance lifecycle table and displays statistics about the render farm environment and render process. o Image Downloader – This application polls the image queue and downloads the rendered animation files once they are complete. · Windows Azure o Azure Storage – Queues and blobs are used for the scene description files and completed frames. A table is used to store the statistics about the rendering environment. The architecture of each worker role is shown below. The worker role is configured to use local storage, which provides file storage on the worker role instance that can be use by the applications to render the image and transform the format of the image. The service definition for the worker role with the local storage configuration highlighted is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="CloudRay" > <WorkerRole name="CloudRayWorkerRole" vmsize="Small"> <Imports> </Imports> <ConfigurationSettings> <Setting name="DataConnectionString" /> </ConfigurationSettings> <LocalResources> <LocalStorage name="RayFolder" cleanOnRoleRecycle="true" /> </LocalResources> </WorkerRole> </ServiceDefinition> The two executable programs, PolyRay.exe and DTA.exe are included in the Azure project, with Copy Always set as the property. PolyRay will take the scene description file and render it to a Truevision TGA file. As the TGA format has not seen much use since the mid 90’s it is converted to a JPG image using Dave's Targa Animator, another shareware application from the 90’s. Each worker roll will use the following process to render the animation frames. 1. The worker process polls the job queue, if a job is available the scene description file is downloaded from blob storage to local storage. 2. PolyRay.exe is started in a process with the appropriate command line arguments to render the image as a TGA file. 3. DTA.exe is started in a process with the appropriate command line arguments convert the TGA file to a JPG file. 4. The JPG file is uploaded from local storage to the images blob container. 5. A message is placed on the images queue to indicate a new image is available for download. 6. The job message is deleted from the job queue. 7. The role instance lifecycle table is updated with statistics on the number of frames rendered by the worker role instance, and the CPU time used. The code for this is shown below. public override void Run() { // Set environment variables string polyRayPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), PolyRayLocation); string dtaPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), DTALocation); LocalResource rayStorage = RoleEnvironment.GetLocalResource("RayFolder"); string localStorageRootPath = rayStorage.RootPath; JobQueue jobQueue = new JobQueue("renderjobs"); JobQueue downloadQueue = new JobQueue("renderimagedownloadjobs"); CloudRayBlob sceneBlob = new CloudRayBlob("scenes"); CloudRayBlob imageBlob = new CloudRayBlob("images"); RoleLifecycleDataSource roleLifecycleDataSource = new RoleLifecycleDataSource(); Frames = 0; while (true) { // Get the render job from the queue CloudQueueMessage jobMsg = jobQueue.Get(); if (jobMsg != null) { // Get the file details string sceneFile = jobMsg.AsString; string tgaFile = sceneFile.Replace(".pi", ".tga"); string jpgFile = sceneFile.Replace(".pi", ".jpg"); string sceneFilePath = Path.Combine(localStorageRootPath, sceneFile); string tgaFilePath = Path.Combine(localStorageRootPath, tgaFile); string jpgFilePath = Path.Combine(localStorageRootPath, jpgFile); // Copy the scene file to local storage sceneBlob.DownloadFile(sceneFilePath); // Run the ray tracer. string polyrayArguments = string.Format("\"{0}\" -o \"{1}\" -a 2", sceneFilePath, tgaFilePath); Process polyRayProcess = new Process(); polyRayProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), polyRayPath); polyRayProcess.StartInfo.Arguments = polyrayArguments; polyRayProcess.Start(); polyRayProcess.WaitForExit(); // Convert the image string dtaArguments = string.Format(" {0} /FJ /P{1}", tgaFilePath, Path.GetDirectoryName (jpgFilePath)); Process dtaProcess = new Process(); dtaProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), dtaPath); dtaProcess.StartInfo.Arguments = dtaArguments; dtaProcess.Start(); dtaProcess.WaitForExit(); // Upload the image to blob storage imageBlob.UploadFile(jpgFilePath); // Add a download job. downloadQueue.Add(jpgFile); // Delete the render job message jobQueue.Delete(jobMsg); Frames++; } else { Thread.Sleep(1000); } // Log the worker role activity. roleLifecycleDataSource.Alive ("CloudRayWorker", RoleLifecycleDataSource.RoleLifecycleId, Frames); } } Monitoring Worker Role Instance Lifecycle In order to get more accurate statistics about the lifecycle of the worker role instances used to render the animation data was tracked in an Azure storage table. The following class was used to track the worker role lifecycles in Azure storage. public class RoleLifecycle : TableServiceEntity { public string ServerName { get; set; } public string Status { get; set; } public DateTime StartTime { get; set; } public DateTime EndTime { get; set; } public long SecondsRunning { get; set; } public DateTime LastActiveTime { get; set; } public int Frames { get; set; } public string Comment { get; set; } public RoleLifecycle() { } public RoleLifecycle(string roleName) { PartitionKey = roleName; RowKey = Utils.GetAscendingRowKey(); Status = "Started"; StartTime = DateTime.UtcNow; LastActiveTime = StartTime; EndTime = StartTime; SecondsRunning = 0; Frames = 0; } } A new instance of this class is created and added to the storage table when the role starts. It is then updated each time the worker renders a frame to record the total number of frames rendered and the total processing time. These statistics are used be the monitoring application to determine the effectiveness of use of resources in the render farm. Rendering the Animation The Azure solution was deployed to Windows Azure with the service configuration set to 16 worker role instances. This allows for the application to be tested in the cloud environment, and the performance of the application determined. When I demo the application at conferences and user groups I often start with 16 instances, and then scale up the application to the full 256 instances. The configuration to run 16 instances is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="16" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> About six minutes after deploying the application the first worker roles become active and start to render the first frames of the animation. The CloudRay Monitor application displays an icon for each worker role instance, with a number indicating the number of frames that the worker role has rendered. The statistics on the left show the number of active worker roles and statistics about the render process. The render time is the time since the first worker role became active; the CPU time is the total amount of processing time used by all worker role instances to render the frames. Five minutes after the first worker role became active the last of the 16 worker roles activated. By this time the first seven worker roles had each rendered one frame of the animation. With 16 worker roles u and running it can be seen that one hour and 45 minutes CPU time has been used to render 32 frames with a render time of just under 10 minutes. At this rate it would take over 10 hours to render the 2,000 frames of the full animation. In order to complete the animation in under an hour more processing power will be required. Scaling the render farm from 16 instances to 256 instances is easy using the new management portal. The slider is set to 256 instances, and the configuration saved. We do not need to re-deploy the application, and the 16 instances that are up and running will not be affected. Alternatively, the configuration file for the Azure service could be modified to specify 256 instances. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="256" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> Six minutes after the new configuration has been applied 75 new worker roles have activated and are processing their first frames. Five minutes later the full configuration of 256 worker roles is up and running. We can see that the average rate of frame rendering has increased from 3 to 12 frames per minute, and that over 17 hours of CPU time has been utilized in 23 minutes. In this test the time to provision 140 worker roles was about 11 minutes, which works out at about one every five seconds. We are now half way through the rendering, with 1,000 frames complete. This has utilized just under three days of CPU time in a little over 35 minutes. The animation is now complete, with 2,000 frames rendered in a little over 52 minutes. The CPU time used by the 256 worker roles is 6 days, 7 hours and 22 minutes with an average frame rate of 38 frames per minute. The rendering of the last 1,000 frames took 16 minutes 27 seconds, which works out at a rendering rate of 60 frames per minute. The frame counts in the server instances indicate that the use of a queue to distribute the workload has been very effective in distributing the load across the 256 worker role instances. The first 16 instances that were deployed first have rendered between 11 and 13 frames each, whilst the 240 instances that were added when the application was scaled have rendered between 6 and 9 frames each. Completed Animation I’ve uploaded the completed animation to YouTube, a low resolution preview is shown below. Pin Board Animation Created using Windows Kinect and 256 Windows Azure Worker Roles The animation can be viewed in 1280x720 resolution at the following link: http://www.youtube.com/watch?v=n5jy6bvSxWc Effective Use of Resources According to the CloudRay monitor statistics the animation took 6 days, 7 hours and 22 minutes CPU to render, this works out at 152 hours of compute time, rounded up to the nearest hour. As the usage for the worker role instances are billed for the full hour, it may have been possible to render the animation using fewer than 256 worker roles. When deciding the optimal usage of resources, the time required to provision and start the worker roles must also be considered. In the demo I started with 16 worker roles, and then scaled the application to 256 worker roles. It would have been more optimal to start the application with maybe 200 worker roles, and utilized the full hour that I was being billed for. This would, however, have prevented showing the ease of scalability of the application. The new management portal displays the CPU usage across the worker roles in the deployment. The average CPU usage across all instances is 93.27%, with over 99% used when all the instances are up and running. This shows that the worker role resources are being used very effectively. Grid Computing Scenarios Although I am using this scenario for a hobby project, there are many scenarios where a large amount of compute power is required for a short period of time. Windows Azure provides a great platform for developing these types of grid computing applications, and can work out very cost effective. · Windows Azure can provide massive compute power, on demand, in a matter of minutes. · The use of queues to manage the load balancing of jobs between role instances is a simple and effective solution. · Using a cloud-computing platform like Windows Azure allows proof-of-concept scenarios to be tested and evaluated on a very low budget. · No charges for inbound data transfer makes the uploading of large data sets to Windows Azure Storage services cost effective. (Transaction charges still apply.) Tips for using Windows Azure for Grid Computing Scenarios I found the implementation of a render farm using Windows Azure a fairly simple scenario to implement. I was impressed by ease of scalability that Azure provides, and by the short time that the application took to scale from 16 to 256 worker role instances. In this case it was around 13 minutes, in other tests it took between 10 and 20 minutes. The following tips may be useful when implementing a grid computing project in Windows Azure. · Using an Azure Storage queue to load-balance the units of work across multiple worker roles is simple and very effective. The design I have used in this scenario could easily scale to many thousands of worker role instances. · Windows Azure accounts are typically limited to 20 cores. If you need to use more than this, a call to support and a credit card check will be required. · Be aware of how the billing model works. You will be charged for worker role instances for the full clock our in which the instance is deployed. Schedule the workload to start just after the clock hour has started. · Monitor the utilization of the resources you are provisioning, ensure that you are not paying for worker roles that are idle. · If you are deploying third party applications to worker roles, you may well run into licensing issues. Purchasing software licenses on a per-processor basis when using hundreds of processors for a short time period would not be cost effective. · Third party software may also require installation onto the worker roles, which can be accomplished using start-up tasks. Bear in mind that adding a startup task and possible re-boot will add to the time required for the worker role instance to start and activate. An alternative may be to use a prepared VM and use VM roles. · Consider using the Windows Azure Autoscaling Application Block (WASABi) to autoscale the worker roles in your application. When using a large number of worker roles, the utilization must be carefully monitored, if the scaling algorithms are not optimal it could get very expensive!

Read the article
Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article
VS 2010 SP1 (Beta) and IIS Express

- by ScottGu

Last month we released the VS 2010 Service Pack 1 (SP1) Beta. You can learn more about the VS 2010 SP1 Beta from Jason Zander’s two blog posts about it, and from Scott Hanselman’s blog post that covers some of the new capabilities enabled with it. You can download and install the VS 2010 SP1 Beta here. IIS Express Earlier this summer I blogged about IIS Express. IIS Express is a free version of IIS 7.5 that is optimized for developer scenarios. We think it combines the ease of use of the ASP.NET Web Server (aka Cassini) currently built-into VS today with the full power of IIS. Specifically: It’s lightweight and easy to install (less than 5Mb download and a quick install) It does not require an administrator account to run/debug applications from Visual Studio It enables a full web-server feature set – including SSL, URL Rewrite, and other IIS 7.x modules It supports and enables the same extensibility model and web.config file settings that IIS 7.x support It can be installed side-by-side with the full IIS web server as well as the ASP.NET Development Server (they do not conflict at all) It works on Windows XP and higher operating systems – giving you a full IIS 7.x developer feature-set on all Windows OS platforms IIS Express (like the ASP.NET Development Server) can be quickly launched to run a site from a directory on disk. It does not require any registration/configuration steps. This makes it really easy to launch and run for development scenarios. Visual Studio 2010 SP1 adds support for IIS Express – and you can start to take advantage of this starting with last month’s VS 2010 SP1 Beta release. Downloading and Installing IIS Express IIS Express isn’t included as part of the VS 2010 SP1 Beta. Instead it is a separate ~4MB download which you can download and install using this link (it uses WebPI to install it). Once IIS Express is installed, VS 2010 SP1 will enable some additional IIS Express commands and dialog options that allow you to easily use it. Enabling IIS Express for Existing Projects Visual Studio today defaults to using the built-in ASP.NET Development Server (aka Cassini) when running ASP.NET Projects: Converting your existing projects to use IIS Express is really easy. You can do this by opening up the project properties dialog of an existing project, and then by clicking the “web” tab within it and selecting the “Use IIS Express” checkbox. Or even simpler, just right-click on your existing project, and select the “Use IIS Express…” menu command: And now when you run or debug your project you’ll see that IIS Express now starts up and runs automatically as your web-server: You can optionally right-click on the IIS Express icon within your system tray to see/browse all of sites and applications running on it: Note that if you ever want to revert back to using the ASP.NET Development Server you can do this by right-clicking the project again and then select the “Use Visual Studio Development Server” option (or go into the project properties, click the web tab, and uncheck IIS Express). This will revert back to the ASP.NET Development Server the next time you run the project. IIS Express Properties Visual Studio 2010 SP1 exposes several new IIS Express configuration options that you couldn’t previously set with the ASP.NET Development Server. Some of these are exposed via the property grid of your project (select the project node in the solution explorer and then change them via the property window): For example, enabling something like SSL support (which is not possible with the ASP.NET Development Server) can now be done simply by changing the “SSL Enabled” property to “True”: Once this is done IIS Express will expose both an HTTP and HTTPS endpoint for the project that we can use: SSL Self Signed Certs IIS Express ships with a self-signed SSL cert that it installs as part of setup – which removes the need for you to install your own certificate to use SSL during development. Once you change the above drop-down to enable SSL, you’ll be able to browse to your site with the appropriate https:// URL prefix and it will connect via SSL. One caveat with self-signed certificates, though, is that browsers (like IE) will go out of their way to warn you that they aren’t to be trusted: You can mark the certificate as trusted to avoid seeing dialogs like this – or just keep the certificate un-trusted and press the “continue” button when the browser warns you not to trust your local web server. Additional IIS Settings IIS Express uses its own per-user ApplicationHost.config file to configure default server behavior. Because it is per-user, it can be configured by developers who do not have admin credentials – unlike the full IIS. You can customize all IIS features and settings via it if you want ultimate server customization (for example: to use your own certificates for SSL instead of self-signed ones). We recommend storing all app specific settings for IIS and ASP.NET within the web.config file which is part of your project – since that makes deploying apps easier (since the settings can be copied with the application content). IIS (since IIS 7) no longer uses the metabase, and instead uses the same web.config configuration files that ASP.NET has always supported – which makes xcopy/ftp based deployment much easier. Making IIS Express your Default Web Server Above we looked at how we can convert existing sites that use the ASP.NET Developer Web Server to instead use IIS Express. You can configure Visual Studio to use IIS Express as the default web server for all new projects by clicking the Tools->Options menu command and opening up the Projects and Solutions->Web Projects node with the Options dialog: Clicking the “Use IIS Express for new file-based web site and projects” checkbox will cause Visual Studio to use it for all new web site and projects. Summary We think IIS Express makes it even easier to build, run and test web applications. It works with all versions of ASP.NET and supports all ASP.NET application types (including obviously both ASP.NET Web Forms and ASP.NET MVC applications). Because IIS Express is based on the IIS 7.5 codebase, you have a full web-server feature-set that you can use. This means you can build and run your applications just like they’ll work on a real production web-server. In addition to supporting ASP.NET, IIS Express also supports Classic ASP and other file-types and extensions supported by IIS – which also makes it ideal for sites that combine a variety of different technologies. Best of all – you do not need to change any code to take advantage of it. As you can see above, updating existing Visual Studio web projects to use it is trivial. You can begin to take advantage of IIS Express today using the VS 2010 SP1 Beta. Hope this helps, Scott

Read the article
How I do VCS

- by Wes McClure

After years of dabbling with different version control systems and techniques, I wanted to share some of what I like and dislike in a few blog posts. To start this out, I want to talk about how I use VCS in a team environment. These come in a series of tips or best practices that I try to follow. Note: This list is subject to change in the future. Always use some form of version control for all aspects of software development. Development is an evolution. Looking back at where we were is an invaluable asset in that process. This includes data schemas and documentation. Reverting / reapplying changes is absolutely critical for efficient development. The tools I use: Code: Hg (preferred), SVN Database: TSqlMigrations Documents: Sometimes in code repository, also SharePoint with versioning Always tag a commit (changeset) with comments This is a quick way to describe to someone else (or your future self) what the changeset entails. Be brief but courteous. One or two sentences about the task, not the actual changes. Use precommit hooks or setup the central repository to reject changes without comments. Link changesets to documentation If your project management system integrates with version control, or has a way to externally reference stories, tasks etc then leave a reference in the commit. This helps locate more information about the commit and/or related changesets. It’s best to have a precommit hook or system that requires this information, otherwise it’s easy to forget. Ability to work offline is required, including commits and history Yes this requires a DVCS locally but doesn’t require the central repository to be a DVCS. I prefer to use either Git or Hg but if it isn’t possible to migrate the central repository, it’s still possible for a developer to push / pull changes to that repository from a local Hg or Git repository. Never lock resources (files) in a central repository… Rude! We have merge tools for a reason, merging sucked a long time ago, it doesn’t anymore… stop locking files! This is unproductive, rude and annoying to other team members. Always review everything in your commit. Never ever commit a set of files without reviewing the changes in each. Never add a file without asking yourself, deep down inside, does this belong? If you leave to make changes during a review, start the review over when you come back. Never assume you didn’t touch a file, double check. This is another reason why you want to avoid large, infrequent commits. Requirements for tools Quickly show pending changes for the entire repository. Default action for a resource with pending changes is a diff. Pluggable diff & merge tool Produce a unified diff or a diff of all changes. This is helpful to bulk review changes instead of opening each file. The central repository is not your own personal dump yard. Breaking this rule is a sure fire way to get the F bomb dropped in front of your name, multiple times. If you turn on Visual Studio’s commit on closing studio option, I will personally break your fingers. By the way, the person(s) in charge of this feature should be fired and never be allowed near programming, ever again. Commit (integrate) to the central repository / branch frequently I try to do this before leaving each day, especially without a DVCS. One never knows when they might need to work from remote the following day. Never commit commented out code If it isn’t needed anymore, delete it! If you aren’t sure if it might be useful in the future, delete it! This is why we have history. If you don’t know why it’s commented out, figure it out and then either uncomment it or delete it. Don’t commit build artifacts, user preferences and temporary files. Build artifacts do not belong in VCS, everything in them is present in the code. (ie: bin\*, obj\*, *.dll, *.exe) User preferences are your settings, stop overriding my preferences files! (ie: *.suo and *.user files) Most tools allow you to ignore certain files and Hg/Git allow you to version this as an ignore file. Set this up as a first step when creating a new repository! Be polite when merging unresolved conflicts. Count to 10, cuss, grab a stress ball and realize it’s not a big deal. Actually, it’s an opportunity to let you know that someone else is working in the same area and you might want to communicate with them. Following the other rules, especially committing frequently, will reduce the likelihood of this. Suck it up, we all have to deal with this unintended consequence at times. Just be careful and GET FAMILIAR with your merge tool. It’s really not as scary as you think. I personally prefer KDiff3 as its merging capabilities rock. Don’t blindly merge and then blindly commit your changes, this is rude and unprofessional. Make sure you understand why the conflict occurred and which parts of the code you want to keep. Apply scrutiny when you commit a manual merge: review the diff! Make sure you test the changes (build and run automated tests) Become intimate with your version control system and the tools you use with it. Avoid trial and error as much as is possible, sit down and test the tool out, read some tutorials etc. Create test repositories and walk through common scenarios. Find the most efficient way to do your work. These tools will be used repetitively, so inefficiencies will add up. Sometimes this involves a mix of tools, both GUI and CLI. I like a combination of both Tortoise Hg and hg cli to get the job efficiently. Always tag releases Create a way to find a given release, whether this be in comments or an explicit tag / branch. This should be readily discoverable. Create release branches to patch bugs and then merge the changes back to other development branch(es). If using feature branches, strive for periodic integrations. Feature branches often cause forked code that becomes irreconcilable. Strive to re-integrate somewhat frequently with the branch this code will ultimately be merged into. This will avoid merge conflicts in the future. Feature branches are best when they are mutually exclusive of active development in other branches. Use and abuse local commits , at least one per task in a story. This builds a trail of changes in your local repository that can be pushed to a central repository when the story is complete. Never commit a broken build or failing tests to the central repository. It’s ok for a local commit to break the build and/or tests. In fact, I encourage this if it helps group the changes more logically. This is one of the main reasons I got excited about DVCS, when I wanted more than one changeset for a set of pending changes but some files could be grouped into both changesets (like solution file / project file changes). If you have more than a dozen outstanding changed resources, there should probably be more than one commit involved. Exceptions when maintaining code bases that require shotgun surgery, in this case, it’s a design smell :) Don’t version sensitive information Especially usernames / passwords There is one area I haven’t found a solution I like yet: versioning 3rd party libraries and/or code. I really dislike keeping any assemblies in the repository, but seems to be a common practice for external libraries. Please feel free to share your ideas about this below. -Wes

Read the article
My Thoughts On the Xbox 180

- by Chris Gardner

Originally posted on: http://geekswithblogs.net/freestylecoding/archive/2013/06/21/my-thoughts-on-the-xbox-180.aspx Everyone seems to be putting their 0.00237 cents into the wishing well over Microsoft's recent decision to reverse the DRM policy on the Xbox One. However, there have been a few issues that nobody has touched. As such, I have decided to dig 0.00237 cents out of my pocket. First, let me be clear about this point. I do not support the decision to reverse the DRM policy on the Xbox One. I wanted that point to be expressed first and unambiguously. I will say it again. I do not support the decision to reverse the DRM policy on the Xbox One. Now that I have that out of the way, let me go into my rationale. This decision removes most of the cool features that enticed me to pre-order the console. No, I didn't cancel my pre-order. There is still five months before the release of the console, and there is still a plethora of information that we, as consumers, do not have. With that, it should be noted that much of the talk in this post is speculation and rhetoric. I do not have any insider information that you do not possess. The persistent connection would have allowed the console to do many of the functions for which we have been begging. That demo where someone was playing Ryse, seamlessly accepted a multiplayer challenge in Killer Instinct, played the match (and a rematch,) and then jumped back into Ryse. That's gone, if you bought the game on disc. The new, DRM free system will require the disc in the system to play a game. That bullet point where one Xbox Live account could have up to 10 slave accounts so families could play together, no matter where they were located. That's gone as well. The promise of huge, expansive, dynamically changing worlds that was brought to us with the power of cloud computing. Well, "the people" didn't want there to be a forced, persistent connection. As such, developers can't rely on a connection and, as such, that feature is gone. This is akin to the removal of the hard drive on the Xbox 360. The list continues, but the enthusiast press has enumerated the list far better than I wish. All of this is because the Xbox team saw the HUGE success of Steam and decided to borrow a few ideas. Yes, Steam. The service that everyone hated for the first six months (for the same reasons the Xbox One is getting flack.) There was an initial growing pain. However, it is now lauded as the way games distribution should be handled. Unless you are Microsoft. I do find it curious that many of the features were originally announced for the PS4 during its unveiling. However, much of that was left strangely absent for Sony's E3 press conference. Instead, we received a single, static slide that basically said the exact opposite of Microsoft's plans. It is not farfetched to believe that slide came into existence during the approximately seven hours between the two media briefings. The thing that majorly annoys me over this whole kerfuffle is that the single thing that caused the call to arms is, really, not an issue. Microsoft never said they were going to block used sales. They said it was up to the publisher to make that decision. This would have allowed publishers to reclaim some of the costs of development in subsequent sales of the product. If you sell your game to GameStop for 7 USD, GameStop is going to sell it for 55 USD. That is 48 USD pure profit for them. Some publishers asked GameStop for a small cut. Was this a huge, money grubbing scheme? Well, yes, but the idea was that they have to handle server infrastructure for dormant accounts, etc. Of course, GameStop flatly refused, and the Online Pass was born. Fortunately, this trend didn’t last, and most publishers have stopped the practice. The ability to sell "licenses" has already begun to be challenged. Are you living in the EU? If so, companies must allow you to sell digital property. With this precedent in place, it's only a matter of time before other areas follow suit. If GameStop were smart, they should have immediately contacted every publisher out there to get the rights to become a clearing house for these licenses. Then, they keep their business model and could reduce their brick and mortar footprint. The digital landscape is changing. We need to not block this process. As Seth MacFarlane best said "Some issues are so important that you should drag people kicking and screaming." I believe this was said on an episode of Real Time with Bill Maher about the issue of Gay Marriages. Much like the original source, this is an issue that we need to drag people to the correct, progressive position. Microsoft, as a company, actually has the resources to weather the transition period. They have a great pool of first and second party developers that can leverage this new framework to prove the validity. Over time, the third party developers will get excited to use these tools. As an old C++ guy, I resisted C# for years. Now, I think it's one of the best languages I've ever used. I have a server room and a Co-Lo full of servers, so I originally didn't see the value in Azure. Now, I wish I could move every one of my projects into the cloud. I still LOVE getting physical packaging, which my music and games collection will proudly attest. However, I have started to see the value in pure digital, and have found ways to integrate this into the ways I consume those products. I can, honestly, understand how some parts of the population would be very apprehensive about this new landscape. There were valid arguments about people with no internet access. There are ways to combat these problems. These methods do not require us to throw the baby out with the bathwater. However, the number of people in the computer industry that I have seen cry foul is truly appalling. We are the forward looking people that help show how technology can improve people's lives. If we can't see the value of the brief pain involved with an exciting new ecosystem, than who will?

Read the article
General monitoring for SQL Server Analysis Services using Performance Monitor

- by Testas

A recent customer engagement required a setup of a monitoring solution for SSAS, due to the time restrictions placed upon this, native Windows Performance Monitor (Perfmon) and SQL Server Profiler Monitoring Tools was used as using a third party tool would have meant the customer providing an additional monitoring server that was not available.I wanted to outline the performance monitoring counters that was used to monitor the system on which SSAS was running. Due to the slow query performance that was occurring during certain scenarios, perfmon was used to establish if any pressure was being placed on the Disk, CPU or Memory subsystem when concurrent connections access the same query, and Profiler to pinpoint how the query was being managed within SSAS, profiler I will leave for another blogThis guide is not designed to provide a definitive list of what should be used when monitoring SSAS, different situations may require the addition or removal of counters as presented by the situation. However I hope that it serves as a good basis for starting your monitoring of SSAS. I would also like to acknowledge Chris Webb’s awesome chapters from “Expert Cube Development” that also helped shape my monitoring strategy:http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!6657.entrySimulating ConnectionsTo simulate the additional connections to the SSAS server whilst monitoring, I used ascmd to simulate multiple connections to the typical and worse performing queries that were identified by the customer. A similar sript can be downloaded from codeplex at http://www.codeplex.com/SQLSrvAnalysisSrvcs. File name: ASCMD_StressTestingScripts.zip. Performance MonitorWithin performance monitor, a counter log was created that contained the list of counters below. The important point to note when running the counter log is that the RUN AS property within the counter log properties should be changed to an account that has rights to the SSAS instance when monitoring MSAS counters. Failure to do so means that the counter log runs under the system account, no errors or warning are given while running the counter log, and it is not until you need to view the MSAS counters that they will not be displayed if run under the default account that has no right to SSAS. If your connection simulation takes hours, this could prove quite frustrating if not done beforehand JThe counters used…… Object Counter Instance Justification System Processor Queue legnth N/A Indicates how many threads are waiting for execution against the processor. If this counter is consistently higher than around 5 when processor utilization approaches 100%, then this is a good indication that there is more work (active threads) available (ready for execution) than the machine's processors are able to handle. System Context Switches/sec N/A Measures how frequently the processor has to switch from user- to kernel-mode to handle a request from a thread running in user mode. The heavier the workload running on your machine, the higher this counter will generally be, but over long term the value of this counter should remain fairly constant. If this counter suddenly starts increasing however, it may be an indicating of a malfunctioning device, especially if the Processor\Interrupts/sec\(_Total) counter on your machine shows a similar unexplained increase Process % Processor Time sqlservr Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process % Processor Time msmdsrv Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process Working Set sqlservr If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Process Working Set msmdsrv If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Processor % Processor Time _Total and individual cores measures the total utilization of your processor by all running processes. If multi-proc then be mindful only an average is provided Processor % Privileged Time _Total To see how the OS is handling basic IO requests. If kernel mode utilization is high, your machine is likely underpowered as it's too busy handling basic OS housekeeping functions to be able to effectively run other applications. Processor % User Time _Total To see how the applications is interacting from a processor perspective, a high percentage utilisation determine that the server is dealing with too many apps and may require increasing thje hardware or scaling out Processor Interrupts/sec _Total The average rate, in incidents per second, at which the processor received and serviced hardware interrupts. Shoulr be consistant over time but a sudden unexplained increase could indicate a device malfunction which can be confirmed using the System\Context Switches/sec counter Memory Pages/sec N/A Indicates the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays, this is the primary counter to watch for indication of possible insufficient RAM to meet your server's needs. A good idea here is to configure a perfmon alert that triggers when the number of pages per second exceeds 50 per paging disk on your system. May also want to see the configuration of the page file on the Server Memory Available Mbytes N/A is the amount of physical memory, in bytes, available to processes running on the computer. if this counter is greater than 10% of the actual RAM in your machine then you probably have more than enough RAM. monitor it regularly to see if any downward trend develops, and set an alert to trigger if it drops below 2% of the installed RAM. Physical Disk Disk Transfers/sec for each physical disk If it goes above 10 disk I/Os per second then you've got poor response time for your disk. Physical Disk Idle Time _total If Disk Transfers/sec is above 25 disk I/Os per second use this counter. which measures the percent time that your hard disk is idle during the measurement interval, and if you see this counter fall below 20% then you've likely got read/write requests queuing up for your disk which is unable to service these requests in a timely fashion. Physical Disk Disk queue legnth For the OLAP and SQL physical disk A value that is consistently less than 2 means that the disk system is handling the IO requests against the physical disk Network Interface Bytes Total/sec For the NIC Should be monitored over a period of time to see if there is anb increase/decrease in network utilisation Network Interface Current Bandwidth For the NIC is an estimate of the current bandwidth of the network interface in bits per second (BPS). MSAS 2005: Memory Memory Limit High KB N/A Shows (as a percentage) the high memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Limit Low KB N/A Shows (as a percentage) the low memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Usage KB N/A Displays the memory usage of the server process. MSAS 2005: Memory File Store KB N/A Displays the amount of memory that is reserved for the Cache. Note if total memory limit in the msmdsrv.ini is set to 0, no memory is reserved for the cache MSAS 2005: Storage Engine Query Queries from Cache Direct / sec N/A Displays the rate of queries answered from the cache directly MSAS 2005: Storage Engine Query Queries from Cache Filtered / Sec N/A Displays the Rate of queries answered by filtering existing cache entry. MSAS 2005: Storage Engine Query Queries from File / Sec N/A Displays the Rate of queries answered from files. MSAS 2005: Storage Engine Query Average time /query N/A Displays the average time of a query MSAS 2005: Connection Current connections N/A Displays the number of connections against the SSAS instance MSAS 2005: Connection Requests / sec N/A Displays the rate of query requests per second MSAS 2005: Locks Current Lock Waits N/A Displays thhe number of connections waiting on a lock MSAS 2005: Threads Query Pool job queue Length N/A The number of queries in the job queue MSAS 2005:Proc Aggregations Temp file bytes written/sec N/A Shows the number of bytes of data processed in a temporary file MSAS 2005:Proc Aggregations Temp file rows written/sec N/A Shows the number of bytes of data processed in a temporary file

Read the article
Changing an HTML Form's Target with jQuery

- by Rick Strahl

This is a question that comes up quite frequently: I have a form with several submit or link buttons and one or more of the buttons needs to open a new Window. How do I get several buttons to all post to the right window? If you're building ASP.NET forms you probably know that by default the Web Forms engine sends button clicks back to the server as a POST operation. A server form has a <form> tag which expands to this: <form method="post" action="default.aspx" id="form1"> Now you CAN change the target of the form and point it to a different window or frame, but the problem with that is that it still affects ALL submissions of the current form. If you multiple buttons/links and they need to go to different target windows/frames you can't do it easily through the <form runat="server"> tag. Although this discussion uses ASP.NET WebForms as an example, realistically this is a general HTML problem although likely more common in WebForms due to the single form metaphor it uses. In ASP.NET MVC for example you'd have more options by breaking out each button into separate forms with its own distinct target tag. However, even with that option it's not always possible to break up forms - for example if multiple targets are required but all targets require the same form data to the be posted. A common scenario here is that you might have a button (or link) that you click where you still want some server code to fire but at the end of the request you actually want to display the content in a new window. A common operation where this happens is report generation: You click a button and the server generates a report say in PDF format and you then want to display the PDF result in a new window without killing the content in the current window. Assuming you have other buttons on the same Page that need to post to base window how do you get the button click to go to a new window? Can't you just use a LinkButton or other Link Control? At first glance you might think an easy way to do this is to use an ASP.NET LinkButton to do this - after all a LinkButton creates a hyper link that CAN accept a target and it also posts back to the server, right? However, there's no Target property, although you can set the target HTML attribute easily enough. Code like this looks reasonable: <asp:LinkButton runat="server" ID="btnNewTarget" Text="New Target" target="_blank" OnClick="bnNewTarget_Click" /> But if you try this you'll find that it doesn't work. Why? Because ASP.NET creates postbacks with JavaScript code that operates on the current window/frame: <a id="btnNewTarget" target="_blank" href="javascript:__doPostBack('btnNewTarget','')">New Target</a> What happens with a target tag is that before the JavaScript actually executes a new window is opened and the focus shifts to the new window. The new window of course is empty and has no __doPostBack() function nor access to the old document. So when you click the link a new window opens but the window remains blank without content - no server postback actually occurs. Natch that idea. Setting the Form Target for a Button Control or LinkButton So, in order to send Postback link controls and buttons to another window/frame, both require that the target of the form gets changed dynamically when the button or link is clicked. Luckily this is rather easy to do however using a little bit of script code and jQuery. Imagine you have two buttons like this that should go to another window: <asp:LinkButton runat="server" ID="btnNewTarget" Text="New Target" OnClick="ClickHandler" /> <asp:Button runat="server" ID="btnButtonNewTarget" Text="New Target Button" OnClick="ClickHandler" /> ClickHandler in this case is any routine that generates the output you want to display in the new window. Generally this output will not come from the current page markup but is generated externally - like a PDF report or some report generated by another application component or tool. The output generally will be either generated by hand or something that was generated to disk to be displayed with Response.Redirect() or Response.TransmitFile() etc. Here's the dummy handler that just generates some HTML by hand and displays it: protected void ClickHandler(object sender, EventArgs e) { // Perform some operation that generates HTML or Redirects somewhere else Response.Write("Some custom output would be generated here (PDF, non-Page HTML etc.)"); // Make sure this response doesn't display the page content // Call Response.End() or Response.Redirect() Response.End(); } To route this oh so sophisticated output to an alternate window for both the LinkButton and Button Controls, you can use the following simple script code: <script type="text/javascript"> $("#btnButtonNewTarget,#btnNewTarget").click(function () { $("form").attr("target", "_blank"); }); </script> So why does this work where the target attribute did not? The difference here is that the script fires BEFORE the target is changed to the new window. When you put a target attribute on a link or form the target is changed as the very first thing before the link actually executes. IOW, the link literally executes in the new window when it's done this way. By attaching a click handler, though we're not navigating yet so all the operations the script code performs (ie. __doPostBack()) and the collection of Form variables to post to the server all occurs in the current page. By changing the target from within script code the target change fires as part of the form submission process which means it runs in the correct context of the current page. IOW - the input for the POST is from the current page, but the output is routed to a new window/frame. Just what we want in this scenario. Voila you can dynamically route output to the appropriate window.© Rick Strahl, West Wind Technologies, 2005-2011Posted in ASP.NET HTML jQuery

Read the article
SQL SERVER – Thinking about Deprecated, Discontinued Features and Breaking Changes while Upgrading to SQL Server 2012 – Guest Post by Nakul Vachhrajani

- by pinaldave

Nakul Vachhrajani is a Technical Specialist and systems development professional with iGATE having a total IT experience of more than 7 years. Nakul is an active blogger with BeyondRelational.com (150+ blogs), and can also be found on forums at SQLServerCentral and BeyondRelational.com. Nakul has also been a guest columnist for SQLAuthority.com and SQLServerCentral.com. Nakul presented a webcast on the “Underappreciated Features of Microsoft SQL Server” at the Microsoft Virtual Tech Days Exclusive Webcast series (May 02-06, 2011) on May 06, 2011. He is also the author of a research paper on Database upgrade methodologies, which was published in a CSI journal, published nationwide. In addition to his passion about SQL Server, Nakul also contributes to the academia out of personal interest. He visits various colleges and universities as an external faculty to judge project activities being carried out by the students. Disclaimer: The opinions expressed herein are his own personal opinions and do not represent his employer’s view in anyway. Blog | LinkedIn | Twitter | Google+ Let us hear the thoughts of Nakul in first person - Those who have been following my blogs would be aware that I am recently running a series on the database engine features that have been deprecated in Microsoft SQL Server 2012. Based on the response that I have received, I was quite surprised to know that most of the audience found these to be breaking changes, when in fact, they were not! It was then that I decided to write a little piece on how to plan your database upgrade such that it works with the next version of Microsoft SQL Server. Please note that the recommendations made in this article are high-level markers and are intended to help you think over the specific steps that you would need to take to upgrade your database. Refer the documentation – Understand the terms Change is the only constant in this world. Therefore, whenever customer requirements, newer architectures and designs require software vendors to make a change to the keywords, functions, etc; they ensure that they provide their end users sufficient time to migrate over to the new standards before dropping off the old ones. Microsoft does that too with it’s Microsoft SQL Server product. Whenever a new SQL Server release is announced, it comes with a list of the following features: Breaking changes These are changes that would break your currently running applications, scripts or functionalities that are based on earlier version of Microsoft SQL Server These are mostly features whose behavior has been changed keeping in mind the newer architectures and designs Lesson: These are the changes that you need to be most worried about! Discontinued features These features are no longer available in the associated version of Microsoft SQL Server These features used to be “deprecated” in the prior release Lesson: Without these changes, your database would not be compliant/may not work with the version of Microsoft SQL Server under consideration Deprecated features These features are those that are still available in the current version of Microsoft SQL Server, but are scheduled for removal in a future version. These may be removed in either the next version or any other future version of Microsoft SQL Server The features listed for deprecation will compose the list of discontinued features in the next version of SQL Server Lesson: Plan to make necessary changes required to remove/replace usage of the deprecated features with the latest recommended replacements Once a feature appears on the list, it moves from bottom to the top, i.e. it is first marked as “Deprecated” and then “Discontinued”. We know of “Breaking change” comes later on in the product life cycle. What this means is that if you want to know what features would not work with SQL Server 2012 (and you are currently using SQL Server 2008 R2), you need to refer the list of breaking changes and discontinued features in SQL Server 2012. Use the tools! There are a lot of tools and technologies around us, but it is rarely that I find teams using these tools religiously and to the best of their potential. Below are the top two tools, from Microsoft, that I use every time I plan a database upgrade. The SQL Server Upgrade Advisor Ever since SQL Server 2005 was announced, Microsoft provides a small, very light-weight tool called the “SQL Server upgrade advisor”. The upgrade advisor analyzes installed components from earlier versions of SQL Server, and then generates a report that identifies issues to fix either before or after you upgrade. The analysis examines objects that can be accessed, such as scripts, stored procedures, triggers, and trace files. Upgrade Advisor cannot analyze desktop applications or encrypted stored procedures. Refer the links towards the end of the post to know how to get the Upgrade Advisor. The SQL Server Profiler Another great tool that you can use is the one most SQL Server developers & administrators use often – the SQL Server profiler. SQL Server Profiler provides functionality to monitor the “Deprecation” event, which contains: Deprecation announcement – equivalent to features to be deprecated in a future release of SQL Server Deprecation final support – equivalent to features to be deprecated in the next release of SQL Server You can learn more using the links towards the end of the post. A basic checklist There are a lot of finer points that need to be taken care of when upgrading your database. But, it would be worth-while to identify a few basic steps in order to make your database compliant with the next version of SQL Server: Monitor the current application workload (on a test bed) via the Profiler in order to identify usage of features marked as Deprecated If none appear, you are all set! (This almost never happens) Note down all the offending queries and feature usages Run analysis sessions using the SQL Server upgrade advisor on your database Based on the inputs from the analysis report and Profiler trace sessions, Incorporate solutions for the breaking changes first Next, incorporate solutions for the discontinued features Revisit and document the upgrade strategy for your deployment scenarios Revisit the fall-back, i.e. rollback strategies in case the upgrades fail Because some programming changes are dependent upon the SQL server version, this may need to be done in consultation with the development teams Before any other enhancements are incorporated by the development team, send out the database changes into QA QA strategy should involve a comparison between an environment running the old version of SQL Server against the new one Because minimal application changes have gone in (essential changes for SQL Server version compliance only), this would be possible As an ongoing activity, keep incorporating changes recommended as per the deprecated features list As a DBA, update your coding standards to ensure that the developers are using ANSI compliant code – this code will require a change only if the ANSI standard changes Remember this: Change management is a continuous process. Keep revisiting the product release notes and incorporate recommended changes to stay prepared for the next release of SQL Server. May the power of SQL Server be with you! Links Referenced in this post Breaking changes in SQL Server 2012: Link Discontinued features in SQL Server 2012: Link Get the upgrade advisor from the Microsoft Download Center at: Link Upgrade Advisor page on MSDN: Link Profiler: Review T-SQL code to identify objects no longer supported by Microsoft: Link Upgrading to SQL Server 2012 by Vinod Kumar: Link Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Upgrade

Read the article
SQL SERVER – SSIS Look Up Component – Cache Mode – Notes from the Field #028

- by Pinal Dave

[Notes from Pinal]: Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems. Similarly most of the people think lookup component is just component which does look up for additional information and does not pay much attention to it. Due to the same reason they do not pay attention to the same and eventually get very bad performance. Linchpin People are database coaches and wellness experts for a data driven world. In this 28th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to write a good lookup component with Cache Mode. In SQL Server Integration Services, the lookup component is one of the most frequently used tools for data validation and completion. The lookup component is provided as a means to virtually join one set of data to another to validate and/or retrieve missing values. Properly configured, it is reliable and reasonably fast. Among the many settings available on the lookup component, one of the most critical is the cache mode. This selection will determine whether and how the distinct lookup values are cached during package execution. It is critical to know how cache modes affect the result of the lookup and the performance of the package, as choosing the wrong setting can lead to poorly performing packages, and in some cases, incorrect results. Full Cache The full cache mode setting is the default cache mode selection in the SSIS lookup transformation. Like the name implies, full cache mode will cause the lookup transformation to retrieve and store in SSIS cache the entire set of data from the specified lookup location. As a result, the data flow in which the lookup transformation resides will not start processing any data buffers until all of the rows from the lookup query have been cached in SSIS. The most commonly used cache mode is the full cache setting, and for good reason. The full cache setting has the most practical applications, and should be considered the go-to cache setting when dealing with an untested set of data. With a moderately sized set of reference data, a lookup transformation using full cache mode usually performs well. Full cache mode does not require multiple round trips to the database, since the entire reference result set is cached prior to data flow execution. There are a few potential gotchas to be aware of when using full cache mode. First, you can see some performance issues – memory pressure in particular – when using full cache mode against large sets of reference data. If the table you use for the lookup is very large (either deep or wide, or perhaps both), there’s going to be a performance cost associated with retrieving and caching all of that data. Also, keep in mind that when doing a lookup on character data, full cache mode will always do a case-sensitive (and in some cases, space-sensitive) string comparison even if your database is set to a case-insensitive collation. This is because the in-memory lookup uses a .NET string comparison (which is case- and space-sensitive) as opposed to a database string comparison (which may be case sensitive, depending on collation). There’s a relatively easy workaround in which you can use the UPPER() or LOWER() function in the pipeline data and the reference data to ensure that case differences do not impact the success of your lookup operation. Again, neither of these present a reason to avoid full cache mode, but should be used to determine whether full cache mode should be used in a given situation. Full cache mode is ideally useful when one or all of the following conditions exist: The size of the reference data set is small to moderately sized The size of the pipeline data set (the data you are comparing to the lookup table) is large, is unknown at design time, or is unpredictable Each distinct key value(s) in the pipeline data set is expected to be found multiple times in that set of data Partial Cache When using the partial cache setting, lookup values will still be cached, but only as each distinct value is encountered in the data flow. Initially, each distinct value will be retrieved individually from the specified source, and then cached. To be clear, this is a row-by-row lookup for each distinct key value(s). This is a less frequently used cache setting because it addresses a narrower set of scenarios. Because each distinct key value(s) combination requires a relational round trip to the lookup source, performance can be an issue, especially with a large pipeline data set to be compared to the lookup data set. If you have, for example, a million records from your pipeline data source, you have the potential for doing a million lookup queries against your lookup data source (depending on the number of distinct values in the key column(s)). Therefore, one has to be keenly aware of the expected row count and value distribution of the pipeline data to safely use partial cache mode. Using partial cache mode is ideally suited for the conditions below: The size of the data in the pipeline (more specifically, the number of distinct key column) is relatively small The size of the lookup data is too large to effectively store in cache The lookup source is well indexed to allow for fast retrieval of row-by-row values No Cache As you might guess, selecting no cache mode will not add any values to the lookup cache in SSIS. As a result, every single row in the pipeline data set will require a query against the lookup source. Since no data is cached, it is possible to save a small amount of overhead in SSIS memory in cases where key values are not reused. In the real world, I don’t see a lot of use of the no cache setting, but I can imagine some edge cases where it might be useful. As such, it’s critical to know your data before choosing this option. Obviously, performance will be an issue with anything other than small sets of data, as the no cache setting requires row-by-row processing of all of the data in the pipeline. I would recommend considering the no cache mode only when all of the below conditions are true: The reference data set is too large to reasonably be loaded into SSIS memory The pipeline data set is small and is not expected to grow There are expected to be very few or no duplicates of the key values(s) in the pipeline data set (i.e., there would be no benefit from caching these values) Conclusion The cache mode, an often-overlooked setting on the SSIS lookup component, represents an important design decision in your SSIS data flow. Choosing the right lookup cache mode directly impacts the fidelity of your results and the performance of package execution. Know how this selection impacts your ETL loads, and you’ll end up with more reliable, faster packages. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SSIS

Read the article
Guide to MySQL & NoSQL, Webinar Q&A

- by Mat Keep

0 0 1 959 5469 Homework 45 12 6416 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} Yesterday we ran a webinar discussing the demands of next generation web services and how blending the best of relational and NoSQL technologies enables developers and architects to deliver the agility, performance and availability needed to be successful. Attendees posted a number of great questions to the MySQL developers, serving to provide additional insights into areas like auto-sharding and cross-shard JOINs, replication, performance, client libraries, etc. So I thought it would be useful to post those below, for the benefit of those unable to attend the webinar. Before getting to the Q&A, there are a couple of other resources that maybe useful to those looking at NoSQL capabilities within MySQL: - On-Demand webinar (coming soon!) - Slides used during the webinar - Guide to MySQL and NoSQL whitepaper - MySQL Cluster demo, including NoSQL interfaces, auto-sharing, high availability, etc. So here is the Q&A from the event Q. Where does MySQL Cluster fit in to the CAP theorem? A. MySQL Cluster is flexible. A single Cluster will prefer consistency over availability in the presence of network partitions. A pair of Clusters can be configured to prefer availability over consistency. A full explanation can be found on the MySQL Cluster & CAP Theorem blog post. Q. Can you configure the number of replicas? (the slide used a replication factor of 1) Yes. A cluster is configured by an .ini file. The option NoOfReplicas sets the number of originals and replicas: 1 = no data redundancy, 2 = one copy etc. Usually there's no benefit in setting it >2. Q. Interestingly most (if not all) of the NoSQL databases recommend having 3 copies of data (the replication factor). Yes, with configurable quorum based Reads and writes. MySQL Cluster does not need a quorum of replicas online to provide service. Systems that require a quorum need > 2 replicas to be able to tolerate a single failure. Additionally, many NoSQL systems take liberal inspiration from the original GFS paper which described a 3 replica configuration. MySQL Cluster avoids the need for a quorum by using a lightweight arbitrator. You can configure more than 2 replicas, but this is a tradeoff between incrementally improved availability, and linearly increased cost. Q. Can you have cross node group JOINS? Wouldn't that run into the risk of flooding the network? MySQL Cluster 7.2 supports cross nodegroup joins. A full cross-join can require a large amount of data transfer, which may bottleneck on network bandwidth. However, for more selective joins, typically seen with OLTP and light analytic applications, cross node-group joins give a great performance boost and network bandwidth saving over having the MySQL Server perform the join. Q. Are the details of the benchmark available anywhere? According to my calculations it results in approx. 350k ops/sec per processor which is the largest number I've seen lately The details are linked from Mikael Ronstrom's blog The benchmark uses a benchmarking tool we call flexAsynch which runs parallel asynchronous transactions. It involved 100 byte reads, of 25 columns each. Regarding the per-processor ops/s, MySQL Cluster is particularly efficient in terms of throughput/node. It uses lock-free minimal copy message passing internally, and maximizes ID cache reuse. Note also that these are in-memory tables, there is no need to read anything from disk. Q. Is access control (like table) planned to be supported for NoSQL access mode? Currently we have not seen much need for full SQL-like access control (which has always been overkill for web apps and telco apps). So we have no plans, though especially with memcached it is certainly possible to turn-on connection-level access control. But specifically table level controls are not planned. Q. How is the performance of memcached APi with MySQL against memcached+MySQL or any other Object Cache like Ecache with MySQL DB? With the memcache API we generally see a memcached response in less than 1 ms. and a small cluster with one memcached server can handle tens of thousands of operations per second. Q. Can .NET can access MemcachedAPI? Yes, just use a .Net memcache client such as the enyim or BeIT memcache libraries. Q. Is the row level locking applicable when you update a column through memcached API? An update that comes through memcached uses a row lock and then releases it immediately. Memcached operations like "INCREMENT" are actually pushed down to the data nodes. In most cases the locks are not even held long enough for a network round trip. Q. Has anyone published an example using something like PHP? I am assuming that you just use the PHP memcached extension to hook into the memcached API. Is that correct? Not that I'm aware of but absolutely you can use it with php or any of the other drivers Q. For beginner we need more examples. Take a look here for a fully worked example Q. Can I access MySQL using Cobol (Open Cobol) or C and if so where can I find the coding libraries etc? A. There is a cobol implementation that works well with MySQL, but I do not think it is Open Cobol. Also there is a MySQL C client library that is a standard part of every mysql distribution Q. Is there a place to go to find help when testing and/implementing the NoSQL access? If using Cluster then you can use the [email protected] alias or post on the MySQL Cluster forum Q. Are there any white papers on this? Yes - there is more detail in the MySQL Guide to NoSQL whitepaper If you have further questions, please don’t hesitate to use the comments below!

Read the article
CodePlex Daily Summary for Sunday, September 02, 2012

CodePlex Daily Summary for Sunday, September 02, 2012Popular ReleasesThisismyusername's codeplex page.: HTML5 Multitouch Example - Fruit Ninja in HTML5: This is an example of how you could create a game such as Fruit Ninja using HTML5's multitouch capabilities. This example isn't responsive enough, so I will be working on that, and it doesn't have great graphics, either. If I had my own webpage, I could store some graphics and upload the game there and it might look halfway decent, but here the fruits are just circles. I hope you enjoy reading the source code anyway.GmailDefaultMaker: GmailDefaultMaker 3.0.0.2: Add QQ Mail BugfixRuminate XNA 4.0 GUI: Release 1.1.1: Fixed bugs with Slider and TextBox. Added ComboBox.Confuser: Confuser build 76542: This is a build of changeset 76542.SharePoint Column & View Permission: SharePoint Column and View Permission v1.2: Version 1.2 of this project. If you will find any bugs please let me know at enti@zoznam.sk or post your findings in Issue TrackerMihmojsos OS: Mihmojsos OS 3 (Smart Rabbit): !Mihmojsos OS 3 Smart Rabbit Mihmojsos Smart Rabbit is now availableDotNetNuke Translator: 01.00.00 Beta: First release of the project.YNA: YNA 0.2 alpha: Wath's new since 0.1 alpha ? A lot of changes but there are the most interresting : StateManager is now better and faster Mouse events for all YnObjects (Sprites, Images, texts) A really big improvement for YnGroup Gamepad support And the news : Tiled Map support (need refactoring) Isometric tiled map support (need refactoring) Transition effect like "FadeIn" and "FadeOut" (YnTransition) Timers (YnTimer) Path management (YnPath, need more refactoring) Downloads All downloads...Audio Pitch & Shift: Audio Pitch And Shift 5.1.0.2: fixed several issues with streaming modeUrlPager: UrlPager 1.2: Fixed bug in which url parameters will lost after paging; ????????url???bug；Sofire Suite: Sofire v1.5.0.0: Sofire v1.5.0.0 ?? ???????? ?????： 1、?? 2、????EntLib.com????????: EntLib.com???????? v3.0: EntLib eCommerce Solution ???Microsoft .Net Framework?????????????????????。Coevery - Free CRM: Coevery 1.0.0.24: Add a sample database, and installation instructions.Math.NET Numerics: Math.NET Numerics v2.2.1: Major linear algebra rework since v2.1, now available on Codeplex as well (previous versions were only available via NuGet). Since v2.2.0: Student-T density more robust for very large degrees of freedom Sparse Kronecker product much more efficient (now leverages sparsity) Direct access to raw matrix storage implementations for advanced extensibility Now also separate package for signed core library with a strong name (we dropped strong names in v2.2.0) Also available as NuGet packages...Microsoft SQL Server Product Samples: Database: AdventureWorks Databases – 2012, 2008R2 and 2008: About this release This release consolidates AdventureWorks databases for SQL Server 2012, 2008R2 and 2008 versions to one page. Each zip file contains an mdf database file and ldf log file. This should make it easier to find and download AdventureWorks databases since all OLTP versions are on one page. There are no database schema changes. For each release of the product, there is a light-weight and full version of the AdventureWorks sample database. The light-weight version is denoted by ...Christoc's DotNetNuke Module Development Template: DotNetNuke Project Templates V1.1 for VS2012: This release is specifically for Visual Studio 2012 Support, distributed through the Visual Studio Extensions gallery at http://visualstudiogallery.msdn.microsoft.com/ After you build in Release mode the installable packages (source/install) can be found in the INSTALL folder now, within your module's folder, not the packages folder anymore Check out the blog post for all of the details about this release. http://www.dotnetnuke.com/Resources/Blogs/EntryId/3471/New-Visual-Studio-2012-Projec...Home Access Plus+: v8.0: v8.0.0901.1830 RELEASE CHANGED TO BETA Any issues, please log them on http://www.edugeek.net/forums/home-access-plus/ This is full release, NO upgrade ZIP will be provided as most files require replacing. To upgrade from a previous version, delete everything but your AppData folder, extract all but the AppData folder and run your HAP+ install Documentation is supplied in the Web Zip The Quota Services require executing a script to register the service, this can be found in there install ...Phalanger - The PHP Language Compiler for the .NET Framework: 3.0.0.3406 (September 2012): New features: Extended ReflectionClass libxml error handling, constants DateTime::modify(), DateTime::getOffset() TreatWarningsAsErrors MSBuild option OnlyPrecompiledCode configuration option; allows to use only compiled code Fixes: ArgsAware exception fix accessing .NET properties bug fix ASP.NET session handler fix for OutOfProc mode DateTime methods (WordPress posting fix) Phalanger Tools for Visual Studio: Visual Studio 2010 & 2012 New debugger engine, PHP-like debugging ...MabiCommerce: MabiCommerce 1.0.1: What's NewSetup now creates shortcuts Fix spelling errors Minor enhancement to the Map window.ScintillaNET: ScintillaNET 2.5.2: This release has been built from the 2.5 branch. Version 2.5.2 is functionally identical to the 2.5.1 release but also includes the XML documentation comments file generated by Visual Studio. It is not 100% comprehensive but it will give you Visual Studio IntelliSense for a large part of the API. Just make sure the ScintillaNET.xml file is in the same folder as the ScintillaNET.dll reference you're using in your projects. (The XML file does not need to be distributed with your application)....New ProjectsATSV: this is a student project for making a new silverlight UI Bookmark Collector: This project is a best practice example of how to use content items in DotNetNuke. It allows you to quickly and easily manage a listing of external links.BPVotingmachine: BP Vote SystemClean My Space: Sort your files in a fun and fast! With Clean My Space you can!CutePlatform: CutePlatform is a platform game based around the PlanetCute graphics pack from Daniel cook, make him a visit in www.lostgardem.comDancTeX: This project is targeting the integration of LaTeX into VisusalStudio. Epi Info™ Companion for Android: A mobile companion to the Epi Info™ 7 desktop tool for epidemiologic data collection and analysis.Flucene: Object Document Mapper for Lucene.Netfluentserializer: FluentSerializer is a library for .NET usable to create serialize/deserialize data through its fluent interface. The methods it creates are compiled.hongjiapp: hongjiappidealthings educational comprehensive administration system: ?????????????????????????????????????????????.Java Accounting Library: The project aims at providing a Financial Accounting Java Library which may be integrated to any other Java Application independent of its Backend Database.mycnblogs: mycnblogsNETPack: Lightweight and flexible packer for .NETRandom Useful Code: This project is where I will store various useful classes I have built over time. Only the code will be provided, no Library or the like.Suleymaniye Tavimi: Namaz vakitleri hesaplama uygulamasidir. Istenilen yer için hesaplama yapar.

Read the article
Set Context User Principal for Customized Authentication in SignalR

- by Shaun

Originally posted on: http://geekswithblogs.net/shaunxu/archive/2014/05/27/set-context-user-principal-for-customized-authentication-in-signalr.aspxCurrently I'm working on a single page application project which is built on AngularJS and ASP.NET WebAPI. When I need to implement some features that needs real-time communication and push notifications from server side I decided to use SignalR. SignalR is a project currently developed by Microsoft to build web-based, read-time communication application. You can find it here. With a lot of introductions and guides it's not a difficult task to use SignalR with ASP.NET WebAPI and AngularJS. I followed this and this even though it's based on SignalR 1. But when I tried to implement the authentication for my SignalR I was struggled 2 days and finally I got a solution by myself. This might not be the best one but it actually solved all my problem. In many articles it's said that you don't need to worry about the authentication of SignalR since it uses the web application authentication. For example if your web application utilizes form authentication, SignalR will use the user principal your web application authentication module resolved, check if the principal exist and authenticated. But in my solution my ASP.NET WebAPI, which is hosting SignalR as well, utilizes OAuth Bearer authentication. So when the SignalR connection was established the context user principal was empty. So I need to authentication and pass the principal by myself. Firstly I need to create a class which delivered from "AuthorizeAttribute", that will takes the responsible for authenticate when SignalR connection established and any method was invoked. 1: public class QueryStringBearerAuthorizeAttribute : AuthorizeAttribute 2: { 3: public override bool AuthorizeHubConnection(HubDescriptor hubDescriptor, IRequest request) 4: { 5: } 6: 7: public override bool AuthorizeHubMethodInvocation(IHubIncomingInvokerContext hubIncomingInvokerContext, bool appliesToMethod) 8: { 9: } 10: } The method "AuthorizeHubConnection" will be invoked when any SignalR connection was established. And here I'm going to retrieve the Bearer token from query string, try to decrypt and recover the login user's claims. 1: public override bool AuthorizeHubConnection(HubDescriptor hubDescriptor, IRequest request) 2: { 3: var dataProtectionProvider = new DpapiDataProtectionProvider(); 4: var secureDataFormat = new TicketDataFormat(dataProtectionProvider.Create()); 5: // authenticate by using bearer token in query string 6: var token = request.QueryString.Get(WebApiConfig.AuthenticationType); 7: var ticket = secureDataFormat.Unprotect(token); 8: if (ticket != null && ticket.Identity != null && ticket.Identity.IsAuthenticated) 9: { 10: // set the authenticated user principal into environment so that it can be used in the future 11: request.Environment["server.User"] = new ClaimsPrincipal(ticket.Identity); 12: return true; 13: } 14: else 15: { 16: return false; 17: } 18: } In the code above I created "TicketDataFormat" instance, which must be same as the one I used to generate the Bearer token when user logged in. Then I retrieve the token from request query string and unprotect it. If I got a valid ticket with identity and it's authenticated this means it's a valid token. Then I pass the user principal into request's environment property which can be used in nearly future. Since my website was built in AngularJS so the SignalR client was in pure JavaScript, and it's not support to set customized HTTP headers in SignalR JavaScript client, I have to pass the Bearer token through request query string. This is not a restriction of SignalR, but a restriction of WebSocket. For security reason WebSocket doesn't allow client to set customized HTTP headers from browser. Next, I need to implement the authentication logic in method "AuthorizeHubMethodInvocation" which will be invoked when any SignalR method was invoked. 1: public override bool AuthorizeHubMethodInvocation(IHubIncomingInvokerContext hubIncomingInvokerContext, bool appliesToMethod) 2: { 3: var connectionId = hubIncomingInvokerContext.Hub.Context.ConnectionId; 4: // check the authenticated user principal from environment 5: var environment = hubIncomingInvokerContext.Hub.Context.Request.Environment; 6: var principal = environment["server.User"] as ClaimsPrincipal; 7: if (principal != null && principal.Identity != null && principal.Identity.IsAuthenticated) 8: { 9: // create a new HubCallerContext instance with the principal generated from token 10: // and replace the current context so that in hubs we can retrieve current user identity 11: hubIncomingInvokerContext.Hub.Context = new HubCallerContext(new ServerRequest(environment), connectionId); 12: return true; 13: } 14: else 15: { 16: return false; 17: } 18: } Since I had passed the user principal into request environment in previous method, I can simply check if it exists and valid. If so, what I need is to pass the principal into context so that SignalR hub can use. Since the "User" property is all read-only in "hubIncomingInvokerContext", I have to create a new "ServerRequest" instance with principal assigned, and set to "hubIncomingInvokerContext.Hub.Context". After that, we can retrieve the principal in my Hubs through "Context.User" as below. 1: public class DefaultHub : Hub 2: { 3: public object Initialize(string host, string service, JObject payload) 4: { 5: var connectionId = Context.ConnectionId; 6: ... ... 7: var domain = string.Empty; 8: var identity = Context.User.Identity as ClaimsIdentity; 9: if (identity != null) 10: { 11: var claim = identity.FindFirst("Domain"); 12: if (claim != null) 13: { 14: domain = claim.Value; 15: } 16: } 17: ... ... 18: } 19: } Finally I just need to add my "QueryStringBearerAuthorizeAttribute" into the SignalR pipeline. 1: app.Map("/signalr", map => 2: { 3: // Setup the CORS middleware to run before SignalR. 4: // By default this will allow all origins. You can 5: // configure the set of origins and/or http verbs by 6: // providing a cors options with a different policy. 7: map.UseCors(CorsOptions.AllowAll); 8: var hubConfiguration = new HubConfiguration 9: { 10: // You can enable JSONP by uncommenting line below. 11: // JSONP requests are insecure but some older browsers (and some 12: // versions of IE) require JSONP to work cross domain 13: // EnableJSONP = true 14: EnableJavaScriptProxies = false 15: }; 16: // Require authentication for all hubs 17: var authorizer = new QueryStringBearerAuthorizeAttribute(); 18: var module = new AuthorizeModule(authorizer, authorizer); 19: GlobalHost.HubPipeline.AddModule(module); 20: // Run the SignalR pipeline. We're not using MapSignalR 21: // since this branch already runs under the "/signalr" path. 22: map.RunSignalR(hubConfiguration); 23: }); On the client side should pass the Bearer token through query string before I started the connection as below. 1: self.connection = $.hubConnection(signalrEndpoint); 2: self.proxy = self.connection.createHubProxy(hubName); 3: self.proxy.on(notifyEventName, function (event, payload) { 4: options.handler(event, payload); 5: }); 6: // add the authentication token to query string 7: // we cannot use http headers since web socket protocol doesn't support 8: self.connection.qs = { Bearer: AuthService.getToken() }; 9: // connection to hub 10: self.connection.start(); Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article
Free Document/Content Management System Using SharePoint 2010

- by KunaalKapoor

That’s right, it’s true. You can use the free version of SharePoint 2010 to meet your document and content management needs and even run your public facing website or an internal knowledge bank. SharePoint Foundation 2010 is free. It may not have all the features that you get in the enterprise license but it still has enough to cater to your needs to build a document management system and replace age old file shares or folders. I’ve built a dozen content management sites for internal and public use exploiting SharePoint. There are hundreds of web content management systems out there (see CMS Matrix). On one hand we have commercial platforms like SharePoint, SiteCore, and Ektron etc. which are the most frequently used and on the other hand there are free options like WordPress, Drupal, Joomla, and Plone etc. which are pretty common popular as well. But I would be very surprised if anyone was able to find a single CMS platform that is all things to all people. Infact not a lot of people consider SharePoint’s free version under the free CMS side but its high time organizations benefit from this. Through this blog post I wanted to present SharePoint Foundation as an option for running a FREE CMS platform. Even if you knew that there is a free version of SharePoint, what most people don’t realize is that SharePoint Foundation is a great option for running web sites of all kinds – not just team sites. It is a great option for many reasons, but in reality it is supported by Microsoft, and above all it is FREE (yay!), and it is extremely easy to get started. From a functionality perspective – it’s hard to beat SharePoint. Even the free version, SharePoint Foundation, offers simple data connectivity (through BCS), cross browser support, accessibility, support for Office Web Apps, blogs, wikis, templates, document support, health analyzer, support for presence, and MUCH more.I often get asked: “Can I use SharePoint 2010 as a document management system?” The answer really depends on · What are your specific requirements? · What systems you currently have in place for managing documents. · And of course how much money you have J Benefits? Not many large organizations have benefited from SharePoint yet. For some it has been an IT project to see what they can achieve with it, for others it has been used as a collaborative platform or in many cases an extended intranet. SharePoint 2010 has changed the game slightly as the improvements that Microsoft have made have been noted by organizations, and we are seeing a lot of companies starting to build specific business applications using SharePoint as the basis, and nearly every business process will require documents at some stage. If you require a document management system and have SharePoint in place then it can be a relatively straight forward decision to use SharePoint, as long as you have reviewed the considerations just discussed. The collaborative nature of SharePoint 2010 is also a massive advantage, as specific departmental or project sites can be created quickly and easily that allow workers to interact in a variety of different ways using one source of information. This also benefits an organization with regards to how they manage the knowledge that they have, as if all of their information is in one source then it is naturally easier to search and manage. Is SharePoint right for your organization? As just discussed, this can only be determined after defining your requirements and also planning a longer term strategy for how you will manage your documents and information. A key factor to look at is how the users would interact with the system and how much value would it get for your organization. The amount of data and documents that organizations are creating is increasing rapidly each year. Therefore the ability to archive this information, whilst keeping the ability to know what you have and where it is, is vital to any organizations management of their information life cycle. SharePoint is best used for the initial life of business documents where they need to be referenced and accessed after time. It is often beneficial to archive these to overcome for storage and performance issues. FREE CMS – SharePoint, Really? In order to show some of the completely of what comes with this free version of SharePoint 2010, I thought it would make sense to use Wikipedia (since every one trusts it as a credible source). Wikipedia shows that a web content management system typically has the following components: Document Management: - CMS software may provide a means of managing the life cycle of a document from initial creation time, through revisions, publication, archive, and document destruction. SharePoint is king when it comes to document management. Version history, exclusive check-out, security, publication, workflow, and so much more. Content Virtualization: - CMS software may provide a means of allowing each user to work within a virtual copy of the entire Web site, document set, and/or code base. This enables changes to multiple interdependent resources to be viewed and/or executed in-context prior to submission. Through the use of versioning, each content manager can preview, publish, and roll-back content of pages, wiki entries, blog posts, documents, or any other type of content stored in SharePoint. The idea of each user having an entire copy of the website virtualized is a bit odd to me – not sure why anyone would need that for anything but the simplest of websites. Automated Templates: - Create standard output templates that can be automatically applied to new and existing content, allowing the appearance of all content to be changed from one central place. Through the use of Master Pages and Themes, SharePoint provides the ability to change the entire look and feel of site. Of course, the older brother version of SharePoint – SharePoint Server 2010 – also introduces the concept of Page Layouts which allows page template level customization and even switching the layout of an individual page using different page templates. I think many organizations really think they want this but rarely end up using this bit of functionality. Easy Edits: - Once content is separated from the visual presentation of a site, it usually becomes much easier and quicker to edit and manipulate. Most WCMS software includes WYSIWYG editing tools allowing non-technical individuals to create and edit content. This is probably easier described with a screen cap of a vanilla SharePoint Foundation page in edit mode. Notice the page editing toolbar, the multiple layout options… It’s actually easier to use than Microsoft Word. Workflow management: - Workflow is the process of creating cycles of sequential and parallel tasks that must be accomplished in the CMS. For example, a content creator can submit a story, but it is not published until the copy editor cleans it up and the editor-in-chief approves it. Workflow, it’s in there. In fact, the same workflow engine is running under SharePoint Foundation that is running under the other versions of SharePoint. The primary difference is that with SharePoint Foundation – you need to configure the workflows yourself. Web Standards: - Active WCMS software usually receives regular updates that include new feature sets and keep the system up to current web standards. SharePoint is in the fourth major iteration under Microsoft with the 2010 release. In addition to the innovation that Microsoft continuously adds, you have the entire global ecosystem available. Scalable Expansion: - Available in most modern WCMSs is the ability to expand a single implementation (one installation on one server) across multiple domains. SharePoint Foundation can run multiple sites using multiple URLs on a single server install. Even more powerful, SharePoint Foundation is scalable and can be part of a multi-server farm to ensure that it will handle any amount of traffic that can be thrown at it. Delegation & Security: - Some CMS software allows for various user groups to have limited privileges over specific content on the website, spreading out the responsibility of content management. SharePoint Foundation provides very granular security capabilities. Read @ http://msdn.microsoft.com/en-us/library/ee537811.aspx Content Syndication: - CMS software often assists in content distribution by generating RSS and Atom data feeds to other systems. They may also e-mail users when updates are available as part of the workflow process. SharePoint Foundation nails it. With RSS syndication and email alerts available out of the box, content syndication is already in the platform. Multilingual Support: - Ability to display content in multiple languages. SharePoint Foundation 2010 supports more than 40 languages. Read More Read more @ http://msdn.microsoft.com/en-us/library/dd776256(v=office.12).aspxYou can download the free version from http://www.microsoft.com/en-us/download/details.aspx?id=5970

Read the article
Inheritance Mapping Strategies with Entity Framework Code First CTP5: Part 2 – Table per Type (TPT)

- by mortezam

In the previous blog post you saw that there are three different approaches to representing an inheritance hierarchy and I explained Table per Hierarchy (TPH) as the default mapping strategy in EF Code First. We argued that the disadvantages of TPH may be too serious for our design since it results in denormalized schemas that can become a major burden in the long run. In today’s blog post we are going to learn about Table per Type (TPT) as another inheritance mapping strategy and we'll see that TPT doesn’t expose us to this problem. Table per Type (TPT)Table per Type is about representing inheritance relationships as relational foreign key associations. Every class/subclass that declares persistent properties—including abstract classes—has its own table. The table for subclasses contains columns only for each noninherited property (each property declared by the subclass itself) along with a primary key that is also a foreign key of the base class table. This approach is shown in the following figure: For example, if an instance of the CreditCard subclass is made persistent, the values of properties declared by the BillingDetail base class are persisted to a new row of the BillingDetails table. Only the values of properties declared by the subclass (i.e. CreditCard) are persisted to a new row of the CreditCards table. The two rows are linked together by their shared primary key value. Later, the subclass instance may be retrieved from the database by joining the subclass table with the base class table. TPT Advantages The primary advantage of this strategy is that the SQL schema is normalized. In addition, schema evolution is straightforward (modifying the base class or adding a new subclass is just a matter of modify/add one table). Integrity constraint definition are also straightforward (note how CardType in CreditCards table is now a non-nullable column). Another much more important advantage is the ability to handle polymorphic associations (a polymorphic association is an association to a base class, hence to all classes in the hierarchy with dynamic resolution of the concrete class at runtime). A polymorphic association to a particular subclass may be represented as a foreign key referencing the table of that particular subclass. Implement TPT in EF Code First We can create a TPT mapping simply by placing Table attribute on the subclasses to specify the mapped table name (Table attribute is a new data annotation and has been added to System.ComponentModel.DataAnnotations namespace in CTP5): public abstract class BillingDetail { public int BillingDetailId { get; set; } public string Owner { get; set; } public string Number { get; set; } } [Table("BankAccounts")] public class BankAccount : BillingDetail { public string BankName { get; set; } public string Swift { get; set; } } [Table("CreditCards")] public class CreditCard : BillingDetail { public int CardType { get; set; } public string ExpiryMonth { get; set; } public string ExpiryYear { get; set; } } public class InheritanceMappingContext : DbContext { public DbSet<BillingDetail> BillingDetails { get; set; } } If you prefer fluent API, then you can create a TPT mapping by using ToTable() method: protected override void OnModelCreating(ModelBuilder modelBuilder) { modelBuilder.Entity<BankAccount>().ToTable("BankAccounts"); modelBuilder.Entity<CreditCard>().ToTable("CreditCards"); } Generated SQL For QueriesLet’s take an example of a simple non-polymorphic query that returns a list of all the BankAccounts: var query = from b in context.BillingDetails.OfType<BankAccount>() select b; Executing this query (by invoking ToList() method) results in the following SQL statements being sent to the database (on the bottom, you can also see the result of executing the generated query in SQL Server Management Studio): Now, let’s take an example of a very simple polymorphic query that requests all the BillingDetails which includes both BankAccount and CreditCard types: projects some properties out of the base class BillingDetail, without querying for anything from any of the subclasses: var query = from b in context.BillingDetails select new { b.BillingDetailId, b.Number, b.Owner }; -- var query = from b in context.BillingDetails select b; This LINQ query seems even more simple than the previous one but the resulting SQL query is not as simple as you might expect: -- As you can see, EF Code First relies on an INNER JOIN to detect the existence (or absence) of rows in the subclass tables CreditCards and BankAccounts so it can determine the concrete subclass for a particular row of the BillingDetails table. Also the SQL CASE statements that you see in the beginning of the query is just to ensure columns that are irrelevant for a particular row have NULL values in the returning flattened table. (e.g. BankName for a row that represents a CreditCard type) TPT ConsiderationsEven though this mapping strategy is deceptively simple, the experience shows that performance can be unacceptable for complex class hierarchies because queries always require a join across many tables. In addition, this mapping strategy is more difficult to implement by hand— even ad-hoc reporting is more complex. This is an important consideration if you plan to use handwritten SQL in your application (For ad hoc reporting, database views provide a way to offset the complexity of the TPT strategy. A view may be used to transform the table-per-type model into the much simpler table-per-hierarchy model.) SummaryIn this post we learned about Table per Type as the second inheritance mapping in our series. So far, the strategies we’ve discussed require extra consideration with regard to the SQL schema (e.g. in TPT, foreign keys are needed). This situation changes with the Table per Concrete Type (TPC) that we will discuss in the next post. References ADO.NET team blog Java Persistence with Hibernate book a { text-decoration: none; } a:visited { color: Blue; } .title { padding-bottom: 5px; font-family: Segoe UI; font-size: 11pt; font-weight: bold; padding-top: 15px; } .code, .typeName { font-family: consolas; } .typeName { color: #2b91af; } .padTop5 { padding-top: 5px; } .padTop10 { padding-top: 10px; } p.MsoNormal { margin-top: 0in; margin-right: 0in; margin-bottom: 10.0pt; margin-left: 0in; line-height: 115%; font-size: 11.0pt; font-family: "Calibri" , "sans-serif"; }

Read the article
HTG Explains: Why Does Rebooting a Computer Fix So Many Problems?

- by Chris Hoffman

Ask a geek how to fix a problem you’ve having with your Windows computer and they’ll likely ask “Have you tried rebooting it?” This seems like a flippant response, but rebooting a computer can actually solve many problems. So what’s going on here? Why does resetting a device or restarting a program fix so many problems? And why don’t geeks try to identify and fix problems rather than use the blunt hammer of “reset it”? This Isn’t Just About Windows Bear in mind that this soltion isn’t just limited to Windows computers, but applies to all types of computing devices. You’ll find the advice “try resetting it” applied to wireless routers, iPads, Android phones, and more. This same advice even applies to software — is Firefox acting slow and consuming a lot of memory? Try closing it and reopening it! Some Problems Require a Restart To illustrate why rebooting can fix so many problems, let’s take a look at the ultimate software problem a Windows computer can face: Windows halts, showing a blue screen of death. The blue screen was caused by a low-level error, likely a problem with a hardware driver or a hardware malfunction. Windows reaches a state where it doesn’t know how to recover, so it halts, shows a blue-screen of death, gathers information about the problem, and automatically restarts the computer for you . This restart fixes the blue screen of death. Windows has gotten better at dealing with errors — for example, if your graphics driver crashes, Windows XP would have frozen. In Windows Vista and newer versions of Windows, the Windows desktop will lose its fancy graphical effects for a few moments before regaining them. Behind the scenes, Windows is restarting the malfunctioning graphics driver. But why doesn’t Windows simply fix the problem rather than restarting the driver or the computer itself? Well, because it can’t — the code has encountered a problem and stopped working completely, so there’s no way for it to continue. By restarting, the code can start from square one and hopefully it won’t encounter the same problem again. Examples of Restarting Fixing Problems While certain problems require a complete restart because the operating system or a hardware driver has stopped working, not every problem does. Some problems may be fixable without a restart, though a restart may be the easiest option. Windows is Slow: Let’s say Windows is running very slowly. It’s possible that a misbehaving program is using 99% CPU and draining the computer’s resources. A geek could head to the task manager and look around, hoping to locate the misbehaving process an end it. If an average user encountered this same problem, they could simply reboot their computer to fix it rather than dig through their running processes. Firefox or Another Program is Using Too Much Memory: In the past, Firefox has been the poster child for memory leaks on average PCs. Over time, Firefox would often consume more and more memory, getting larger and larger and slowing down. Closing Firefox will cause it to relinquish all of its memory. When it starts again, it will start from a clean state without any leaked memory. This doesn’t just apply to Firefox, but applies to any software with memory leaks. Internet or Wi-Fi Network Problems: If you have a problem with your Wi-Fi or Internet connection, the software on your router or modem may have encountered a problem. Resetting the router — just by unplugging it from its power socket and then plugging it back in — is a common solution for connection problems. In all cases, a restart wipes away the current state of the software . Any code that’s stuck in a misbehaving state will be swept away, too. When you restart, the computer or device will bring the system up from scratch, restarting all the software from square one so it will work just as well as it was working before. “Soft Resets” vs. “Hard Resets” In the mobile device world, there are two types of “resets” you can perform. A “soft reset” is simply restarting a device normally — turning it off and then on again. A “hard reset” is resetting its software state back to its factory default state. When you think about it, both types of resets fix problems for a similar reason. For example, let’s say your Windows computer refuses to boot or becomes completely infected with malware. Simply restarting the computer won’t fix the problem, as the problem is with the files on the computer’s hard drive — it has corrupted files or malware that loads at startup on its hard drive. However, reinstalling Windows (performing a “Refresh or Reset your PC” operation in Windows 8 terms) will wipe away everything on the computer’s hard drive, restoring it to its formerly clean state. This is simpler than looking through the computer’s hard drive, trying to identify the exact reason for the problems or trying to ensure you’ve obliterated every last trace of malware. It’s much faster to simply start over from a known-good, clean state instead of trying to locate every possible problem and fix it. Ultimately, the answer is that “resetting a computer wipes away the current state of the software, including any problems that have developed, and allows it to start over from square one.” It’s easier and faster to start from a clean state than identify and fix any problems that may be occurring — in fact, in some cases, it may be impossible to fix problems without beginning from that clean state. Image Credit: Arria Belli on Flickr, DeclanTM on Flickr

Read the article

Search Results

Search found 6580 results on 264 pages for 'require'.

Page 92/264 | < Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99 | Next Page >

- by restrictedinfinity

- by Michi Qne

- by Gregg Williams

- by ento

- by ento

- by buecking

- by OlafM

- by OlafM

- by Luis Masuelli

- by martin's

- by Alan Smith

- by user12620111

- by ScottGu

- by Wes McClure

- by Chris Gardner

- by Testas

- by Rick Strahl

- by pinaldave

- by Pinal Dave

- by Mat Keep

- by Shaun

- by KunaalKapoor

- by mortezam

- by Chris Hoffman

< Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99 | Next Page >