program x - Page 140 - Developer IT

UDP blocked by Windows XP Firewall when sending to local machine

- by user36367

I work for a software development company but the issue doesn't seem to be programming-related. Here is my setup: Windows XP Professional with Service Pack 3, all updated Program that sends UDP datagrams Program that receives UDP datagrams Windows Firewall set to allow inbound UDP datagrams on a specific port (Scope: Subnet) If I send a UDP datagram on any port to other, similar machines, it goes through. If I send the UDP datagram to the same computer running the program that sends (whether using broadcast, localhost IP or the specific IP of the machine), the receiver program gets nothing. I've traced the problem down to the Windows XP Firewall, as Windows 7 does not have this problem (and I do not wish to sully my hands with Vista). If the exception I create for that UDP port in the WinXP firewall is set for a Scope of Subnet the datagram is blocked, but if I set it to All Computers or specifically enter my network settings (192.168.2.161 or 192.168.2.0/255.255.255.0) it works fine. Using different UDP ports makes no difference. I've tried different programs to reproduce this problem (ServerTalk to send and either IP Port Spy or PortPeeker to receive) to make sure it's not our code that's the issue, and those programs' datagrams were blocked as well. Also, that computer only has one network interface, so there are no additional network weirdness. I receive my IP from a DHCP server, so this is a straightforward setup. Given that it doesn't happen in Windows 7 I must assume it's a defect in the Windows XP Firewall, but I'd think someone else would have encountered this problem before. Has anyone encountered anything like this? Any ideas?

Read the article

UDP blocked by Windows XP Firewall when sending to local machine

- by user36367

Hi there, I work for a software development company but the issue doesn't seem to be programming-related. Here is my setup: - Windows XP Professional with Service Pack 3, all updated - Program that sends UDP datagrams - Program that receives UDP datagrams - Windows Firewall set to allow inbound UDP datagrams on a specific port (Scope: Subnet) If I send a UDP datagram on any port to other, similar machines, it goes through. If I send the UDP datagram to the same computer running the program that sends (whether using broadcast, localhost IP or the specific IP of the machine), the receiver program gets nothing. I've traced the problem down to the Windows XP Firewall, as Windows 7 does not have this problem (and I do not wish to sully my hands with Vista). If the exception I create for that UDP port in the WinXP firewall is set for a Scope of Subnet the datagram is blocked, but if I set it to All Computers or specifically enter my network settings (192.168.2.161 or 192.168.2.0/255.255.255.0) it works fine. Using different UDP ports makes no difference. I've tried different programs to reproduce this problem (ServerTalk to send and either IP Port Spy or PortPeeker to receive) to make sure it's not our code that's the issue, and those programs' datagrams were blocked as well. Also, that computer only has one network interface, so there are no additional network weirdness. I receive my IP from a DHCP server, so this is a straightforward setup. Given that it doesn't happen in Windows 7 I must assume it's a defect in the Windows XP Firewall, but I'd think someone else would have encountered this problem before. Has anyone encountered anything like this? Any ideas? Thanks in advance!

Read the article

iptables to block VPN-traffic if not through tun0

- by dacrow

I have a dedicated Webserver running Debian 6 and some Apache, Tomcat, Asterisk and Mail-stuff. Now we needed to add VPN support for a special program. We installed OpenVPN and registered with a VPN provider. The connection works well and we have a virtual tun0 interface for tunneling. To archive the goal for only tunneling a single program through VPN, we start the program with sudo -u username -g groupname command and added a iptables rule to mark all traffic coming from groupname iptables -t mangle -A OUTPUT -m owner --gid-owner groupname -j MARK --set-mark 42 Afterwards we tell iptables to to some SNAT and tell ip route to use special routing table for marked traffic packets. Problem: if the VPN failes, there is a chance that the special to-be-tunneled program communicates over the normal eth0 interface. Desired solution: All marked traffic should not be allowed to go directly through eth0, it has to go through tun0 first. I tried the following commands which didn't work: iptables -A OUTPUT -m owner --gid-owner groupname ! -o tun0 -j REJECT iptables -A OUTPUT -m owner --gid-owner groupname -o eth0 -j REJECT It might be the problem, that the above iptable-rules didn't work due to the fact, that the packets are first marked, then put into tun0 and then transmitted by eth0 while they are still marked.. I don't know how to de-mark them after in tun0 or to tell iptables, that all marked packet may pass eth0, if they where in tun0 before or if they going to the gateway of my VPN provider. Does someone has any idea to a solution? Some config infos: iptables -nL -v --line-numbers -t mangle Chain OUTPUT (policy ACCEPT 11M packets, 9798M bytes) num pkts bytes target prot opt in out source destination 1 591K 50M MARK all -- * * 0.0.0.0/0 0.0.0.0/0 owner GID match 1005 MARK set 0x2a 2 82812 6938K CONNMARK all -- * * 0.0.0.0/0 0.0.0.0/0 owner GID match 1005 CONNMARK save iptables -nL -v --line-numbers -t nat Chain POSTROUTING (policy ACCEPT 393 packets, 23908 bytes) num pkts bytes target prot opt in out source destination 1 15 1052 SNAT all -- * tun0 0.0.0.0/0 0.0.0.0/0 mark match 0x2a to:VPN_IP ip rule add from all fwmark 42 lookup 42 ip route show table 42 default via VPN_IP dev tun0

Read the article

Problem uninstalling and installing Java on new pc running Windows 7 64 bit os

- by Brian Gerrin

I have a new Dell Studio XPS running Windows 7 64 bit os. I am attending online classes which require IE 8 and Java version 6 build 20. The pc came with IE 8 32 bit and Java 6 build 21 already installed. I tried to uninstall Java using add and remove programs but after about 45 minutes of "Preparing to remove application" I got an error refering to a missing dll file and the uninstall failed. I used a third party program to remove Java and downloaded Java 6 build 20. My problem is when I try to install it I get the box telling me "Installing program ... this may take a few minutes" however after 30 to 45 minutes nothing has happend and there is no indication in the progress bar that anything is happening then all of a sudden the program bar is full and the program is supposedly installed. When I try to run it however it doesn't work. Someone help please! I can't get access to my classwork with out this! Thanks

Read the article

Soundtest works, but no other sounds and windows starts very slowly

- by Kristian Kari

So suddenly all sounds on my computer stopped working expect the sound test works, but when I go to control panel and audio it says there's no audio device. And audio players and audio on website doesn't work either. and Also windows starts very slowly. I'm not sure if this helps but I'll tell what happened before this. I downloaded this program from a little shady website, I hesitated and for a good reason I guess cos the moment I finished downloading it, my antivir detected it as unwanted program and put it on quarantine. I thought everything was fine, but had some problems with program (won't go to details about it cos it's not really related to audio) so rebooted the computer and I was suprised that windows started really really slowly. It took like 5 mins to load the desktop such. After that I noticed the sounds were missing, I immediately deleted the program since I thought it was causing the problem. But it didn't change much. I tried to find the solution with no success and now I'm just scanning again my computer. I'd be thankful if anyone would be able to help me. I might have left a thing or ten things out, so feel free to ask. oh and I'm using windows xp

Read the article

Reading email from Emacs VM using a secure server (Gmail)

- by Alan Wehmann

This is a question (see below) originally entered at https://answers.launchpad.net/vm/+question/108267 and upon the recommendation of Uday Reddy the question and answers are being moved here. The date of the original question was May 4, 2010. One subject of the question is use of the program stunnel with program View Mail (run within Emacs) on a PC running Microsoft Windows, in order to read email from a server that requires use of TSL/SSL (Gmail). See the related question, How to configure Emacs smtp for secure server for using a secure server, for sending email. The programs discussed are Emacs, VM (ViewMail) and stunnel. The platform under discussion is MS Windows. The original question was asked by usr345 on 2010-04-24: I tried to install vm on Windows, but when I tried to get the mail from gmail using ssl, an error emerges, emacs hanges-up. Here is the code from .emacs: (add-to-list 'load-path (expand-file-name "~/vm/lisp")) (add-to-list 'Info-default-directory-list (expand-file-name "~/vm/info")) (require 'vm-autoloads) (setq vm-primary-inbox "~/mail/inbox.mbox") (setq vm-crash-box "~/mail/inbox.crash.mbox") (setq vm-spool-files `((,vm-primary-inbox "pop-ssl:pop.gmail.com:995:pass:usr345:PASSWORD" ,vm-crash-box))) (setq vm-stunnel-program "g:/program files/stunnel/stunnel.exe") So, the question: How to configure pop-ssl on Windows?

Read the article

Use preforker(ruby gem) with supervisor

- by user1548832

I also asked same question on stackoverflow.com http://stackoverflow.com/questions/13871169/use-preforkerruby-gem-with-supervisor But, superuser.com might much help to me. Can anyone amswer this? I want to run a server program using preforker ruby gem with supervisor. But error has occured. I wrote a following test program using preforker. #!/usr/bin/env ruby require 'rubygems' require 'preforker' Preforker.new(:app_name => 'test-preforker', :timeout => 60, :workers => 1) do |master| while master.wants_me_alive? do puts "hello" sleep 10 end end.run And a following supervisor config. [program:test-preforker] command=/home/tkono/tmp/test-preforker.rb stdout_logfile_maxbytes=1MB stderr_logfile_maxbytes=1MB stdout_logfile=/var/log/%(program_name)s.log stderr_logfile=/var/log/%(program_name)s.log autorestart=true Then, reload supervisor. # supervisorctl reload Restarted supervisord Here is the log file of supervisor. 2012-12-13 17:50:47,161 CRIT Supervisor running as root (no user in config file) 2012-12-13 17:50:47,163 WARN Included extra file "/etc/supervisor.d/test-preforker.ini" during parsing 2012-12-13 17:50:47,209 INFO RPC interface 'supervisor' initialized 2012-12-13 17:50:47,213 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2012-12-13 17:50:47,215 INFO supervisord started with pid 12437 2012-12-13 17:50:48,231 INFO spawned: 'test-preforker' with pid 12440 2012-12-13 17:50:48,233 INFO exited: test-preforker (exit status 1; not expected) 2012-12-13 17:50:49,248 INFO spawned: 'test-preforker' with pid 12441 2012-12-13 17:50:49,261 INFO exited: test-preforker (exit status 1; not expected) 2012-12-13 17:50:51,267 INFO spawned: 'test-preforker' with pid 12442 2012-12-13 17:50:51,284 INFO exited: test-preforker (exit status 1; not expected) 2012-12-13 17:50:54,305 INFO spawned: 'test-preforker' with pid 12443 2012-12-13 17:50:54,308 INFO exited: test-preforker (exit status 1; not expected) 2012-12-13 17:50:55,311 INFO gave up: test-preforker entered FATAL state, too many start retries too quickly Please tell me what is wrong? A program using preforker cannot run with supervisor? preforker https://github.com/dcadenas/preforker supervisor http://supervisord.org/index.html

Read the article

How to allow users to transfer files to other users on linux

- by Jon Bringhurst

We have an environment of a few thousand users running applications on about 40 clusters ranging in size from 20 compute nodes to 98,000 compute nodes. Users on these systems generate massive files (sometimes 1PB) controlled by traditional unix permissions (ACLs usually aren't available or practical due to the specialized nature of the filesystem). We currently have a program called "give", which is a suid-root program that allows a user to "give" a file to another user when group permissions are insufficient. So, a user would type something like the following to give a file to another user: > give username-to-give-to filename-to-give ... The receiving user can then use a command called "take" (part of the give program) to receive the file: > take filename-to-receive The permissions of the file are then effectively transferred over to the receiving user. This program has been around for years and we'd like to revisit things from a security and functional point of view. Our current plan of action is to remove the bit rot in our current implementation of "give" and package it up as an open source app before we redeploy it into production. Does anyone have another method they use to transfer extremely large files between users when only traditional unix permissions are available?

Read the article

Starting programs from terminal then exiting terminal exits started programs?

- by sherrellbc

I really was unsure how to phrase the question title. What I mean is that when I use the terminal to start a program, most of the time when the terminal is closed it also exits the programs started from it. Now this makes sense if we look at it from a hierarchical standpoint of the terminal being the parent process which spawns child processes, and any halt of the parent causes subsequent halting of the children as well. However, I've noticed this to not always be the case. For example, I downloaded Sublime Text Editor and created a symlink in PATH. I can start this program by issuing a sublime command from the terminal, but subsequent closure of the terminal program does nothing to sublime. However, other times either the child process that was started it also closed or it hangs up and causes problems. tl;dr: Is it always the case that programs started from a closed parent process will be closed when the parent is exited? And if so, is there way to start a program from the terminal and then close the terminal without exiting the started process? The whole point here is to start programs from the terminal so I do not overly-populate my desktop with symlinks.

Read the article

Exchange 2010 UR3 - customizing OWA logon page

- by STGdb

I have an Exchange 2010 UR3 deployment that I need to customize the OWA logon page for. I've created a new LGNTOPL.GIF file to replace the existing one in the folder: “C:\Program Files\Microsoft\Exchange Server\V14\ClientAccess\Owa\14.3.158.1\themes\resources” When I bring up OWA, I still get the original “Outlook Web App” logo. I’ve searched and found a couple of other instances of LGNTOPL.GIF in the directories: “C:\Program Files\Microsoft\Exchange Server\V14\ClientAccess\Owa\14.3.123.3\themes\resources” “C:\Program Files\Microsoft\Exchange Server\V14\ClientAccess\Owa\14.3.146.0\themes\resources” “C:\Program Files\Microsoft\Exchange Server\V14\ClientAccess\Owa\Current\themes\resources” I’ve replaced the LGNTOPL.GIF file in each of the above directories but got the same results. I’ve tried clearing my browser cache and even using multiple browsers from multiple PC’s but the same results. I’ve even tried making my GIF file the same pixel size as the original LGNTOPL.GIF logo but still the same results. I’ve tried restarting IIS on the CAS server and restarting the server but same results. Has something changed with Exchange 2010 UR3 when trying to customize OWA? I don't see anything documented about any change to OWA customization. Thanks

Read the article

Command line switching

- by Larry

I have read through some suggestions but I am just not technical enough to get this I think. I am a CAD designer and each file has 5 files associated with it. I have 3 sets of 5 files, and each set needs to go into its own zip file, placed on a separate server. For example: "C:\Program Files\7-zip\7z.exe" a file1.zip "O:\server2\map files\BC\BC.d*"-0 "C:\Program Files\7-zip\7z.exe" a file2.zip "O:\server2\map files\BC\ON.d*"-0 "C:\Program Files\7-zip\7z.exe" a file3.zip "O:\server2\map files\BC\AB.d*"-0 and I am in directory "S:\server\map files\provinces" (for example). These lines run within an existing batch file and by the time it reaches the 3 lines above, it's in the S: directory sample above. So it's looking on my pc for the 7-zip program, creating 3 zip file names which it does, but places those zip files on a separate server which it doesn't and the first zip file also includes all the other 10 files, the second zip file the same plus the first zip file, and the third the same with the other two zip files making me think the code isn't recognizing the part after file1.zip where I am trying to tell it what files to include and where to place the zip files. Ultimately, I want to either have the system create a new zip file if the old one was deleted, or copy the new files into the existing zip and overwrite any older files, and for these zip files to be placed in a separate location which is where we share our files with other personnel from within our company. The S: drive is for all originals, and O: is for sharing. Is there a list of all switching options with many different samples?

Read the article

Deleting old system folders from a drive that is no longer the windows installation drive

- by grenade

I dropped my laptop and was no longer able to boot. There were error messages about a corrupt boot record. Replacing the hard drive and reinstalling Win 7 was how I dealt with it. The old drive still appears to be good and I can read and write to it when I connect it as a second drive and mount as D:. However, if I try to recover the space being used by the windows, programdata, program files & program files(x86) folders, by deleting them I get error messages about needing permission from trustedinstaller. If I set myself as the owner of the folders and retry the delete I get error messages about needing permission from myself! Since I'm pretty sure that I have permission from myself to delete the folders, I can only assume that the OS or file system has gotten its panties twisted. I have tried shift, right click, delete from explorer and also if I run "del /f /s /q D:\Windows" from an admin command prompt, I get a succession of Access is denied messages as well. How do I delete D:\Windows, D:\ProgramData, D:\Program Files & D:\Program Files(x86) from a drive that is not the Windows installation drive?

Read the article

How can I disable flashing icons on Windows 7 taskbar?

- by Jebego

I set my Windows 7 taskbar to auto-hide. However, sometimes when a program changes or something new happens in a program, the taskbar will show its self, and its respective taskbar icon will begin flashing orange. Here's what I'm talking about: To make the taskbar hide again, I have click on the program before I can go back to what I was doing. Anyways, I personally find this very annoying, and would love to find a way to either: Prevent the taskbar from having such alerts. Prevent the taskbar from showing its self when it has such alerts. I've searched around quite a bit, and really only found answers to this for XP. I've also found another Stack Exchange Question looking for the same thing for Windows 7. However, none of the answers to the question were really what I'm looking for. I'm not looking to hide the taskbar, or control the number of flashes. However, this answer seems to be what I'm looking for, so I downloaded and tried out the program. It works perfectly, other than the fact that the start menu icon is always shown, regardless of the taskbar being set to auto-hide. So, any ideas on how to fix this problem?

Read the article

How can I stop outlook 2003 from crashing?

- by Xavierjazz

XP Outlook 2003 keeps crashing, sometimes freezing my whole computer. The STR: Have Outlook 2003 running (with the added "app" LOOKOUT for search and a pop mail as well as MS mail set up. The program loads and displays my reminders. I minimize the reminders. Outlook displays my email list. I have the "Reading pane" set to display right. There is often junk in my junk folder. When I click on the MS mail junk folder, there is sometimes junk with a blank description. Clicking on this to select and delete it is when the program is virtually certain to crash. Often when I reboot the program, the reading pane is again reset to the default, which is "no reading pane". If I change it back and then again click on the message the program often crashes. If I don't set the reading pane but select the message(s), they can be selected and removed. I then set the reading pane and things are okay for a period. This has been going on for some time now. As a part of trying to solve it, I did a deep scan with a number of "root kit" virus-removers. One did find 2 related root kit viruses and removed them. Ram seems okay, HDD shows okay. As I write this I realize that one thing I haven't tried is removing and re-installing LOOKOUT. I will do that now. Any other ideas or even better, solutions, would be most welcome.

Read the article

Exchange 2010 UR3 - customizing OWA logon page

- by STGdb

I have an Exchange 2010 UR3 deployment that I need to customize the OWA logon page for. I've created a new LGNTOPL.GIF file to replace the existing one in the folder: “C:\Program Files\Microsoft\Exchange Server\V14\ClientAccess\Owa\14.3.158.1\themes\resources” When I bring up OWA, I still get the original “Outlook Web App” logo. I’ve searched and found a couple of other instances of LGNTOPL.GIF in the directories: “C:\Program Files\Microsoft\Exchange Server\V14\ClientAccess\Owa\14.3.123.3\themes\resources” “C:\Program Files\Microsoft\Exchange Server\V14\ClientAccess\Owa\14.3.146.0\themes\resources” “C:\Program Files\Microsoft\Exchange Server\V14\ClientAccess\Owa\Current\themes\resources” I’ve replaced the LGNTOPL.GIF file in each of the above directories but got the same results. I’ve tried clearing my browser cache and even using multiple browsers from multiple PC’s but the same results. I’ve even tried making my GIF file the same pixel size as the original LGNTOPL.GIF logo but still the same results. I’ve tried restarting IIS on the CAS server and restarting the server but same results. Has something changed with Exchange 2010 UR3 when trying to customize OWA? I don't see anything documented about any change to OWA customization. Thanks

Read the article

Windows OS level file open event trigger

- by john

Colleagues, I have need to run a script/program on certain basic OS level events. In particular when a file in Windows is opened. The open may be read-only or to edit, and may be initiated by a number of means, either from windows explorer (open or ), be selected from a viewing or editing application from the native file chooser, or drag-n-drop into an editing or viewing application. Further, i need the trigger to "hold" the event from completing the action until the runtime on the program has completed. The event handler program may return a pass state, or fail state. If fail state has been returned, then the event must disallow the initially requested action. Lastly, I need to add to the file in question a property or attribute that will contain metadata that will be used by the above event trigger handler program to make a determination as to the pass/fail condition that will ultimately determine if the user is permitted to open the file. Please note that this is NOT a windows event log situation, but one at the OS level file open event. thanks very much for your help. regards, j

Read the article

using virtual machine like mySql server

- by ffmm

i'm developing a java program and i need a database. Now i'm using MAMP and it's pretty easy but i would have a virtual machine (ubuntu server) and i need to connect my java program with this virtual machine using vitualBox. the situation: I installed VirtualBox on my mac and I installed an ubuntu-server machine set "bridge adapter" in the network settings of VB I installed mysql on ubuntu-server and i created a simple database (all work well by ubuntu) doing ifconfig by ubuntu I get the ip: 192.168.1.217 so in the java program i made this function: public static Connection connect(String host, int port, String dbName, String user, String passwd) { Connection dbConnection = null; try { String dbString = null; Class.forName("com.mysql.jdbc.Driver").newInstance(); dbString = "jdbc:mysql://" + host + ":" + port + "/" + dbName; dbConnection = DriverManager.getConnection(dbString, user, passwd); } catch (Exception e) { System.err.println("Failed to connect with the DB"); e.printStackTrace(); } return dbConnection; } and in the main() i use: Connection con = connect(1, "192.168.1.217", 3306, "Ciao", "root", "cocacola"); 3306 was a default value. I don't know if is correct, it works on mamp, but…. how I can find the correct port that I have to use with VB? when I ran the program I get the catch excepion… what's wrong? ps: i have to install apache o something else?

Read the article

moving from WinXP to WinServer in VmWare

- by Alex

I have a Vmware machine for.Net application testing. Current setup: Host OS: win7 Guest OS: Right now the guest OS is Win Xp Pro x64, which runs great with just 1 gigabyte of RAM and 10 gigs of disk space. * This part can be skipped * As I said, there was a program that I needed to test, but unfortunately, by default, Vmware installs crappy display drivers(called SVGA II) on XP machines and there is NO way to upgrade them! This resulted in my program's error (the program used SlimDX (DirectX wrapper) to do some stuff..). Eventually I found out that display drivers most certainly is the problem. For example, Windows 7 virtual machine uses SVGA 3D drivers and I have NO problems running my SlimDX-based program. Now, regarding Windows Server 2008! Apparently, WDDM driver is supported by WS2008, which means that I'll be able to install SVGA 3D and to test my DX apps. * end of skip * The questions are: Will WS2008 be as smooth with just 1 gig of RAM just like Win XP was? Will 10 gigs of HDD be enough? Or the server requires more? Will I be able to install .Net ver. 4 on WS2008? Are there any limitations that I need to be aware of as a .Net programmer? EDIT: I was hoping that WS2008 is XP-based, not Vista-vased/W7-based. In comparison, W7 virtual machine with 2 gigs of RAM and 2 proc cores nearly kills my Host OS. Whereas, WinXp runs extremely fast even with 1 core and 1 gig of RAM. That's the main reason why I want to try WS2008..

Read the article

Adding SSD as boot drive to existing system

- by thegrinner

I recently bought two 128GB SSDs that I'm planning on adding (RAID 0) to a system I currently have on a 1TB HDD. I'm hoping to redo the disk space such that the SSDs act as the boot drive (only other items would be things I install there explicitly) while the majority of my system is on the HDD - documents, media, program files. Something like this: SSD = [ OS | Explicitly placed programs] HDD = [ Program Files | Media | Documents | etc] I have an external drive capable of holding all the data I want to save, so the backup isn't too much of a concern. What I'm worried about is how I should go about doing this - do I need to do a clean install on the SSDs, reformat the HDD, move things like Program Files/Users to the HDD, and then restore data (not full programs but things like saves)? Should I be using one of the regedit hacks I've seen around to change the default install directories instead of moving program files and users? Should I have the actual folders on the HDD and symlinks on the SSD? Or is there a better solution? Do I need to disconnect my HDD while doing the clean Windows install?

Read the article

How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article

An Alphabet of Eponymous Aphorisms, Programming Paradigms, Software Sayings, Annoying Alliteration

- by Brian Schroer

Malcolm Anderson blogged about “Einstein’s Razor” yesterday, which reminded me of my favorite software development “law”, the name of which I can never remember. It took much Wikipedia-ing to find it (Hofstadter’s Law – see below), but along the way I compiled the following list: Amara’s Law: We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run. Brook’s Law: Adding manpower to a late software project makes it later. Clarke’s Third Law: Any sufficiently advanced technology is indistinguishable from magic. Law of Demeter: Each unit should only talk to its friends; don't talk to strangers. Einstein’s Razor: “Make things as simple as possible, but not simpler” is the popular paraphrase, but what he actually said was “It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience”, an overly complicated quote which is an obvious violation of Einstein’s Razor. (You can tell by looking at a picture of Einstein that the dude was hardly an expert on razors or other grooming apparati.) Finagle's Law of Dynamic Negatives: Anything that can go wrong, will—at the worst possible moment. - O'Toole's Corollary: The perversity of the Universe tends towards a maximum. Greenspun's Tenth Rule: Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. (Morris’s Corollary: “…including Common Lisp”) Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law. Issawi’s Omelet Analogy: One cannot make an omelet without breaking eggs - but it is amazing how many eggs one can break without making a decent omelet. Jackson’s Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet. Kaner’s Caveat: A program which perfectly meets a lousy specification is a lousy program. Liskov Substitution Principle (paraphrased): Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it Mason’s Maxim: Since human beings themselves are not fully debugged yet, there will be bugs in your code no matter what you do. Nils-Peter Nelson’s Nil I/O Rule: The fastest I/O is no I/O. Occam's Razor: The simplest explanation is usually the correct one. Parkinson’s Law: Work expands so as to fill the time available for its completion. Quentin Tarantino’s Pie Principle: “…you want to go home have a drink and go and eat pie and talk about it.” (OK, he was talking about movies, not software, but I couldn’t find a “Q” quote about software. And wouldn’t it be cool to write a program so great that the users want to eat pie and talk about it?) Raymond’s Rule: Computer science education cannot make anybody an expert programmer any more than studying brushes and pigment can make somebody an expert painter. Sowa's Law of Standards: Whenever a major organization develops a new system as an official standard for X, the primary result is the widespread adoption of some simpler system as a de facto standard for X. Turing’s Tenet: We shall do a much better programming job, provided we approach the task with a full appreciation of its tremendous difficulty, provided that we respect the intrinsic limitations of the human mind and approach the task as very humble programmers. Udi Dahan’s Race Condition Rule: If you think you have a race condition, you don’t understand the domain well enough. These rules didn’t exist in the age of paper, there is no reason for them to exist in the age of computers. When you have race conditions, go back to the business and find out actual rules. Van Vleck’s Kvetching: We know about as much about software quality problems as they knew about the Black Plague in the 1600s. We've seen the victims' agonies and helped burn the corpses. We don't know what causes it; we don't really know if there is only one disease. We just suffer -- and keep pouring our sewage into our water supply. Wheeler’s Law: All problems in computer science can be solved by another level of indirection... Except for the problem of too many layers of indirection. Wheeler also said “Compatibility means deliberately repeating other people's mistakes.”. The Wrong Road Rule of Mr. X (anonymous): No matter how far down the wrong road you've gone, turn back. Yourdon’s Rule of Two Feet: If you think your management doesn't know what it's doing or that your organisation turns out low-quality software crap that embarrasses you, then leave. Zawinski's Law of Software Envelopment: Every program attempts to expand until it can read mail. Zawinski is also responsible for “Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems.” He once commented about X Windows widget toolkits: “Using these toolkits is like trying to make a bookshelf out of mashed potatoes.”

Read the article

What's going on with INETA and the Regional Speakers Bureau?

- by Chris Williams

For those of you that have been waiting patiently (and not so patiently) I'm happy to say that we're very near completion on some changes/enhancements/improvements that will allow us to finally go live with the INETA Regional Speakers Bureau. I know quite a few of you have already registered, which is great (though some of you may need to come back and update your info) and we've had a few folks submit requests, mostly in a test capacity, but soon we'll be up and live. Here's how it breaks down. Be sure to read this, because things have changed a bit from when we initially announced it. 1. The majority of our speaker/event funding is going into the Regional Speakers Bureau. The National Bureau still exists, but it's a good bit smaller than it was before, and it's not an "every group" benefit anymore. We'll be using the National Bureau as more of a strategic task force, targeting high impact events and areas that need some community building love from INETA. These will be identified and handled on a case by case basis, and may include more than just user group events. 2. You're going to get more events per group, per year than you did before. Not only are we focusing more resources on this program, but we're also making a lot of efforts to use it more effectively. With the INETA Regional Speakers Bureau, you should be able to get 2-3 INETA speakers per year, on average. Not every geographical area will have exactly the same experience, but we're doing the best we can. 3. It's not a farm team program for the National Bureau. Unsurprisingly, I managed to offend a number of people when I previously made the comment that the Regional Speakers Bureau program was a farm team or stepping stone to the National Bureau. It was a poor choice of words. Anyone can participate in the Regional Speakers Bureau, and I look forward to working with all of you. 4. There is assistance for your efforts. The exact final details are still being hammered out, but expect it to look something like this: (all distances listed are based on a round trip) Distances < 120 miles = $0 121 miles - 240 miles = $50 (effectively 1 to 2 hours, each way) 241 miles - 360 miles = $100 (effectively 2 to 3 hours, each way) 361 miles - 480 miles = $200 (effectively 3 to 4 hours, each way) For those of you who travel a lot, we're working on a solution to handle group visits when you're away from home. These will (for now) be handled on a case by case basis. 5. We're going to make it as easy as possible to work with the program. In order to do this, we need a few things from you. For speakers, that means your home address. It also means (maybe) filling out a simple 1 line expense report via the INETA website. For user groups, it means making sure your meeting address is up to date as well. 6. Distances will be automatically calculated from your home of record to the user group event and back. We realize that this is not a perfect solution to every instance, but we're not paying you to speak at an event, and you won't be taxed on this money. It's simply some assistance to make your community efforts easier. Our way of saying thanks for everything you do. 7. Sounds good so far, what's the catch? There's always a catch, right? In this case there are two of them: 1) At this time, Microsoft employees are welcome to use the website to line up speaking engagements with user groups, but are not eligible for financial assistance. 2) Anyone can register and use the website to line up speaking engagements with user groups, however you must receive and maintain a net score of 3+ positive ratings (we're implementing a thumbs up / thumbs down system) in order to receive financial assistance. These ratings are provided by the User Group leaders after the meeting has taken place. 8. Involvement by the User Group leaders is a key factor in the success of this program. Your job isn't done once you request a speaker. After you've had your meeting, it's critical that you go back to the website and take a very small survey. Doing this ensures that the speaker gets rated (and compensated if eligible) and also ensures that you can make another request, since you won't be able to make a new request if you have an old one outstanding. 9. What about Canada? We're definitely working on that. Unfortunately nothing new to report on that front, other than to say that we're trying. So... this is where things stand currently. We're working very quickly to get this in place and get speakers and groups together. If you have any questions, please leave a comment below and I'll answer them as quickly as possible. If I've forgotten anything, or if things change, I'll update it here. Thanks, Chris G. Williams INETA Board of Directors

Read the article

Calculating the Size (in Bytes and MB) of a Oracle Coherence Cache

- by Ricardo Ferreira

The concept and usage of data grids are becoming very popular in this days since this type of technology are evolving very fast with some cool lead products like Oracle Coherence. Once for a while, developers need an programmatic way to calculate the total size of a specific cache that are residing in the data grid. In this post, I will show how to accomplish this using Oracle Coherence API. This example has been tested with 3.6, 3.7 and 3.7.1 versions of Oracle Coherence. To start the development of this example, you need to create a POJO ("Plain Old Java Object") that represents a data structure that will hold user data. This data structure will also create an internal fat so I call that should increase considerably the size of each instance in the heap memory. Create a Java class named "Person" as shown in the listing below. package com.oracle.coherence.domain; import java.io.Serializable; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Random; @SuppressWarnings("serial") public class Person implements Serializable { private String firstName; private String lastName; private List<Object> fat; private String email; public Person() { generateFat(); } public Person(String firstName, String lastName, String email) { setFirstName(firstName); setLastName(lastName); setEmail(email); generateFat(); } private void generateFat() { fat = new ArrayList<Object>(); Random random = new Random(); for (int i = 0; i < random.nextInt(18000); i++) { HashMap<Long, Double> internalFat = new HashMap<Long, Double>(); for (int j = 0; j < random.nextInt(10000); j++) { internalFat.put(random.nextLong(), random.nextDouble()); } fat.add(internalFat); } } public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } public String getEmail() { return email; } public void setEmail(String email) { this.email = email; } } Now let's create a Java program that will start a data grid into Coherence and will create a cache named "People", that will hold people instances with sequential integer keys. Each person created in this program will trigger the execution of a custom constructor created in the People class that instantiates an internal fat (the random amount of data generated to increase the size of the object) for each person. Create a Java class named "CreatePeopleCacheAndPopulateWithData" as shown in the listing below. package com.oracle.coherence.demo; import com.oracle.coherence.domain.Person; import com.tangosol.net.CacheFactory; import com.tangosol.net.NamedCache; public class CreatePeopleCacheAndPopulateWithData { public static void main(String[] args) { // Asks Coherence for a new cache named "People"... NamedCache people = CacheFactory.getCache("People"); // Creates three people that will be putted into the data grid. Each person // generates an internal fat that should increase its size in terms of bytes... Person pessoa1 = new Person("Ricardo", "Ferreira", "[email protected]"); Person pessoa2 = new Person("Vitor", "Ferreira", "[email protected]"); Person pessoa3 = new Person("Vivian", "Ferreira", "[email protected]"); // Insert three people at the data grid... people.put(1, pessoa1); people.put(2, pessoa2); people.put(3, pessoa3); // Waits for 5 minutes until the user runs the Java program // that calculates the total size of the people cache... try { System.out.println("---> Waiting for 5 minutes for the cache size calculation..."); Thread.sleep(300000); } catch (InterruptedException ie) { ie.printStackTrace(); } } } Finally, let's create a Java program that, using the Coherence API and JMX, will calculate the total size of each cache that the data grid is currently managing. The approach used in this example was retrieve every cache that the data grid are currently managing, but if you are interested on an specific cache, the same approach can be used, you should only filter witch cache will be looked for. Create a Java class named "CalculateTheSizeOfPeopleCache" as shown in the listing below. package com.oracle.coherence.demo; import java.text.DecimalFormat; import java.util.Map; import java.util.Set; import java.util.TreeMap; import javax.management.MBeanServer; import javax.management.MBeanServerFactory; import javax.management.ObjectName; import com.tangosol.net.CacheFactory; public class CalculateTheSizeOfPeopleCache { @SuppressWarnings({ "unchecked", "rawtypes" }) private void run() throws Exception { // Enable JMX support in this Coherence data grid session... System.setProperty("tangosol.coherence.management", "all"); // Create a sample cache just to access the data grid... CacheFactory.getCache(MBeanServerFactory.class.getName()); // Gets the JMX server from Coherence data grid... MBeanServer jmxServer = getJMXServer(); // Creates a internal data structure that would maintain // the statistics from each cache in the data grid... Map cacheList = new TreeMap(); Set jmxObjectList = jmxServer.queryNames(new ObjectName("Coherence:type=Cache,*"), null); for (Object jmxObject : jmxObjectList) { ObjectName jmxObjectName = (ObjectName) jmxObject; String cacheName = jmxObjectName.getKeyProperty("name"); if (cacheName.equals(MBeanServerFactory.class.getName())) { continue; } else { cacheList.put(cacheName, new Statistics(cacheName)); } } // Updates the internal data structure with statistic data // retrieved from caches inside the in-memory data grid... Set<String> cacheNames = cacheList.keySet(); for (String cacheName : cacheNames) { Set resultSet = jmxServer.queryNames( new ObjectName("Coherence:type=Cache,name=" + cacheName + ",*"), null); for (Object resultSetRef : resultSet) { ObjectName objectName = (ObjectName) resultSetRef; if (objectName.getKeyProperty("tier").equals("back")) { int unit = (Integer) jmxServer.getAttribute(objectName, "Units"); int size = (Integer) jmxServer.getAttribute(objectName, "Size"); Statistics statistics = (Statistics) cacheList.get(cacheName); statistics.incrementUnit(unit); statistics.incrementSize(size); cacheList.put(cacheName, statistics); } } } // Finally... print the objects from the internal data // structure that represents the statistics from caches... cacheNames = cacheList.keySet(); for (String cacheName : cacheNames) { Statistics estatisticas = (Statistics) cacheList.get(cacheName); System.out.println(estatisticas); } } public MBeanServer getJMXServer() { MBeanServer jmxServer = null; for (Object jmxServerRef : MBeanServerFactory.findMBeanServer(null)) { jmxServer = (MBeanServer) jmxServerRef; if (jmxServer.getDefaultDomain().equals(DEFAULT_DOMAIN) || DEFAULT_DOMAIN.length() == 0) { break; } jmxServer = null; } if (jmxServer == null) { jmxServer = MBeanServerFactory.createMBeanServer(DEFAULT_DOMAIN); } return jmxServer; } private class Statistics { private long unit; private long size; private String cacheName; public Statistics(String cacheName) { this.cacheName = cacheName; } public void incrementUnit(long unit) { this.unit += unit; } public void incrementSize(long size) { this.size += size; } public long getUnit() { return unit; } public long getSize() { return size; } public double getUnitInMB() { return unit / (1024.0 * 1024.0); } public double getAverageSize() { return size == 0 ? 0 : unit / size; } public String toString() { StringBuffer sb = new StringBuffer(); sb.append("\nCache Statistics of '").append(cacheName).append("':\n"); sb.append(" - Total Entries of Cache -----> " + getSize()).append("\n"); sb.append(" - Used Memory (Bytes) --------> " + getUnit()).append("\n"); sb.append(" - Used Memory (MB) -----------> " + FORMAT.format(getUnitInMB())).append("\n"); sb.append(" - Object Average Size --------> " + FORMAT.format(getAverageSize())).append("\n"); return sb.toString(); } } public static void main(String[] args) throws Exception { new CalculateTheSizeOfPeopleCache().run(); } public static final DecimalFormat FORMAT = new DecimalFormat("###.###"); public static final String DEFAULT_DOMAIN = ""; public static final String DOMAIN_NAME = "Coherence"; } I've commented the overall example so, I don't think that you should get into trouble to understand it. Basically we are dealing with JMX. The first thing to do is enable JMX support for the Coherence client (ie, an JVM that will only retrieve values from the data grid and will not integrate the cluster) application. This can be done very easily using the runtime "tangosol.coherence.management" system property. Consult the Coherence documentation for JMX to understand the possible values that could be applied. The program creates an in memory data structure that holds a custom class created called "Statistics". This class represents the information that we are interested to see, which in this case are the size in bytes and in MB of the caches. An instance of this class is created for each cache that are currently managed by the data grid. Using JMX specific methods, we retrieve the information that are relevant for calculate the total size of the caches. To test this example, you should execute first the CreatePeopleCacheAndPopulateWithData.java program and after the CreatePeopleCacheAndPopulateWithData.java program. The results in the console should be something like this: 2012-06-23 13:29:31.188/4.970 Oracle Coherence 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded operational configuration from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/tangosol-coherence.xml" 2012-06-23 13:29:31.219/5.001 Oracle Coherence 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded operational overrides from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/tangosol-coherence-override-dev.xml" 2012-06-23 13:29:31.219/5.001 Oracle Coherence 3.6.0.4 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified 2012-06-23 13:29:31.266/5.048 Oracle Coherence 3.6.0.4 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified Oracle Coherence Version 3.6.0.4 Build 19111 Grid Edition: Development mode Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. 2012-06-23 13:29:33.156/6.938 Oracle Coherence GE 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded Reporter configuration from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/reports/report-group.xml" 2012-06-23 13:29:33.500/7.282 Oracle Coherence GE 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded cache configuration from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/coherence-cache-config.xml" 2012-06-23 13:29:35.391/9.173 Oracle Coherence GE 3.6.0.4 <D4> (thread=Main Thread, member=n/a): TCMP bound to /192.168.177.133:8090 using SystemSocketProvider 2012-06-23 13:29:37.062/10.844 Oracle Coherence GE 3.6.0.4 <Info> (thread=Cluster, member=n/a): This Member(Id=2, Timestamp=2012-06-23 13:29:36.899, Address=192.168.177.133:8090, MachineId=55685, Location=process:244, Role=Oracle, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2) joined cluster "cluster:0xC4DB" with senior Member(Id=1, Timestamp=2012-06-23 13:29:14.031, Address=192.168.177.133:8088, MachineId=55685, Location=process:1128, Role=CreatePeopleCacheAndPopulateWith, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2) 2012-06-23 13:29:37.172/10.954 Oracle Coherence GE 3.6.0.4 <D5> (thread=Cluster, member=n/a): Member 1 joined Service Cluster with senior member 1 2012-06-23 13:29:37.188/10.970 Oracle Coherence GE 3.6.0.4 <D5> (thread=Cluster, member=n/a): Member 1 joined Service Management with senior member 1 2012-06-23 13:29:37.188/10.970 Oracle Coherence GE 3.6.0.4 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistributedCache with senior member 1 2012-06-23 13:29:37.188/10.970 Oracle Coherence GE 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Started cluster Name=cluster:0xC4DB Group{Address=224.3.6.0, Port=36000, TTL=4} MasterMemberSet ( ThisMember=Member(Id=2, Timestamp=2012-06-23 13:29:36.899, Address=192.168.177.133:8090, MachineId=55685, Location=process:244, Role=Oracle) OldestMember=Member(Id=1, Timestamp=2012-06-23 13:29:14.031, Address=192.168.177.133:8088, MachineId=55685, Location=process:1128, Role=CreatePeopleCacheAndPopulateWith) ActualMemberSet=MemberSet(Size=2, BitSetCount=2 Member(Id=1, Timestamp=2012-06-23 13:29:14.031, Address=192.168.177.133:8088, MachineId=55685, Location=process:1128, Role=CreatePeopleCacheAndPopulateWith) Member(Id=2, Timestamp=2012-06-23 13:29:36.899, Address=192.168.177.133:8090, MachineId=55685, Location=process:244, Role=Oracle) ) RecycleMillis=1200000 RecycleSet=MemberSet(Size=0, BitSetCount=0 ) ) TcpRing{Connections=[1]} IpMonitor{AddressListSize=0} 2012-06-23 13:29:37.891/11.673 Oracle Coherence GE 3.6.0.4 <D5> (thread=Invocation:Management, member=2): Service Management joined the cluster with senior service member 1 2012-06-23 13:29:39.203/12.985 Oracle Coherence GE 3.6.0.4 <D5> (thread=DistributedCache, member=2): Service DistributedCache joined the cluster with senior service member 1 2012-06-23 13:29:39.297/13.079 Oracle Coherence GE 3.6.0.4 <D4> (thread=DistributedCache, member=2): Asking member 1 for 128 primary partitions Cache Statistics of 'People': - Total Entries of Cache -----> 3 - Used Memory (Bytes) --------> 883920 - Used Memory (MB) -----------> 0.843 - Object Average Size --------> 294640 I hope that this post could save you some time when calculate the total size of Coherence cache became a requirement for your high scalable system using data grids. See you!

Read the article

Win32 and Win64 programming in C sources?

- by Nick Rosencrantz

I'm learning OpenGL with C and that makes me include the windows.h file in my project. I'd like to look at some more specific windows functions and I wonder if you can cite some good sources for learning the basics of Win32 and Win64 programming in C (or C++). I use MS Visual C++ and I prefer to stick with C even though much of the Windows API seems to be C++. I'd like my program to be portable and using some platform-indepedent graphics library like OpenGL I could make my program portable with some slight changes for window management. Could you direct me with some pointers to books or www links where I can find more info? I've already studied the OpenGL red book and the C programming language, what I'm looking for is the platform-dependent stuff and how to handle that since I run both Linux and Windows where I find the development environment Visual Studio is pretty good but the debugger gdb is not available on windows so it's a trade off which environment i'll choose in the end - Linux with gcc or Windows with MSVC. Here is the program that draws a graphics primitive with some use of windows.h This program is also runnable on Linux without changing the code that actually draws the graphics primitive: #include <windows.h> #include <gl/gl.h> LRESULT CALLBACK WindowProc(HWND, UINT, WPARAM, LPARAM); void EnableOpenGL(HWND hwnd, HDC*, HGLRC*); void DisableOpenGL(HWND, HDC, HGLRC); int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) { WNDCLASSEX wcex; HWND hwnd; HDC hDC; HGLRC hRC; MSG msg; BOOL bQuit = FALSE; float theta = 0.0f; /* register window class */ wcex.cbSize = sizeof(WNDCLASSEX); wcex.style = CS_OWNDC; wcex.lpfnWndProc = WindowProc; wcex.cbClsExtra = 0; wcex.cbWndExtra = 0; wcex.hInstance = hInstance; wcex.hIcon = LoadIcon(NULL, IDI_APPLICATION); wcex.hCursor = LoadCursor(NULL, IDC_ARROW); wcex.hbrBackground = (HBRUSH)GetStockObject(BLACK_BRUSH); wcex.lpszMenuName = NULL; wcex.lpszClassName = "GLSample"; wcex.hIconSm = LoadIcon(NULL, IDI_APPLICATION);; if (!RegisterClassEx(&wcex)) return 0; /* create main window */ hwnd = CreateWindowEx(0, "GLSample", "OpenGL Sample", WS_OVERLAPPEDWINDOW, CW_USEDEFAULT, CW_USEDEFAULT, 256, 256, NULL, NULL, hInstance, NULL); ShowWindow(hwnd, nCmdShow); /* enable OpenGL for the window */ EnableOpenGL(hwnd, &hDC, &hRC); /* program main loop */ while (!bQuit) { /* check for messages */ if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) { /* handle or dispatch messages */ if (msg.message == WM_QUIT) { bQuit = TRUE; } else { TranslateMessage(&msg); DispatchMessage(&msg); } } else { /* OpenGL animation code goes here */ glClearColor(0.0f, 0.0f, 0.0f, 0.0f); glClear(GL_COLOR_BUFFER_BIT); glPushMatrix(); glRotatef(theta, 0.0f, 0.0f, 1.0f); glBegin(GL_TRIANGLES); glColor3f(1.0f, 0.0f, 0.0f); glVertex2f(0.0f, 1.0f); glColor3f(0.0f, 1.0f, 0.0f); glVertex2f(0.87f, -0.5f); glColor3f(0.0f, 0.0f, 1.0f); glVertex2f(-0.87f, -0.5f); glEnd(); glPopMatrix(); SwapBuffers(hDC); theta += 1.0f; Sleep (1); } } /* shutdown OpenGL */ DisableOpenGL(hwnd, hDC, hRC); /* destroy the window explicitly */ DestroyWindow(hwnd); return msg.wParam; } LRESULT CALLBACK WindowProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam) { switch (uMsg) { case WM_CLOSE: PostQuitMessage(0); break; case WM_DESTROY: return 0; case WM_KEYDOWN: { switch (wParam) { case VK_ESCAPE: PostQuitMessage(0); break; } } break; default: return DefWindowProc(hwnd, uMsg, wParam, lParam); } return 0; } void EnableOpenGL(HWND hwnd, HDC* hDC, HGLRC* hRC) { PIXELFORMATDESCRIPTOR pfd; int iFormat; /* get the device context (DC) */ *hDC = GetDC(hwnd); /* set the pixel format for the DC */ ZeroMemory(&pfd, sizeof(pfd)); pfd.nSize = sizeof(pfd); pfd.nVersion = 1; pfd.dwFlags = PFD_DRAW_TO_WINDOW | PFD_SUPPORT_OPENGL | PFD_DOUBLEBUFFER; pfd.iPixelType = PFD_TYPE_RGBA; pfd.cColorBits = 24; pfd.cDepthBits = 16; pfd.iLayerType = PFD_MAIN_PLANE; iFormat = ChoosePixelFormat(*hDC, &pfd); SetPixelFormat(*hDC, iFormat, &pfd); /* create and enable the render context (RC) */ *hRC = wglCreateContext(*hDC); wglMakeCurrent(*hDC, *hRC); } void DisableOpenGL (HWND hwnd, HDC hDC, HGLRC hRC) { wglMakeCurrent(NULL, NULL); wglDeleteContext(hRC); ReleaseDC(hwnd, hDC); }

Read the article

How do I resolve this exercise on C++? [closed]

- by user40630

(Card Shuffling and Dealing) Create a program to shuffle and deal a deck of cards. The program should consist of class Card, class DeckOfCards and a driver program. Class Card should provide: a) Data members face and suit of type int. b) A constructor that receives two ints representing the face and suit and uses them to initialize the data members. c) Two static arrays of strings representing the faces and suits. d) A toString function that returns the Card as a string in the form “face of suit.” You can use the + operator to concatenate strings. Class DeckOfCards should contain: a) A vector of Cards named deck to store the Cards. b) An integer currentCard representing the next card to deal. c) A default constructor that initializes the Cards in the deck. The constructor should use vector function push_back to add each Card to the end of the vector after the Card is created and initialized. This should be done for each of the 52 Cards in the deck. d) A shuffle function that shuffles the Cards in the deck. The shuffle algorithm should iterate through the vector of Cards. For each Card, randomly select another Card in the deck and swap the two Cards. e) A dealCard function that returns the next Card object from the deck. f) A moreCards function that returns a bool value indicating whether there are more Cards to deal. The driver program should create a DeckOfCards object, shuffle the cards, then deal the 52 cards. This above is the exercise I'm trying to solve. I'd be very much appreciated if someone could solve it and explain it to me. The main idea of the program is quite simple. What I don't get is how to build the constructor for the class DeckOfCards and how to generate the 52 cards of the deck with different suits and faces. Untill now I've managed to do this: #include <iostream> #include <vector> using namespace std; /* * */ /* a) Data members face and suit of type int. b) A constructor that receives two ints representing the face and suit and uses them to initialize the data members. c) Two static arrays of strings representing the faces and suits. d) A toString function that returns the Card as a string in the form “face of suit.” You can use the + operator to concatenate strings. */ class Card { public: Card(int, int); string toString(); private: int suit, face; static string faceNames[13]; static string suitNames[4]; }; string Card::faceNames[13] = {"Ace","Two","Three","Four","Five","Six","Seven","Eight","Nine","Ten","Queen","Jack","King"}; string Card::suitNames[4] = {"Diamonds","Clubs","Hearts","Spades"}; string Card::toString() { return faceNames[face]+" of "+suitNames[suit]; } Card::Card(int f, int s) :face(f), suit(s) { } /* Class DeckOfCards should contain: a) A vector of Cards named deck to store the Cards. b) An integer currentCard representing the next card to deal. c) A default constructor that initializes the Cards in the deck. The constructor should use vector function push_back to add each Card to the end of the vector after the Card is created and initialized. This should be done for each of the 52 Cards in the deck. d) A shuffle function that shuffles the Cards in the deck. The shuffle algorithm should iterate through the vector of Cards. For each Card, randomly select another Card in the deck and swap the two Cards. e) A dealCard function that returns the next Card object from the deck. f) A moreCards function that returns a bool value indicating whether there are more Cards to deal. */ class DeckOfCards { public: DeckOfCards(); void shuffleCards(); Card dealCard(); bool moreCards(); private: vector<Card> deck(52); int currentCard; }; int main(int argc, char** argv) { return 0; } DeckOfCards::DeckOfCards() { //I'm stuck here I have no idea of what to take out of here. //I still don't fully get the idea of class inside class and that's turning out as a problem. I try to find a way to set the suits and faces members of the class Card but I can't figure out how. for(int i=0; i<deck.size(); i++) { deck[i]//....There is no function to set them. They must be set when initialized. But how?? } } For easier reading: http://pastebin.com/pJeXMH0f

Search Results

Search found 21828 results on 874 pages for 'program x'.

Page 140/874 | < Previous Page | 136 137 138 139 140 141 142 143 144 145 146 147 | Next Page >

- by user36367

- by user36367

- by dacrow

- by Brian Gerrin

- by Kristian Kari

- by Alan Wehmann

- by user1548832

- by Jon Bringhurst

- by sherrellbc

- by STGdb

- by Larry

- by grenade

- by Jebego

- by Xavierjazz

- by STGdb

- by john

- by ffmm

- by Alex

- by thegrinner

- by rchrd

- by Brian Schroer

- by Chris Williams

- by Ricardo Ferreira

- by Nick Rosencrantz

- by user40630

< Previous Page | 136 137 138 139 140 141 142 143 144 145 146 147 | Next Page >