divide - Page 25 - Developer IT

Problem measuring N times the execution time of a code block

- by Nazgulled

EDIT: I just found my problem after writing this long post explaining every little detail... If someone can give me a good answer on what I'm doing wrong and how can I get the execution time in seconds (using a float with 5 decimal places or so), I'll mark that as accepted. Hint: The problem was on how I interpreted the clock_getttime() man page. Hi, Let's say I have a function named myOperation that I need to measure the execution time of. To measure it, I'm using clock_gettime() as it was recommend here in one of the comments. My teacher recommends us to measure it N times so we can get an average, standard deviation and median for the final report. He also recommends us to execute myOperation M times instead of just one. If myOperation is a very fast operation, measuring it M times allow us to get a sense of the "real time" it takes; cause the clock being used might not have the required precision to measure such operation. So, execution myOperation only one time or M times really depends if the operation itself takes long enough for the clock precision we are using. I'm having trouble dealing with that M times execution. Increasing M decreases (a lot) the final average value. Which doesn't make sense to me. It's like this, on average you take 3 to 5 seconds to travel from point A to B. But then you go from A to B and back to A 5 times (which makes it 10 times, cause A to B is the same as B to A) and you measure that. Than you divide by 10, the average you get is supposed to be the same average you take traveling from point A to B, which is 3 to 5 seconds. This is what I want my code to do, but it's not working. If I keep increasing the number of times I go from A to B and back A, the average will be lower and lower each time, it makes no sense to me. Enough theory, here's my code: #include <stdio.h> #include <time.h> #define MEASUREMENTS 1 #define OPERATIONS 1 typedef struct timespec TimeClock; TimeClock diffTimeClock(TimeClock start, TimeClock end) { TimeClock aux; if((end.tv_nsec - start.tv_nsec) < 0) { aux.tv_sec = end.tv_sec - start.tv_sec - 1; aux.tv_nsec = 1E9 + end.tv_nsec - start.tv_nsec; } else { aux.tv_sec = end.tv_sec - start.tv_sec; aux.tv_nsec = end.tv_nsec - start.tv_nsec; } return aux; } int main(void) { TimeClock sTime, eTime, dTime; int i, j; for(i = 0; i < MEASUREMENTS; i++) { printf(" » MEASURE %02d\n", i+1); clock_gettime(CLOCK_REALTIME, &sTime); for(j = 0; j < OPERATIONS; j++) { myOperation(); } clock_gettime(CLOCK_REALTIME, &eTime); dTime = diffTimeClock(sTime, eTime); printf(" - NSEC (TOTAL): %ld\n", dTime.tv_nsec); printf(" - NSEC (OP): %ld\n\n", dTime.tv_nsec / OPERATIONS); } return 0; } Notes: The above diffTimeClock function is from this blog post. I replaced my real operation with myOperation() because it doesn't make any sense to post my real functions as I would have to post long blocks of code, you can easily code a myOperation() with whatever you like to compile the code if you wish. As you can see, OPERATIONS = 1 and the results are: » MEASURE 01 - NSEC (TOTAL): 27456580 - NSEC (OP): 27456580 For OPERATIONS = 100 the results are: » MEASURE 01 - NSEC (TOTAL): 218929736 - NSEC (OP): 2189297 For OPERATIONS = 1000 the results are: » MEASURE 01 - NSEC (TOTAL): 862834890 - NSEC (OP): 862834 For OPERATIONS = 10000 the results are: » MEASURE 01 - NSEC (TOTAL): 574133641 - NSEC (OP): 57413 Now, I'm not a math wiz, far from it actually, but this doesn't make any sense to me whatsoever. I've already talked about this with a friend that's on this project with me and he also can't understand the differences. I don't understand why the value is getting lower and lower when I increase OPERATIONS. The operation itself should take the same time (on average of course, not the exact same time), no matter how many times I execute it. You could tell me that that actually depends on the operation itself, the data being read and that some data could already be in the cache and bla bla, but I don't think that's the problem. In my case, myOperation is reading 5000 lines of text from an CSV file, separating the values by ; and inserting those values into a data structure. For each iteration, I'm destroying the data structure and initializing it again. Now that I think of it, I also that think that there's a problem measuring time with clock_gettime(), maybe I'm not using it right. I mean, look at the last example, where OPERATIONS = 10000. The total time it took was 574133641ns, which would be roughly 0,5s; that's impossible, it took a couple of minutes as I couldn't stand looking at the screen waiting and went to eat something.

Read the article

C++: Declaration of template class member specialization (+ Doxygen bonus question!)

- by Ziv

When I specialize a (static) member function/constant in a template class, I'm confused as to where the declaration is meant to go. Here's an example of what I what to do - yoinked directly from IBM's reference on template specialization: template<class T> class X { public: static T v; static void f(T); }; template<class T> T X<T>::v = 0; template<class T> void X<T>::f(T arg) { v = arg; } template<> char* X<char*>::v = "Hello"; template<> void X<float>::f(float arg) { v = arg * 2; } int main() { X<char*> a, b; X<float> c; c.f(10); // X<float>::v now set to 20 } The question is, how do I divide this into header/cpp files? The generic implementation is obviously in the header, but what about the specialization? It can't go in the header file, because it's concrete, leading to multiple definition. But if it goes into the .cpp file, is code which calls X::f() aware of the specialization, or might it rely on the generic X::f()? So far I've got the specialization in the .cpp only, with no declaration in the header. I'm not having trouble compiling or even running my code (on gcc, don't remember the version at the moment), and it behaves as expected - recognizing the specialization. But A) I'm not sure this is correct, and I'd like to know what is, and B) my Doxygen documentation comes out wonky and very misleading (more on that in a moment). What seems most natural to me would be something like this, declaring the specialization in the header and defining it in the .cpp: ===XClass.hpp=== #ifndef XCLASS_HPP #define XCLASS_HPP template<class T> class X { public: static T v; static void f(T); }; template<class T> T X<T>::v = 0; template<class T> void X<T>::f(T arg) { v = arg; } /* declaration of specialized functions */ template<> char* X<char*>::v; template<> void X<float>::f(float arg); #endif ===XClass.cpp=== #include <XClass.hpp> /* concrete implementation of specialized functions */ template<> char* X<char*>::v = "Hello"; template<> void X<float>::f(float arg) { v = arg * 2; } ...but I have no idea if this is correct. The most immediate consequence of this issue, as I mentioned, is my Doxygen documentation, which doesn't seem to warm to the idea of member specialization, at least the way I'm defining it at the moment. It will always present only the first definition it finds of a function/constant, and I really need to be able to present the specializations as well. If I go so far as to re-declare the entire class, i.e. in the header: /* template declaration */ template<class T> class X { public: static T v; static void f(T); }; /* template member definition */ template<class T> T X<T>::v = 0; template<class T> void X<T>::f(T arg) { v = arg; } /* declaration of specialized CLASS (with definitions in .cpp) */ template<> class X<float> { public: static float v; static void f(float); }; then it will display the different variations of X as different classes (which is fine by me), but I don't know how to get the same effect when specializing only a few select members of the class. I don't know if this is a mistake of mine, or a limitation of Doxygen - any ideas? Thanks much, Ziv

Read the article

For Loop help In a Hash Cracker Homework.

- by aaron burns

On the homework I am working on we are making a hash cracker. I am implementing it so as to have my cracker. java call worker.java. Worker.java implements Runnable. Worker is to take the start and end of a list of char, the hash it is to crack, and the max length of the password that made the hash. I know I want to do a loop in run() BUT I cannot think of how I would do it so it would go to the given max pasword length. I have posted the code I have so far. Any directions or areas I should look into.... I thought there was a way to do this with a certain way to write the loop but I don't know or can't find the correct syntax. Oh.. also. In main I divide up so x amount of threads can be chosen and I know that as of write now it only works for an even number of the 40 possible char given. package HashCracker; import java.util.*; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; public class Cracker { // Array of chars used to produce strings public static final char[] CHARS = "abcdefghijklmnopqrstuvwxyz0123456789.,-!".toCharArray(); public static final int numOfChar=40; /* Given a byte[] array, produces a hex String, such as "234a6f". with 2 chars for each byte in the array. (provided code) */ public static String hexToString(byte[] bytes) { StringBuffer buff = new StringBuffer(); for (int i=0; i<bytes.length; i++) { int val = bytes[i]; val = val & 0xff; // remove higher bits, sign if (val<16) buff.append('0'); // leading 0 buff.append(Integer.toString(val, 16)); } return buff.toString(); } /* Given a string of hex byte values such as "24a26f", creates a byte[] array of those values, one byte value -128..127 for each 2 chars. (provided code) */ public static byte[] hexToArray(String hex) { byte[] result = new byte[hex.length()/2]; for (int i=0; i<hex.length(); i+=2) { result[i/2] = (byte) Integer.parseInt(hex.substring(i, i+2), 16); } return result; } public static void main(String args[]) throws NoSuchAlgorithmException { if(args.length==1)//Hash Maker { //create a byte array , meassage digestand put password into it //and get out a hash value printed to the screen using provided methods. byte[] myByteArray=args[0].getBytes(); MessageDigest hasher=MessageDigest.getInstance("SHA-1"); hasher.update(myByteArray); byte[] digestedByte=hasher.digest(); String hashValue=Cracker.hexToString(digestedByte); System.out.println(hashValue); } else//Hash Cracker { ArrayList<Thread> myRunnables=new ArrayList<Thread>(); int numOfThreads = Integer.parseInt(args[2]); int charPerThread=Cracker.numOfChar/numOfThreads; int start=0; int end=charPerThread-1; for(int i=0; i<numOfThreads; i++) { //creates, stores and starts threads. Runnable tempWorker=new Worker(start, end, args[1], Integer.parseInt(args[1])); Thread temp=new Thread(tempWorker); myRunnables.add(temp); temp.start(); start=end+1; end=end+charPerThread; } } } import java.util.*; public class Worker implements Runnable{ private int charStart; private int charEnd; private String Hash2Crack; private int maxLength; public Worker(int start, int end, String hashValue, int maxPWlength) { charStart=start; charEnd=end; Hash2Crack=hashValue; maxLength=maxPWlength; } public void run() { byte[] myHash2Crack_=Cracker.hexToArray(Hash2Crack); for(int i=charStart; i<charEnd+1; i++) { Cracker.numOfChar[i]////// this is where I am stuck. } } }

Read the article

How to Load Oracle Tables From Hadoop Tutorial (Part 5 - Leveraging Parallelism in OSCH)

- by Bob Hanckel

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Using OSCH: Beyond Hello World In the previous post we discussed a “Hello World” example for OSCH focusing on the mechanics of getting a toy end-to-end example working. In this post we are going to talk about how to make it work for big data loads. We will explain how to optimize an OSCH external table for load, paying particular attention to Oracle’s DOP (degree of parallelism), the number of external table location files we use, and the number of HDFS files that make up the payload. We will provide some rules that serve as best practices when using OSCH. The assumption is that you have read the previous post and have some end to end OSCH external tables working and now you want to ramp up the size of the loads. Using OSCH External Tables for Access and Loading OSCH external tables are no different from any other Oracle external tables. They can be used to access HDFS content using Oracle SQL: SELECT * FROM my_hdfs_external_table; or use the same SQL access to load a table in Oracle. INSERT INTO my_oracle_table SELECT * FROM my_hdfs_external_table; To speed up the load time, you will want to control the degree of parallelism (i.e. DOP) and add two SQL hints. ALTER SESSION FORCE PARALLEL DML PARALLEL 8; ALTER SESSION FORCE PARALLEL QUERY PARALLEL 8; INSERT /*+ append pq_distribute(my_oracle_table, none) */ INTO my_oracle_table SELECT * FROM my_hdfs_external_table; There are various ways of either hinting at what level of DOP you want to use. The ALTER SESSION statements above force the issue assuming you (the user of the session) are allowed to assert the DOP (more on that in the next section). Alternatively you could embed additional parallel hints directly into the INSERT and SELECT clause respectively. /*+ parallel(my_oracle_table,8) *//*+ parallel(my_hdfs_external_table,8) */ Note that the "append" hint lets you load a target table by reserving space above a given "high watermark" in storage and uses Direct Path load. In other doesn't try to fill blocks that are already allocated and partially filled. It uses unallocated blocks. It is an optimized way of loading a table without incurring the typical resource overhead associated with run-of-the-mill inserts. The "pq_distribute" hint in this context unifies the INSERT and SELECT operators to make data flow during a load more efficient. Finally your target Oracle table should be defined with "NOLOGGING" and "PARALLEL" attributes. The combination of the "NOLOGGING" and use of the "append" hint disables REDO logging, and its overhead. The "PARALLEL" clause tells Oracle to try to use parallel execution when operating on the target table. Determine Your DOP It might feel natural to build your datasets in Hadoop, then afterwards figure out how to tune the OSCH external table definition, but you should start backwards. You should focus on Oracle database, specifically the DOP you want to use when loading (or accessing) HDFS content using external tables. The DOP in Oracle controls how many PQ slaves are launched in parallel when executing an external table. Typically the DOP is something you want to Oracle to control transparently, but for loading content from Hadoop with OSCH, it's something that you will want to control. Oracle computes the maximum DOP that can be used by an Oracle user. The maximum value that can be assigned is an integer value typically equal to the number of CPUs on your Oracle instances, times the number of cores per CPU, times the number of Oracle instances. For example, suppose you have a RAC environment with 2 Oracle instances. And suppose that each system has 2 CPUs with 32 cores. The maximum DOP would be 128 (i.e. 2*2*32). In point of fact if you are running on a production system, the maximum DOP you are allowed to use will be restricted by the Oracle DBA. This is because using a system maximum DOP can subsume all system resources on Oracle and starve anything else that is executing. Obviously on a production system where resources need to be shared 24x7, this can’t be allowed to happen. The use cases for being able to run OSCH with a maximum DOP are when you have exclusive access to all the resources on an Oracle system. This can be in situations when your are first seeding tables in a new Oracle database, or there is a time where normal activity in the production database can be safely taken off-line for a few hours to free up resources for a big incremental load. Using OSCH on high end machines (specifically Oracle Exadata and Oracle BDA cabled with Infiniband), this mode of operation can load up to 15TB per hour. The bottom line is that you should first figure out what DOP you will be allowed to run with by talking to the DBAs who manage the production system. You then use that number to derive the number of location files, and (optionally) the number of HDFS data files that you want to generate, assuming that is flexible. Rule 1: Find out the maximum DOP you will be allowed to use with OSCH on the target Oracle system Determining the Number of Location Files Let’s assume that the DBA told you that your maximum DOP was 8. You want the number of location files in your external table to be big enough to utilize all 8 PQ slaves, and you want them to represent equally balanced workloads. Remember location files in OSCH are metadata lists of HDFS files and are created using OSCH’s External Table tool. They also represent the workload size given to an individual Oracle PQ slave (i.e. a PQ slave is given one location file to process at a time, and only it will process the contents of the location file.) Rule 2: The size of the workload of a single location file (and the PQ slave that processes it) is the sum of the content size of the HDFS files it lists For example, if a location file lists 5 HDFS files which are each 100GB in size, the workload size for that location file is 500GB. The number of location files that you generate is something you control by providing a number as input to OSCH’s External Table tool. Rule 3: The number of location files chosen should be a small multiple of the DOP Each location file represents one workload for one PQ slave. So the goal is to keep all slaves busy and try to give them equivalent workloads. Obviously if you run with a DOP of 8 but have 5 location files, only five PQ slaves will have something to do and the other three will have nothing to do and will quietly exit. If you run with 9 location files, then the PQ slaves will pick up the first 8 location files, and assuming they have equal work loads, will finish up about the same time. But the first PQ slave to finish its job will then be rescheduled to process the ninth location file, potentially doubling the end to end processing time. So for this DOP using 8, 16, or 32 location files would be a good idea. Determining the Number of HDFS Files Let’s start with the next rule and then explain it: Rule 4: The number of HDFS files should try to be a multiple of the number of location files and try to be relatively the same size In our running example, the DOP is 8. This means that the number of location files should be a small multiple of 8. Remember that each location file represents a list of unique HDFS files to load, and that the sum of the files listed in each location file is a workload for one Oracle PQ slave. The OSCH External Table tool will look in an HDFS directory for a set of HDFS files to load. It will generate N number of location files (where N is the value you gave to the tool). It will then try to divvy up the HDFS files and do its best to make sure the workload across location files is as balanced as possible. (The tool uses a greedy algorithm that grabs the biggest HDFS file and delegates it to a particular location file. It then looks for the next biggest file and puts in some other location file, and so on). The tools ability to balance is reduced if HDFS file sizes are grossly out of balance or are too few. For example suppose my DOP is 8 and the number of location files is 8. Suppose I have only 8 HDFS files, where one file is 900GB and the others are 100GB. When the tool tries to balance the load it will be forced to put the singleton 900GB into one location file, and put each of the 100GB files in the 7 remaining location files. The load balance skew is 9 to 1. One PQ slave will be working overtime, while the slacker PQ slaves are off enjoying happy hour. If however the total payload (1600 GB) were broken up into smaller HDFS files, the OSCH External Table tool would have an easier time generating a list where each workload for each location file is relatively the same. Applying Rule 4 above to our DOP of 8, we could divide the workload into160 files that were approximately 10 GB in size. For this scenario the OSCH External Table tool would populate each location file with 20 HDFS file references, and all location files would have similar workloads (approximately 200GB per location file.) As a rule, when the OSCH External Table tool has to deal with more and smaller files it will be able to create more balanced loads. How small should HDFS files get? Not so small that the HDFS open and close file overhead starts having a substantial impact. For our performance test system (Exadata/BDA with Infiniband), I compared three OSCH loads of 1 TiB. One load had 128 HDFS files living in 64 location files where each HDFS file was about 8GB. I then did the same load with 12800 files where each HDFS file was about 80MB size. The end to end load time was virtually the same. However when I got ridiculously small (i.e. 128000 files at about 8MB per file), it started to make an impact and slow down the load time. What happens if you break rules 3 or 4 above? Nothing draconian, everything will still function. You just won’t be taking full advantage of the generous DOP that was allocated to you by your friendly DBA. The key point of the rules articulated above is this: if you know that HDFS content is ultimately going to be loaded into Oracle using OSCH, it makes sense to chop them up into the right number of files roughly the same size, derived from the DOP that you expect to use for loading. Next Steps So far we have talked about OLH and OSCH as alternative models for loading. That’s not quite the whole story. They can be used together in a way that provides for more efficient OSCH loads and allows one to be more flexible about scheduling on a Hadoop cluster and an Oracle Database to perform load operations. The next lesson will talk about Oracle Data Pump files generated by OLH, and loaded using OSCH. It will also outline the pros and cons of using various load methods. This will be followed up with a final tutorial lesson focusing on how to optimize OLH and OSCH for use on Oracle's engineered systems: specifically Exadata and the BDA. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

Read the article

I asked this yesterday, after the input given I'm still having trouble implementing..

- by Josh

I'm not sure how to fix this or what I did wrong, but whenever I enter in a value it just closes out the run prompt. So, seems I do have a problem somewhere in my coding. Whenever I run the program and input a variable, it always returns the same answer.."The content at location 76 is 0." On that note, someone told me that "I don't know, but I suspect that Program A incorrectly has a fixed address being branched to on instructions 10 and 11." - mctylr but I'm not sure how to fix that.. I'm trying to figure out how to incorporate this idea from R Samuel Klatchko.. I'm still not sure what I'm missing but I can't get it to work.. const int OP_LOAD = 3; const int OP_STORE = 4; const int OP_ADD = 5; ... const int OP_LOCATION_MULTIPLIER = 100; mem[0] = OP_LOAD * OP_LOCATION_MULTIPLIER + ...; mem[1] = OP_ADD * OP_LOCATION_MULTIPLIER + ...; operand = memory[ j ] % OP_LOCATION_MULTIPLIER; operation = memory[ j ] / OP_LOCATION_MULTIPLIER; I'm new to programming, I'm not the best, so I'm going for simplicity. Also this is an SML program. Anyway, this IS a homework assignment and I'm wanting a good grade on this. So I was looking for input and making sure this program will do what I'm hoping they are looking for. Anyway, here are the instructions: Write SML (Simpletron Machine language) programs to accomplish each of the following task: A) Use a sentinel-controlled loop to read positive number s and compute and print their sum. Terminate input when a neg number is entered. B) Use a counter-controlled loop to read seven numbers, some positive and some negative, and compute + print the avg. C) Read a series of numbers, and determine and print the largest number. The first number read indicates how many numbers should be processed. Without further a due, here is my program. All together. int main() { const int READ = 10; const int WRITE = 11; const int LOAD = 20; const int STORE = 21; const int ADD = 30; const int SUBTRACT = 31; const int DIVIDE = 32; const int MULTIPLY = 33; const int BRANCH = 40; const int BRANCHNEG = 41; const int BRANCHZERO = 41; const int HALT = 43; int mem[100] = {0}; //Making it 100, since simpletron contains a 100 word mem. int operation; //taking the rest of these variables straight out of the book seeing as how they were italisized. int operand; int accum = 0; // the special register is starting at 0 int j; // This is for part a, it will take in positive variables in a sent-controlled loop and compute + print their sum. Variables from example in text. memory [0] = 1010; memory [01] = 2009; memory [02] = 3008; memory [03] = 2109; memory [04] = 1109; memory [05] = 4300; memory [06] = 1009; j = 0; //Makes the variable j start at 0. while ( true ) { operand = memory[ j ]%100; // Finds the op codes from the limit on the memory (100) operation = memory[ j ]/100; //using a switch loop to set up the loops for the cases switch ( operation ){ case 10: //reads a variable into a word from loc. Enter in -1 to exit cout <<"\n Input a positive variable: "; cin >> memory[ operand ]; break; case 11: // takes a word from location cout << "\n\nThe content at location " << operand << "is " << memory[operand]; break; case 20:// loads accum = memory[ operand ]; break; case 21: //stores memory[ operand ] = accum; break; case 30: //adds accum += mem[operand]; break; case 31: // subtracts accum-= memory[ operand ]; break; case 32: //divides accum /=(memory[ operand ]); break; case 33: // multiplies accum*= memory [ operand ]; break; case 40: // Branches to location j = -1; break; case 41: //branches if acc. is < 0 if (accum < 0) j = 5; break; case 42: //branches if acc = 0 if (accum == 0) j = 5; break; case 43: // Program ends exit(0); break; } j++; } return 0; }

Read the article

Can anyone help me with this VHDL code (currently malfunctioning)?

- by xx77aBs

This code should be (and is) very simple, and I don't know what I am doing wrong. Here is description of what it should do: It should display a number on one 7-segment display. That number should be increased by one every time someone presses the push button. There is also reset button which sets the number to 0. That's it. Here is VHDL code: library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity PWM is Port ( cp_in : in STD_LOGIC; inc : in STD_LOGIC; rst: in std_logic; AN : out STD_LOGIC_VECTOR (3 downto 0); segments : out STD_LOGIC_VECTOR (6 downto 0)); end PWM; architecture Behavioral of PWM is signal cp: std_logic; signal CurrentPWMState: integer range 0 to 10; signal inco: std_logic; signal temp: std_logic_vector (3 downto 0); begin --cp = 100 Hz counter: entity djelitelj generic map (CountTo => 250000) port map (cp_in, cp); debounce: entity debounce port map (inc, cp, inco); temp <= conv_std_logic_vector(CurrentPWMState, 4); ss: entity decoder7seg port map (temp, segments); process (inco, rst) begin if inco = '1' then CurrentPWMState <= CurrentPWMState + 1; elsif rst='1' then CurrentPWMState <= 0; end if; end process; AN <= "1110"; end Behavioral; Entity djelitelj (the counter used to divide 50MHz clock): library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity PWM is Port ( cp_in : in STD_LOGIC; inc : in STD_LOGIC; rst: in std_logic; AN : out STD_LOGIC_VECTOR (3 downto 0); segments : out STD_LOGIC_VECTOR (6 downto 0)); end PWM; architecture Behavioral of PWM is signal cp: std_logic; signal CurrentPWMState: integer range 0 to 10; signal inco: std_logic; signal temp: std_logic_vector (3 downto 0); begin --cp = 100 Hz counter: entity djelitelj generic map (CountTo => 250000) port map (cp_in, cp); debounce: entity debounce port map (inc, cp, inco); temp <= conv_std_logic_vector(CurrentPWMState, 4); ss: entity decoder7seg port map (temp, segments); process (inco, rst) begin if inco = '1' then CurrentPWMState <= CurrentPWMState + 1; elsif rst='1' then CurrentPWMState <= 0; end if; end process; AN <= "1110"; end Behavioral; Debouncing entity: library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_ARITH.all; use IEEE.STD_LOGIC_UNSIGNED.all; ENTITY debounce IS PORT(pb, clock_100Hz : IN STD_LOGIC; pb_debounced : OUT STD_LOGIC); END debounce; ARCHITECTURE a OF debounce IS SIGNAL SHIFT_PB : STD_LOGIC_VECTOR(3 DOWNTO 0); BEGIN -- Debounce Button: Filters out mechanical switch bounce for around 40Ms. -- Debounce clock should be approximately 10ms process begin wait until (clock_100Hz'EVENT) AND (clock_100Hz = '1'); SHIFT_PB(2 Downto 0) <= SHIFT_PB(3 Downto 1); SHIFT_PB(3) <= NOT PB; If SHIFT_PB(3 Downto 0)="0000" THEN PB_DEBOUNCED <= '1'; ELSE PB_DEBOUNCED <= '0'; End if; end process; end a; And here is BCD to 7-segment decoder: library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity decoder7seg is port ( bcd: in std_logic_vector (3 downto 0); segm: out std_logic_vector (6 downto 0)); end decoder7seg; architecture Behavioral of decoder7seg is begin with bcd select segm<= "0000001" when "0000", -- 0 "1001111" when "0001", -- 1 "0010010" when "0010", -- 2 "0000110" when "0011", -- 3 "1001100" when "0100", -- 4 "0100100" when "0101", -- 5 "0100000" when "0110", -- 6 "0001111" when "0111", -- 7 "0000000" when "1000", -- 8 "0000100" when "1001", -- 9 "1111110" when others; -- just - character end Behavioral; Does anyone see where I made my mistake(s) ? I've tried that design on Spartan-3 Started board and it isn't working ... Every time I press the push button, I get crazy (random) values. The reset button is working properly. Thanks !!!!

Read the article

Simple prime number program - Weird issue with threads C#

- by Para

Hi! This is my code: using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading; namespace FirePrime { class Program { static bool[] ThreadsFinished; static bool[] nums; static bool AllThreadsFinished() { bool allThreadsFinished = false; foreach (var threadFinished in ThreadsFinished) { allThreadsFinished &= threadFinished; } return allThreadsFinished; } static bool isPrime(int n) { if (n < 2) { return false; } if (n == 2) { return true; } if (n % 2 == 0) { return false; } int d = 3; while (d * d <= n) { if (n % d == 0) { return false; } d += 2; } return true; } static void MarkPrimes(int startNumber,int stopNumber,int ThreadNr) { for (int j = startNumber; j < stopNumber; j++) nums[j] = isPrime(j); lock (typeof(Program)) { ThreadsFinished[ThreadNr] = true; } } static void Main(string[] args) { int nrNums = 100; int nrThreads = 10; //var threadStartNums = new List<int>(); ThreadsFinished = new bool[nrThreads]; nums = new bool[nrNums]; //var nums = new List<bool>(); nums[0] = false; nums[1] = false; for(int i=2;i<nrNums;i++) nums[i] = true; int interval = (int)(nrNums / nrThreads); //threadStartNums.Add(2); //int aux = firstStartNum; //int i = 2; //while (aux < interval) //{ // aux = interval*i; // i=i+1; // threadStartNums.Add(aux); //} int startNum = 0; for (int i = 0; i < nrThreads; i++) { var _thread = new System.Threading.Thread(() => MarkPrimes(startNum, Math.Min(startNum + interval, nrNums), i)); startNum = startNum + interval; //set the thread to run in the background _thread.IsBackground = true; //start our thread _thread.Start(); } while (!AllThreadsFinished()) { Thread.Sleep(1); } for (int i = 0; i < nrNums; i++) if(nums[i]) Console.WriteLine(i); } } } This should be a pretty simple program that is supposed to find and output the first nrNums prime numbers using nrThreads threads working in parallel. So, I just split nrNums into nrThreads equal chunks (well, the last one won't be equal; if nrThreads doesn't divide by nrNums, it will also contain the remainder, of course). I start nrThreads threads. They all test each number in their respective chunk and see if it is prime or not; they mark everything out in a bool array that keeps a tab on all the primes. The threads all turn a specific element in another boolean array ThreadsFinished to true when they finish. Now the weird part begins: The threads never all end. If I debug, I find that ThreadNr is not what I assign to it in the loop but another value. I guess this is normal since the threads execute afterwards and the counter (the variable i) is already increased by then but I cannot understand how to make the code be right. Can anyone help? Thank you in advance. P.S.: I know the algorithm is not very efficient; I am aiming at a solution using the sieve of Eratosthenes also with x given threads. But for now I can't even get this one to work and I haven't found any examples of any implementations of that algorithm anywhere in a language that I can understand.

Read the article

float addition 2.5 + 2.5 = 4.0? RPN

- by AJ Clou

The code below is my subprogram to do reverse polish notation calculations... basically +, -, *, and /. Everything works in the program except when I try to add 2.5 and 2.5 the program gives me 4.0... I think I have an idea why, but I'm not sure how to fix it... Right now I am reading all the numbers and operators in from command line as required by this assignment, then taking that string and using sscanf to get the numbers out of it... I am thinking that somehow the array that contains the three characters '2', '.', and '5', is not being totally converted to a float... instead i think just the '2' is. Could someone please take a look at my code and either confirm or deny this, and possibly tell me how to fix it so that i get the proper answer? Thank you in advance for any help! float fsm (char mystring[]) { int i = -1, j, k = 0, state = 0; float num1, num2, ans; char temp[10]; c_stack top; c_init_stack (&top); while (1) { switch (state) { case 0: i++; if ((mystring[i]) == ' ') { state = 0; } else if ((isdigit (mystring[i])) || (mystring[i] == '.')) { state = 1; } else if ((mystring[i]) == '\0') { state = 3; } else { state = 4; } break; case 1: temp[k] = mystring[i]; k++; i++; if ((isdigit (mystring[i])) || (mystring[i] == '.')) { state = 1; } else { state = 2; } break; case 2: temp[k] = '\0'; sscanf (temp, "%f", &num1); c_push (&top, num1); i--; k = 0; state = 0; break; case 3: ans = c_pop (&top); if (c_is_empty (top)) return ans; else { printf ("There are still items on the stack\n"); exit (0); case 4: num2 = c_pop (&top); num1 = c_pop (&top); if (mystring[i] == '+'){ ans = num1 + num2; return ans; } else if (mystring[i] == '-'){ ans = num1 - num2; return ans; } else if (mystring[i] == '*'){ ans = num1 * num2; return ans; } else if (mystring[i] == '/'){ if (num2){ ans = num1 / num2; return ans; } else{ printf ("Error: cannot divide by 0\n"); exit (0); } } c_push (&top, ans); state = 0; break; } } } } Here is my main program: #include <stdio.h> #include <stdlib.h> #include "boolean.h" #include "c_stack.h" #include <string.h> int main(int argc, char *argv[]) { char mystring[100]; int i; sscanf("", "%s", mystring); for (i=1; i<argc; i++){ strcat(mystring, argv[i]); strcat(mystring, " "); } printf("%.2f\n", fsm(mystring)); } and here is the header file with prototypes and the definition for c_stack: #include "boolean.h" #ifndef CSTACK_H #define CSTACK_H typedef struct c_stacknode{ char data; struct c_stacknode *next; } *c_stack; #endif void c_init_stack(c_stack *); boolean c_is_full(void); boolean c_is_empty(c_stack); void c_push(c_stack *,char); char c_pop(c_stack *); void print_c_stack(c_stack); boolean is_open(char); boolean is_brother(char, char); float fsm(char[]);

Read the article

Guidance: A Branching strategy for Scrum Teams

- by Martin Hinshelwood

Having a good branching strategy will save your bacon, or at least your code. Be careful when deviating from your branching strategy because if you do, you may be worse off than when you started! This is one possible branching strategy for Scrum teams and I will not be going in depth with Scrum but you can find out more about Scrum by reading the Scrum Guide and you can even assess your Scrum knowledge by having a go at the Scrum Open Assessment. You can also read SSW’s Rules to Better Scrum using TFS which have been developed during our own Scrum implementations. Acknowledgements Bill Heys – Bill offered some good feedback on this post and helped soften the language. Note: Bill is a VS ALM Ranger and co-wrote the Branching Guidance for TFS 2010 Willy-Peter Schaub – Willy-Peter is an ex Visual Studio ALM MVP turned blue badge and has been involved in most of the guidance including the Branching Guidance for TFS 2010 Chris Birmele – Chris wrote some of the early TFS Branching and Merging Guidance. Dr Paul Neumeyer, Ph.D Parallel Processes, ScrumMaster and SSW Solution Architect – Paul wanted to have feature branches coming from the release branch as well. We agreed that this is really a spin-off that needs own project, backlog, budget and Team. Scenario: A product is developed RTM 1.0 is released and gets great sales. Extra features are demanded but the new version will have double to price to pay to recover costs, work is approved by the guys with budget and a few sprints later RTM 2.0 is released. Sales a very low due to the pricing strategy. There are lots of clients on RTM 1.0 calling out for patches. As I keep getting Reverse Integration and Forward Integration mixed up and Bill keeps slapping my wrists I thought I should have a reminder: You still seemed to use reverse and/or forward integration in the wrong context. I would recommend reviewing your document at the end to ensure that it agrees with the common understanding of these terms merge (forward integration) from parent to child (same direction as the branch), and merge (reverse integration) from child to parent (the reverse direction of the branch). - one of my many slaps on the wrist from Bill Heys. As I mentioned previously we are using a single feature branching strategy in our current project. The single biggest mistake developers make is developing against the “Main” or “Trunk” line. This ultimately leads to messy code as things are added and never finished. Your only alternative is to NEVER check in unless your code is 100%, but this does not work in practice, even with a single developer. Your ADD will kick in and your half-finished code will be finished enough to pass the build and the tests. You do use builds don’t you? Sadly, this is a very common scenario and I have had people argue that branching merely adds complexity. Then again I have seen the other side of the universe ... branching structures from he... We should somehow convince everyone that there is a happy between no-branching and too-much-branching. - Willy-Peter Schaub, VS ALM Ranger, Microsoft A key benefit of branching for development is to isolate changes from the stable Main branch. Branching adds sanity more than it adds complexity. We do try to stress in our guidance that it is important to justify a branch, by doing a cost benefit analysis. The primary cost is the effort to do merges and resolve conflicts. A key benefit is that you have a stable code base in Main and accept changes into Main only after they pass quality gates, etc. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft The second biggest mistake developers make is branching anything other than the WHOLE “Main” line. If you branch parts of your code and not others it gets out of sync and can make integration a nightmare. You should have your Source, Assets, Build scripts deployment scripts and dependencies inside the “Main” folder and branch the whole thing. Some departments within MSFT even go as far as to add the environments used to develop the product in there as well; although I would not recommend that unless you have a massive SQL cluster to house your source code. We tried the “add environment” back in South-Africa and while it was “phenomenal”, especially when having to switch between environments, the disk storage and processing requirements killed us. We opted for virtualization to skin this cat of keeping a ready-to-go environment handy. - Willy-Peter Schaub, VS ALM Ranger, Microsoft I think people often think that you should have separate branches for separate environments (e.g. Dev, Test, Integration Test, QA, etc.). I prefer to think of deploying to environments (such as from Main to QA) rather than branching for QA). - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft You can read about SSW’s Rules to better Source Control for some additional information on what Source Control to use and how to use it. There are also a number of branching Anti-Patterns that should be avoided at all costs: You know you are on the wrong track if you experience one or more of the following symptoms in your development environment: Merge Paranoia—avoiding merging at all cost, usually because of a fear of the consequences. Merge Mania—spending too much time merging software assets instead of developing them. Big Bang Merge—deferring branch merging to the end of the development effort and attempting to merge all branches simultaneously. Never-Ending Merge—continuous merging activity because there is always more to merge. Wrong-Way Merge—merging a software asset version with an earlier version. Branch Mania—creating many branches for no apparent reason. Cascading Branches—branching but never merging back to the main line. Mysterious Branches—branching for no apparent reason. Temporary Branches—branching for changing reasons, so the branch becomes a permanent temporary workspace. Volatile Branches—branching with unstable software assets shared by other branches or merged into another branch. Note Branches are volatile most of the time while they exist as independent branches. That is the point of having them. The difference is that you should not share or merge branches while they are in an unstable state. Development Freeze—stopping all development activities while branching, merging, and building new base lines. Berlin Wall—using branches to divide the development team members, instead of dividing the work they are performing. -Branching and Merging Primer by Chris Birmele - Developer Tools Technical Specialist at Microsoft Pty Ltd in Australia In fact, this can result in a merge exercise no-one wants to be involved in, merging hundreds of thousands of change sets and trying to get a consolidated build. Again, we need to find a happy medium. - Willy-Peter Schaub on Merge Paranoia Merge conflicts are generally the result of making changes to the same file in both the target and source branch. If you create merge conflicts, you will eventually need to resolve them. Often the resolution is manual. Merging more frequently allows you to resolve these conflicts close to when they happen, making the resolution clearer. Waiting weeks or months to resolve them, the Big Bang approach, means you are more likely to resolve conflicts incorrectly. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft Figure: Main line, this is where your stable code lives and where any build has known entities, always passes and has a happy test that passes as well? Many development projects consist of, a single “Main” line of source and artifacts. This is good; at least there is source control . There are however a couple of issues that need to be considered. What happens if: you and your team are working on a new set of features and the customer wants a change to his current version? you are working on two features and the customer decides to abandon one of them? you have two teams working on different feature sets and their changes start interfering with each other? I just use labels instead of branches? That's a lot of “what if’s”, but there is a simple way of preventing this. Branching… In TFS, labels are not immutable. This does not mean they are not useful. But labels do not provide a very good development isolation mechanism. Branching allows separate code sets to evolve separately (e.g. Current with hotfixes, and vNext with new development). I don’t see how labels work here. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft Figure: Creating a single feature branch means you can isolate the development work on that branch. Its standard practice for large projects with lots of developers to use Feature branching and you can check the Branching Guidance for the latest recommendations from the Visual Studio ALM Rangers for other methods. In the diagram above you can see my recommendation for branching when using Scrum development with TFS 2010. It consists of a single Sprint branch to contain all the changes for the current sprint. The main branch has the permissions changes so contributors to the project can only Branch and Merge with “Main”. This will prevent accidental check-ins or checkouts of the “Main” line that would contaminate the code. The developers continue to develop on sprint one until the completion of the sprint. Note: In the real world, starting a new Greenfield project, this process starts at Sprint 2 as at the start of Sprint 1 you would have artifacts in version control and no need for isolation. Figure: Once the sprint is complete the Sprint 1 code can then be merged back into the Main line. There are always good practices to follow, and one is to always do a Forward Integration from Main into Sprint 1 before you do a Reverse Integration from Sprint 1 back into Main. In this case it may seem superfluous, but this builds good muscle memory into your developer’s work ethic and means that no bad habits are learned that would interfere with additional Scrum Teams being added to the Product. The process of completing your sprint development: The Team completes their work according to their definition of done. Merge from “Main” into “Sprint1” (Forward Integration) Stabilize your code with any changes coming from other Scrum Teams working on the same product. If you have one Scrum Team this should be quick, but there may have been bug fixes in the Release branches. (we will talk about release branches later) Merge from “Sprint1” into “Main” to commit your changes. (Reverse Integration) Check-in Delete the Sprint1 branch Note: The Sprint 1 branch is no longer required as its useful life has been concluded. Check-in Done But you are not yet done with the Sprint. The goal in Scrum is to have a “potentially shippable product” at the end of every Sprint, and we do not have that yet, we only have finished code. Figure: With Sprint 1 merged you can create a Release branch and run your final packaging and testing In 99% of all projects I have been involved in or watched, a “shippable product” only happens towards the end of the overall lifecycle, especially when sprints are short. The in-between releases are great demonstration releases, but not shippable. Perhaps it comes from my 80’s brain washing that we only ship when we reach the agreed quality and business feature bar. - Willy-Peter Schaub, VS ALM Ranger, Microsoft Although you should have been testing and packaging your code all the way through your Sprint 1 development, preferably using an automated process, you still need to test and package with stable unchanging code. This is where you do what at SSW we call a “Test Please”. This is first an internal test of the product to make sure it meets the needs of the customer and you generally use a resource external to your Team. Then a “Test Please” is conducted with the Product Owner to make sure he is happy with the output. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client? Figure: If you find a deviation from the expected result you fix it on the Release branch. If during your final testing or your “Test Please” you find there are issues or bugs then you should fix them on the release branch. If you can’t fix them within the time box of your Sprint, then you will need to create a Bug and put it onto the backlog for prioritization by the Product owner. Make sure you leave plenty of time between your merge from the development branch to find and fix any problems that are uncovered. This process is commonly called Stabilization and should always be conducted once you have completed all of your User Stories and integrated all of your branches. Even once you have stabilized and released, you should not delete the release branch as you would with the Sprint branch. It has a usefulness for servicing that may extend well beyond the limited life you expect of it. Note: Don't get forced by the business into adding features into a Release branch instead that indicates the unspoken requirement is that they are asking for a product spin-off. In this case you can create a new Team Project and branch from the required Release branch to create a new Main branch for that product. And you create a whole new backlog to work from. Figure: When the Team decides it is happy with the product you can create a RTM branch. Once you have fixed all the bugs you can, and added any you can’t to the Product Backlog, and you Team is happy with the result you can create a Release. This would consist of doing the final Build and Packaging it up ready for your Sprint Review meeting. You would then create a read-only branch that represents the code you “shipped”. This is really an Audit trail branch that is optional, but is good practice. You could use a Label, but Labels are not Auditable and if a dispute was raised by the customer you can produce a verifiable version of the source code for an independent party to check. Rare I know, but you do not want to be at the wrong end of a legal battle. Like the Release branch the RTM branch should never be deleted, or only deleted according to your companies legal policy, which in the UK is usually 7 years. Figure: If you have made any changes in the Release you will need to merge back up to Main in order to finalise the changes. Nothing is really ever done until it is in Main. The same rules apply when merging any fixes in the Release branch back into Main and you should do a reverse merge before a forward merge, again for the muscle memory more than necessity at this stage. Your Sprint is now nearly complete, and you can have a Sprint Review meeting knowing that you have made every effort and taken every precaution to protect your customer’s investment. Note: In order to really achieve protection for both you and your client you would add Automated Builds, Automated Tests, Automated Acceptance tests, Acceptance test tracking, Unit Tests, Load tests, Web test and all the other good engineering practices that help produce reliable software. Figure: After the Sprint Planning meeting the process begins again. Where the Sprint Review and Retrospective meetings mark the end of the Sprint, the Sprint Planning meeting marks the beginning. After you have completed your Sprint Planning and you know what you are trying to achieve in Sprint 2 you can create your new Branch to develop in. How do we handle a bug(s) in production that can’t wait? Although in Scrum the only work done should be on the backlog there should be a little buffer added to the Sprint Planning for contingencies. One of these contingencies is a bug in the current release that can’t wait for the Sprint to finish. But how do you handle that? Willy-Peter Schaub asked an excellent question on the release activities: In reality Sprint 2 starts when sprint 1 ends + weekend. Should we not cater for a possible parallelism between Sprint 2 and the release activities of sprint 1? It would introduce FI’s from main to sprint 2, I guess. Your “Figure: Merging print 2 back into Main.” covers, what I tend to believe to be reality in most cases. - Willy-Peter Schaub, VS ALM Ranger, Microsoft I agree, and if you have a single Scrum team then your resources are limited. The Scrum Team is responsible for packaging and release, so at least one run at stabilization, package and release should be included in the Sprint time box. If more are needed on the current production release during the Sprint 2 time box then resource needs to be pulled from Sprint 2. The Product Owner and the Team have four choices (in order of disruption/cost): Backlog: Add the bug to the backlog and fix it in the next Sprint Buffer Time: Use any buffer time included in the current Sprint to fix the bug quickly Make time: Remove a Story from the current Sprint that is of equal value to the time lost fixing the bug(s) and releasing. Note: The Team must agree that it can still meet the Sprint Goal. Cancel Sprint: Cancel the sprint and concentrate all resource on fixing the bug(s) Note: This can be a very costly if the current sprint has already had a lot of work completed as it will be lost. The choice will depend on the complexity and severity of the bug(s) and both the Product Owner and the Team need to agree. In this case we will go with option #2 or #3 as they are uncomplicated but severe bugs. Figure: Real world issue where a bug needs fixed in the current release. If the bug(s) is urgent enough then then your only option is to fix it in place. You can edit the release branch to find and fix the bug, hopefully creating a test so it can’t happen again. Follow the prior process and conduct an internal and customer “Test Please” before releasing. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client? Figure: After you have fixed the bug you need to ship again. You then need to again create an RTM branch to hold the version of the code you released in escrow. Figure: Main is now out of sync with your Release. We now need to get these new changes back up into the Main branch. Do a reverse and then forward merge again to get the new code into Main. But what about the branch, are developers not working on Sprint 2? Does Sprint 2 now have changes that are not in Main and Main now have changes that are not in Sprint 2? Well, yes… and this is part of the hit you take doing branching. But would this scenario even have been possible without branching? Figure: Getting the changes in Main into Sprint 2 is very important. The Team now needs to do a Forward Integration merge into their Sprint and resolve any conflicts that occur. Maybe the bug has already been fixed in Sprint 2, maybe the bug no longer exists! This needs to be identified and resolved by the developers before they continue to get further out of Sync with Main. Note: Avoid the “Big bang merge” at all costs. Figure: Merging Sprint 2 back into Main, the Forward Integration, and R0 terminates. Sprint 2 now merges (Reverse Integration) back into Main following the procedures we have already established. Figure: The logical conclusion. This then allows the creation of the next release. By now you should be getting the big picture and hopefully you learned something useful from this post. I know I have enjoyed writing it as I find these exploratory posts coupled with real world experience really help harden my understanding. Branching is a tool; it is not a silver bullet. Don’t over use it, and avoid “Anti-Patterns” where possible. Although the diagram above looks complicated I hope showing you how it is formed simplifies it as much as possible. Technorati Tags: Branching,Scrum,VS ALM,TFS 2010,VS2010

Read the article

ANTS CLR and Memory Profiler In Depth Review (Part 1 of 2 – CLR Profiler)

- by ToStringTheory

One of the things that people might not know about me, is my obsession to make my code as efficient as possible. Many people might not realize how much of a task or undertaking that this might be, but it is surely a task as monumental as climbing Mount Everest, except this time it is a challenge for the mind… In trying to make code efficient, there are many different factors that play a part – size of project or solution, tiers, language used, experience and training of the programmer, technologies used, maintainability of the code – the list can go on for quite some time. I spend quite a bit of time when developing trying to determine what is the best way to implement a feature to accomplish the efficiency that I look to achieve. One program that I have recently come to learn about – Red Gate ANTS Performance (CLR) and Memory profiler gives me tools to accomplish that job more efficiently as well. In this review, I am going to cover some of the features of the ANTS profiler set by compiling some hideous example code to test against. Notice As a member of the Geeks With Blogs Influencers program, one of the perks is the ability to review products, in exchange for a free license to the program. I have not let this affect my opinions of the product in any way, and Red Gate nor Geeks With Blogs has tried to influence my opinion regarding this product in any way. Introduction The ANTS Profiler pack provided by Red Gate was something that I had not heard of before receiving an email regarding an offer to review it for a license. Since I look to make my code efficient, it was a no brainer for me to try it out! One thing that I have to say took me by surprise is that upon downloading the program and installing it you fill out a form for your usual contact information. Sure enough within 2 hours, I received an email from a sales representative at Red Gate asking if she could help me to achieve the most out of my trial time so it wouldn’t go to waste. After replying to her and explaining that I was looking to review its feature set, she put me in contact with someone that setup a demo session to give me a quick rundown of its features via an online meeting. After having dealt with a massive ordeal with one of my utility companies and their complete lack of customer service, Red Gates friendly and helpful representatives were a breath of fresh air, and something I was thankful for. ANTS CLR Profiler The ANTS CLR profiler is the thing I want to focus on the most in this post, so I am going to dive right in now. Install was simple and took no time at all. It installed both the profiler for the CLR and Memory, but also visual studio extensions to facilitate the usage of the profilers (click any images for full size images): The Visual Studio menu options (under ANTS menu) Starting the CLR Performance Profiler from the start menu yields this window If you follow the instructions after launching the program from the start menu (Click File > New Profiling Session to start a new project), you are given a dialog with plenty of options for profiling: The New Session dialog. Lots of options. One thing I noticed is that the buttons in the lower right were half-covered by the panel of the application. If I had to guess, I would imagine that this is caused by my DPI settings being set to 125%. This is a problem I have seen in other applications as well that don’t scale well to different dpi scales. The profiler options give you the ability to profile: .NET Executable ASP.NET web application (hosted in IIS) ASP.NET web application (hosted in IIS express) ASP.NET web application (hosted in Cassini Web Development Server) SharePoint web application (hosted in IIS) Silverlight 4+ application Windows Service COM+ server XBAP (local XAML browser application) Attach to an already running .NET 4 process Choosing each option provides a varying set of other variables/options that one can set including options such as application arguments, operating path, record I/O performance performance counters to record (43 counters in all!), etc… All in all, they give you the ability to profile many different .Net project types, and make it simple to do so. In most cases of my using this application, I would be using the built in Visual Studio extensions, as they automatically start a new profiling project in ANTS with the options setup, and start your program, however RedGate has made it easy enough to profile outside of Visual Studio as well. On the flip side of this, as someone who lives most of their work life in Visual Studio, one thing I do wish is that instead of opening an entirely separate application/gui to perform profiling after launching, that instead they would provide a Visual Studio panel with the information, and integrate more of the profiling project information into Visual Studio. So, now that we have an idea of what options that the profiler gives us, its time to test its abilities and features. Horrendous Example Code – Prime Number Generator One of my interests besides development, is Physics and Math – what I went to college for. I have especially always been interested in prime numbers, as they are something of a mystery… So, I decided that I would go ahead and to test the abilities of the profiler, I would write a small program, website, and library to generate prime numbers in the quantity that you ask for. I am going to start off with some terrible code, and show how I would see the profiler being used as a development tool. First off, the IPrimes interface (all code is downloadable at the end of the post): interface IPrimes { IEnumerable<int> GetPrimes(int retrieve); } Simple enough, right? Anything that implements the interface will (hopefully) provide an IEnumerable of int, with the quantity specified in the parameter argument. Next, I am going to implement this interface in the most basic way: public class DumbPrimes : IPrimes { public IEnumerable<int> GetPrimes(int retrieve) { //store a list of primes already found var _foundPrimes = new List<int>() { 2, 3 }; //if i ask for 1 or two primes, return what asked for if (retrieve <= _foundPrimes.Count()) return _foundPrimes.Take(retrieve); //the next number to look at int _analyzing = 4; //since I already determined I don't have enough //execute at least once, and until quantity is sufficed do { //assume prime until otherwise determined bool isPrime = true; //start dividing at 2 //divide until number is reached, or determined not prime for (int i = 2; i < _analyzing && isPrime; i++) { //if (i) goes into _analyzing without a remainder, //_analyzing is NOT prime if (_analyzing % i == 0) isPrime = false; } //if it is prime, add to found list if (isPrime) _foundPrimes.Add(_analyzing); //increment number to analyze next _analyzing++; } while (_foundPrimes.Count() < retrieve); return _foundPrimes; } } This is the simplest way to get primes in my opinion. Checking each number by the straight definition of a prime – is it divisible by anything besides 1 and itself. I have included this code in a base class library for my solution, as I am going to use it to demonstrate a couple of features of ANTS. This class library is consumed by a simple non-MVVM WPF application, and a simple MVC4 website. I will not post the WPF code here inline, as it is simply an ObservableCollection<int>, a label, two textbox’s, and a button. Starting a new Profiling Session So, in Visual Studio, I have just completed my first stint developing the GUI and DumbPrimes IPrimes class, so now I want to check my codes efficiency by profiling it. All I have to do is build the solution (surprised initiating a profiling session doesn’t do this, but I suppose I can understand it), and then click the ANTS menu, followed by Profile Performance. I am then greeted by the profiler starting up and already monitoring my program live: You are provided with a realtime graph at the top, and a pane at the bottom giving you information on how to proceed. I am going to start by asking my program to show me the first 15000 primes: After the program finally began responding again (I did all the work on the main UI thread – how bad!), I stopped the profiler, which did kill the process of my program too. One important thing to note, is that the profiler by default wants to give you a lot of detail about the operation – line hit counts, time per line, percent time per line, etc… The important thing to remember is that this itself takes a lot of time. When running my program without the profiler attached, it can generate the 15000 primes in 5.18 seconds, compared to 74.5 seconds – almost a 1500 percent increase. While this may seem like a lot, remember that there is a trade off. It may be WAY more inefficient, however, I am able to drill down and make improvements to specific problem areas, and then decrease execution time all around. Analyzing the Profiling Session After clicking ‘Stop Profiling’, the process running my application stopped, and the entire execution time was automatically selected by ANTS, and the results shown below: Now there are a number of interesting things going on here, I am going to cover each in a section of its own: Real Time Performance Counter Bar (top of screen) At the top of the screen, is the real time performance bar. As your application is running, this will constantly update with the currently selected performance counters status. A couple of cool things to note are the fact that you can drag a selection around specific time periods to drill down the detail views in the lower 2 panels to information pertaining to only that period. After selecting a time period, you can bookmark a section and name it, so that it is easy to find later, or after reloaded at a later time. You can also zoom in, out, or fit the graph to the space provided – useful for drilling down. It may be hard to see, but at the top of the processor time graph below the time ticks, but above the red usage graph, there is a green bar. This bar shows at what times a method that is selected in the ‘Call tree’ panel is called. Very cool to be able to click on a method and see at what times it made an impact. As I said before, ANTS provides 43 different performance counters you can hook into. Click the arrow next to the Performance tab at the top will allow you to change between different counters if you have them selected: Method Call Tree, ADO.Net Database Calls, File IO – Detail Panel Red Gate really hit the mark here I think. When you select a section of the run with the graph, the call tree populates to fill a hierarchical tree of method calls, with information regarding each of the methods. By default, methods are hidden where the source is not provided (framework type code), however, Red Gate has integrated Reflector into ANTS, so even if you don’t have source for something, you can select a method and get the source if you want. Methods are also hidden where the impact is seen as insignificant – methods that are only executed for 1% of the time of the overall calling methods time; in other words, working on making them better is not where your efforts should be focused. – Smart! Source Panel – Detail Panel The source panel is where you can see line level information on your code, showing the code for the currently selected method from the Method Call Tree. If the code is not available, Reflector takes care of it and shows the code anyways! As you can notice, there does seem to be a problem with how ANTS determines what line is the actual line that a call is completed on. I have suspicions that this may be due to some of the inline code optimizations that the CLR applies upon compilation of the assembly. In a method with comments, the problem is much more severe: As you can see here, apparently the most offending code in my base library was a comment – *gasp*! Removing the comments does help quite a bit, however I hope that Red Gate works on their counter algorithm soon to improve the logic on positioning for statistics: I did a small test just to demonstrate the lines are correct without comments. For me, it isn’t a deal breaker, as I can usually determine the correct placements by looking at the application code in the region and determining what makes sense, but it is something that would probably build up some irritation with time. Feature – Suggest Method for Optimization A neat feature to really help those in need of a pointer, is the menu option under tools to automatically suggest methods to optimize/improve: Nice feature – clicking it filters the call tree and stars methods that it thinks are good candidates for optimization. I do wish that they would have made it more visible for those of use who aren’t great on sight: Process Integration I do think that this could have a place in my process. After experimenting with the profiler, I do think it would be a great benefit to do some development, testing, and then after all the bugs are worked out, use the profiler to check on things to make sure nothing seems like it is hogging more than its fair share. For example, with this program, I would have developed it, ran it, tested it – it works, but slowly. After looking at the profiler, and seeing the massive amount of time spent in 1 method, I might go ahead and try to re-implement IPrimes (I actually would probably rewrite the offending code, but so that I can distribute both sets of code easily, I’m just going to make another implementation of IPrimes). Using two pieces of knowledge about prime numbers can make this method MUCH more efficient – prime numbers fall into two buckets 6k+/-1 , and a number is prime if it is not divisible by any other primes before it: public class SmartPrimes : IPrimes { public IEnumerable<int> GetPrimes(int retrieve) { //store a list of primes already found var _foundPrimes = new List<int>() { 2, 3 }; //if i ask for 1 or two primes, return what asked for if (retrieve <= _foundPrimes.Count()) return _foundPrimes.Take(retrieve); //the next number to look at int _k = 1; //since I already determined I don't have enough //execute at least once, and until quantity is sufficed do { //assume prime until otherwise determined bool isPrime = true; int potentialPrime; //analyze 6k-1 //assign the value to potential potentialPrime = 6 * _k - 1; //if there are any primes that divise this, it is NOT a prime number //using PLINQ for quick boost isPrime = !_foundPrimes.AsParallel() .Any(prime => potentialPrime % prime == 0); //if it is prime, add to found list if (isPrime) _foundPrimes.Add(potentialPrime); if (_foundPrimes.Count() == retrieve) break; //analyze 6k+1 //assign the value to potential potentialPrime = 6 * _k + 1; //if there are any primes that divise this, it is NOT a prime number //using PLINQ for quick boost isPrime = !_foundPrimes.AsParallel() .Any(prime => potentialPrime % prime == 0); //if it is prime, add to found list if (isPrime) _foundPrimes.Add(potentialPrime); //increment k to analyze next _k++; } while (_foundPrimes.Count() < retrieve); return _foundPrimes; } } Now there are definitely more things I can do to help make this more efficient, but for the scope of this example, I think this is fine (but still hideous)! Profiling this now yields a happy surprise 27 seconds to generate the 15000 primes with the profiler attached, and only 1.43 seconds without. One important thing I wanted to call out though was the performance graph now: Notice anything odd? The %Processor time is above 100%. This is because there is now more than 1 core in the operation. A better label for the chart in my mind would have been %Core time, but to each their own. Another odd thing I noticed was that the profiler seemed to be spot on this time in my DumbPrimes class with line details in source, even with comments.. Odd. Profiling Web Applications The last thing that I wanted to cover, that means a lot to me as a web developer, is the great amount of work that Red Gate put into the profiler when profiling web applications. In my solution, I have a simple MVC4 application setup with 1 page, a single input form, that will output prime values as my WPF app did. Launching the profiler from Visual Studio as before, nothing is really different in the profiler window, however I did receive a UAC prompt for a Red Gate helper app to integrate with the web server without notification. After requesting 500, 1000, 2000, and 5000 primes, and looking at the profiler session, things are slightly different from before: As you can see, there are 4 spikes of activity in the processor time graph, but there is also something new in the call tree: That’s right – ANTS will actually group method calls by get/post operations, so it is easier to find out what action/page is giving the largest problems… Pretty cool in my mind! Overview Overall, I think that Red Gate ANTS CLR Profiler has a lot to offer, however I think it also has a long ways to go. 3 Biggest Pros: Ability to easily drill down from time graph, to method calls, to source code Wide variety of counters to choose from when profiling your application Excellent integration/grouping of methods being called from web applications by request – BRILLIANT! 3 Biggest Cons: Issue regarding line details in source view Nit pick – Processor time vs. Core time Nit pick – Lack of full integration with Visual Studio Ratings Ease of Use (7/10) – I marked down here because of the problems with the line level details and the extra work that that entails, and the lack of better integration with Visual Studio. Effectiveness (10/10) – I believe that the profiler does EXACTLY what it purports to do. Especially with its large variety of performance counters, a definite plus! Features (9/10) – Besides the real time performance monitoring, and the drill downs that I’ve shown here, ANTS also has great integration with ADO.Net, with the ability to show database queries run by your application in the profiler. This, with the line level details, the web request grouping, reflector integration, and various options to customize your profiling session I think create a great set of features! Customer Service (10/10) – My entire experience with Red Gate personnel has been nothing but good. their people are friendly, helpful, and happy! UI / UX (8/10) – The interface is very easy to get around, and all of the options are easy to find. With a little bit of poking around, you’ll be optimizing Hello World in no time flat! Overall (8/10) – Overall, I am happy with the Performance Profiler and its features, as well as with the service I received when working with the Red Gate personnel. I WOULD recommend you trying the application and seeing if it would fit into your process, BUT, remember there are still some kinks in it to hopefully be worked out. My next post will definitely be shorter (hopefully), but thank you for reading up to here, or skipping ahead! Please, if you do try the product, drop me a message and let me know what you think! I would love to hear any opinions you may have on the product. Code Feel free to download the code I used above – download via DropBox

Read the article

Building applications with WCF - Intro

- by skjagini

I am going to write series of articles using Windows Communication Framework (WCF) to develop client and server applications and this is the first part of that series. What is WCF As Juwal puts in his Programming WCF book, WCF provides an SDK for developing and deploying services on Windows, provides runtime environment to expose CLR types as services and consume services as CLR types. Building services with WCF is incredibly easy and it’s implementation provides a set of industry standards and off the shelf plumbing including service hosting, instance management, reliability, transaction management, security etc such that it greatly increases productivity Scenario: Lets consider a typical bank customer trying to create an account, deposit amount and transfer funds between accounts, i.e. checking and savings. To make it interesting, we are going to divide the functionality into multiple services and each of them working with database directly. We will run test cases with and without transactional support across services. In this post we will build contracts, services, data access layer, unit tests to verify end to end communication etc, nothing big stuff here and we dig into other features of the WCF in subsequent posts with incremental changes. In any distributed architecture we have two pieces i.e. services and clients. Services as the name implies provide functionality to execute various pieces of business logic on the server, and clients providing interaction to the end user. Services can be built with Web Services or with WCF. Service built on WCF have the advantage of binding independent, i.e. can run against TCP and HTTP protocol without any significant changes to the code. Solution Services Profile: For creating a new bank customer, getting details about existing customer ProfileContract ProfileService Checking Account: To get checking account balance, deposit or withdraw amount CheckingAccountContract CheckingAccountService Savings Account: To get savings account balance, deposit or withdraw amount SavingsAccountContract SavingsAccountService ServiceHost: To host services, i.e. running the services at particular address, binding and contract where client can connect to Client: Helps end user to use services like creating account and amount transfer between the accounts BankDAL: Data access layer to work with database BankDAL It’s no brainer not to use an ORM as many matured products are available currently in market including Linq2Sql, Entity Framework (EF), LLblGenPro etc. For this exercise I am going to use Entity Framework 4.0, CTP 5 with code first approach. There are two approaches when working with data, data driven and code driven. In data driven we start by designing tables and their constrains in database and generate entities in code while in code driven (code first) approach entities are defined in code and the metadata generated from the entities is used by the EF to create tables and table constrains. In previous versions the entity classes had to derive from EF specific base classes. In EF 4 it is not required to derive from any EF classes, the entities are not only persistence ignorant but also enable full test driven development using mock frameworks. Application consists of 3 entities, Customer entity which contains Customer details; CheckingAccount and SavingsAccount to hold the respective account balance. We could have introduced an Account base class for CheckingAccount and SavingsAccount which is certainly possible with EF mappings but to keep it simple we are just going to follow 1 –1 mapping between entity and table mappings. Lets start out by defining a class called Customer which will be mapped to Customer table, observe that the class is simply a plain old clr object (POCO) and has no reference to EF at all. using System; namespace BankDAL.Model { public class Customer { public int Id { get; set; } public string FullName { get; set; } public string Address { get; set; } public DateTime DateOfBirth { get; set; } } } In order to inform EF about the Customer entity we have to define a database context with properties of type DbSet<> for every POCO which needs to be mapped to a table in database. EF uses convention over configuration to generate the metadata resulting in much less configuration. using System.Data.Entity; namespace BankDAL.Model { public class BankDbContext: DbContext { public DbSet<Customer> Customers { get; set; } } } Entity constrains can be defined through attributes on Customer class or using fluent syntax (no need to muscle with xml files), CustomerConfiguration class. By defining constrains in a separate class we can maintain clean POCOs without corrupting entity classes with database specific information. using System; using System.Data.Entity.ModelConfiguration; namespace BankDAL.Model { public class CustomerConfiguration: EntityTypeConfiguration<Customer> { public CustomerConfiguration() { Initialize(); } private void Initialize() { //Setting the Primary Key this.HasKey(e => e.Id); //Setting required fields this.HasRequired(e => e.FullName); this.HasRequired(e => e.Address); //Todo: Can't create required constraint as DateOfBirth is not reference type, research it //this.HasRequired(e => e.DateOfBirth); } } } Any queries executed against Customers property in BankDbContext are executed against Cusomers table. By convention EF looks for connection string with key of BankDbContext when working with the context. We are going to define a helper class to work with Customer entity with methods for querying, adding new entity etc and these are known as repository classes, i.e., CustomerRepository using System; using System.Data.Entity; using System.Linq; using BankDAL.Model; namespace BankDAL.Repositories { public class CustomerRepository { private readonly IDbSet<Customer> _customers; public CustomerRepository(BankDbContext bankDbContext) { if (bankDbContext == null) throw new ArgumentNullException(); _customers = bankDbContext.Customers; } public IQueryable<Customer> Query() { return _customers; } public void Add(Customer customer) { _customers.Add(customer); } } } From the above code it is observable that the Query methods returns customers as IQueryable i.e. customers are retrieved only when actually used i.e. iterated. Returning as IQueryable also allows to execute filtering and joining statements from business logic using lamba expressions without cluttering the data access layer with tens of methods. Our CheckingAccountRepository and SavingsAccountRepository look very similar to each other using System; using System.Data.Entity; using System.Linq; using BankDAL.Model; namespace BankDAL.Repositories { public class CheckingAccountRepository { private readonly IDbSet<CheckingAccount> _checkingAccounts; public CheckingAccountRepository(BankDbContext bankDbContext) { if (bankDbContext == null) throw new ArgumentNullException(); _checkingAccounts = bankDbContext.CheckingAccounts; } public IQueryable<CheckingAccount> Query() { return _checkingAccounts; } public void Add(CheckingAccount account) { _checkingAccounts.Add(account); } public IQueryable<CheckingAccount> GetAccount(int customerId) { return (from act in _checkingAccounts where act.CustomerId == customerId select act); } } } The repository classes look very similar to each other for Query and Add methods, with the help of C# generics and implementing repository pattern (Martin Fowler) we can reduce the repeated code. Jarod from ElegantCode has posted an article on how to use repository pattern with EF which we will implement in the subsequent articles along with WCF Unity life time managers by Drew Contracts It is very easy to follow contract first approach with WCF, define the interface and append ServiceContract, OperationContract attributes. IProfile contract exposes functionality for creating customer and getting customer details. using System; using System.ServiceModel; using BankDAL.Model; namespace ProfileContract { [ServiceContract] public interface IProfile { [OperationContract] Customer CreateCustomer(string customerName, string address, DateTime dateOfBirth); [OperationContract] Customer GetCustomer(int id); } } ICheckingAccount contract exposes functionality for working with checking account, i.e., getting balance, deposit and withdraw of amount. ISavingsAccount contract looks the same as checking account. using System.ServiceModel; namespace CheckingAccountContract { [ServiceContract] public interface ICheckingAccount { [OperationContract] decimal? GetCheckingAccountBalance(int customerId); [OperationContract] void DepositAmount(int customerId,decimal amount); [OperationContract] void WithdrawAmount(int customerId, decimal amount); } } Services Having covered the data access layer and contracts so far and here comes the core of the business logic, i.e. services. .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } ProfileService implements the IProfile contract for creating customer and getting customer detail using CustomerRepository. using System; using System.Linq; using System.ServiceModel; using BankDAL; using BankDAL.Model; using BankDAL.Repositories; using ProfileContract; namespace ProfileService { [ServiceBehavior(IncludeExceptionDetailInFaults = true)] public class Profile: IProfile { public Customer CreateAccount( string customerName, string address, DateTime dateOfBirth) { Customer cust = new Customer { FullName = customerName, Address = address, DateOfBirth = dateOfBirth }; using (var bankDbContext = new BankDbContext()) { new CustomerRepository(bankDbContext).Add(cust); bankDbContext.SaveChanges(); } return cust; } public Customer CreateCustomer(string customerName, string address, DateTime dateOfBirth) { return CreateAccount(customerName, address, dateOfBirth); } public Customer GetCustomer(int id) { return new CustomerRepository(new BankDbContext()).Query() .Where(i => i.Id == id).FirstOrDefault(); } } } From the above code you shall observe that we are calling bankDBContext’s SaveChanges method and there is no save method specific to customer entity because EF manages all the changes centralized at the context level and all the pending changes so far are submitted in a batch and it is represented as Unit of Work. Similarly Checking service implements ICheckingAccount contract using CheckingAccountRepository, notice that we are throwing overdraft exception if the balance falls by zero. WCF has it’s own way of raising exceptions using fault contracts which will be explained in the subsequent articles. SavingsAccountService is similar to CheckingAccountService. using System; using System.Linq; using System.ServiceModel; using BankDAL.Model; using BankDAL.Repositories; using CheckingAccountContract; namespace CheckingAccountService { [ServiceBehavior(IncludeExceptionDetailInFaults = true)] public class Checking:ICheckingAccount { public decimal? GetCheckingAccountBalance(int customerId) { using (var bankDbContext = new BankDbContext()) { CheckingAccount account = (new CheckingAccountRepository(bankDbContext) .GetAccount(customerId)).FirstOrDefault(); if (account != null) return account.Balance; return null; } } public void DepositAmount(int customerId, decimal amount) { using(var bankDbContext = new BankDbContext()) { var checkingAccountRepository = new CheckingAccountRepository(bankDbContext); CheckingAccount account = (checkingAccountRepository.GetAccount(customerId)) .FirstOrDefault(); if (account == null) { account = new CheckingAccount() { CustomerId = customerId }; checkingAccountRepository.Add(account); } account.Balance = account.Balance + amount; if (account.Balance < 0) throw new ApplicationException("Overdraft not accepted"); bankDbContext.SaveChanges(); } } public void WithdrawAmount(int customerId, decimal amount) { DepositAmount(customerId, -1*amount); } } } BankServiceHost The host acts as a glue binding contracts with it’s services, exposing the endpoints. The services can be exposed either through the code or configuration file, configuration file is preferred as it allows run time changes to service behavior even after deployment. We have 3 services and for each of the service you need to define name (the class that implements the service with fully qualified namespace) and endpoint known as ABC, i.e. address, binding and contract. We are using netTcpBinding and have defined the base address with for each of the contracts .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } <system.serviceModel> <services> <service name="ProfileService.Profile"> <endpoint binding="netTcpBinding" contract="ProfileContract.IProfile"/> <host> <baseAddresses> <add baseAddress="net.tcp://localhost:1000/Profile"/> </baseAddresses> </host> </service> <service name="CheckingAccountService.Checking"> <endpoint binding="netTcpBinding" contract="CheckingAccountContract.ICheckingAccount"/> <host> <baseAddresses> <add baseAddress="net.tcp://localhost:1000/Checking"/> </baseAddresses> </host> </service> <service name="SavingsAccountService.Savings"> <endpoint binding="netTcpBinding" contract="SavingsAccountContract.ISavingsAccount"/> <host> <baseAddresses> <add baseAddress="net.tcp://localhost:1000/Savings"/> </baseAddresses> </host> </service> </services> </system.serviceModel> Have to open the services by creating service host which will handle the incoming requests from clients. using System; namespace ServiceHost { class Program { static void Main(string[] args) { CreateHosts(); Console.ReadLine(); } private static void CreateHosts() { CreateHost(typeof(ProfileService.Profile),"Profile Service"); CreateHost(typeof(SavingsAccountService.Savings), "Savings Account Service"); CreateHost(typeof(CheckingAccountService.Checking), "Checking Account Service"); } private static void CreateHost(Type type, string hostDescription) { System.ServiceModel.ServiceHost host = new System.ServiceModel.ServiceHost(type); host.Open(); if (host.ChannelDispatchers != null && host.ChannelDispatchers.Count != 0 && host.ChannelDispatchers[0].Listener != null) Console.WriteLine("Started: " + host.ChannelDispatchers[0].Listener.Uri); else Console.WriteLine("Failed to start:" + hostDescription); } } } BankClient The client has no knowledge about service business logic other than the functionality it exposes through the contract, end points and a proxy to work against. The endpoint data and server proxy can be generated by right clicking on the project reference and choosing ‘Add Service Reference’ and entering the service end point address. Or if you have access to source, you can manually reference contract dlls and update clients configuration file to point to the service end point if the server and client happens to be being built using .Net framework. One of the pros with the manual approach is you don’t have to work against messy code generated files. <system.serviceModel> <client> <endpoint name="tcpProfile" address="net.tcp://localhost:1000/Profile" binding="netTcpBinding" contract="ProfileContract.IProfile"/> <endpoint name="tcpCheckingAccount" address="net.tcp://localhost:1000/Checking" binding="netTcpBinding" contract="CheckingAccountContract.ICheckingAccount"/> <endpoint name="tcpSavingsAccount" address="net.tcp://localhost:1000/Savings" binding="netTcpBinding" contract="SavingsAccountContract.ISavingsAccount"/> </client> </system.serviceModel> The client uses a façade to connect to the services using System.ServiceModel; using CheckingAccountContract; using ProfileContract; using SavingsAccountContract; namespace Client { public class ProxyFacade { public static IProfile ProfileProxy() { return (new ChannelFactory<IProfile>("tcpProfile")).CreateChannel(); } public static ICheckingAccount CheckingAccountProxy() { return (new ChannelFactory<ICheckingAccount>("tcpCheckingAccount")) .CreateChannel(); } public static ISavingsAccount SavingsAccountProxy() { return (new ChannelFactory<ISavingsAccount>("tcpSavingsAccount")) .CreateChannel(); } } } With that in place, lets get our unit tests going using System; using System.Diagnostics; using BankDAL.Model; using NUnit.Framework; using ProfileContract; namespace Client { [TestFixture] public class Tests { private void TransferFundsFromSavingsToCheckingAccount(int customerId, decimal amount) { ProxyFacade.CheckingAccountProxy().DepositAmount(customerId, amount); ProxyFacade.SavingsAccountProxy().WithdrawAmount(customerId, amount); } private void TransferFundsFromCheckingToSavingsAccount(int customerId, decimal amount) { ProxyFacade.SavingsAccountProxy().DepositAmount(customerId, amount); ProxyFacade.CheckingAccountProxy().WithdrawAmount(customerId, amount); } [Test] public void CreateAndGetProfileTest() { IProfile profile = ProxyFacade.ProfileProxy(); const string customerName = "Tom"; int customerId = profile.CreateCustomer(customerName, "NJ", new DateTime(1982, 1, 1)).Id; Customer customer = profile.GetCustomer(customerId); Assert.AreEqual(customerName,customer.FullName); } [Test] public void DepositWithDrawAndTransferAmountTest() { IProfile profile = ProxyFacade.ProfileProxy(); string customerName = "Smith" + DateTime.Now.ToString("HH:mm:ss"); var customer = profile.CreateCustomer(customerName, "NJ", new DateTime(1982, 1, 1)); // Deposit to Savings ProxyFacade.SavingsAccountProxy().DepositAmount(customer.Id, 100); ProxyFacade.SavingsAccountProxy().DepositAmount(customer.Id, 25); Assert.AreEqual(125, ProxyFacade.SavingsAccountProxy().GetSavingsAccountBalance(customer.Id)); // Withdraw ProxyFacade.SavingsAccountProxy().WithdrawAmount(customer.Id, 30); Assert.AreEqual(95, ProxyFacade.SavingsAccountProxy().GetSavingsAccountBalance(customer.Id)); // Deposit to Checking ProxyFacade.CheckingAccountProxy().DepositAmount(customer.Id, 60); ProxyFacade.CheckingAccountProxy().DepositAmount(customer.Id, 40); Assert.AreEqual(100, ProxyFacade.CheckingAccountProxy().GetCheckingAccountBalance(customer.Id)); // Withdraw ProxyFacade.CheckingAccountProxy().WithdrawAmount(customer.Id, 30); Assert.AreEqual(70, ProxyFacade.CheckingAccountProxy().GetCheckingAccountBalance(customer.Id)); // Transfer from Savings to Checking TransferFundsFromSavingsToCheckingAccount(customer.Id,10); Assert.AreEqual(85, ProxyFacade.SavingsAccountProxy().GetSavingsAccountBalance(customer.Id)); Assert.AreEqual(80, ProxyFacade.CheckingAccountProxy().GetCheckingAccountBalance(customer.Id)); // Transfer from Checking to Savings TransferFundsFromCheckingToSavingsAccount(customer.Id, 50); Assert.AreEqual(135, ProxyFacade.SavingsAccountProxy().GetSavingsAccountBalance(customer.Id)); Assert.AreEqual(30, ProxyFacade.CheckingAccountProxy().GetCheckingAccountBalance(customer.Id)); } [Test] public void FundTransfersWithOverDraftTest() { IProfile profile = ProxyFacade.ProfileProxy(); string customerName = "Angelina" + DateTime.Now.ToString("HH:mm:ss"); var customerId = profile.CreateCustomer(customerName, "NJ", new DateTime(1972, 1, 1)).Id; ProxyFacade.SavingsAccountProxy().DepositAmount(customerId, 100); TransferFundsFromSavingsToCheckingAccount(customerId,80); Assert.AreEqual(20, ProxyFacade.SavingsAccountProxy().GetSavingsAccountBalance(customerId)); Assert.AreEqual(80, ProxyFacade.CheckingAccountProxy().GetCheckingAccountBalance(customerId)); try { TransferFundsFromSavingsToCheckingAccount(customerId,30); } catch (Exception e) { Debug.WriteLine(e.Message); } Assert.AreEqual(110, ProxyFacade.CheckingAccountProxy().GetCheckingAccountBalance(customerId)); Assert.AreEqual(20, ProxyFacade.SavingsAccountProxy().GetSavingsAccountBalance(customerId)); } } } We are creating a new instance of the channel for every operation, we will look into instance management and how creating a new instance of channel affects it in subsequent articles. The first two test cases deals with creation of Customer, deposit and withdraw of month between accounts. The last case, FundTransferWithOverDraftTest() is interesting. Customer starts with depositing $100 in SavingsAccount followed by transfer of $80 in to checking account resulting in $20 in savings account. Customer then initiates $30 transfer from Savings to Checking resulting in overdraft exception on Savings with $30 being deposited to Checking. As we are not running both the requests in transactions the customer ends up with more amount than what he started with $100. In subsequent posts we will look into transactions handling. Make sure the ServiceHost project is set as start up project and start the solution. Run the test cases either from NUnit client or TestDriven.Net/Resharper which ever is your favorite tool. Make sure you have updated the data base connection string in the ServiceHost config file to point to your local database

Read the article

Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article

Anyone have ideas for solving the "n items remaining" problem on Internet Explorer?

- by CMPalmer

In my ASP.Net app, which is javascript and jQuery heavy, but also uses master pages and .Net Ajax pieces, I am consistently seeing on the status bar of IE 6 (and occasionally IE 7) the message "2 items remaining" or "15 items remaining" followed by "loading somegraphicsfile.png|gif ." This message never goes away and may or may not prevent some page functionality from running (it certainly seems to bog down, but I'm not positive). I can cause this to happen 99% of the time by just refreshing an .aspx age, but the number of items and, sometimes, the file it mentions varies. Usually it is 2, 3, 12, 13, or 15. I've Googled for answers and there are several suggestions or explanations. Some of them haven't worked for us, and others aren't practical for us to implement or try. Here are some of the ideas/theories: IE isn't caching images right, so it repeatedly asks for the same image if the image is repeated on the page and the server assumes that it should be cached locally since it's already served it in that page context. IE displays the images correctly, but sits and waits for a server response that never comes. Typically the file it says it is waiting on is repeated on the page. The page is using PNG graphics with transparency. Indeed it is, but they are jQuery-UI Themeroller generated graphics which, according to the jQuery-UI folks, are IE safe. The jQuery-UI components are the only things using PNGs. All of our PNG references are in CSS, if that helps. I've changed some of the graphics from PNG to GIF, but it is just as likely to say it's waiting for somegraphicsfile.png as it is for somegraphicsfile.gif Images are being specified in CSS and/or JavaScript but are on things that aren't currently being displayed (display: none items for example). This may be true, but if it is, then I would think preloading images would work, but so far, adding a preloader doesn't do any good. IIS's caching policy is confusing the browser. If this is true, it is only Microsoft server SW having problems with Microsoft's browser (which doesn't surprise me at all). Unfortunately, I don't have much control over the IIS configuration that will be hosting the app. Has anyone seen this and found a way to combat it? Particularly on ASP.Net apps with jQuery and jQuery-UI? UPDATE One other data point: on at least one of the pages, just commenting out the jQuery-UI Datepicker component setup causes the problem to go away, but I don't think (or at least I'm not sure) if that fixes all of the pages. If it does "fix" them, I'll have to swap out plug-ins because that functionality needs to be there. There doesn't seem to be any open issues against jQuery-UI on IE6/7 currently... UPDATE 2 I checked the IIS settings and "enable content expiration" was not set on any of my folders. Unchecking that setting was a common suggestion for fixing this problem. I have another, simpler, page that I can consistently create the error on. I'm using the jQuery-UI 1.6rc6 file (although I've also tried jQuery-UI 1.7.1 with the same results). The problem only occurs when I refresh the page that contains the jQuery-UI Datepicker. If I comment out the Datepicker setup, the problem goes away. Here are a few things I notice when I do this: This page always says "(1 item remaining) Downloading picture http:///images/Calendar_scheduleHS.gif", but only when reloading. When I look at HTTP logging, I see that it requests that image from the server every time it is dynamically turned on, without regard to caching. All of the requests for that graphic are complete and return the graphic correctly. None are marked code 200 or 304 (indicating that the server is telling IE to use the cached version). Why it says waiting on that graphic when all of the requests have completed I have no idea. There is a single other graphic on the page (one of the UI PNG files) that has a code 304 (Not Modified). On another page where I managed to log HTTP traffic with "2 items remaining", two different graphic files (both UI PNGs) had a 304 as well (but neither was the one listed as "Downloading". This error is not innocuous - the page is not fully responsive. For example, if I click on one of the buttons which should execute a client-side action, the page refreshes. Going away from the page and coming back does not produce the error. I have moved the script and script references to the bottom of the content and this doesn't affect this problem. The script is still running in the $(document).ready() though (it's too hairy to divide out unless I absolutely have to). FINAL UPDATE AND ANSWER There were a lot of good answers and suggestions below, but none of them were exactly our problem. The closest one (and the one that led me to the solution) was the one about long running JavaScript, so I awarded the bounty there (I guess I could have answered it myself, but I'd rather reward info that leads to solutions). Here was our solution: We had multiple jQueryUI datepickers that were created on the $(document).ready event in script included from the ASP.Net master page. On this client page, a local script's $(document).ready event had script that destroyed the datepickers under certain conditions. We had to use "destroy" because the previous version of datepicker had a problem with "disable". When we upgraded to the latest version of jQuery UI (1.7.1) and replaced the "destroy"s with "disable"s for the datepickers, the problem went away (or mostly went away - if you do things too fast while the page is loading, it is still possible to get the "n items remaining" status). My theory as to what was happening goes like this: The page content loads and has 12 or so text boxes with the datepicker class. The master page script creates datepickers on those text boxes. IE queues up requests for each calendar graphic independently because IE doesn't know how to properly cache dynamic image requests. Before the requests get processed, the client area script destroys those datepickers so the graphics are no longer needed. IE is left with some number of orphaned requests that it doesn't know what to do with.

Read the article

Help with Java Program for Prime numbers

- by Ben

Hello everyone, I was wondering if you can help me with this program. I have been struggling with it for hours and have just trashed my code because the TA doesn't like how I executed it. I am completely hopeless and if anyone can help me out step by step, I would greatly appreciate it. In this project you will write a Java program that reads a positive integer n from standard input, then prints out the first n prime numbers. We say that an integer m is divisible by a non-zero integer d if there exists an integer k such that m = k d , i.e. if d divides evenly into m. Equivalently, m is divisible by d if the remainder of m upon (integer) division by d is zero. We would also express this by saying that d is a divisor of m. A positive integer p is called prime if its only positive divisors are 1 and p. The one exception to this rule is the number 1 itself, which is considered to be non-prime. A positive integer that is not prime is called composite. Euclid showed that there are infinitely many prime numbers. The prime and composite sequences begin as follows: Primes: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, … Composites: 1, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 22, 24, 25, 26, 27, 28, … There are many ways to test a number for primality, but perhaps the simplest is to simply do trial divisions. Begin by dividing m by 2, and if it divides evenly, then m is not prime. Otherwise, divide by 3, then 4, then 5, etc. If at any point m is found to be divisible by a number d in the range 2 d m-1, then halt, and conclude that m is composite. Otherwise, conclude that m is prime. A moment’s thought shows that one need not do any trial divisions by numbers d which are themselves composite. For instance, if a trial division by 2 fails (i.e. has non-zero remainder, so m is odd), then a trial division by 4, 6, or 8, or any even number, must also fail. Thus to test a number m for primality, one need only do trial divisions by prime numbers less than m. Furthermore, it is not necessary to go all the way up to m-1. One need only do trial divisions of m by primes p in the range 2 p m . To see this, suppose m 1 is composite. Then there exist positive integers a and b such that 1 < a < m, 1 < b < m, and m = ab . But if both a m and b m , then ab m, contradicting that m = ab . Hence one of a or b must be less than or equal to m . To implement this process in java you will write a function called isPrime() with the following signature: static boolean isPrime(int m, int[] P) This function will return true or false according to whether m is prime or composite. The array argument P will contain a sufficient number of primes to do the testing. Specifically, at the time isPrime() is called, array P must contain (at least) all primes p in the range 2 p m . For instance, to test m = 53 for primality, one must do successive trial divisions by 2, 3, 5, and 7. We go no further since 11 53 . Thus a precondition for the function call isPrime(53, P) is that P[0] = 2 , P[1] = 3 , P[2] = 5, and P[3] = 7 . The return value in this case would be true since all these divisions fail. Similarly to test m =143 , one must do trial divisions by 2, 3, 5, 7, and 11 (since 13 143 ). The precondition for the function call isPrime(143, P) is therefore P[0] = 2 , P[1] = 3 , P[2] = 5, P[3] = 7 , and P[4] =11. The return value in this case would be false since 11 divides 143. Function isPrime() should contain a loop that steps through array P, doing trial divisions. This loop should terminate when 2 either a trial division succeeds, in which case false is returned, or until the next prime in P is greater than m , in which case true is returned. Function main() in this project will read the command line argument n, allocate an int array of length n, fill the array with primes, then print the contents of the array to stdout according to the format described below. In the context of function main(), we will refer to this array as Primes[]. Thus array Primes[] plays a dual role in this project. On the one hand, it is used to collect, store, and print the output data. On the other hand, it is passed to function isPrime() to test new integers for primality. Whenever isPrime() returns true, the newly discovered prime will be placed at the appropriate position in array Primes[]. This process works since, as explained above, the primes needed to test an integer m range only up to m , and all of these primes (and more) will already be stored in array Primes[] when m is tested. Of course it will be necessary to initialize Primes[0] = 2 manually, then proceed to test 3, 4, … for primality using function isPrime(). The following is an outline of the steps to be performed in function main(). • Check that the user supplied exactly one command line argument which can be interpreted as a positive integer n. If the command line argument is not a single positive integer, your program will print a usage message as specified in the examples below, then exit. • Allocate array Primes[] of length n and initialize Primes[0] = 2 . • Enter a loop which will discover subsequent primes and store them as Primes[1] , Primes[2], Primes[3] , ……, Primes[n -1] . This loop should contain an inner loop which walks through successive integers and tests them for primality by calling function isPrime() with appropriate arguments. • Print the contents of array Primes[] to stdout, 10 to a line separated by single spaces. In other words Primes[0] through Primes[9] will go on line 1, Primes[10] though Primes[19] will go on line 2, and so on. Note that if n is not a multiple of 10, then the last line of output will contain fewer than 10 primes. Your program, which will be called Prime.java, will produce output identical to that of the sample runs below. (As usual % signifies the unix prompt.) % java Prime Usage: java Prime [PositiveInteger] % java Prime xyz Usage: java Prime [PositiveInteger] % java Prime 10 20 Usage: java Prime [PositiveInteger] % java Prime 75 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 % 3 As you can see, inappropriate command line argument(s) generate a usage message which is similar to that of many unix commands. (Try doing the more command with no arguments to see such a message.) Your program will include a function called Usage() having signature static void Usage() that prints this message to stderr, then exits. Thus your program will contain three functions in all: main(), isPrime(), and Usage(). Each should be preceded by a comment block giving it’s name, a short description of it’s operation, and any necessary preconditions (such as those for isPrime().) See examples on the webpage.

Read the article

how to store data in ram in verilog

- by anum

i am having a bit stream of 128 bits @ each posedge of clk,i.e.total 10 bit streams each of length 128 bits. i want to divide the 128 bit stream into 8, 8 bits n hve to store them in a ram / memory of width 8 bits. i did it by assigning 8, 8 bits to wires of size 8 bit.in this way there are 16 wires. and i am using dual port ram...wen i cal module of memory in stimulus.i don know how to give input....as i am hving 16 different wires naming from k1 to k16. **codeeee** // this is stimulus file module final_stim; reg [7:0] in,in_data; reg clk,rst_n,rd,wr,rd_data,wr_data; wire [7:0] out,out_wr, ouut; wire[7:0] d; integer i; //wire[7:0] xor_out; reg kld,f; reg [127:0]key; wire [127:0] key_expand; wire [7:0]out_data; reg [7:0] k; //wire [7:0] k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16; wire [7:0] out_data1; **//key_expand is da output which is giving 10 streams of size 128 bits.** assign k1=key_expand[127:120]; assign k2=key_expand[119:112]; assign k3=key_expand[111:104]; assign k4=key_expand[103:96]; assign k5=key_expand[95:88]; assign k6=key_expand[87:80]; assign k7=key_expand[79:72]; assign k8=key_expand[71:64]; assign k9=key_expand[63:56]; assign k10=key_expand[55:48]; assign k11=key_expand[47:40]; assign k12=key_expand[39:32]; assign k13=key_expand[31:24]; assign k14=key_expand[23:16]; assign k15=key_expand[15:8]; assign k16=key_expand[7:0]; **// then the module of memory is instanciated. //here k1 is sent as input.but i don know how to save the other values of k. //i tried to use for loop but it dint help** memory m1(clk,rst_n,rd, wr,k1,out_data1); aes_sbox b(out,d); initial begin clk=1'b1; rst_n=1'b0; #20 rst_n = 1; //rd=1'b1; wr_data=1'b1; in=8'hd4; #20 //rst_n=1'b1; in=8'h27; rd_data=1'b0; wr_data=1'b1; #20 in=8'h11; rd_data=1'b0; wr_data=1'b1; #20 in=8'hae; rd_data=1'b0; wr_data=1'b1; #20 in=8'he0; rd_data=1'b0; wr_data=1'b1; #20 in=8'hbf; rd_data=1'b0; wr_data=1'b1; #20 in=8'h98; rd_data=1'b0; wr_data=1'b1; #20 in=8'hf1; rd_data=1'b0; wr_data=1'b1; #20 in=8'hb8; rd_data=1'b0; wr_data=1'b1; #20 in=8'hb4; rd_data=1'b0; wr_data=1'b1; #20 in=8'h5d; rd_data=1'b0; wr_data=1'b1; #20 in=8'he5; rd_data=1'b0; wr_data=1'b1; #20 in=8'h1e; rd_data=1'b0; wr_data=1'b1; #20 in=8'h41; rd_data=1'b0; wr_data=1'b1; #20 in=8'h52; rd_data=1'b0; wr_data=1'b1; #20 in=8'h30; rd_data=1'b0; wr_data=1'b1; #20 wr_data=1'b0; #380 rd_data=1'b1; #320 rd_data = 1'b0; /////////////// #10 kld = 1'b1; key=128'h 2b7e151628aed2a6abf7158809cf4f3c; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; wr = 1'b1; rd = 1'b0; #10 wr = 1'b1; rd = 1'b1; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; wr = 1'b1; rd = 1'b1; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; wr = 1'b1; rd = 1'b1; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; wr = 1'b1; rd = 1'b1; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; wr = 1'b1; rd = 1'b1; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; wr = 1'b1; rd = 1'b1; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; wr = 1'b1; rd = 1'b1; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; wr = 1'b1; rd = 1'b1; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; wr = 1'b1; rd = 1'b1; #20 kld = 1'b0; key = 128'h 2b7e151628aed2a6abf7158809cf4f3c; wr = 1'b1; rd = 1'b1; #20 wr = 1'b0; #20 rd = 1'b1; #4880 f=1'b1; ///////////////////////////////////////////////// // out_data[i] end /*always@(*) begin while(i) mem[i]^mem1[i] ; i<=16; break; end*/ always #10 clk=~clk; always@(posedge clk) begin //$monitor($time," out_wr=%h,out_rd=%h\n ",out_wr,out); #10000 $stop; end endmodule

Read the article

How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Search Results

Search found 616 results on 25 pages for 'divide'.

Page 25/25 | < Previous Page | 21 22 23 24 25

- by Nazgulled

- by Ziv

- by aaron burns

- by Bob Hanckel

- by Josh

- by xx77aBs

- by Para

- by AJ Clou

- by Martin Hinshelwood

- by ToStringTheory

- by skjagini

- by user12620111

- by CMPalmer

- by Ben

- by anum

- by rchrd

< Previous Page | 21 22 23 24 25