Search Results

Search found 10417 results on 417 pages for 'large'.

Page 73/417 | < Previous Page | 69 70 71 72 73 74 75 76 77 78 79 80  | Next Page >

  • Efficient storage in C#.net App

    - by Tommy
    I'm looking for the fastest, least memory consuming, stand alone storage method available for large amounts of data for my C# app. My initial thoughts: Sql: no. not stand alone XML in flat file: no. takes too long to parse large amounts of data Other Options? Basically what i'm looking for, is a way that i can load with my applications load, keep all the data in my app, and when the data in my app changes just update the storage location.

    Read the article

  • Dump to CSV/Postgres memory

    - by alex
    I have a large table (300 million lines) that I would like to dump to a csv - I need to do some processing that cannot be done with SQL. Right now I am using Squirrel as a client, and it does not apparently deal very well with large datasets - at least as far as I can tell from my own (limited) experience. If I run the query on the actual host, will it use less memory? Thanks for any help.

    Read the article

  • Create thumbnail and reduce image size

    - by oo
    I have very large images (jpg) and i want to write a csharp program to loop through the files and reduce the size of each image by 75%. I tried this: Image thumbNail = image.GetThumbnailImage(800, 600, null, new IntPtr()); but the file size is still very large. Is there anyway to create thumbnails and have the filesize be much smaller?

    Read the article

  • How to create a database deadlock using jdbc and JUNIT

    - by Isawpalmetto
    I am trying to create a database deadlock and I am using JUnit. I have two concurrent tests running which are both updating the same row in a table over and over again in a loop. My idea is that you update say row A in Table A and then row B in Table B over and over again in one test. Then at the same time you update row B table B and then row A Table A over and over again. From my understanding this should eventually result in a deadlock. Here is the code For the first test. public static void testEditCC() { try{ int rows = 0; int counter = 0; int large=10000000; Connection c=DataBase.getConnection(); while(counter<large) { int pid = 87855; int cCode = 655; String newCountry="Egypt"; int bpl = 0; stmt = c.createStatement(); rows = stmt.executeUpdate("UPDATE main " + //create lock on main table "SET BPL="+cCode+ "WHERE ID="+pid); rows = stmt.executeUpdate("UPDATE BPL SET DESCRIPTION='SomeWhere' WHERE ID=602"); //create lock on bpl table counter++; } assertTrue(rows == 1); //rows = stmt.executeUpdate("Insert into BPL (ID, DESCRIPTION) VALUES ("+cCode+", '"+newCountry+"')"); } catch(SQLException ex) { ex.printStackTrace(); //ex.getMessage(); } } And here is the code for the second test. public static void testEditCC() { try{ int rows = 0; int counter = 0; int large=10000000; Connection c=DataBase.getConnection(); while(counter<large) { int pid = 87855; int cCode = 655; String newCountry="Jordan"; int bpl = 0; stmt = c.createStatement(); //stmt.close(); rows = stmt.executeUpdate("UPDATE BPL SET DESCRIPTION='SomeWhere' WHERE ID=602"); //create lock on bpl table rows = stmt.executeUpdate("UPDATE main " + //create lock on main table "SET BPL="+cCode+ "WHERE ID="+pid); counter++; } assertTrue(rows == 1); //rows = stmt.executeUpdate("Insert into BPL (ID, DESCRIPTION) VALUES ("+cCode+", '"+newCountry+"')"); } catch(SQLException ex) { ex.printStackTrace(); } } I am running these two separate JUnit tests at the same time and am connecting to an apache Derby database that I am running in network mode within Eclipse. Can anyone help me figure out why a deadlock is not occurring? Perhaps I am using JUnit wrong.

    Read the article

  • Is there any way to make gcc print offending lines when it emits an error?

    - by Alex
    I have a large codebase that I've been tasked with porting to 64 bits. The code compiles, but it prints a very large amount of incompatible pointer warnings (as is to be expected.) Is there any way I can have gcc print the line on which the error occurs? At this point I'm just using gcc's error messages to try to track down assumptions that need to be modified, and having to look up every one is not fun.

    Read the article

  • How to pick random (small) data samples using Map/Reduce?

    - by Andrei Savu
    I want to write a map/reduce job to select a number of random samples from a large dataset based on a row level condition. I want to minimize the number of intermediate keys. Pseudocode: for each row if row matches condition put the row.id in the bucket if the bucket is not already large enough Have you done something like this? Is there any well known algorithm? A sample containing sequential rows is also good enough. Thanks.

    Read the article

  • How to call a set of variables functions based on class on a group of elements

    - by user1547007
    I have the following html code: <i class="small ele class1"></i> <i class="medium ele class1"></i> <i class="large ele class1"></i> <div class="clear"></div> <i class="small ele class2"></i> <i class="medium ele class2"></i> <i class="large ele class2"></i> <div class="clear"></div> <i class="small ele class3"></i> <i class="medium ele class3"></i> <i class="large ele class3"></i> <div class="clear"></div> <i class="small ele class4"></i> <i class="medium ele class4"></i> <i class="large ele class4"></i>? And my javascript looks like so: var resize = function(face, s) { var bb = face.getBBox(); console.log(bb); var w = bb.width; var h = bb.height; var max = w; if (h > max) { max = h; } var scale = s / max; var ox = -bb.x+((max-w)/2); var oy = -bb.y+((max-h)/2); console.log(s+' '+h+' '+bb.y); face.attr({ "transform": "s" + scale + "," + scale + ",0,0" + "t" + ox + "," + oy }); } $('.ele').each(function() { var s = $(this).innerWidth(); var paper = Raphael($(this)[0], s, s); var face = $(this).hasClass("class1") ? class1Generator(paper) : class4Generator(paper); /*switch (true) { case $(this).hasClass('class1'): class1Generator(paper); break; case $(this).hasClass('class2'): class2Generator(paper) break; case $(this).hasClass('class3'): class3Generator(paper) break; case $(this).hasClass('class4'): class4Generator(paper) break; }*/ resize(face, s); }); my question is, how could I make this line of code more scalable? I tried using a switch but The script below is calling two functions if one of the elements has a class, but what If i have 10 classes? I don't think is the best solution I created a jsFiddle http://jsfiddle.net/7uUgz/6/ //var face = $(this).hasClass("awesome") ? awesomeGenerator(paper) : awfulGenerator(paper);

    Read the article

  • Efficient storage in .Net App

    - by Tommy
    I'm looking for the fastest, least memory consuming, stand alone storage method available for large amounts of data for my C# app. My initial thoughts: Sql: no. not stand alone XML in flat file: no. takes too long to parse large amounts of data Other Options? Basically what i'm looking for, is a way that i can load with my applications load, keep all the data in my app, and when the data in my app changes just update the storage location.

    Read the article

  • GDI+ on 64bit systems

    - by LaZe
    Our system uses a lot of large Bitmaps (System.Drawing.Bitmap) and sometimes we run out of memory and gets a "Parameter is not valid" error. This makes sense, since it can be difficult to allocate large continuous chunk of memory. So the question is... If we upgraded the system to 64bit, would this problem go away?

    Read the article

  • Oracle - truncating a global temporary table

    - by superdario
    I am processing large amounts of data in iterations, each and iteration processes around 10-50 000 records. Because of such large number of records, I am inserting them into a global temporary table first, and then process it. Usually, each iteration takes 5-10 seconds. Would it be wise to truncate the global temporary table after each iteration so that each iteration can start off with an empty table? There are around 5000 iterations.

    Read the article

  • Is a string formatter that pulls variables from its calling scope bad practice?

    - by Eric
    I have some code that does an awful lot of string formatting, Often, I end up with code along the lines of: "...".format(x=x, y=y, z=z, foo=foo, ...) Where I'm trying to interpolate a large number of variables into a large string. Is there a good reason not to write a function like this that uses the inspect module to find variables to interpolate? import inspect def interpolate(s): return s.format(**inspect.currentframe().f_back.f_locals) def generateTheString(x): y = foo(x) z = x + y # more calculations go here return interpolate("{x}, {y}, {z}")

    Read the article

  • C#: String.IndexOf to FileStream.Seek

    - by pistacchio
    Hi, having a FileStream that I read with a StreamReader (it is a very large file), how can I set the Seek position of the FileStream to the first occurrence of a certain substring so that I can start reading this large file from a given point? Thanks

    Read the article

  • Internal drives vs USB-3 with external SSD or eSata with External SSD

    - by normstorm
    I have a need to carry VMWare Virtual Machines with me for work. These are very large files (each VM is 20GB or more) and I carry around about 40 to 50 VM's to simulate different software configurations for different client needs. Key: they won't fit on the internal hard drive of my current laptop. I currently execute the VM's from an external 7200RPM 2.5" USB-2 drive. I keep copies of the VM's on other 5400 external USB-2 drives. The VM's work from this drive, but they are slow, costing me much time and frustration. It can take upwards of 30 minutes just to make a copy of one of the VM's. They can take upwards of 10-15 minutes to fully launch and then they operate sluggishly. I am buying a new laptop (Core I7, 8GB RAM and other high-end specs). I intend to buy an SSD for the O/S volume (C:). This SSD will not be large enough to hold the VM's. I have always wanted a second internal hard drive to operate the VM's. To have two hard drives, though, I am finding that I will have to go to a 17" laptop which would be bulky/heavy. I am instead considering purchasing a 15" laptop with either an eSATA port or USB-3 ports and then purchasing two external drives. One of the drives might be an external SSD (maybe OCX brand) for operating the VM's and the other a 7400RPM 1TB hard drive for carrying around the VM's not currently in use. The question is which options would give me the biggest bang for the buck and the weight: 1) 2nd Internal SSD hard drive. This would mean buying a 17" laptop with two drive "bays". The first bay would hold an SSD drive for the C: drive. I would leave the first bay empty from the manufacture and then purchase/install an aftermarket SSD drive. This second SSD drive would have to be very large (256 GB), which would be expensive. I would still also need another external hard drive for carrying around the VM's not in use. 2) 2nd internal hard drive - 7400 RPM. Again, a 17" laptop would be required, but there are models available with on SSD drive for the C: drive and a second 7200 RPM hard drives. The second drive could probably be large enough to hold the VM's in use as well as those not in use. But would it be fast enough to drive the VM's? 3) USB-3 with External SSD. I could buy a 15" laptop with an SSD drive for the C: drive and a second hard drive for general files. I would operate the VM's from an external USB-3 SSD drive and have a third USB-3 external 7200 RPM drive for holding the VM's not in use. 4) eSATA with External SSD. Ditto, just eSATA instead of USB-3 5) USB-3 with External 7400 RPM drive. Ditto, but the drive running the VM's would be USB-3 attached 7400 RPM drives rather than SSD. 6) eSATA with External 7400 RPM drive. Dittor, but the drive running the VM's would be eSATA attached 7400 RPM drives rather than SSD. Any thoughts on this and any creative solutions?

    Read the article

  • PASS: Bylaw Changes

    - by Bill Graziano
    While you’re reading this, a post should be going up on the PASS blog on the plans to change our bylaws.  You should be able to find our old bylaws, our proposed bylaws and a red-lined version of the changes.  We plan to listen to feedback until March 31st.  At that point we’ll decide whether to vote on these changes or take other action. The executive summary is that we’re adding a restriction to prevent more than two people from the same company on the Board and eliminating the Board’s Officer Appointment Committee to have Officers directly elected by the Board.  This second change better matches how officer elections have been conducted in the past. The Gritty Details Our scope was to change bylaws to match how PASS actually works and tackle a limited set of issues.  Changing the bylaws is hard.  We’ve been working on these changes since the March board meeting last year.  At that meeting we met and talked through the issues we wanted to address.  In years past the Board has tried to come up with language and then we’ve discussed and negotiated to get to the result.  In March, we gave HQ guidance on what we wanted and asked them to come up with a starting point.  Hannes worked on building us an initial set of changes that we could work our way through.  Discussing changes like this over email is difficult wasn’t very productive.  We do a much better job on this at the in-person Board meetings.  Unfortunately there are only 2 or 3 of those a year. In August we met in Nashville and spent time discussing the changes.  That was also the day after we released the slate for the 2010 election. The discussion around that colored what we talked about in terms of these changes.  We talked very briefly at the Summit and again reviewed and revised the changes at the Board meeting in January.  This is the result of those changes and discussions. We made numerous small changes to clean up language and make wording more clear.  We also made two big changes. Director Employment Restrictions The first is that only two people from the same company can serve on the Board at the same time.  The actual language in section VI.3 reads: A maximum of two (2) Directors who are employed by, or who are joint owners or partners in, the same for-profit venture, company, organization, or other legal entity, may concurrently serve on the PASS Board of Directors at any time. The definition of “employed” is at the sole discretion of the Board. And what a mess this turns out to be in practice.  Our membership is a hodgepodge of interlocking relationships.  Let’s say three Board members get together and start a blog service for SQL Server bloggers.  It’s technically for-profit.  Let’s assume it makes $8 in the first year.  Does that trigger this clause?  (Technically yes.)  We had a horrible time trying to write language that covered everything.  All the sample bylaws that we found were just as vague as this. That led to the third clause in this section.  The first sentence reads: The Board of Directors reserves the right, strictly on a case-by-case basis, to overrule the requirements of Section VI.3 by majority decision for any single Director’s conflict of employment. We needed some way to handle the trivial issues and exercise some judgment.  It seems like a public vote is the best way.  This discloses the relationship and gets each Board member on record on the issue.   In practice I think this clause will rarely be used.  I think this entire section will only be invoked for actual employment issues and not for small side projects.  In either case we have the mechanisms in place to handle it in a public, transparent way. That’s the first and third clauses.  The second clause says that if your situation changes and you fall afoul of this restriction you need to notify the Board.  The clause further states that if this new job means a Board members violates the “two-per-company” rule the Board may request their resignation.  The Board can also  allow the person to continue serving with a majority vote.  I think this will also take some judgment.  Consider a person switching jobs that leads to three people from the same company.  I’m very likely to ask for someone to resign if all three are two weeks into a two year term.  I’m unlikely to ask anyone to resign if one is two weeks away from ending their term.  In either case, the decision will be a public vote that we can be held accountable for. One concern that was raised was whether this would affect someone choosing to accept a job.  I think that’s a choice for them to make.  PASS is clearly stating its intent that only two directors from any one organization should serve at any time.  Once these bylaws are approved, this policy should not come as a surprise to any potential or current Board members considering a job change.  This clause isn’t perfect.  The biggest hole is business relationships that aren’t defined above.  Let’s say that two employees from company “X” serve on the Board.  What happens if I accept a full-time consulting contract with that company?  Let’s assume I’m working directly for one of the two existing Board members.  That doesn’t violate section VI.3.  But I think it’s clearly the kind of relationship we’d like to prevent.  Unfortunately that was even harder to write than what we have now.  I fully expect that in the next revision of the bylaws we’ll address this.  It just didn’t make it into this one. Officer Elections The officer election process received a slightly different rewrite.  Our goal was to codify in the bylaws the actual process we used to elect the officers.  The officers are the President, Executive Vice-President (EVP) and Vice-President of Marketing.  The Immediate Past President (IPP) is also an officer but isn’t elected.  The IPP serves in that role for two years after completing their term as President.  We do that for continuity’s sake.  Some organizations have a President-elect that serves for one or two years.  The group that founded PASS chose to have an IPP. When I started on the Board, the Nominating Committee (NomCom) selected the slate for the at-large directors and the slate for the officers.  There was always one candidate for each officer position.  It wasn’t really an election so much as the NomCom decided who the next person would be for each officer position.  Behind the scenes the Board worked to select the best people for the role. In June 2009 that process was changed to bring it line with what actually happens.  An Officer Appointment Committee was created that was a subset of the Board.  That committee would take time to interview the candidates and present a slate to the Board for approval.  The majority vote of the Board would determine the officers for the next two years.  In practice the Board itself interviewed the candidates and conducted the elections.  That means it was time to change the bylaws again. Section VII.2 and VII.3 spell out the process used to select the officers.  We use the phrase “Officer Appointment” to separate it from the Director election but the end result is that the Board elects the officers.  Section VII.3 starts: Officers shall be appointed bi-annually by a majority of all the voting members of the Board of Directors. Everything else revolves around that sentence.  We use the word appoint but they truly are elected.  There are details in the bylaws for term limits, minimum requirements for President (1 prior term as an officer), tie breakers and filling vacancies. In practice we will have an election for President, then an election for EVP and then an election for VP Marketing.  That means that losing candidates will be able to fall down the ladder and run for the next open position.  Another point to note is that officers aren’t at-large directors.  That means if a current sitting officer loses all three elections they are off the Board.  Having Board member votes public will help with the transparency of this approach. This process has a number of positive and negatives.  The biggest concern I expect to hear is that our members don’t directly choose the officers.  I’m going to try and list all the positives and negatives of this approach. Many non-profits value continuity and are slower to change than a business.  On the plus side this promotes that.  On the negative side this promotes that.  If we change too slowly the members complain that we aren’t responsive.  If we change too quickly we make mistakes and fail at various things.  We’ve been criticized for both of those lately so I’m not entirely sure where to draw the line.  My rough assumption to this point is that we’re going too slow on governance and too quickly on becoming “more than a Summit.”  This approach creates competition in the officer elections.  If you are an at-large director there is no consequence to losing an election.  If you are an officer the only way to stay on the Board is to win an officer election or an at-large election.  If you are an officer and lose an election you can always run for the next office down.  This makes it very easy for multiple people to contest an election. There is value in a person moving through the officer positions up to the Presidency.  Having the Board select the officers promotes this.  The down side is that it takes a LOT of time to get to the Presidency.  We’ve had good people struggle with burnout.  We’ve had lots of discussion around this.  The process as we’ve described it here makes it possible for someone to move quickly through the ranks but doesn’t prevent people from working their way up through each role. We talked long and hard about having the officers elected by the members.  We had a self-imposed deadline to complete these changes prior to elections this summer. The other challenge was that our original goal was to make the bylaws reflect our actual process rather than create a new one.  I believe we accomplished this goal. We ran out of time to consider this option in the detail it needs.  Having member elections for officers needs a number of problems solved.  We would need a way for candidates to fall through the election.  This is what promotes competition.  Without this few people would risk an election and we’ll be back to one candidate per slot.  We need to do this without having multiple elections.  We may be able to copy what other organizations are doing but I was surprised at how little I could find on other organizations.  We also need a way for people that lose an officer election to win an at-large election.  Otherwise we’ll have very little competition for officers. This brings me to an area that I think we as a Board haven’t done a good job.  We haven’t built a strong process to tell you who is doing a good job and who isn’t.  This is a double-edged sword.  I don’t want to highlight Board members that are failing.  That’s not a good way to get people to volunteer and run for the Board.  But I also need a way let the members make an informed choice about who is doing a good job and would make a good officer.  Encouraging Board members to blog, publishing minutes and making votes public helps in that regard but isn’t the final answer.  I don’t know what the final answer is yet.  I do know that the Board members themselves are uniquely positioned to know which other Board members are doing good work.  They know who speaks up in meetings, who works to build consensus, who has good ideas and who works with the members.  What I Could Do Better I’ve learned a lot writing this about how we communicated with our members.  The next time we revise the bylaws I’d do a few things differently.  The biggest change would be to provide better documentation.  The March 2009 minutes provide a very detailed look into what changes we wanted to make to the bylaws.  Looking back, I’m a little surprised at how closely they matched our final changes and covered the various arguments.  If you just read those you’d get 90% of what we eventually changed.  Nearly everything else was just details around implementation.  I’d also consider publishing a scope document defining exactly what we were doing any why.  I think it really helped that we had a limited, defined goal in mind.  I don’t think we did a good job communicating that goal outside the meeting minutes though. That said, I wish I’d blogged more after the August and January meeting.  I think it would have helped more people to know that this change was coming and to be ready for it. Conclusion These changes address two big concerns that the Board had.  First, it prevents a single organization from dominating the Board.  Second, it codifies and clearly spells out how officers are elected.  This is the process that was previously followed but it was somewhat murky.  These changes bring clarity to this and clearly explain the process the Board will follow. We’re going to listen to feedback until March 31st.  At that time we’ll decide whether to approve these changes.  I’m also assuming that we’ll start another round of changes in the next year or two.  Are there other issues in the bylaws that we should tackle in the future?

    Read the article

  • Session memory – who’s this guy named Max and what’s he doing with my memory?

    - by extended_events
    SQL Server MVP Jonathan Kehayias (blog) emailed me a question last week when he noticed that the total memory used by the buffers for an event session was larger than the value he specified for the MAX_MEMORY option in the CREATE EVENT SESSION DDL. The answer here seems like an excellent subject for me to kick-off my new “401 – Internals” tag that identifies posts where I pull back the curtains a bit and let you peek into what’s going on inside the extended events engine. In a previous post (Option Trading: Getting the most out of the event session options) I explained that we use a set of buffers to store the event data before  we write the event data to asynchronous targets. The MAX_MEMORY along with the MEMORY_PARTITION_MODE defines how big each buffer will be. Theoretically, that means that I can predict the size of each buffer using the following formula: max memory / # of buffers = buffer size If it was that simple I wouldn’t be writing this post. I’ll take “boundary” for 64K Alex For a number of reasons that are beyond the scope of this blog, we create event buffers in 64K chunks. The result of this is that the buffer size indicated by the formula above is rounded up to the next 64K boundary and that is the size used to create the buffers. If you think visually, this means that the graph of your max_memory option compared to the actual buffer size that results will look like a set of stairs rather than a smooth line. You can see this behavior by looking at the output of dm_xe_sessions, specifically the fields related to the buffer sizes, over a range of different memory inputs: Note: This test was run on a 2 core machine using per_cpu partitioning which results in 5 buffers. (Seem my previous post referenced above for the math behind buffer count.) input_memory_kb total_regular_buffers regular_buffer_size total_buffer_size 637 5 130867 654335 638 5 130867 654335 639 5 130867 654335 640 5 196403 982015 641 5 196403 982015 642 5 196403 982015 This is just a segment of the results that shows one of the “jumps” between the buffer boundary at 639 KB and 640 KB. You can verify the size boundary by doing the math on the regular_buffer_size field, which is returned in bytes: 196403 – 130867 = 65536 bytes 65536 / 1024 = 64 KB The relationship between the input for max_memory and when the regular_buffer_size is going to jump from one 64K boundary to the next is going to change based on the number of buffers being created. The number of buffers is dependent on the partition mode you choose. If you choose any partition mode other than NONE, the number of buffers will depend on your hardware configuration. (Again, see the earlier post referenced above.) With the default partition mode of none, you always get three buffers, regardless of machine configuration, so I generated a “range table” for max_memory settings between 1 KB and 4096 KB as an example. start_memory_range_kb end_memory_range_kb total_regular_buffers regular_buffer_size total_buffer_size 1 191 NULL NULL NULL 192 383 3 130867 392601 384 575 3 196403 589209 576 767 3 261939 785817 768 959 3 327475 982425 960 1151 3 393011 1179033 1152 1343 3 458547 1375641 1344 1535 3 524083 1572249 1536 1727 3 589619 1768857 1728 1919 3 655155 1965465 1920 2111 3 720691 2162073 2112 2303 3 786227 2358681 2304 2495 3 851763 2555289 2496 2687 3 917299 2751897 2688 2879 3 982835 2948505 2880 3071 3 1048371 3145113 3072 3263 3 1113907 3341721 3264 3455 3 1179443 3538329 3456 3647 3 1244979 3734937 3648 3839 3 1310515 3931545 3840 4031 3 1376051 4128153 4032 4096 3 1441587 4324761 As you can see, there are 21 “steps” within this range and max_memory values below 192 KB fall below the 64K per buffer limit so they generate an error when you attempt to specify them. Max approximates True as memory approaches 64K The upshot of this is that the max_memory option does not imply a contract for the maximum memory that will be used for the session buffers (Those of you who read Take it to the Max (and beyond) know that max_memory is really only referring to the event session buffer memory.) but is more of an estimate of total buffer size to the nearest higher multiple of 64K times the number of buffers you have. The maximum delta between your initial max_memory setting and the true total buffer size occurs right after you break through a 64K boundary, for example if you set max_memory = 576 KB (see the green line in the table), your actual buffer size will be closer to 767 KB in a non-partitioned event session. You get “stepped up” for every 191 KB block of initial max_memory which isn’t likely to cause a problem for most machines. Things get more interesting when you consider a partitioned event session on a computer that has a large number of logical CPUs or NUMA nodes. Since each buffer gets “stepped up” when you break a boundary, the delta can get much larger because it’s multiplied by the number of buffers. For example, a machine with 64 logical CPUs will have 160 buffers using per_cpu partitioning or if you have 8 NUMA nodes configured on that machine you would have 24 buffers when using per_node. If you’ve just broken through a 64K boundary and get “stepped up” to the next buffer size you’ll end up with total buffer size approximately 10240 KB and 1536 KB respectively (64K * # of buffers) larger than max_memory value you might think you’re getting. Using per_cpu partitioning on large machine has the most impact because of the large number of buffers created. If the amount of memory being used by your system within these ranges is important to you then this is something worth paying attention to and considering when you configure your event sessions. The DMV dm_xe_sessions is the tool to use to identify the exact buffer size for your sessions. In addition to the regular buffers (read: event session buffers) you’ll also see the details for large buffers if you have configured MAX_EVENT_SIZE. The “buffer steps” for any given hardware configuration should be static within each partition mode so if you want to have a handy reference available when you configure your event sessions you can use the following code to generate a range table similar to the one above that is applicable for your specific machine and chosen partition mode. DECLARE @buf_size_output table (input_memory_kb bigint, total_regular_buffers bigint, regular_buffer_size bigint, total_buffer_size bigint) DECLARE @buf_size int, @part_mode varchar(8) SET @buf_size = 1 -- Set to the begining of your max_memory range (KB) SET @part_mode = 'per_cpu' -- Set to the partition mode for the table you want to generate WHILE @buf_size <= 4096 -- Set to the end of your max_memory range (KB) BEGIN     BEGIN TRY         IF EXISTS (SELECT * from sys.server_event_sessions WHERE name = 'buffer_size_test')             DROP EVENT SESSION buffer_size_test ON SERVER         DECLARE @session nvarchar(max)         SET @session = 'create event session buffer_size_test on server                         add event sql_statement_completed                         add target ring_buffer                         with (max_memory = ' + CAST(@buf_size as nvarchar(4)) + ' KB, memory_partition_mode = ' + @part_mode + ')'         EXEC sp_executesql @session         SET @session = 'alter event session buffer_size_test on server                         state = start'         EXEC sp_executesql @session         INSERT @buf_size_output (input_memory_kb, total_regular_buffers, regular_buffer_size, total_buffer_size)             SELECT @buf_size, total_regular_buffers, regular_buffer_size, total_buffer_size FROM sys.dm_xe_sessions WHERE name = 'buffer_size_test'     END TRY     BEGIN CATCH         INSERT @buf_size_output (input_memory_kb)             SELECT @buf_size     END CATCH     SET @buf_size = @buf_size + 1 END DROP EVENT SESSION buffer_size_test ON SERVER SELECT MIN(input_memory_kb) start_memory_range_kb, MAX(input_memory_kb) end_memory_range_kb, total_regular_buffers, regular_buffer_size, total_buffer_size from @buf_size_output group by total_regular_buffers, regular_buffer_size, total_buffer_size Thanks to Jonathan for an interesting question and a chance to explore some of the details of Extended Event internals. - Mike

    Read the article

  • SQL Server and Hyper-V Dynamic Memory Part 3

    - by SQLOS Team
    In parts 1 and 2 of this series we looked at the basics of Hyper-V Dynamic Memory and SQL Server memory management. In this part Serdar looks at configuration guidelines for SQL Server memory management. Part 3: Configuration Guidelines for Hyper-V Dynamic Memory and SQL Server Now that we understand SQL Server Memory Management and Hyper-V Dynamic Memory basics, let’s take a look at general configuration guidelines in order to utilize benefits of Hyper-V Dynamic Memory in your SQL Server VMs. Requirements Host Operating System Requirements Hyper-V Dynamic Memory feature is introduced with Windows Server 2008 R2 SP1. Therefore in order to use Dynamic Memory for your virtual machines, you need to have Windows Server 2008 R2 SP1 or Microsoft Hyper-V Server 2008 R2 SP1 in your Hyper-V host. Guest Operating System Requirements In addition to this Dynamic Memory is only supported in Standard, Web, Enterprise and Datacenter editions of windows running inside VMs. Make sure that your VM is running one of these editions. For additional requirements on each operating system see “Dynamic Memory Configuration Guidelines” here. SQL Server Requirements All versions of SQL Server support Hyper-V Dynamic Memory. However, only certain editions of SQL Server are aware of dynamically changing system memory. To have a truly dynamic environment for your SQL Server VMs make sure that you are running one of the SQL Server editions listed below: ·         SQL Server 2005 Enterprise ·         SQL Server 2008 Enterprise / Datacenter Editions ·         SQL Server 2008 R2 Enterprise / Datacenter Editions Configuration guidelines for other versions of SQL Server are covered below in the FAQ section. Guidelines for configuring Dynamic Memory Parameters Here is how to configure Dynamic Memory for your SQL VMs in a nutshell: Hyper-V Dynamic Memory Parameter Recommendation Startup RAM 1 GB + SQL Min Server Memory Maximum RAM > SQL Max Server Memory Memory Buffer % 5 Memory Weight Based on performance needs   Startup RAM In order to ensure that your SQL Server VMs can start correctly, ensure that Startup RAM is higher than configured SQL Min Server Memory for your VMs. Otherwise SQL Server service will need to do paging in order to start since it will not be able to see enough memory during startup. Also note that Startup Memory will always be reserved for your VMs. This will guarantee a certain level of performance for your SQL Servers, however setting this too high will limit the consolidation benefits you’ll get out of your virtualization environment. Maximum RAM This one is obvious. If you’ve configured SQL Max Server Memory for your SQL Server, make sure that Dynamic Memory Maximum RAM configuration is higher than this value. Otherwise your SQL Server will not grow to memory values higher than the value configured for Dynamic Memory. Memory Buffer % Memory buffer configuration is used to provision file cache to virtual machines in order to improve performance. Due to the fact that SQL Server is managing its own buffer pool, Memory Buffer setting should be configured to the lowest value possible, 5%. Configuring a higher memory buffer will prevent low resource notifications from Windows Memory Manager and it will prevent reclaiming memory from SQL Server VMs. Memory Weight Memory weight configuration defines the importance of memory to a VM. Configure higher values for the VMs that have higher performance requirements. VMs with higher memory weight will have more memory under high memory pressure conditions on your host. Questions and Answers Q1 – Which SQL Server memory model is best for Dynamic Memory? The best SQL Server model for Dynamic Memory is “Locked Page Memory Model”. This memory model ensures that SQL Server memory is never paged out and it’s also adaptive to dynamically changing memory in the system. This will be extremely useful when Dynamic Memory is attempting to remove memory from SQL Server VMs ensuring no SQL Server memory is paged out. You can find instructions on configuring “Locked Page Memory Model” for your SQL Servers here. Q2 – What about other SQL Server Editions, how should I configure Dynamic Memory for them? Other editions of SQL Server do not adapt to dynamically changing environments. They will determine how much memory they should allocate during startup and don’t change this value afterwards. Therefore make sure that you configure a higher startup memory for your VM because that will be all the memory that SQL Server utilize Tune Maximum Memory and Memory Buffer based on the other workloads running on the system. If there are no other workloads consider using Static Memory for these editions. Q3 – What if I have multiple SQL Server instances in a VM? Having multiple SQL Server instances in a VM is not a general recommendation for predictable performance, manageability and isolation. In order to achieve a predictable behavior make sure that you configure SQL Min Server Memory and SQL Max Server Memory for each instance in the VM. And make sure that: ·         Dynamic Memory Startup Memory is greater than the sum of SQL Min Server Memory values for the instances in the VM ·         Dynamic Memory Maximum Memory is greater than the sum of SQL Max Server Memory values for the instances in the VM Q4 – I’m using Large Page Memory Model for my SQL Server. Can I still use Dynamic Memory? The short answer is no. SQL Server does not dynamically change its memory size when configured with Large Page Memory Model. In virtualized environments Hyper-V provides large page support by default. Most of the time, Large Page Memory Model doesn’t bring any benefits to a SQL Server if it’s running in virtualized environments. Q5 – How do I monitor SQL performance when I’m trying Dynamic Memory on my VMs? Use the performance counters below to monitor memory performance for SQL Server: Process - Working Set: This counter is available in the VM via process performance counters. It represents the actual amount of physical memory being used by SQL Server process in the VM. SQL Server – Buffer Cache Hit Ratio: This counter is available in the VM via SQL Server counters. This represents the paging being done by SQL Server. A rate of 90% or higher is desirable. Conclusion These blog posts are a quick start to a story that will be developing more in the near future. We’re still continuing our testing and investigations to provide more detailed configuration guidelines with example performance numbers with a white paper in the upcoming months. Now it’s time to give SQL Server and Hyper-V Dynamic Memory a try. Use this guidelines to kick-start your environment. See what you think about it and let us know of your experiences. - Serdar Sutay Originally posted at http://blogs.msdn.com/b/sqlosteam/

    Read the article

  • SPARC T3-1 Record Results Running JD Edwards EnterpriseOne Day in the Life Benchmark with Added Batch Component

    - by Brian
    Using Oracle's SPARC T3-1 server for the application tier and Oracle's SPARC Enterprise M3000 server for the database tier, a world record result was produced running the Oracle's JD Edwards EnterpriseOne applications Day in the Life benchmark run concurrently with a batch workload. The SPARC T3-1 server based result has 25% better performance than the IBM Power 750 POWER7 server even though the IBM result did not include running a batch component. The SPARC T3-1 server based result has 25% better space/performance than the IBM Power 750 POWER7 server as measured by the online component. The SPARC T3-1 server based result is 5x faster than the x86-based IBM x3650 M2 server system when executing the online component of the JD Edwards EnterpriseOne 9.0.1 Day in the Life benchmark. The IBM result did not include a batch component. The SPARC T3-1 server based result has 2.5x better space/performance than the x86-based IBM x3650 M2 server as measured by the online component. The combination of SPARC T3-1 and SPARC Enterprise M3000 servers delivered a Day in the Life benchmark result of 5000 online users with 0.875 seconds of average transaction response time running concurrently with 19 Universal Batch Engine (UBE) processes at 10 UBEs/minute. The solution exercises various JD Edwards EnterpriseOne applications while running Oracle WebLogic Server 11g Release 1 and Oracle Web Tier Utilities 11g HTTP server in Oracle Solaris Containers, together with the Oracle Database 11g Release 2. The SPARC T3-1 server showed that it could handle the additional workload of batch processing while maintaining the same number of online users for the JD Edwards EnterpriseOne Day in the Life benchmark. This was accomplished with minimal loss in response time. JD Edwards EnterpriseOne 9.0.1 takes advantage of the large number of compute threads available in the SPARC T3-1 server at the application tier and achieves excellent response times. The SPARC T3-1 server consolidates the application/web tier of the JD Edwards EnterpriseOne 9.0.1 application using Oracle Solaris Containers. Containers provide flexibility, easier maintenance and better CPU utilization of the server leaving processing capacity for additional growth. A number of Oracle advanced technology and features were used to obtain this result: Oracle Solaris 10, Oracle Solaris Containers, Oracle Java Hotspot Server VM, Oracle WebLogic Server 11g Release 1, Oracle Web Tier Utilities 11g, Oracle Database 11g Release 2, the SPARC T3 and SPARC64 VII+ based servers. This is the first published result running both online and batch workload concurrently on the JD Enterprise Application server. No published results are available from IBM running the online component together with a batch workload. The 9.0.1 version of the benchmark saw some minor performance improvements relative to 9.0. When comparing between 9.0.1 and 9.0 results, the reader should take this into account when the difference between results is small. Performance Landscape JD Edwards EnterpriseOne Day in the Life Benchmark Online with Batch Workload This is the first publication on the Day in the Life benchmark run concurrently with batch jobs. The batch workload was provided by Oracle's Universal Batch Engine. System RackUnits Online Users Resp Time (sec) BatchConcur(# of UBEs) BatchRate(UBEs/m) Version SPARC T3-1, 1xSPARC T3 (1.65 GHz), Solaris 10 M3000, 1xSPARC64 VII+ (2.86 GHz), Solaris 10 4 5000 0.88 19 10 9.0.1 Resp Time (sec) — Response time of online jobs reported in seconds Batch Concur (# of UBEs) — Batch concurrency presented in the number of UBEs Batch Rate (UBEs/m) — Batch transaction rate in UBEs/minute. JD Edwards EnterpriseOne Day in the Life Benchmark Online Workload Only These results are for the Day in the Life benchmark. They are run without any batch workload. System RackUnits Online Users ResponseTime (sec) Version SPARC T3-1, 1xSPARC T3 (1.65 GHz), Solaris 10 M3000, 1xSPARC64 VII (2.75 GHz), Solaris 10 4 5000 0.52 9.0.1 IBM Power 750, 1xPOWER7 (3.55 GHz), IBM i7.1 4 4000 0.61 9.0 IBM x3650M2, 2xIntel X5570 (2.93 GHz), OVM 2 1000 0.29 9.0 IBM result from http://www-03.ibm.com/systems/i/advantages/oracle/, IBM used WebSphere Configuration Summary Hardware Configuration: 1 x SPARC T3-1 server 1 x 1.65 GHz SPARC T3 128 GB memory 16 x 300 GB 10000 RPM SAS 1 x Sun Flash Accelerator F20 PCIe Card, 92 GB 1 x 10 GbE NIC 1 x SPARC Enterprise M3000 server 1 x 2.86 SPARC64 VII+ 64 GB memory 1 x 10 GbE NIC 2 x StorageTek 2540 + 2501 Software Configuration: JD Edwards EnterpriseOne 9.0.1 with Tools 8.98.3.3 Oracle Database 11g Release 2 Oracle 11g WebLogic server 11g Release 1 version 10.3.2 Oracle Web Tier Utilities 11g Oracle Solaris 10 9/10 Mercury LoadRunner 9.10 with Oracle Day in the Life kit for JD Edwards EnterpriseOne 9.0.1 Oracle’s Universal Batch Engine - Short UBEs and Long UBEs Benchmark Description JD Edwards EnterpriseOne is an integrated applications suite of Enterprise Resource Planning (ERP) software. Oracle offers 70 JD Edwards EnterpriseOne application modules to support a diverse set of business operations. Oracle's Day in the Life (DIL) kit is a suite of scripts that exercises most common transactions of JD Edwards EnterpriseOne applications, including business processes such as payroll, sales order, purchase order, work order, and other manufacturing processes, such as ship confirmation. These are labeled by industry acronyms such as SCM, CRM, HCM, SRM and FMS. The kit's scripts execute transactions typical of a mid-sized manufacturing company. The workload consists of online transactions and the UBE workload of 15 short and 4 long UBEs. LoadRunner runs the DIL workload, collects the user’s transactions response times and reports the key metric of Combined Weighted Average Transaction Response time. The UBE processes workload runs from the JD Enterprise Application server. Oracle's UBE processes come as three flavors: Short UBEs < 1 minute engage in Business Report and Summary Analysis, Mid UBEs > 1 minute create a large report of Account, Balance, and Full Address, Long UBEs > 2 minutes simulate Payroll, Sales Order, night only jobs. The UBE workload generates large numbers of PDF files reports and log files. The UBE Queues are categorized as the QBATCHD, a single threaded queue for large UBEs, and the QPROCESS queue for short UBEs run concurrently. One of the Oracle Solaris Containers ran 4 Long UBEs, while another Container ran 15 short UBEs concurrently. The mixed size UBEs ran concurrently from the SPARC T3-1 server with the 5000 online users driven by the LoadRunner. Oracle’s UBE process performance metric is Number of Maximum Concurrent UBE processes at transaction rate, UBEs/minute. Key Points and Best Practices Two JD Edwards EnterpriseOne Application Servers and two Oracle Fusion Middleware WebLogic Servers 11g R1 coupled with two Oracle Fusion Middleware 11g Web Tier HTTP Server instances on the SPARC T3-1 server were hosted in four separate Oracle Solaris Containers to demonstrate consolidation of multiple application and web servers. See Also SPARC T3-1 oracle.com SPARC Enterprise M3000 oracle.com Oracle Solaris oracle.com JD Edwards EnterpriseOne oracle.com Oracle Database 11g Release 2 Enterprise Edition oracle.com Disclosure Statement Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 6/27/2011.

    Read the article

  • People, Process & Engagement: WebCenter Partner Keste

    - by Michael Snow
    v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Within the WebCenter group here at Oracle, discussions about people, process and engagement cross over many vertical industries and products. Amidst our growing partner ecosystem, the community provides us insight into great customer use cases every day. Such is the case with our partner, Keste, who provides us a guest post on our blog today with an overview of their innovative solution for a customer in the transportation industry. Keste is an Oracle software solutions and development company headquartered in Dallas, Texas. As a Platinum member of the Oracle® PartnerNetwork, Keste designs, develops and deploys custom solutions that automate complex business processes. Seamless Customer Self-Service Experience in the Trucking Industry with Oracle WebCenter Portal  Keste, Oracle Platinum Partner Customer Overview Omnitracs, Inc., a Qualcomm company provides mobility solutions for trucking fleets to companies in the transportation industry. Omnitracs’ mobility services include basic communications such as text as well as advanced monitoring services such as GPS tracking, temperature tracking of perishable goods, load tracking and weighting distribution, and many others. Customer Business Needs Already the leading provider of mobility solutions for large trucking fleets, they chose to target smaller trucking fleets as new customers. However their existing high-touch customer support method would not be a cost effective or scalable method to manage and service these smaller customers. Omnitracs needed to provide several self-service features to make customer support more scalable while keeping customer satisfaction levels high and the costs manageable. The solution also had to be very intuitive and easy to use. The systems that Omnitracs sells to these trucking customers require professional installation and smaller customers need to track and schedule the installation. Information captured in Oracle eBusiness Suite needed to be readily available for new customers to track these purchases and delivery details. Omnitracs wanted a high impact User Interface to significantly improve customer experience with the ability to integrate with EBS, provisioning systems as well as CRM systems that were already implemented. Omnitracs also wanted to build an architecture platform that could potentially be extended to other Portals. Omnitracs’ stated goal was to deliver an “eBay-like” or “Amazon-like” experience for all of their customers so that they could reach a much broader market beyond their large company customer base. Solution Overview In order to manage the increased complexity, the growing support needs of global customers and improve overall product time-to-market in a cost-effective manner, IT began to deliver a self-service model. This self service model not only transformed numerous business processes but is also allowing the business to keep up with the growing demands of the (internal and external) customers. This solution was a customer service Portal that provided self service capabilities for large and small customers alike for Activation of mobility products, managing add-on applications for the devices (much like the Apple App Store), transferring services when trucks are sold to other companies as well as deactivation all without the involvement of a call service agent or sending multiple emails to different Omnitracs contacts. This is a conceptual view of the Customer Portal showing the details of the components that make up the solution. 12.00 The portal application for transactions was entirely built using ADF 11g R2. Omnitracs’ business had a pressing requirement to have a portal available 24/7 for its customers. Since there were interactions with EBS in the back-end, the downtimes on the EBS would negate this availability. Omnitracs devised a decoupling strategy at the database side for the EBS data. The decoupling of the database was done using Oracle Data Guard and completely insulated the solution from any eBusiness Suite down time. The customer has no knowledge whether eBS is running or not. Here are two sample screenshots of the portal application built in Oracle ADF. Customer Benefits The Customer Portal not only provided the scalability to grow the business but also provided the seamless integration with other disparate applications. Some of the key benefits are: Improved Customer Experience: With a modern look and feel and a Portal that has the aspects of an App Store, the customer experience was significantly improved. Page response times went from several seconds to sub-second for all of the pages. Enabled new product launches: After successfully dominating the large fleet market, Omnitracs now has a scalable solution to sell and manage smaller fleet customers giving them a huge advantage over their nearest competitors. Dozens of new customers have been acquired via this portal through an onboarding process that now takes minutes Seamless Integrations Improves Customer Support: ADF 11gR2 allowed Omnitracs to bring a diverse list of applications into one integrated solution. This provided a seamless experience for customers to route them from Marketing focused application to a customer-oriented portal. Internally, it also allowed Sales Representatives to have an integrated flow for taking a prospect through the various steps to onboard them as a customer. Key integrations included: Unity Core Salesforce.com Merchant e-Solution for credit card Custom Omnitracs Applications like CUPS and AUTO Security utilizing OID and OVD Back end integration with EBS (Data Guard) and iQ Database Business Impact Significant business impacts were realized through the launch of customer portal. It not only allows the business to push through in underserved segments, but also reduces the time it needs to spend on customer support—allowing the business to focus more on sales and identifying the market for new products. Some of the Immediate Benefits are The entire onboarding process is now completely automated and now completes in minutes. This represents an 85% productivity improvement over their previous processes. And it was 160 times faster! With the success of this self-service solution, the business is now targeting about 3X customer growth in the next five years. This represents a tripling of their overall customer base and significant downstream revenue for the ongoing services. 90%+ improvement of customer onboarding and management process by utilizing, single sign on integration using OID/OAM solution, performance improvements and new self-service functionality Unified login for all Customers, Partners and Internal Users enables login to a common portal and seamless access to all other integrated applications targeted at the respective audience Significantly improved customer experience with a better look and feel with a more user experience focused Portal screens. Helped sales of the new product by having an easy way of ordering and activating the product. Data Guard helped increase availability of the Portal to 99%+ and make it independent of EBS downtime. This gave customers the feel of high availability of the portal application. Some of the anticipated longer term Benefits are: Platform that can be leveraged to launch any new product introduction and enable all product teams to reach new customers and new markets Easy integration with content management to allow business owners more control of the product catalog Overall reduced TCO with standardization of the Oracle platform Managed IT support cost savings through optimization of technology skills needed to support and modify this solution ------------------------------------------------------------ 12.00 Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 -"/ /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Times New Roman","serif";}

    Read the article

  • Design pattern for cost calculator app?

    - by Anders Svensson
    Hi, I have a problem that I’ve tried to get help for before, but I wasn’t able to solve it then, so I’m trying to simplify the problem now to see if I can get some more concrete help with this because it is driving me crazy… Basically, I have a working (more complex) version of this application, which is a project cost calculator. But because I am at the same time trying to learn to design my applications better, I would like some input on how I could improve this design. Basically the main thing I want is input on the conditionals that (here) appear repeated in two places. The suggestions I got before was to use the strategy pattern or factory pattern. I also know about the Martin Fowler book with the suggestion to Refactor conditional with polymorphism. I understand that principle in his simpler example. But how can I do either of these things here (if any would be suitable)? The way I see it, the calculation is dependent on a couple of conditions: 1. What kind of service is it, writing or analysis? 2. Is the project small, medium or large? (Please note that there may be other parameters as well, equally different, such as “are the products new or previously existing?” So such parameters should be possible to add, but I tried to keep the example simple with only two parameters to be able to get concrete help) So refactoring with polymorphism would imply creating a number of subclasses, which I already have for the first condition (type of service), and should I really create more subclasses for the second condition as well (size)? What would that become, AnalysisSmall, AnalysisMedium, AnalysisLarge, WritingSmall, etc…??? No, I know that’s not good, I just don’t see how to work with that pattern anyway else? I see the same problem basically for the suggestions of using the strategy pattern (and the factory pattern as I see it would just be a helper to achieve the polymorphism above). So please, if anyone has concrete suggestions as to how to design these classes the best way I would be really grateful! Please also consider whether I have chosen the objects correctly too, or if they need to be redesigned. (Responses like "you should consider the factory pattern" will obviously not be helpful... I've already been down that road and I'm stumped at precisely how in this case) Regards, Anders The code (very simplified, don’t mind the fact that I’m using strings instead of enums, not using a config file for data etc, that will be done as necessary in the real application once I get the hang of these design problems): public abstract class Service { protected Dictionary<string, int> _hours; protected const int SMALL = 2; protected const int MEDIUM = 8; public int NumberOfProducts { get; set; } public abstract int GetHours(); } public class Writing : Service { public Writing(int numberOfProducts) { NumberOfProducts = numberOfProducts; _hours = new Dictionary<string, int> { { "small", 125 }, { "medium", 100 }, { "large", 60 } }; } public override int GetHours() { if (NumberOfProducts <= SMALL) return _hours["small"] * NumberOfProducts; if (NumberOfProducts <= MEDIUM) return (_hours["small"] * SMALL) + (_hours["medium"] * (NumberOfProducts - SMALL)); return (_hours["small"] * SMALL) + (_hours["medium"] * (MEDIUM - SMALL)) + (_hours["large"] * (NumberOfProducts - MEDIUM)); } } public class Analysis : Service { public Analysis(int numberOfProducts) { NumberOfProducts = numberOfProducts; _hours = new Dictionary<string, int> { { "small", 56 }, { "medium", 104 }, { "large", 200 } }; } public override int GetHours() { if (NumberOfProducts <= SMALL) return _hours["small"]; if (NumberOfProducts <= MEDIUM) return _hours["medium"]; return _hours["large"]; } } public partial class Form1 : Form { public Form1() { InitializeComponent(); List<int> quantities = new List<int>(); for (int i = 0; i < 100; i++) { quantities.Add(i); } comboBoxNumberOfProducts.DataSource = quantities; } private void comboBoxNumberOfProducts_SelectedIndexChanged(object sender, EventArgs e) { Service writing = new Writing((int) comboBoxNumberOfProducts.SelectedItem); Service analysis = new Analysis((int) comboBoxNumberOfProducts.SelectedItem); labelWriterHours.Text = writing.GetHours().ToString(); labelAnalysisHours.Text = analysis.GetHours().ToString(); } }

    Read the article

  • Design advice for avoiding change in several classes

    - by Anders Svensson
    Hi, I'm trying to figure out how to design a small application more elegantly, and make it more resistant to change. Basically it is a sort of project price calculator, and the problem is that there are many parameters that can affect the pricing. I'm trying to avoid cluttering the code with a lot of if-clauses for each parameter, but still I have e.g. if-clauses in two places checking for the value of the size parameter. I have the Head First Design Patterns book, and have tried to find ideas there, but the closest I got was the decorator pattern, which has an example where starbuzz coffee sets prices depending first on condiments added, and then later in an exercise by adding a size parameter (Tall, Grande, Venti). But that didn't seem to help, because adding that parameter still seemed to add if-clause complexity in a lot of places (and this being an exercise they didn't explain that further). What I am trying to avoid is having to change several classes if a parameter were to change or a new parameter added, or at least change in as few places as possible (there's some fancy design principle word for this that I don't rememeber :-)). Here below is the code. Basically it calculates the price for a project that has the tasks "Writing" and "Analysis" with a size parameter and different pricing models. There will be other parameters coming in later too, like "How new is the product?" (New, 1-5 years old, 6-10 years old), etc. Any advice on the best design would be greatly appreciated, whether a "design pattern" or just good object oriented principles that would make it resistant to change (e.g. adding another size, or changing one of the size values, and only have to change in one place rather than in several if-clauses): public class Project { private readonly int _numberOfProducts; protected Size _size; public Task Analysis { get; set; } public Task Writing { get; set; } public Project(int numberOfProducts) { _numberOfProducts = numberOfProducts; _size = GetSize(); Analysis = new AnalysisTask(numberOfProducts, _size); Writing = new WritingTask(numberOfProducts, _size); } private Size GetSize() { if (_numberOfProducts <= 2) return Size.small; if (_numberOfProducts <= 8) return Size.medium; return Size.large; } public double GetPrice() { return Analysis.GetPrice() + Writing.GetPrice(); } } public abstract class Task { protected readonly int _numberOfProducts; protected Size _size; protected double _pricePerHour; protected Dictionary<Size, int> _hours; public abstract int TotalHours { get; } public double Price { get; set; } protected Task(int numberOfProducts, Size size) { _numberOfProducts = numberOfProducts; _size = size; } public double GetPrice() { return _pricePerHour * TotalHours; } } public class AnalysisTask : Task { public AnalysisTask(int numberOfProducts, Size size) : base(numberOfProducts, size) { _pricePerHour = 850; _hours = new Dictionary<Size, int>() { { Size.small, 56 }, { Size.medium, 104 }, { Size.large, 200 } }; } public override int TotalHours { get { return _hours[_size]; } } } public class WritingTask : Task { public WritingTask(int numberOfProducts, Size size) : base(numberOfProducts, size) { _pricePerHour = 650; _hours = new Dictionary<Size, int>() { { Size.small, 125 }, { Size.medium, 100 }, { Size.large, 60 } }; } public override int TotalHours { get { if (_size == Size.small) return _hours[_size] * _numberOfProducts; if (_size == Size.medium) return (_hours[Size.small] * 2) + (_hours[Size.medium] * (_numberOfProducts - 2)); return (_hours[Size.small] * 2) + (_hours[Size.medium] * (8 - 2)) + (_hours[Size.large] * (_numberOfProducts - 8)); } } } public enum Size { small, medium, large } public partial class Form1 : Form { public Form1() { InitializeComponent(); List<int> quantities = new List<int>(); for (int i = 0; i < 100; i++) { quantities.Add(i); } comboBoxNumberOfProducts.DataSource = quantities; } private void comboBoxNumberOfProducts_SelectedIndexChanged(object sender, EventArgs e) { Project project = new Project((int)comboBoxNumberOfProducts.SelectedItem); labelPrice.Text = project.GetPrice().ToString(); labelWriterHours.Text = project.Writing.TotalHours.ToString(); labelAnalysisHours.Text = project.Analysis.TotalHours.ToString(); } } At the end is a simple current calling code in the change event for a combobox that set size... (BTW, I don't like the fact that I have to use several dots to get to the TotalHours at the end here either, as far as I can recall, that violates the "principle of least knowledge" or "the law of demeter", so input on that would be appreciated too, but it's not the main point of the question) Regards, Anders

    Read the article

  • Big Data – Buzz Words: What is MapReduce – Day 7 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned what is Hadoop. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – MapReduce. What is MapReduce? MapReduce was designed by Google as a programming model for processing large data sets with a parallel, distributed algorithm on a cluster. Though, MapReduce was originally Google proprietary technology, it has been quite a generalized term in the recent time. MapReduce comprises a Map() and Reduce() procedures. Procedure Map() performance filtering and sorting operation on data where as procedure Reduce() performs a summary operation of the data. This model is based on modified concepts of the map and reduce functions commonly available in functional programing. The library where procedure Map() and Reduce() belongs is written in many different languages. The most popular free implementation of MapReduce is Apache Hadoop which we will explore tomorrow. Advantages of MapReduce Procedures The MapReduce Framework usually contains distributed servers and it runs various tasks in parallel to each other. There are various components which manages the communications between various nodes of the data and provides the high availability and fault tolerance. Programs written in MapReduce functional styles are automatically parallelized and executed on commodity machines. The MapReduce Framework takes care of the details of partitioning the data and executing the processes on distributed server on run time. During this process if there is any disaster the framework provides high availability and other available modes take care of the responsibility of the failed node. As you can clearly see more this entire MapReduce Frameworks provides much more than just Map() and Reduce() procedures; it provides scalability and fault tolerance as well. A typical implementation of the MapReduce Framework processes many petabytes of data and thousands of the processing machines. How do MapReduce Framework Works? A typical MapReduce Framework contains petabytes of the data and thousands of the nodes. Here is the basic explanation of the MapReduce Procedures which uses this massive commodity of the servers. Map() Procedure There is always a master node in this infrastructure which takes an input. Right after taking input master node divides it into smaller sub-inputs or sub-problems. These sub-problems are distributed to worker nodes. A worker node later processes them and does necessary analysis. Once the worker node completes the process with this sub-problem it returns it back to master node. Reduce() Procedure All the worker nodes return the answer to the sub-problem assigned to them to master node. The master node collects the answer and once again aggregate that in the form of the answer to the original big problem which was assigned master node. The MapReduce Framework does the above Map () and Reduce () procedure in the parallel and independent to each other. All the Map() procedures can run parallel to each other and once each worker node had completed their task they can send it back to master code to compile it with a single answer. This particular procedure can be very effective when it is implemented on a very large amount of data (Big Data). The MapReduce Framework has five different steps: Preparing Map() Input Executing User Provided Map() Code Shuffle Map Output to Reduce Processor Executing User Provided Reduce Code Producing the Final Output Here is the Dataflow of MapReduce Framework: Input Reader Map Function Partition Function Compare Function Reduce Function Output Writer In a future blog post of this 31 day series we will explore various components of MapReduce in Detail. MapReduce in a Single Statement MapReduce is equivalent to SELECT and GROUP BY of a relational database for a very large database. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – HDFS. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • OWB 11gR2 - Find and Search Metadata in Designer

    - by David Allan
    Here are some tools and techniques for finding objects, specifically in the design repository. There are ways of navigating and collating objects that are useful for day to day development and build-time usage - this includes features out of the box and utilities constructed on top. There are a variety of techniques to navigate and find objects in the repository, the first 3 are out of the box, the 4th is an expert utility. Navigating by the tree, grouping by project and module - ok if you are aware of the exact module/folder that objects reside in. The structure panel is a useful way of finding parts of an object, especially when large rather than using the canvas. In large scale projects it helps to have accelerators (either find or collections below). Advanced find to search by name - 11gR2 included a find capability specifically for large scale projects. There were improvements in both the tree search and the object editors (including highlighting in mapping for example). So you can now do regular expression based search and quickly navigate to objects within a repository. Collections - logically organize your objects into virtual folders by shortcutting the actual objects. This is useful for a range of things since all the OWB services operate on collections too (export/import, validation, deployment). See the post here for new collection functionality in 11gR2. Reports for searching by type, updated on, updated by etc. Useful for activities such as periodic incremental actions (deploy all mappings changed in the past week). The report style view is useful since I can quickly see who changed what and when. You can see all the audit details for objects within each objects property inspector, but its useful to just get all objects changed today or example, all objects changed since my last build etc. This utility combines both UI extensions via experts and the public views on the repository. In the figure to the right you see the contextual option 'Object Search' which invokes the utility, you can see I have quite a number of modules within my project. Figure out all the potential objects which have been changed is not simple. The utility is an expert which provides this kind of search capability. The utility provides a report of the objects in the design repository which satisfy some filter criteria. The type of criteria includes; objects updated in the last n days optionally filter the objects updated by user filter the user by project and by type (table/mappings etc.) The search dialog appears with these options, you can multi-select the object types, so for example you can select TABLE and MAPPING. Its also possible to search across projects if need be. If you have multiple users using the repository you can define the OWB user name in the 'Updated by' property to restrict the report to just that user also. Finally there is a search name that will be used for some of the options such as building a collection - this name is used for the collection to be built. In the example I have done, I've just searched my project for all process flows and mappings that users have updated in the last 7 days. The results of the query are returned in a table containing the object names, types, full path and audit details. The columns are sort-able, you can sort the results by name, type, path etc. One of the cool things here, is that you can then perform operations on these objects - such as edit them, export single selection or entire results to MDL, create a collection from the results (now you have a saved set of references in the repository, you could do deploy/export etc.), create a deployment script from the results...or even add in your own ideas! You see from this that you can do bulk operations on sets of objects based on search results. So for example selecting the 'Build Collection' option creates a collection with all of the objects from my search, you can subsequently deploy/generate/maintain this collection of objects. Under the hood of the expert if just basic OMB commands from the product and the use of the public views on the design repository. You can see how easy it is to build up macro-like capabilities that will help you do day-to-day as well as build like tasks on sets of objects.

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • My Take on Hadoop World 2011

    - by Jean-Pierre Dijcks
    I’m sure some of you have read pieces about Hadoop World and I did see some headlines which were somewhat, shall we say, interesting? I thought the keynote by Larry Feinsmith of JP Morgan Chase & Co was one of the highlights of the conference for me. The reason was very simple, he addressed some real use cases outside of internet and ad platforms. The following are my notes, since the keynote was recorded I presume you can go and look at Hadoopworld.com at some point… On the use cases that were mentioned: ETL – how can I do complex data transformation at scale Doing Basel III liquidity analysis Private banking – transaction filtering to feed [relational] data marts Common Data Platform – a place to keep data that is (or will be) valuable some day, to someone, somewhere 360 Degree view of customers – become pro-active and look at events across lines of business. For example make sure the mortgage folks know about direct deposits being stopped into an account and ensure the bank is pro-active to service the customer Treasury and Security – Global Payment Hub [I think this is really consolidation of data to cross reference activity across business and geographies] Data Mining Bypass data engineering [I interpret this as running a lot of a large data set rather than on samples] Fraud prevention – work on event triggers, say a number of failed log-ins to the website. When they occur grab web logs, firewall logs and rules and start to figure out who is trying to log in. Is this me, who forget his password, or is it someone in some other country trying to guess passwords Trade quality analysis – do a batch analysis or all trades done and run them through an analysis or comparison pipeline One of the key requests – if you can say it like that – was for vendors and entrepreneurs to make sure that new tools work with existing tools. JPMC has a large footprint of BI Tools and Big Data reporting and tools should work with those tools, rather than be separate. Security and Entitlement – how to protect data within a large cluster from unwanted snooping was another topic that came up. I thought his Elephant ears graph was interesting (couldn’t actually read the points on it, but the concept certainly made some sense) and it was interesting – when asked to show hands – how the audience did not (!) think that RDBMS and Hadoop technology would overlap completely within a few years. Another interesting session was the session from Disney discussing how Disney is building a DaaS (Data as a Service) platform and how Hadoop processing capabilities are mixed with Database technologies. I thought this one of the best sessions I have seen in a long time. It discussed real use case, where problems existed, how they were solved and how Disney planned some of it. The planning focused on three things/phases: Determine the Strategy – Design a platform and evangelize this within the organization Focus on the people – Hire key people, grow and train the staff (and do not overload what you have with new things on top of their day-to-day job), leverage a partner with experience Work on Execution of the strategy – Implement the platform Hadoop next to the other technologies and work toward the DaaS platform This kind of fitted with some of the Linked-In comments, best summarized in “Think Platform – Think Hadoop”. In other words [my interpretation], step back and engineer a platform (like DaaS in the Disney example), then layer the rest of the solutions on top of this platform. One general observation, I got the impression that we have knowledge gaps left and right. On the one hand are people looking for more information and details on the Hadoop tools and languages. On the other I got the impression that the capabilities of today’s relational databases are underestimated. Mostly in terms of data volumes and parallel processing capabilities or things like commodity hardware scale-out models. All in all I liked this conference, it was great to chat with a wide range of people on Oracle big data, on big data, on use cases and all sorts of other stuff. Just hope they get a set of bigger rooms next time… and yes, I hope I’m going to be back next year!

    Read the article

< Previous Page | 69 70 71 72 73 74 75 76 77 78 79 80  | Next Page >