Search Results

Search found 68 results on 3 pages for 'kk'.

Page 3/3 | < Previous Page | 1 2 3 

  • How to tweak the performance of Bit blit on Barco monitors?

    - by krishna
    Hi, The performance of bit blit on Small monitor(16 bpp,60Hz,1280X1024 resolution) it gives 0.9909ms. The performance of big monitors(8bpp,60hz,2048X5260) it gives 52.315ms . I use SRCCOPY to do the bit blit operation.how we can optimize the performance of bit blit on big monitor? Please share your thoughts. Thanks kk

    Read the article

  • Performance Tuning a High-Load Apache Server

    - by futureal
    I am looking to understand some server performance problems I am seeing with a (for us) heavily loaded web server. The environment is as follows: Debian Lenny (all stable packages + patched to security updates) Apache 2.2.9 PHP 5.2.6 Amazon EC2 large instance The behavior we're seeing is that the web typically feels responsive, but with a slight delay to begin handling a request -- sometimes a fraction of a second, sometimes 2-3 seconds in our peak usage times. The actual load on the server is being reported as very high -- often 10.xx or 20.xx as reported by top. Further, running other things on the server during these times (even vi) is very slow, so the load is definitely up there. Oddly enough Apache remains very responsive, other than that initial delay. We have Apache configured as follows, using prefork: StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 150 MaxRequestsPerChild 0 And KeepAlive as: KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 5 Looking at the server-status page, even at these times of heavy load we are rarely hitting the client cap, usually serving between 80-100 requests and many of those in the keepalive state. That tells me to rule out the initial request slowness as "waiting for a handler" but I may be wrong. Amazon's CloudWatch monitoring tells me that even when our OS is reporting a load of 15, our instance CPU utilization is between 75-80%. Example output from top: top - 15:47:06 up 31 days, 1:38, 8 users, load average: 11.46, 7.10, 6.56 Tasks: 221 total, 28 running, 193 sleeping, 0 stopped, 0 zombie Cpu(s): 66.9%us, 22.1%sy, 0.0%ni, 2.6%id, 3.1%wa, 0.0%hi, 0.7%si, 4.5%st Mem: 7871900k total, 7850624k used, 21276k free, 68728k buffers Swap: 0k total, 0k used, 0k free, 3750664k cached The majority of the processes look like: 24720 www-data 15 0 202m 26m 4412 S 9 0.3 0:02.97 apache2 24530 www-data 15 0 212m 35m 4544 S 7 0.5 0:03.05 apache2 24846 www-data 15 0 209m 33m 4420 S 7 0.4 0:01.03 apache2 24083 www-data 15 0 211m 35m 4484 S 7 0.5 0:07.14 apache2 24615 www-data 15 0 212m 35m 4404 S 7 0.5 0:02.89 apache2 Example output from vmstat at the same time as the above: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 8 0 0 215084 68908 3774864 0 0 154 228 5 7 32 12 42 9 6 21 0 198948 68936 3775740 0 0 676 2363 4022 1047 56 16 9 15 23 0 0 169460 68936 3776356 0 0 432 1372 3762 835 76 21 0 0 23 1 0 140412 68936 3776648 0 0 280 0 3157 827 70 25 0 0 20 1 0 115892 68936 3776792 0 0 188 8 2802 532 68 24 0 0 6 1 0 133368 68936 3777780 0 0 752 71 3501 878 67 29 0 1 0 1 0 146656 68944 3778064 0 0 308 2052 3312 850 38 17 19 24 2 0 0 202104 68952 3778140 0 0 28 90 2617 700 44 13 33 5 9 0 0 188960 68956 3778200 0 0 8 0 2226 475 59 17 6 2 3 0 0 166364 68956 3778252 0 0 0 21 2288 386 65 19 1 0 And finally, output from Apache's server-status: Server uptime: 31 days 2 hours 18 minutes 31 seconds Total accesses: 60102946 - Total Traffic: 974.5 GB CPU Usage: u209.62 s75.19 cu0 cs0 - .0106% CPU load 22.4 requests/sec - 380.3 kB/second - 17.0 kB/request 107 requests currently being processed, 6 idle workers C.KKKW..KWWKKWKW.KKKCKK..KKK.KKKK.KK._WK.K.K.KKKKK.K.R.KK..C.C.K K.C.K..WK_K..KKW_CK.WK..W.KKKWKCKCKW.W_KKKKK.KKWKKKW._KKK.CKK... KK_KWKKKWKCKCWKK.KKKCK.......................................... ................................................................ From my limited experience I draw the following conclusions/questions: We may be allowing far too many KeepAlive requests I do see some time spent waiting for IO in the vmstat although not consistently and not a lot (I think?) so I am not sure this is a big concern or not, I am less experienced with vmstat Also in vmstat, I see in some iterations a number of processes waiting to be served, which is what I am attributing the initial page load delay on our web server to, possibly erroneously We serve a mixture of static content (75% or higher) and script content, and the script content is often fairly processor intensive, so finding the right balance between the two is important; long term we want to move statics elsewhere to optimize both servers but our software is not ready for that today I am happy to provide additional information if anybody has any ideas, the other note is that this is a high-availability production installation so I am wary of making tweak after tweak, and is why I haven't played with things like the KeepAlive value myself yet.

    Read the article

  • Connection Reset by Peer error with Apache and JBoss 7.1.1

    - by vikingz
    We are seeing errors on some of our QA testing scripts that intermittently throw Connection Reset By Peer errors. The Test scripts submit requests via F5 which forwards requests to Apache (2.2.21) with a mod_jk load_balancer with the following setting for each worker in the worker.property worker1 props worker.worker1.type=ajp13 worker.worker1.port=8109 worker.worker1.lbfactor=1 worker.worker1.host=skunkhost1.com worker.worker1.connection_pool_timeout=30 and here is what is in the JBoss domain.xml for the AJP port from JBoss 7.1.1 <unbounded-queue-thread-pool name="SKUNKY.APP.AJP"> <max-threads count="300"/> <keepalive-time time="3" unit="minutes"/> </unbounded-queue-thread-pool> Here is httpd.conf Timeout 300 KeepAlive On KeepAliveTimeout 15 MaxKeepAliveRequests 100 TraceEnable Off My question is that is it posisbe that apache times out and closes the connection while jboss is still ready and working on the request? What might be causing the Connection Reset By Peer error?what am i missing here? Any help is majorly appreciated!! Sincerely KK

    Read the article

  • apache port number

    - by user983223
    For each development sites I want to have a unique port number. For instance, domain.com:1234 This is what I have in my httpd.conf file. After restart the page domain.com:1234 is not showing in the browser. Is there anything else that I need to do besides what I have already done to make this work? Listen *:1234 <VirtualHost *:1234> DocumentRoot /var/www/dev_sites/test ServerName domain.com:1234 </VirtualHost> It looks like if I go to my local hostname (kk.local:1234) it shows. Is there some sort of dns that I need to do? I really don't want to go into godaddy everytime I add a development site. Is there a way around that?

    Read the article

  • Subdomain still times out after set up a month ago

    - by user8137
    I'm a newbie on this and this has probably been asked already but the subjects online were close but too vague in there answers so I've probably really messed this up. I would really appreciate specific step by step instructions. This is what I'd like to do: use the subdomain www.high-res.domain.com to be accessed by external customers with specific permissions to access the site (like ftp). We use Network Solutions to house domain.com. We recently added a new ip address to point to www.high-res.domain.com. I gave the ip address to the company that hosts our website. I pinged www.high-res.domain.com and it points to the correct ip address but still times out. It’s been a few weeks now and when you ping it, it still times out. C:ping XXX.XXX.X.XXX Pinging XXX.XXX.X.XXX with 32 bytes of data: Request timed out. Request timed out. Request timed out. Request timed out. Ping statistics for XXX.XXX.X.XXX: Packets: Sent = 4, Received = 0, Lost = 4 (100% loss). Tracert times out as well. I even went to DNS tools and a few other sites for checking this and it shows the same thing. I recently went into the DNSmgmt on our server (wink2k3sp1) and created an A record under the DomainDnsZones which translated to a Cname when you look at it. Under the Domain it has two entries one to the subdomain and the other to the website host each with separate ip addresses. Is this correct? The website people are too busy on another project to research it further and my friends haven't gotten back to me. Please help. Thanks KK

    Read the article

  • XSLT and possible alternatives [on hold]

    - by wirrbel
    I had a look at XSLT for transforming one XML file into another one (HTML, etc.). Now while I see that there are benefits to XSLT (being a standardized and used tool) I am reluctant for a couple of reasons XSLT processors seem to be quite huge / resource hungry XML is a bad notation for programming and thats what XSLT is all about. It do not want to troll XSLT here though I just want to point out what I dislike about it to give you an idea of what I would expect from an alternative. Having some Lisp background I wonder whether there are better ways for tree-structure transformations based upon some lisp. I have seen references to DSSSL, sadly most links about DSSSL are dead so its already challenging to see some code that illustrates it. Is DSSSL still in use? I remember that I had installed openjade once when checking out docbook stuff. Jeff Atwood's blog post seems to hint upon using Ruby instead of XSLT. Are there any sane ways to do XML transformations similar to XSLT in a non-xml programming language? I would be open for input on Useful libraries for scripting languages that facilitate XML transformations especially (but not exclusively) lisp-like transformation languages, or Ruby, etc. A few things I found so far: A couple of places on the web have pointed out Linq as a possible alternative. Quite generally I any kind of classifications, also from those who have had the best XSLT experience. For scheme http://cs.brown.edu/~sk/Publications/Papers/Published/kk-sxslt/ and http://www.okmij.org/ftp/Scheme/xml.html

    Read the article

  • Select child of earlier clicked item in jQuery

    - by koko
    I have clicked on a div. In this div is an image: <div class="grid_2 shareContent" id="facebook_45"> <a href="#"><img class="facebook" src="http://roepingen.kk/skins/admin/default/images/social/facebook.png" alt="Facebook not shared" width="32px" height="32px" /></a> </div> How can I change the image in the div? I have the clicked item saved in the variable 'clicked'. If possible I'd like to delete the link around the image also.

    Read the article

  • Google search results are invalid

    - by Rufus
    I'm writing a program that lets a user perform a Google search. When the result comes back, all of the links in the search results are links not to other sites but to Google, and if the user clicks on one, the page is fetched not from the other site but from Google. Can anyone explain how to fix this problem? My Google URL consists of this: http://google.com/search?q=gargle But this is what I get back when the user clicks on the Wikipedia search result, which was http://www.google.com/url?q=http://en.wikipedia.org/wiki/Gargling&sa=U&ei=_4vkT5y555Wh6gGBeOzECg&ved=0CBMQejAe&usg=AFQjeNHd1eRV8Xef3LGeH6AvGxt-AF-Yjw <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html lang="en" dir="ltr" class="client-nojs" xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Gargling - Wikipedia, the free encyclopedia</title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta http-equiv="Content-Style-Type" content="text/css" /> <meta name="generator" content="MediaWiki 1.20wmf5" /> <meta http-equiv="last-modified" content="Fri, 09 Mar 2012 12:34:19 +0000" /> <meta name="last-modified-timestamp" content="1331296459" /> <meta name="last-modified-range" content="0" /> <link rel="alternate" type="application/x-wiki" title="Edit this page" > <link rel="edit" title="Edit this page" > <link rel="apple-touch-icon" > <link rel="shortcut icon" > <link rel="search" type="application/opensearchdescription+xml" > <link rel="EditURI" type="application/rsd+xml" > <link rel="copyright" > <link rel="alternate" type="application/atom+xml" title="Wikipedia Atom feed" > <link rel="stylesheet" href="//bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&amp;lang=en&amp;modules=ext.gadget.teahouse%7Cext.wikihiero%7Cmediawiki.legacy.commonPrint%2Cshared%7Cskins.vector&amp;only=styles&amp;skin=vector&amp;*" type="text/css" media="all" /> <style type="text/css" media="all">#mwe-lastmodified { display: none; }</style><meta name="ResourceLoaderDynamicStyles" content="" /> <link rel="stylesheet" href="//bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&amp;lang=en&amp;modules=site&amp;only=styles&amp;skin=vector&amp;*" type="text/css" media="all" /> <style type="text/css" media="all">a:lang(ar),a:lang(ckb),a:lang(fa),a:lang(kk-arab),a:lang(mzn),a:lang(ps),a:lang(ur){text-decoration:none} /* cache key: enwiki:resourceloader:filter:minify-css:7:d5a1bf6cbd05fc6cc2705e47f52062dc */</style>

    Read the article

  • How do I extract a postcode from one column in SSIS using regular expression

    - by Aphillippe
    I'm trying to use a custom regex clean transformation (information found here ) to extract a post code from a mixed address column (Address3) and move it to a new column (Post Code) Example of incoming data: Address3: "London W12 9LZ" Incoming data could be any combination of place names with a post code at the start, middle or end (or not at all). Desired outcome: Address3: "London" Post Code: "W12 9LZ" Essentially, in plain english, "move (not copy) any post code found from address3 into Post Code". My regex skills aren't brilliant but I've managed to get as far as extracting the post code and getting it into its own column using the following regex, matching from Address3 and replacing into Post Code: Match Expression: (?<stringOUT>([A-PR-UWYZa-pr-uwyz]([0-9]{1,2}|([A-HK-Ya-hk-y][0-9]|[A-HK-Ya-hk-y][0-9] ([0-9]|[ABEHMNPRV-Yabehmnprv-y]))|[0-9][A-HJKS-UWa-hjks-uw])\ {0,1}[0-9][ABD-HJLNP-UW-Zabd-hjlnp-uw-z]{2}|([Gg][Ii][Rr]\ 0[Aa][Aa])|([Ss][Aa][Nn]\ {0,1}[Tt][Aa]1)|([Bb][Ff][Pp][Oo]\ {0,1}([Cc]\/[Oo]\ )?[0-9]{1,4})|(([Aa][Ss][Cc][Nn]|[Bb][Bb][Nn][Dd]|[BFSbfs][Ii][Qq][Qq]|[Pp][Cc][Rr][Nn]|[Ss][Tt][Hh][Ll]|[Tt][Dd][Cc][Uu]|[Tt][Kk][Cc][Aa])\ {0,1}1[Zz][Zz]))) Replace Expression: ${stringOUT} So this leaves me with: Address3: "London W12 9LZ" Post Code: "W12 9LZ" My next thought is to keep the above match/replace, then add another to match anything that doesn't match the above regex. I think it might be a negative lookahead but I can't seem to make it work. I'm using SSIS 2008 R2 and I think the regex clean transformation uses .net regex implementation. Thanks.

    Read the article

  • How do I find a source code position from an address given by a crash in Window CE

    - by Shane MacLaughlin
    I have a Windows mobile 4.0 application, written using EVC++ 4.0 SP4 with MFC, that is exhibiting a random occasional crash in the field. e.g. Exception ox800000002 at 00112584. It does not happen under various emulators and simulators, hence is very difficult to trace using a debugger. The crash throws up and address and exception type. Given that I have the PDB is there any way to track this address to the source. I can't recompile using VC++ 8 as it doesn't support the mobile 4 SDK. My guess is that without a stack trace I'm not going to have much joy, as the chances are that the exception may not be in my source. Worth a try all the same. Edit As suggested, I have looked at the address in the context of the .MAP file for the program. This reveals the following Address Publics by Value Rva+Base Lib:Object 0001:00000000 ?GetUnduValue@@YANMM@Z 00011000 f 7Par.obj ' ' ' 0001:001124b8 ?OnLButtonUp@CGXGridUserDragSelectRangeImp@@UAAHPAVCGXGridCore@@AAVCPoint@@AAI@Z 001234b8 f gxseldrg.obj 0001:001126d8 ?OnSelDragStart@CGXGridUserDragSelectRangeImp@@UAAHPAVCGXGridCore@@KK@Z 001236d8 f gxseldrg.obj Which suggests the error occured during CGXGridUserDragSelectRangeImp::OnLButtonUp(), which seems a bit odd as I don't think there was a mouse / keyboard / screen button pressed at the time. Could be the stack got fragged before the crash got reported, and I'm wasting my time. I'll recompile with assembler output to try to isolate it to a given line, but don't hold out much hope :( Does the fact that the map file reports segmented addresses e.g. 0001:xxxxxxxxx and the crash report unsegmented addresses mean I have to carry out some computation to get the map address from the crash address?

    Read the article

  • Jquery- Get the value of first td in table

    - by Jerry
    Hello. I am trying to get the value of first td in each tr when a users clicks "click". The result below will output aa ,ee or ii. I was thinking about using clesest('tr')..but it always output "Object object". Not sure what to do on this one. Thanks. My html is <table> <tr> <td>aa</td> <td>bb</td> <td>cc</td> <td>dd</td> <td><a href="#" class="hit">click</a></td> </tr> <tr> <td>ee</td> <td>ff</td> <td>gg</td> <td>hh</td> <td><a href="#" class="hit">click</a></td> </tr> <tr> <td>ii</td> <td>jj</td> <td>kk</td> <td>ll</td> <td><a href="#" class="hit">click</a></td> </tr> </table> Jquery $(".hit").click(function(){ var value=$(this).// not sure what to do here alert(value) ; });

    Read the article

  • LVS Configuration issue (Using piranha Tool)

    - by PravinG
    I have configured LVS on cent os using piranha tool .I am using vip of internal n/w as gateway for real server we have two NIC one having exteranl Ip and other for internal n/w which is on 192.168.3.0/24 network. But I am not able to connect from client it shows connection refused error . Please suggest iptables rules for private n public n/w to communicate. May be I am missing this . Iptables rules that we have added are : iptables -t nat -A POSTROUTING -p tcp -s 192.168.3.0/24 --sport 5000 -j MASQUERADE this is my ipconfig: eth0 Link encap:Ethernet HWaddr 00:00:E8:F6:74:DA inet addr:122.166.233.133 Bcast:122.166.233.255 Mask:255.255.255.0 inet6 addr: fe80::200:e8ff:fef6:74da/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:94433 errors:0 dropped:0 overruns:0 frame:0 TX packets:130966 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9469972 (9.0 MiB) TX bytes:19929308 (19.0 MiB) Interrupt:16 Base address:0x2000 eth0:1 Link encap:Ethernet HWaddr 00:00:E8:F6:74:DA inet addr:122.166.233.136 Bcast:122.166.233.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:16 Base address:0x2000 eth1 Link encap:Ethernet HWaddr 00:E0:20:14:F9:2D inet addr:192.168.3.1 Bcast:192.168.3.255 Mask:255.255.255.0 inet6 addr: fe80::2e0:20ff:fe14:f92d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:123718 errors:0 dropped:0 overruns:0 frame:0 TX packets:148856 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:18738556 (17.8 MiB) TX bytes:11697153 (11.1 MiB) Interrupt:17 Memory:60000400-600004ff eth1:1 Link encap:Ethernet HWaddr 00:E0:20:14:F9:2D inet addr:192.168.3.10 Bcast:192.168.3.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:17 Memory:60000400-600004ff eth2 Link encap:Ethernet HWaddr 00:16:76:6E:D1:D2 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:21 Base address:0xa500 and ipvsadm -ln command [root@abts-kk-static-133 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 122.166.233.136:5000 wlc TCP 122.166.233.136:5004 wlc lvs server routing table Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 122.166.233.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 1004 0 0 eth1 0.0.0.0 122.166.233.1 0.0.0.0 UG 0 0 0 eth0 real 1 real 2 we have configured various ports from 5000:5008 . Do we need to this iptables for all ports? Suggest me how should I solve this issue.

    Read the article

  • jquery masked input with asp.net control problem

    - by Eyla
    Hi, I have problem while using jquery maskedinput with asp.net textbox. I have a check box that when I check will set a mask to the textbox and when uncheck it change the mask. the problem that when the focus is lost before the mask completed to be filled the text box will empty. how can I fix the problem??? here is my code: <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="server"> <title>Untitled Page</title> <script src="js/jquery-1.4.1.js" type="text/javascript"></script> <script src="js/jquery.maskedinput-1.2.2.js" type="text/javascript"></script> <script type="text/javascript"> function mycheck() { if ($('#<%=chk.ClientID %>').is(':checked')) { $("#<%=txt.ClientID %>").unmask(); $("#<%=txt.ClientID %>").val(""); $("#<%=txt.ClientID %>").mask("999999999999"); } else { $("#<%=txt.ClientID %>").unmask(); $("#<%=txt.ClientID %>").val(""); $("#<%=txt.ClientID %>").mask("(999)999-9999"); } } </script> <style type="text/css"> #form1 { margin-top: 0px; } </style> </head> <body> <form id="form1" runat="server"> <asp:ScriptManager ID="ScriptManager1" runat="server" /> <div> <p> <asp:CheckBox ID="chk" runat="server" CssClass="kk" onclick="mycheck()" /> </p> <p> <asp:TextBox ID="txt" runat="server" CssClass="tt" ></asp:TextBox> </p> </div> </form> </body> </html>

    Read the article

  • Can someone help me with this Java Chess game please?

    - by Chris Edwards
    Hey guys, Please can someone have a look at this code and let me know whether I am on the right track with the "check_somefigure_move"s and the "check_black/white_promotion"s please? And also any other help you can give would be greatly appreciated! Thanks! P.S. I know the code is not the best implementation, but its a template I have to follow :( Code: class Moves { private final Board B; private boolean regular; public Moves(final Board b) { B = b; regular = regular_position(); } public boolean get_regular_position() { return regular; } public void set_regular_position(final boolean new_reg) { regular = new_reg; } // checking whether B represents a "normal" position or not; // if not, then only simple checks regarding move-correctness should // be performed, only checking the direct characteristics of the figure // moved; // checks whether there is exactly one king of each colour, there are // no more figures than promotions allow, and there are no pawns on the // first or last rank; public boolean regular_position() { int[] counts = new int[256]; for (char file = 'a'; file <= 'h'; ++file) for (char rank = '1'; rank <= '8'; ++rank) ++counts[(int) B.get(file,rank)]; if (counts[Board.white_king] != 1 || counts[Board.black_king] != 1) return false; if (counts[Board.white_pawn] > 8 || counts[Board.black_pawn] > 8) return false; int count_w_promotions = 0; count_w_promotions += Math.max(counts[Board.white_queen]-1,0); count_w_promotions += Math.max(counts[Board.white_rook]-2,0); count_w_promotions += Math.max(counts[Board.white_bishop]-2,0); count_w_promotions += Math.max(counts[Board.white_knight]-2,0); if (count_w_promotions > 8 - counts[Board.white_pawn]) return false; int count_b_promotions = 0; count_b_promotions += Math.max(counts[Board.black_queen]-1,0); count_b_promotions += Math.max(counts[Board.black_rook]-2,0); count_b_promotions += Math.max(counts[Board.black_bishop]-2,0); count_b_promotions += Math.max(counts[Board.black_knight]-2,0); if (count_b_promotions > 8 - counts[Board.black_pawn]) return false; for (char file = 'a'; file <= 'h'; ++file) { final char fig1 = B.get(file,'1'); if (fig1 == Board.white_pawn || fig1 == Board.black_pawn) return false; final char fig8 = B.get(file,'8'); if (fig8 == Board.white_pawn || fig8 == Board.black_pawn) return false; } return true; } public boolean check_normal_white_move(final char file0, final char rank0, final char file1, final char rank1) { if (! Board.is_valid_white_figure(B.get(file0,rank0))) return false; if (! B.is_empty(file1,rank1) && ! Board.is_valid_black_figure(B.get(file1,rank1))) return false; if (B.get_active_colour() != 'w') return false; if (! check_move_simple(file0,rank0,file1,rank1)) return false; if (! regular) return true; final Board test_board = new Board(B); test_board.normal_white_move_0(file0,rank0,file1,rank1); final Moves test_move = new Moves(test_board); final char[] king_pos = test_move.white_king_position(); assert(king_pos.length == 2); return test_move.black_not_attacking(king_pos[0],king_pos[1]); } public boolean check_normal_black_move(final char file0, final char rank0, final char file1, final char rank1) { // ADDED THE CHECK NORMAL BLACK MOVE BASED ON THE CHECK NORMAL WHITE MOVE if (! Board.is_valid_black_figure(B.get(file0,rank0))) return false; if (! B.is_empty(file1,rank1) && ! Board.is_valid_white_figure(B.get(file1,rank1))) return false; if (B.get_active_colour() != 'b') return false; if (! check_move_simple(file0,rank0,file1,rank1)) return false; if (! regular) return true; final Board test_board = new Board(B); test_board.normal_black_move_0(file0,rank0,file1,rank1); final Moves test_move = new Moves(test_board); final char[] king_pos = test_move.black_king_position(); assert(king_pos.length == 2); return test_move.white_not_attacking(king_pos[0],king_pos[1]); } // for checking a normal move by just applying the move-rules private boolean check_move_simple(final char file0, final char rank0, final char file1, final char rank1) { final char fig = B.get(file0,rank0); if (fig == Board.white_king || fig == Board.black_king) return check_king_move(file0,rank0,file1,rank1); if (fig == Board.white_queen || fig == Board.black_queen) return check_queen_move(file0,rank0,file1,rank1); if (fig == Board.white_rook || fig == Board.black_rook) return check_rook_move(file0,rank0,file1,rank1); if (fig == Board.white_bishop || fig == Board.black_bishop) return check_bishop_move(file0,rank0,file1,rank1); if (fig == Board.white_knight || fig == Board.black_knight) return check_knight_move(file0,rank0,file1,rank1); if (fig == Board.white_pawn) return check_white_pawn_move(file0,rank0,file1,rank1); else return check_black_pawn_move(file0,rank0,file1,rank1); } private boolean check_king_move(final char file0, final char rank0, final char file1, final char rank1) { // ADDED KING MOVE int fileChange = file0 - file1; int rankChange = rank0 - rank1; return fileChange <= 1 && fileChange >= -1 && rankChange <= 1 && rankChange >= -1; } private boolean check_queen_move(final char file0, final char rank0, final char file1, final char rank1) { // ADDED QUEEN MOVE int fileChange = file0 - file1; int rankChange = rank0 - rank1; return fileChange <=8 && fileChange >= -8 && rankChange <= 8 && rankChange >= -8; } private boolean check_rook_move(final char file0, final char rank0, final char file1, final char rank1) { // ADDED ROOK MOVE int fileChange = file0 - file1; int rankChange = rank0 - rank1; return fileChange <=8 || fileChange >= -8 || rankChange <= 8 || rankChange >= -8; } private boolean check_bishop_move(final char file0, final char rank0, final char file1, final char rank1) { // ADDED BISHOP MOVE int fileChange = file0 - file1; int rankChange = rank0 - rank1; return fileChange <= 8 && rankChange <= 8 || fileChange <= 8 && rankChange >= -8 || fileChange >= -8 && rankChange >= -8 || fileChange >= -8 && rankChange <= 8; } private boolean check_knight_move(final char file0, final char rank0, final char file1, final char rank1) { // ADDED KNIGHT MOVE int fileChange = file0 - file1; int rankChange = rank0 - rank1; /* IS THIS THE CORRECT WAY? * return fileChange <= 1 && rankChange <= 2 || fileChange <= 1 && rankChange >= -2 || fileChange <= 2 && rankChange <= 1 || fileChange <= 2 && rankChange >= -1 || fileChange >= -1 && rankChange <= 2 || fileChange >= -1 && rankChange >= -2 || fileChange >= -2 && rankChange <= 1 || fileChange >= -2 && rankChange >= -1;*/ // OR IS THIS? return fileChange <= 1 || fileChange >= -1 || fileChange <= 2 || fileChange >= -2 && rankChange <= 1 || rankChange >= - 1 || rankChange <= 2 || rankChange >= -2; } private boolean check_white_pawn_move(final char file0, final char rank0, final char file1, final char rank1) { // ADDED PAWN MOVE int fileChange = file0 - file1; int rankChange = rank0 - rank1; return fileChange == 0 && rankChange <= 1; } private boolean check_black_pawn_move(final char file0, final char rank0, final char file1, final char rank1) { // ADDED PAWN MOVE int fileChange = file0 - file1; int rankChange = rank0 - rank1; return fileChange == 0 && rankChange >= -1; } public boolean check_white_kingside_castling() { // only demonstration code: final char c = B.get_white_castling(); if (c == '-' || c == 'q') return false; if (B.get_active_colour() == 'b') return false; if (B.get('e','1') != 'K') return false; if (! black_not_attacking('e','1')) return false; if (! free_white('f','1')) return false; // XXX return true; } public boolean check_white_queenside_castling() { // only demonstration code: final char c = B.get_white_castling(); if (c == '-' || c == 'k') return false; if (B.get_active_colour() == 'b') return false; // ADDED BASED ON KINGSIDE CASTLING if (B.get('e','1') != 'Q') return false; if (! black_not_attacking('e','1')) return false; if (! free_white('f','1')) return false; // XXX return true; } public boolean check_black_kingside_castling() { // only demonstration code: final char c = B.get_black_castling(); if (c == '-' || c == 'q') return false; if (B.get_active_colour() == 'w') return false; // ADDED BASED ON CHECK WHITE if (B.get('e','8') != 'K') return false; if (! black_not_attacking('e','8')) return false; if (! free_white('f','8')) return false; // XXX return true; } public boolean check_black_queenside_castling() { // only demonstration code: final char c = B.get_black_castling(); if (c == '-' || c == 'k') return false; if (B.get_active_colour() == 'w') return false; // ADDED BASED ON KINGSIDE CASTLING if (B.get('e','8') != 'Q') return false; if (! black_not_attacking('e','8')) return false; if (! free_white('f','8')) return false; // XXX return true; } public boolean check_white_promotion(final char pawn_file, final char figure) { // XXX // ADDED CHECKING FOR CORRECT FIGURE AND POSITION - ALTHOUGH IT SEEMS AS THOUGH // PAWN_FILE SHOULD BE PAWN_RANK, AS IT IS THE REACHING OF THE END RANK THAT // CAUSES PROMOTION OF A PAWN, NOT FILE if (figure == P && pawn_file == 8) { return true; } else return false; } public boolean check_black_promotion(final char pawn_file, final char figure) { // XXX // ADDED CHECKING FOR CORRECT FIGURE AND POSITION if (figure == p && pawn_file == 1) { return true; } else return false; } // checks whether black doesn't attack the field: public boolean black_not_attacking(final char file, final char rank) { // XXX return true; } public boolean free_white(final char file, final char rank) { // XXX return black_not_attacking(file,rank) && B.is_empty(file,rank); } // checks whether white doesn't attack the field: public boolean white_not_attacking(final char file, final char rank) { // XXX return true; } public boolean free_black(final char file, final char rank) { // XXX return white_not_attacking(file,rank) && B.is_empty(file,rank); } public char[] white_king_position() { for (char file = 'a'; file <= 'h'; ++file) for (char rank = '1'; rank <= '8'; ++rank) if (B.get(file,rank) == Board.white_king) { char[] result = new char[2]; result[0] = file; result[1] = rank; return result; } return new char[0]; } public char[] black_king_position() { for (char file = 'a'; file <= 'h'; ++file) for (char rank = '1'; rank <= '8'; ++rank) if (B.get(file,rank) == Board.black_king) { char[] result = new char[2]; result[0] = file; result[1] = rank; return result; } return new char[0]; } public static void main(final String[] args) { // checking regular_position { Moves m = new Moves(new Board()); assert(m.regular_position()); m = new Moves(new Board("8/8/8/8/8/8/8/8 w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("KK6/8/8/8/8/8/8/8 w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("kk6/8/8/8/8/8/8/8 w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("Kk6/8/8/8/8/8/8/8 w - - 0 1")); assert(m.regular_position()); m = new Moves(new Board("Kk6/qqqqqqqq/QQQQQQQQ/Q7/q7/rrbbnn2/RRBBNN2/8 w - - 0 1")); assert(m.regular_position()); m = new Moves(new Board("Kk6/qqqqqqqq/QQQQQQQQ/Q7/q7/rrbbnn2/RRBBNN2/n7 w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("Kk6/qqqqqqqq/QQQQQQQQ/Q7/q7/rrbbnn2/RRBBNN2/N7 w - - 0 1")); m = new Moves(new Board("Kk6/qqqqqqqq/QQQQQQQQ/Q7/q7/rrbbnn2/RRBBNN2/b7 w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("Kk6/qqqqqqqq/QQQQQQQQ/Q7/q7/rrbbnn2/RRBBNN2/B7 w - - 0 1")); m = new Moves(new Board("Kk6/qqqqqqqq/QQQQQQQQ/Q7/q7/rrbbnn2/RRBBNN2/r7 w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("Kk6/qqqqqqqq/QQQQQQQQ/Q7/q7/rrbbnn2/RRBBNN2/R7 w - - 0 1")); m = new Moves(new Board("Kk6/qqqqqqqq/QQQQQQQQ/Q7/q7/rrbbnn2/RRBBNN2/q7 w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("Kk6/qqqqqqqq/QQQQQQQQ/Q7/q7/rrbbnn2/RRBBNN2/Q7 w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("Kkp5/8/8/8/8/8/8/8 w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("KkP5/8/8/8/8/8/8/8 w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("Kk6/8/8/8/8/8/8/7p w - - 0 1")); assert(!m.regular_position()); m = new Moves(new Board("Kk6/8/8/8/8/8/8/7P w - - 0 1")); assert(!m.regular_position()); } // checking check_white/black_king/queenside_castling { Moves m = new Moves(new Board("4k2r/8/8/8/8/8/8/4K2R w Kk - 0 1")); assert(!m.check_white_kingside_castling()); assert(!m.check_black_kingside_castling()); assert(!m.check_white_queenside_castling()); assert(!m.check_black_queenside_castling()); m = new Moves(new Board("4k2r/8/8/8/8/8/8/4K2R b Kk - 0 1")); assert(!m.check_white_kingside_castling()); assert(!m.check_black_kingside_castling()); assert(!m.check_white_queenside_castling()); assert(!m.check_black_queenside_castling()); m = new Moves(new Board("4k2r/4pppp/8/8/8/8/4PPPP/4K2R w KQkq - 0 1")); assert(m.check_white_kingside_castling()); assert(!m.check_black_kingside_castling()); assert(!m.check_white_queenside_castling()); assert(!m.check_black_queenside_castling()); m = new Moves(new Board("4k2r/4pppp/8/8/8/8/4PPPP/4K2R b KQkq - 0 1")); assert(!m.check_white_kingside_castling()); assert(m.check_black_kingside_castling()); assert(!m.check_white_queenside_castling()); assert(!m.check_black_queenside_castling()); m = new Moves(new Board("r3k3/8/8/8/8/8/8/R3K3 w Qq - 0 1")); assert(!m.check_white_kingside_castling()); assert(!m.check_black_kingside_castling()); assert(!m.check_white_queenside_castling()); assert(!m.check_black_queenside_castling()); m = new Moves(new Board("r3k3/8/8/8/8/8/8/R3K3 b Qq - 0 1")); assert(!m.check_white_kingside_castling()); assert(!m.check_black_kingside_castling()); assert(!m.check_white_queenside_castling()); assert(!m.check_black_queenside_castling()); m = new Moves(new Board("r3k3/p7/8/8/8/8/8/R3K3 w Qq - 0 1")); assert(!m.check_white_kingside_castling()); assert(!m.check_black_kingside_castling()); assert(m.check_white_queenside_castling()); assert(!m.check_black_queenside_castling()); m = new Moves(new Board("r3k3/p7/8/8/8/8/8/R3K3 b Qq - 0 1")); assert(!m.check_white_kingside_castling()); assert(!m.check_black_kingside_castling()); assert(!m.check_white_queenside_castling()); assert(m.check_black_queenside_castling()); m = new Moves(new Board("r3k3/p7/8/8/8/n7/8/R3K3 w Qq - 0 1")); assert(!m.check_white_kingside_castling()); assert(!m.check_black_kingside_castling()); assert(!m.check_white_queenside_castling()); assert(!m.check_black_queenside_castling()); m = new Moves(new Board("r3k3/p7/B7/8/8/8/8/R3K3 b Qq - 0 1")); assert(!m.check_white_kingside_castling()); assert(!m.check_black_kingside_castling()); assert(!m.check_white_queenside_castling()); assert(!m.check_black_queenside_castling()); // XXX } } }

    Read the article

  • how to use serial port in UDK using windows DLL and DLLBind directive?

    - by Shayan Abbas
    I want to use serial port in UDK, For that purpose i use a windows DLL and DLLBind directive. I have a thread in windows DLL for serial port data recieve event. My problem is: this thread doesn't work properly. Please Help me. below is my code SerialPortDLL Code: // SerialPortDLL.cpp : Defines the exported functions for the DLL application. // #include "stdafx.h" #include "Cport.h" extern "C" { // This is an example of an exported variable //SERIALPORTDLL_API int nSerialPortDLL=0; // This is an example of an exported function. //SERIALPORTDLL_API int fnSerialPortDLL(void) //{ // return 42; //} CPort *sp; __declspec(dllexport) void Open(wchar_t* portName) { sp = new CPort(portName); //MessageBox(0,L"ha ha!!!",L"ha ha",0); //MessageBox(0,portName,L"ha ha",0); } __declspec(dllexport) void Close() { sp->Close(); MessageBox(0,L"ha ha!!!",L"ha ha",0); } __declspec(dllexport) wchar_t *GetData() { return sp->GetData(); } __declspec(dllexport) unsigned int GetDSR() { return sp->getDSR(); } __declspec(dllexport) unsigned int GetCTS() { return sp->getCTS(); } __declspec(dllexport) unsigned int GetRing() { return sp->getRing(); } } CPort class code: #include "stdafx.h" #include "CPort.h" #include "Serial.h" CSerial serial; HANDLE HandleOfThread; LONG lLastError = ERROR_SUCCESS; bool fContinue = true; HANDLE hevtOverlapped; HANDLE hevtStop; OVERLAPPED ov = {0}; //char szBuffer[101] = ""; wchar_t *szBuffer = L""; wchar_t *data = L""; DWORD WINAPI ThreadHandler( LPVOID lpParam ) { // Keep reading data, until an EOF (CTRL-Z) has been received do { MessageBox(0,L"ga ga!!!",L"ga ga",0); //Sleep(10); // Wait for an event lLastError = serial.WaitEvent(&ov); if (lLastError != ERROR_SUCCESS) { //LOG( " Unable to wait for a COM-port event" ); } // Setup array of handles in which we are interested HANDLE ahWait[2]; ahWait[0] = hevtOverlapped; ahWait[1] = hevtStop; // Wait until something happens switch (::WaitForMultipleObjects(sizeof(ahWait)/sizeof(*ahWait),ahWait,FALSE,INFINITE)) { case WAIT_OBJECT_0: { // Save event const CSerial::EEvent eEvent = serial.GetEventType(); // Handle break event if (eEvent & CSerial::EEventBreak) { //LOG( " ### BREAK received ###" ); } // Handle CTS event if (eEvent & CSerial::EEventCTS) { //LOG( " ### Clear to send %s ###", serial.GetCTS() ? "on":"off" ); } // Handle DSR event if (eEvent & CSerial::EEventDSR) { //LOG( " ### Data set ready %s ###", serial.GetDSR() ? "on":"off" ); } // Handle error event if (eEvent & CSerial::EEventError) { switch (serial.GetError()) { case CSerial::EErrorBreak: /*LOG( " Break condition" );*/ break; case CSerial::EErrorFrame: /*LOG( " Framing error" );*/ break; case CSerial::EErrorIOE: /*LOG( " IO device error" );*/ break; case CSerial::EErrorMode: /*LOG( " Unsupported mode" );*/ break; case CSerial::EErrorOverrun: /*LOG( " Buffer overrun" );*/ break; case CSerial::EErrorRxOver: /*LOG( " Input buffer overflow" );*/ break; case CSerial::EErrorParity: /*LOG( " Input parity error" );*/ break; case CSerial::EErrorTxFull: /*LOG( " Output buffer full" );*/ break; default: /*LOG( " Unknown" );*/ break; } } // Handle ring event if (eEvent & CSerial::EEventRing) { //LOG( " ### RING ###" ); } // Handle RLSD/CD event if (eEvent & CSerial::EEventRLSD) { //LOG( " ### RLSD/CD %s ###", serial.GetRLSD() ? "on" : "off" ); } // Handle data receive event if (eEvent & CSerial::EEventRecv) { // Read data, until there is nothing left DWORD dwBytesRead = 0; do { // Read data from the COM-port lLastError = serial.Read(szBuffer,33,&dwBytesRead); if (lLastError != ERROR_SUCCESS) { //LOG( "Unable to read from COM-port" ); } if( dwBytesRead == 33 && szBuffer[0]=='$' ) { // Finalize the data, so it is a valid string szBuffer[dwBytesRead] = '\0'; ////LOG( "\n%s\n", szBuffer ); data = szBuffer; } } while (dwBytesRead > 0); } } break; case WAIT_OBJECT_0+1: { // Set the continue bit to false, so we'll exit fContinue = false; } break; default: { // Something went wrong //LOG( "Error while calling WaitForMultipleObjects" ); } break; } } while (fContinue); MessageBox(0,L"kka kk!!!",L"kka ga",0); return 0; } CPort::CPort(wchar_t *portName) { // Attempt to open the serial port (COM2) //lLastError = serial.Open(_T(portName),0,0,true); lLastError = serial.Open(portName,0,0,true); if (lLastError != ERROR_SUCCESS) { //LOG( "Unable to open COM-port" ); } // Setup the serial port (115200,8N1, which is the default setting) lLastError = serial.Setup(CSerial::EBaud115200,CSerial::EData8,CSerial::EParNone,CSerial::EStop1); if (lLastError != ERROR_SUCCESS) { //LOG( "Unable to set COM-port setting" ); } // Register only for the receive event lLastError = serial.SetMask(CSerial::EEventBreak | CSerial::EEventCTS | CSerial::EEventDSR | CSerial::EEventError | CSerial::EEventRing | CSerial::EEventRLSD | CSerial::EEventRecv); if (lLastError != ERROR_SUCCESS) { //LOG( "Unable to set COM-port event mask" ); } // Use 'non-blocking' reads, because we don't know how many bytes // will be received. This is normally the most convenient mode // (and also the default mode for reading data). lLastError = serial.SetupReadTimeouts(CSerial::EReadTimeoutNonblocking); if (lLastError != ERROR_SUCCESS) { //LOG( "Unable to set COM-port read timeout" ); } // Create a handle for the overlapped operations hevtOverlapped = ::CreateEvent(0,TRUE,FALSE,0);; if (hevtOverlapped == 0) { //LOG( "Unable to create manual-reset event for overlapped I/O" ); } // Setup the overlapped structure ov.hEvent = hevtOverlapped; // Open the "STOP" handle hevtStop = ::CreateEvent(0,TRUE,FALSE,_T("Overlapped_Stop_Event")); if (hevtStop == 0) { //LOG( "Unable to create manual-reset event for stop event" ); } HandleOfThread = CreateThread( NULL, 0, ThreadHandler, 0, 0, NULL); } CPort::~CPort() { //fContinue = false; //CloseHandle( HandleOfThread ); //serial.Close(); } void CPort::Close() { fContinue = false; CloseHandle( HandleOfThread ); serial.Close(); } wchar_t *CPort::GetData() { return data; } bool CPort::getCTS() { return serial.GetCTS(); } bool CPort::getDSR() { return serial.GetDSR(); } bool CPort::getRing() { return serial.GetRing(); } Unreal Script Code: class MyPlayerController extends GamePlayerController DLLBind(SerialPortDLL); dllimport final function Open(string portName); dllimport final function Close(); dllimport final function string GetData();

    Read the article

  • JUnit testing, exception in threa main

    - by Crystal
    I am new to JUnit and am trying to follow my prof's example. I have a Person class and a PersonTest class. When I try to compile PersonTest.java, I get the following error: Exception in thread "main" java.lang.NoSuchMethodError: main I am not really sure why since I followed his example. Person.java public class Person implements Comparable { String firstName; String lastName; String telephone; String email; public Person() { firstName = ""; lastName = ""; telephone = ""; email = ""; } public Person(String firstName) { this.firstName = firstName; } public Person(String firstName, String lastName, String telephone, String email) { this.firstName = firstName; this.lastName = lastName; this.telephone = telephone; this.email = email; } public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } public String getTelephone() { return telephone; } public void setTelephone(String telephone) { this.telephone = telephone; } public String getEmail() { return email; } public void setEmail(String email) { this.email = email; } public int compareTo(Object o) { String s1 = this.lastName + this.firstName; String s2 = ((Person) o).lastName + ((Person) o).firstName; return s1.compareTo(s2); } public boolean equals(Object otherObject) { // a quick test to see if the objects are identical if (this == otherObject) { return true; } // must return false if the explicit parameter is null if (otherObject == null) { return false; } if (!(otherObject instanceof Person)) { return false; } Person other = (Person) otherObject; return firstName.equals(other.firstName) && lastName.equals(other.lastName) && telephone.equals(other.telephone) && email.equals(other.email); } public int hashCode() { return this.email.toLowerCase().hashCode(); } public String toString() { return getClass().getName() + "[firstName = " + firstName + '\n' + "lastName = " + lastName + '\n' + "telephone = " + telephone + '\n' + "email = " + email + "]"; } } PersonTest.java import org.junit.Test; // JDK 5.0 annotation support import static org.junit.Assert.assertTrue; // Using JDK 5.0 static imports import static org.junit.Assert.assertFalse; // Using JDK 5.0 static imports import junit.framework.JUnit4TestAdapter; // Need this to be compatible with old test driver public class PersonTest { /** A test to verify equals method. */ @Test public void checkEquals() { Person p1 = new Person("jj", "aa", "[email protected]", "1112223333"); assertTrue(p1.equals(p1)); // first check in equals method assertFalse(p1.equals(null)); // second check in equals method assertFalse(p1.equals(new Object())); // third chk in equals method Person p2 = new Person("jj", "aa", "[email protected]", "1112223333"); assertTrue(p1.equals(p2)); // check for deep comparison p1 = new Person("jj", "aa", "[email protected]", "1112223333"); p2 = new Person("kk", "aa", "[email protected]", "1112223333"); assertFalse(p1.equals(p2)); // check for deep comkparison } }

    Read the article

  • top tweets WebLogic Partner Community – September 2012

    - by JuergenKress
    Send your tweets @wlscommunity #WebLogicCommunity and follow us at http://twitter.com/wlscommunity Oracle Exalogic? VIDEO: Oracle Public Cloud Built on Exalogic!, http://www.youtube.com/watch?v=UGzjDloUw_s&feature=plcp … oracleopenworld #NetBeans Community Day at #JavaOne http://ow.ly/dunFL Oracle Cloud Zone Building an enterprise Cloud? Have Oracle show you the RIGHT way to plan, deploy and monitor enterprise clouds.... http://fb.me/286978S4S OTNArchBeat? Oracle Exalogic X2-2 walk-through with Brad Cameron | @jvzoggel http://pub.vitrue.com/yE7d Oracle Technet? Stash your cash. September OTN Member Offers - discounts on books, more | OTN Blog http://pub.vitrue.com/yTr9 C2B2 Consulting? C2B2 is Speaking at @UKOUG App Server Middleware SIG Meeting 'Real Life #WebLogic Performance Tuning' http://www.c2b2.co.uk/ukoug_application_server_middleware_sig_meeting … @wlscommunity JAXenter.com? From yesterday, @smeyen offers his views on the next generation #Java - do you agree? http://jaxenter.com/next-gen-java-we-don-t-need-another-revolutionary-44334.html … Markus Eisele? Awesome: professor from ITU uploads her programming lectures to #YouTube. Programming classes without having to pay! http://bit.ly/UtkJIW Adam Bien? Real World Java EE 6 Patterns--Rethinking Best Practices Reloaded: A completely rewritten, second, iteration of ... http://bit.ly/Qc3xTH Markus Eisele [blog] #PrimeFaces Push with #Atmosphere on #GlassFish 3.1.2.2 http://goo.gl/fb/jPDzA Lucas Jellema? Forms community event at Oracle Open World - Tuesday, 2nd Oct with the BIG names in Forms - see: http://oracleformsinfo.wordpress.com/2012/08/28/ask-the-product-manager-join-us-at-the-oracle-forms-community-event-at-openworld-2012/ … WebLogic Community WebLogic & Coherence & Cloud presentations for customer meetings http://wp.me/p1LMIb-kw Adam Bien? New Book: Rethinking Best Practices with Java EE 6 is out: http://realworldpatterns.com (fully rewritten, re-edited and reformatted) WebLogic Community? Want to become and WebLogic 12c expert? free WebLogic 12c partner bootcamps &ndash;new locations: Madrid Spain http://wp.me/p1LMIb-kK WebLogic Community? Promote Your WebLogic events at http://oracle.com http://wp.me/p1LMIb-ku OracleBlogs Gartner review Oracle ADF http://ow.ly/1mgkCV Simon Haslam Next #ukoug App Server & MW SIG on 10 Oct: http://www.ukoug.org/events/ukoug-application-server-and-middleware-sig-meeting8/ … Hopefully plenty of good admin stuff! Michel Schildmeijer My book "WebLogic 12c; First look" has been reviewed again..see http://www.amazon.com/review/R28L6E3CC9RPMK/ref=cm_cr_pr_perm?ie=UTF8&ASIN=1849687188&linkCode=&nodeID=&tag= … … Markus Eisele? #Weblogic 11g Interactive Quick Reference Map: http://bit.ly/Ugsq52 #wls #oracle #reference /via @TonyvanEsch Marc? Playing with #syslog server and #weblogic. Is there a simple how-to to configure all the logging from #WLS to #syslog-ng WebLogic Community Java update http://wp.me/p1LMIb-kI WebLogic Community top tweets WebLogic Partner Community &ndash; August 2012 http://wp.me/p1LMIb-kA Andrejus Baranovskis? Oracle University Training: ADF/WebCenter 11g Development in Depth | Andrejus Baranovskis http://fb.me/253ZTS2zp OracleSupport_WLS? How neat is a free tool that allows you to inspect and debug traffic from virtually any application? http://pub.vitrue.com/vXdP WebLogic Community WebLogic Partner Community Newsletter August 2012 http://wp.me/p1LMIb-kn OTNArchBeat Integrating Coherence & Java EE 6 Applications using ActiveCache | Ricardo Ferreira http://pub.vitrue.com/rwGg Adam Bien? Thanks for attending the #javaee #techtalk "Enterprise Java 2.0" I pushed the project and slides to: http://kenai.com/projects/javaee-patterns/sources/hg/show/hacks/techtalk2012?rev=429 … JDeveloper & ADF? How to service-enable Oracle ADF Business Components http://ht.ly/1mcfsZ OracleSupport_WLS? Do you know that #WebLogic 12.1.1.0 is certified for production with JDK 7? @ http://pub.vitrue.com/35Kn Andreas Koop? My latest upload : WebLogic Administration und Deployment mit WLST on @slideshare http://www.slideshare.net/multikoop/weblogic-administration-und-deployment-mit-wlst … OTNArchBeat? Demo for OPN: Oracle Coherence Management with EM Cloud Control 12c http://pub.vitrue.com/reoo Markus Eisele? [blog] Java Champions at #JavaOne 2012 http://goo.gl/fb/Ibb6N #javachampion OracleBlogs Buy This Book!: Oracle Exalogic Elastic Cloud Handbook http://ow.ly/1malM1 WebLogic Community? Coherence Management with EM Cloud Control 12c &ndash;demo for partners http://wp.me/p1LMIb-iE Arun Gupta? Learn how Java can help Internet of Things at Java Embedded at JavaOne: http://bit.ly/POBizh WebLogic Community? Follow WebLogicCommunity on facebook http://www.facebook.com/WebLogicCommunity … #WebLogicCommunity WebLogic Community? Building Java EE in the Cloud–Webcast August 30th 2012 https://weblogiccommunity.wordpress.com/2012/08/27/building-java-ee-in-the-cloudwebcast-august-30th-2012/ … #WebLogicCommunity #Java #oracle #opn WebLogic Community? Call for WebLogic Community newsletter content. Please send @wlscommunity #WebLogicCommunity OracleSupport_WLS? The #weblogic wasp: lots of tips, Q&A and examples http://pub.vitrue.com/v0bw Frank Nimphius? Free ADF Best Practices Webinar by Andrejus Baranovskis for ODTUG (18, 2012 12:00 PM - 1:00 PM EDT) http://bit.ly/OiSWbi ADF Code Corner Webcast- Friday September 14, 8:30 AM - 9.00 AM (CET) - ADF as a basis of Fusion Apps (in English) - with Chris Muir: http://bit.ly/OiQVMb Oracle WebLogic? New blog post: Developing Custom User Principal Object http://pub.vitrue.com/ltam JAX London? Just 4days left to get in on the early bird special, don't miss out!! http://jaxlondon.com/ #JAXLondon #Java WebLogic Community Building Java EE in the Cloud&ndash;Webcast August 30th 2012 http://wp.me/p1LMIb-kE Andrejus Baranovskis? New Record Master-Detail Validation and ADF BC Groovy Use Case http://fb.me/1D2NEIl8g JAX London? Don't miss out!!! Only 6 days left to make use of our early bird offer #JAXLondon #JAVA http://jaxlondon.com/ Michel Schildmeijer Qualogy launches Proof of Concept Center for Oracle Fusion Applications http://www.qualogy.com/qualogy-launches-proof-of-concept-center-for-oracle-fusion-applications/ … via @Qualogy_news OracleSupport_WLS ?Need to troubleshoot redeployment failure in #Weblogic? Check this http://pub.vitrue.com/auhz OracleEnterpriseMgr? Blog : Managing Oracle #Exalogic Elastic #Cloud with Oracle Enterprise Manager Ops Center http://ow.ly/dd40e #em12c ODTUG? Want free advanced technical ADF training?Join @andrejusb for an @odtug webinar! check out his blog for more info http://bit.ly/SvKJDq chriscmuir Oracle Open World 2012 and ADF EMG http://zite.to/QyusZE OTNArchBeat? Boost your infrastructure with Coherence into the Cloud | Nino Guarnacci http://pub.vitrue.com/v3aJ WebLogic Community? Presentations & Training material OFM Summer Camps & Impressions & Feedback http://wp.me/p1LMIb-ks Arun Gupta? Java EE 6 pocket guide by O'Reilly available for pre-order from Amazon: http://amzn.to/O6YyoP and B&N: http://bit.ly/NjWLk1 OTNArchBeat Joining the Existing Cluster in Coherence | A. Fuat Sungur http://pub.vitrue.com/6uLh Andrejus Baranovskis Sample Application for Switching Application Module Data Sources http://fb.me/1PSURUzch OTNArchBeat Oracle WebLogic DevCast: Building Java EE in the Cloud - August 30 - 10am PT/ 1pm ET http://pub.vitrue.com/xXg0 OTNArchBeat? GlassFish Community Event at #javaone - Sept 30 -11am – 1pm -Moscone South. Register Now! http://pub.vitrue.com/p2f5 OracleSupport_WLS? Connecting To HTTPS Site Using Simple Java Program When Using Proxy http://pub.vitrue.com/stVv Michel Schildmeijer? Before you go to #OOW take the sneak preview of WebLogic 12c with you: http://www.qualogy.com/ga-nog-niet-naar-oow-en-neem-mee-weblogic-12c/ … via @Qualogy_news Simon Haslam? Even more great ADF content at #oow2012 this year including a packed ADF EMG day on Sunday: https://blogs.oracle.com/onesizedoesntfitall/entry/the_year_after_the_year … OracleBlogs ExaLogic trainings for partners http://ow.ly/1m6a5D Robin? First presentation on DOAG conference (thanks to @Steffen2042) "Weblogic Server for Dummies". Now I´m pretty excited :) http://www.doag.org/de/events/konferenzen/doag-2012.html … Markus Eisele There is a #facebook page for the upcoming #Java Mission Control (JRockit Mission Control for #Hotspot)! ttp://on.fb.me/Q31oyA Adam Bien? The almost free #javaee workshop in Rapperswil has only 60 registrations so far: http://www.adam-bien.com/roller/abien/entry/enterprise_java_2_0_swiss … What's the problem? :-) WebLogic Community ExaLogic trainings for partners http://wp.me/p1LMIb-iC OracleBlogs How to install Oracle Weblogic Server using Generic Package installer? http://ow.ly/1m5ms7 OracleSupport_WLS #Weblogic Server new blog post - Developing Custom User Principal Object http://pub.vitrue.com/ltam OracleBlogs? Architects and Architecture at JavaOne 2012 http://ow.ly/1m4oS5 WebLogic Community Are you WebLogic or Application Grid Specialized? Do you get Recognized? Get your plaque https://weblogiccommunity.wordpress.com/2012/08/21/plaques-weblogic-application-grid-specialization/ … #WebLogicCommunity #opn WebLogic Community? Plaques WebLogic & Application Grid Specialization http://wp.me/p1LMIb-iA JDeveloper & ADF? First Steps With Oracle Application Testing Suite: Recording a Test With OpenScript http://dlvr.it/222npy WebLogic Partner Community For regular information become a member in the WebLogic Partner Community please visit: http://www.oracle.com/partners/goto/wls-emea ( OPN account required). If you need support with your account please contact the Oracle Partner Business Center. Blog Twitter LinkedIn Mix Forum Wiki Technorati Tags: twitter,WebLogic,WebLogic Community,Oracle,OPN,Jürgen Kress

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

< Previous Page | 1 2 3