results - Page 244 - Developer IT

Using a "white list" for extracting terms for Text Mining

- by [email protected]

In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms. However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM TOKEN DATA MINING DATAMINING ORACLE DATA MINING ORACLEDATAMINING 11G ORACLE11G JAVA JAVA CRM CRM CUSTOMER RELATIONSHIP MANAGEMENT CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term VARCHAR2(100), query_string VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE cursor c2 is select token, term from my_term_token_map; BEGIN for r_c2 in c2 loop CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term); EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')' using r_c2.token; end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example: 'ORACLEDATAMINING' SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on my_thesaurus_rules(query_string) indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

Read the article

SQL SERVER – 5 Tips for Improving Your Data with expressor Studio

- by pinaldave

It’s no secret that bad data leads to bad decisions and poor results. However, how do you prevent dirty data from taking up residency in your data store? Some might argue that it’s the responsibility of the person sending you the data. While that may be true, in practice that will rarely hold up. It doesn’t matter how many times you ask, you will get the data however they decide to provide it. So now you have bad data. What constitutes bad data? There are quite a few valid answers, for example: Invalid date values Inappropriate characters Wrong data Values that exceed a pre-set threshold While it is certainly possible to write your own scripts and custom SQL to identify and deal with these data anomalies, that effort often takes too long and becomes difficult to maintain. Instead, leveraging an ETL tool like expressor Studio makes the data cleansing process much easier and faster. Below are some tips for leveraging expressor to get your data into tip-top shape. Tip 1: Build reusable data objects with embedded cleansing rules One of the new features in expressor Studio 3.2 is the ability to define constraints at the metadata level. Using expressor’s concept of Semantic Types, you can define reusable data objects that have embedded logic such as constraints for dealing with dirty data. Once defined, they can be saved as a shared atomic type and then re-applied to other data attributes in other schemas. As you can see in the figure above, I’ve defined a constraint on zip code. I can then save the constraint rules I defined for zip code as a shared atomic type called zip_type for example. The next time I get a different data source with a schema that also contains a zip code field, I can simply apply the shared atomic type (shown below) and the previously defined constraints will be automatically applied. Tip 2: Unlock the power of regular expressions in Semantic Types Another powerful feature introduced in expressor Studio 3.2 is the option to use regular expressions as a constraint. A regular expression is used to identify patterns within data. The patterns could be something as simple as a date format or something much more complex such as a street address. For example, I could define that a valid IP address should be made up of 4 numbers, each 0 to 255, and separated by a period. So 192.168.23.123 might be a valid IP address whereas 888.777.0.123 would not be. How can I account for this using regular expressions? A very simple regular expression that would look for any 4 sets of 3 digits separated by a period would be: ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ Alternatively, the following would be the exact check for truly valid IP addresses as we had defined above: ^(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])$ . In expressor, we would enter this regular expression as a constraint like this: Here we select the corrective action to be ‘Escalate’, meaning that the expressor Dataflow operator will decide what to do. Some of the options include rejecting the offending record, skipping it, or aborting the dataflow. Tip 3: Email pattern expressions that might come in handy In the example schema that I am using, there’s a field for email. Email addresses are often entered incorrectly because people are trying to avoid spam. While there are a lot of different ways to define what constitutes a valid email address, a quick search online yields a couple of really useful regular expressions for validating email addresses: This one is short and sweet: \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b (Source: http://www.regular-expressions.info/) This one is more specific about which characters are allowed: ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ (Source: http://regexlib.com/REDetails.aspx?regexp_id=26 ) Tip 4: Reject “dirty data” for analysis or further processing Yet another feature introduced in expressor Studio 3.2 is the ability to reject records based on constraint violations. To capture reject records on input, simply specify Reject Record in the Error Handling setting for the Read File operator. Then attach a Write File operator to the reject port of the Read File operator as such: Next, in the Write File operator, you can configure the expressor operator in a similar way to the Read File. The key difference would be that the schema needs to be derived from the upstream operator as shown below: Once configured, expressor will output rejected records to the file you specified. In addition to the rejected records, expressor also captures some diagnostic information that will be helpful towards identifying why the record was rejected. This makes diagnosing errors much easier! Tip 5: Use a Filter or Transform after the initial cleansing to finish the job Sometimes you may want to predicate the data cleansing on a more complex set of conditions. For example, I may only be interested in processing data containing males over the age of 25 in certain zip codes. Using an expressor Filter operator, you can define the conditional logic which isolates the records of importance away from the others. Alternatively, the expressor Transform operator can be used to alter the input value via a user defined algorithm or transformation. It also supports the use of conditional logic and data can be rejected based on constraint violations. However, the best tip I can leave you with is to not constrain your solution design approach – expressor operators can be combined in many different ways to achieve the desired results. For example, in the expressor Dataflow below, I can post-process the reject data from the Filter which did not meet my pre-defined criteria and, if successful, Funnel it back into the flow so that it gets written to the target table. I continue to be impressed that expressor offers all this functionality as part of their FREE expressor Studio desktop ETL tool, which you can download from here. Their Studio ETL tool is absolutely free and they are very open about saying that if you want to deploy their software on a dedicated Windows Server, you need to purchase their server software, whose pricing is posted on their website. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

PeopleSoft Upgrades, Fusion, & BI for Leading European PeopleSoft Applications Customers

- by Mark Rosenberg

With so many industry-leading services firms around the globe managing their businesses with PeopleSoft, it’s always an adventure setting up times and meetings for us to keep in touch with them, especially those outside of North America who often do not get to join us at Oracle OpenWorld. Fortunately, during the first two weeks of May, Nigel Woodland (Oracle’s Service Industries Director for the EMEA region) and I successfully blocked off our calendars to visit seven different customers spanning four countries in Western Europe. We met executives and leaders at four Staffing industry firms, two Professional Services firms that engage in consulting and auditing, and a Financial Services firm. As we shared the latest information regarding product capabilities and plans, we also gained valuable insight into the hot technology topics facing these businesses. What we heard was both informative and inspiring, and I suspect other Oracle PeopleSoft applications customers can benefit from one or more of the following observations from our trip. Great IT Plans Get Executed When You Respect the Users Each of our visits followed roughly the same pattern. After introductions, Nigel outlined Oracle’s product and technology strategy, including a discussion of how we at Oracle invest in each layer of the “technology stack” to provide customers with unprecedented business management capabilities and choice. Then, I provided the specifics of the PeopleSoft product line’s investment strategy, detailing the dramatic number of rich usability and functionality enhancements added to release 9.1 since its general availability in 2009 and the game-changing capabilities slated for 9.2. What was most exciting about each of these discussions was that shortly after my talking about what customers can do with release 9.1 right now to drive up user productivity and satisfaction, I saw the wheels turning in the minds of our audiences. Business analyst and end user-configurable tools and technologies, such as WorkCenters and the Related Action Framework, that provide the ability to tailor a “central command center” to the exact needs of each recruiter, biller, and every other role in the organization were exactly what each of our customers had been looking for. Every one of our audiences agreed that these tools which demonstrate a respect for the user would finally help IT pole vault over the wall of resistance that users had often raised in the past. With these new user-focused capabilities, IT is positioned to definitively partner with the business, instead of drag the business along, to unlock the value of their investment in PeopleSoft. This topic of respecting the user emerged during our very first visit, which was at Vital Services Group at their Head Office “The Mill” in Manchester, England. (If you are a student of architecture and are ever in Manchester, you should stop in to see this amazingly renovated old mill building.) I had just finished explaining our PeopleSoft 9.2 roadmap, and Mike Code, PeopleSoft Systems Manager for this innovative staffing company, said, “Mark, the new features you’ve shown us in 9.1/9.2 are very relevant to our business. As we forge ahead with the 9.1 upgrade, the ability to configure a targeted user interface with WorkCenters, Related Actions, Pivot Grids, and Alerts will enable us to satisfy the business that this upgrade is for them and will deliver tangible benefits. In fact, you’ve highlighted that we need to start talking to the business to keep up the momentum to start reviewing the 9.2 upgrade after we get to 9.1, because as much as 9.1 and PeopleTools 8.52 offers, what you’ve shown us for 9.2 is what we’ve envisioned was ultimately possible with our investment in PeopleSoft applications.” We also received valuable feedback about our investment for the Staffing industry when we visited with Hans Wanders, CIO of Randstad (the second largest Staffing company in the world) in the Netherlands. After our visit, Hans noted, “It was very interesting to see how the PeopleSoft applications have developed. I was truly impressed by many of the new developments.” Hans and Mike, sincere thanks for the validation that our team’s hard work and dedication to “respecting the users” is worth the effort! Co-existence of PeopleSoft and Fusion Applications Just Makes Sense As a “product person,” one of the most rewarding things about visiting customers is that they actually want to talk to me. Sometimes, they want to discuss a product area that we need to enhance; other times, they are interested in learning how to extract more value from their applications; and still others, they want to tell me how they are using the applications to drive real value for the business. During this trip, I was very pleased to hear that several of our customers not only thought the co-existence of Fusion applications alongside PeopleSoft applications made sense in theory, but also that they were aggressively looking at how to deploy one or more Fusion applications alongside their PeopleSoft HCM and FSCM applications. The most common deployment plan in the works by three of the organizations is to upgrade to PeopleSoft 9.1 or 9.2, and then adopt one of the new Fusion HCM applications, such as Fusion Performance Management or the full suite of Fusion Talent Management. For example, during an applications upgrade planning discussion with the staffing company Hays plc., Mark Thomas, who is Hays’ UK IT Director, commented, “We are very excited about where we can go with the latest versions of the PeopleSoft applications in conjunction with Fusion Talent Management.” Needless to say, this news was very encouraging, because it reiterated that our applications investment strategy makes good business sense for our customers. Next Generation Business Intelligence Is the Key to the Future The third, and perhaps most exciting, lesson I learned during this journey is that our audiences already know that the latest generation of Business Intelligence technologies will be the “secret sauce” for organizations to transform business in radical ways. While a number of the organizations we visited on the trip have deployed or are deploying Oracle Business Intelligence Enterprise Edition and the associated analytics applications to provide dashboards of easy-to-understand, user-configurable metrics that help optimize business performance according to current operating procedures, what’s most exciting to them is being able to use Business Intelligence to change the way an organization does business, grows revenue, and makes a profit. In particular, several executives we met asked whether we can help them minimize the need to have perfectly structured data and at the same time generate analytics that improve order fulfillment decision-making. To them, the path to future growth lies in having the ability to analyze unstructured data rapidly and intuitively and leveraging technology’s ability to detect patterns that a human cannot reasonably be expected to see. For illustrative purposes, here is a good example of a business problem where analyzing a combination of structured and unstructured data can produce better results. If you have a resource manager trying to decide which person would be the best fit for an assignment in terms of ensuring (a) client satisfaction, (b) the individual’s satisfaction with the work, (c) least travel distance, and (d) highest margin, you traditionally compare resource qualifications to assignment needs, calculate margins on past work with the client, and measure distances. To perform these comparisons, you are likely to need the organization to have profiles setup, people ranked against profiles, margin targets setup, margins measured, distances setup, distances measured, and more. As you can imagine, this requires organizations to plan and implement data setup, capture, and quality management initiatives to ensure that dependable information is available to support resourcing analysis and decisions. In the fast-paced, tight-budget world in which most organizations operate today, the effort and discipline required to maintain high-quality, structured data like those described in the above example are certainly not desirable and in some cases are not feasible. You can imagine how intrigued our audiences were when I informed them that we are ready to help them analyze volumes of unstructured data, detect trends, and produce recommendations. Our discussions delved into examples of how the firms could leverage Oracle’s Secure Enterprise Search and Endeca technologies to keyword search against, compare, and learn from unstructured resource and assignment data. We also considered examples of how they could employ Oracle Real-Time Decisions to generate statistically significant recommendations based on similar resourcing scenarios that have produced the desired satisfaction and profit margin results. --- Although I had almost no time for sight-seeing during this trip to Europe, I have to say that it may have been one of the most energizing and engaging trips of my career. Showing these dedicated customers how they can give every user a uniquely tailored set of tools and address business problems in ways that have to date been impossible made the journey across the Atlantic more than worth it. If any of these three topics intrigue you, I’d recommend you contact your Oracle applications representative to arrange for more detailed discussions with the appropriate members of our organization.

Read the article

Quartz.Net Windows Service Configure Logging

- by Tarun Arora

In this blog post I’ll be covering, Logging for Quartz.Net Windows Service 01 – Why doesn’t Quartz.Net Windows Service log by default 02 – Configuring Quartz.Net windows service for logging to eventlog, file, console, etc 03 – Results: Logging in action If you are new to Quartz.Net I would recommend going through, A brief Introduction to Quartz.net Walkthrough of Installing & Testing Quartz.Net as a Windows Service Writing & Scheduling your First HelloWorld job with Quartz.Net 01 – Why doesn’t Quartz.Net Windows Service log by default If you are trying to figure out why… The Quartz.Net windows service isn’t logging The Quartz.Net windows service isn’t writing anything to the event log The Quartz.Net windows service isn’t writing anything to a file How do I configure Quartz.Net windows service to use log4Net How do I change the level of logging for Quartz.Net Look no further, This blog post should help you answer these questions. Quartz.NET uses the Common.Logging framework for all of its logging needs. If you navigate to the directory where Quartz.Net Windows Service is installed (I have the service installed in C:\Program Files (x86)\Quartz.net, you can find out the location by looking at the properties of the service) and open ‘Quartz.Server.exe.config’ you’ll see that the Quartz.Net is already set up for logging to ConsoleAppender and EventLogAppender, but only ‘ConsoleAppender’ is set up as active. So, unless you have the console associated to the Quartz.Net service you won’t be able to see any logging. <log4net> <appender name="ConsoleAppender" type="log4net.Appender.ConsoleAppender"> <layout type="log4net.Layout.PatternLayout"> <conversionPattern value="%d [%t] %-5p %l - %m%n" /> </layout> </appender> <appender name="EventLogAppender" type="log4net.Appender.EventLogAppender"> <layout type="log4net.Layout.PatternLayout"> <conversionPattern value="%d [%t] %-5p %l - %m%n" /> </layout> </appender> <root> <level value="INFO" /> <appender-ref ref="ConsoleAppender" />   </root> </log4net> Problem: In the configuration above Quartz.Net Windows Service only has ConsoleAppender active. So, no logging will be done to EventLog. More over the RollingFileAppender isn’t setup at all. So, Quartz.Net will not log to an application trace log file. 02 – Configuring Quartz.Net windows service for logging to eventlog, file, console, etc Let’s change this behaviour by changing the config file… In the below config file, I have added the RollingFileAppender. This will configure Quartz.Net service to write to a log file. (<appender name="GeneralLog" type="log4net.Appender.RollingFileAppender">) I have specified the location for the log file (<arg key="configFile" value="Trace/application.log.txt"/>) I have enabled the EventLogAppender and RollingFileAppender to be written to by Quartz. Net windows service Changed the default level of logging from ‘Info’ to ‘All’. This means all activity performed by Quartz.Net Windows service will be logged. You might want to tune this back to ‘Debug’ or ‘Info’ later as logging ‘All’ will produce too much data to the logs. (<level value="ALL"/>) Since I have changed the logging level to ‘All’, I have added applicationSetting to remove logging log4Net internal debugging. (<add key="log4net.Internal.Debug" value="false"/>) <?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections> <section name="quartz" type="System.Configuration.NameValueSectionHandler, System, Version=1.0.5000.0,Culture=neutral, PublicKeyToken=b77a5c561934e089" /> <section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net" /> <sectionGroup name="common"> <section name="logging" type="Common.Logging.ConfigurationSectionHandler, Common.Logging" /> </sectionGroup> </configSections> <common> <logging> <factoryAdapter type="Common.Logging.Log4Net.Log4NetLoggerFactoryAdapter, Common.Logging.Log4net"> <arg key="configType" value="INLINE" /> <arg key="configFile" value="Trace/application.log.txt"/> <arg key="level" value="ALL" /> </factoryAdapter> </logging> </common> <appSettings> <add key="log4net.Internal.Debug" value="false"/> </appSettings> <log4net> <appender name="ConsoleAppender" type="log4net.Appender.ConsoleAppender"> <layout type="log4net.Layout.PatternLayout"> <conversionPattern value="%d [%t] %-5p %l - %m%n" /> </layout> </appender> <appender name="EventLogAppender" type="log4net.Appender.EventLogAppender"> <layout type="log4net.Layout.PatternLayout"> <conversionPattern value="%d [%t] %-5p %l - %m%n" /> </layout> </appender> <appender name="GeneralLog" type="log4net.Appender.RollingFileAppender"> <file value="Trace/application.log.txt"/> <appendToFile value="true"/> <maximumFileSize value="1024KB"/> <rollingStyle value="Size"/> <layout type="log4net.Layout.PatternLayout"> <conversionPattern value="%d{HH:mm:ss} [%t] %-5p %c - %m%n"/> </layout> </appender> <root> <level value="ALL" /> <appender-ref ref="ConsoleAppender" /> <appender-ref ref="EventLogAppender" /> <appender-ref ref="GeneralLog"/> </root> </log4net> </configuration> Note – Please ensure you restart the Quartz.Net Windows service for the config changes to be picked up by the service 03 – Results: Logging in action Once you start the Quartz.Net Windows Service up, the logging should be initiated to write all activities in the Console, EventLog and File… See screen shots below… Figure – Quartz.Net Windows Service logging all activity to the event log Figure – Quartz.Net Windows Service logging all activity to the application log file Where is the output from log4Net ConsoleAppender? As a default behaviour, the console isn't available in windows services, web services, windows forms. The output will simply be dismissed. Unless you are running the process interactively. Which you can do by firing up Quartz.Server.exe –i to see the output This was fourth in the series of posts on enterprise scheduling using Quartz.net, in the next post I’ll be covering troubleshooting why a scheduled task hasn’t fired on Quartz.net windows service. All Quartz.Net specific blog posts can listed here. Thank you for taking the time out and reading this blog post. If you enjoyed the post, remember to subscribe to http://feeds.feedburner.com/TarunArora. Stay tuned!

Read the article

How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article

MySQL Cluster 7.2: Over 8x Higher Performance than Cluster 7.1

- by Mat Keep

0 0 1 893 5092 Homework 42 11 5974 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} Summary The scalability enhancements delivered by extensions to multi-threaded data nodes enables MySQL Cluster 7.2 to deliver over 8x higher performance than the previous MySQL Cluster 7.1 release on a recent benchmark What’s New in MySQL Cluster 7.2 MySQL Cluster 7.2 was released as GA (Generally Available) in February 2012, delivering many enhancements to performance on complex queries, new NoSQL Key / Value API, cross-data center replication and ease-of-use. These enhancements are summarized in the Figure below, and detailed in the MySQL Cluster New Features whitepaper Figure 1: Next Generation Web Services, Cross Data Center Replication and Ease-of-Use Once of the key enhancements delivered in MySQL Cluster 7.2 is extensions made to the multi-threading processes of the data nodes. Multi-Threaded Data Node Extensions The MySQL Cluster 7.2 data node is now functionally divided into seven thread types: 1) Local Data Manager threads (ldm). Note – these are sometimes also called LQH threads. 2) Transaction Coordinator threads (tc) 3) Asynchronous Replication threads (rep) 4) Schema Management threads (main) 5) Network receiver threads (recv) 6) Network send threads (send) 7) IO threads Each of these thread types are discussed in more detail below. MySQL Cluster 7.2 increases the maximum number of LDM threads from 4 to 16. The LDM contains the actual data, which means that when using 16 threads the data is more heavily partitioned (this is automatic in MySQL Cluster). Each LDM thread maintains its own set of data partitions, index partitions and REDO log. The number of LDM partitions per data node is not dynamically configurable, but it is possible, however, to map more than one partition onto each LDM thread, providing flexibility in modifying the number of LDM threads. The TC domain stores the state of in-flight transactions. This means that every new transaction can easily be assigned to a new TC thread. Testing has shown that in most cases 1 TC thread per 2 LDM threads is sufficient, and in many cases even 1 TC thread per 4 LDM threads is also acceptable. Testing also demonstrated that in some instances where the workload needed to sustain very high update loads it is necessary to configure 3 to 4 TC threads per 4 LDM threads. In the previous MySQL Cluster 7.1 release, only one TC thread was available. This limit has been increased to 16 TC threads in MySQL Cluster 7.2. The TC domain also manages the Adaptive Query Localization functionality introduced in MySQL Cluster 7.2 that significantly enhanced complex query performance by pushing JOIN operations down to the data nodes. Asynchronous Replication was separated into its own thread with the release of MySQL Cluster 7.1, and has not been modified in the latest 7.2 release. To scale the number of TC threads, it was necessary to separate the Schema Management domain from the TC domain. The schema management thread has little load, so is implemented with a single thread. The Network receiver domain was bound to 1 thread in MySQL Cluster 7.1. With the increase of threads in MySQL Cluster 7.2 it is also necessary to increase the number of recv threads to 8. This enables each receive thread to service one or more sockets used to communicate with other nodes the Cluster. The Network send thread is a new thread type introduced in MySQL Cluster 7.2. Previously other threads handled the sending operations themselves, which can provide for lower latency. To achieve highest throughput however, it has been necessary to create dedicated send threads, of which 8 can be configured. It is still possible to configure MySQL Cluster 7.2 to a legacy mode that does not use any of the send threads – useful for those workloads that are most sensitive to latency. The IO Thread is the final thread type and there have been no changes to this domain in MySQL Cluster 7.2. Multiple IO threads were already available, which could be configured to either one thread per open file, or to a fixed number of IO threads that handle the IO traffic. Except when using compression on disk, the IO threads typically have a very light load. Benchmarking the Scalability Enhancements The scalability enhancements discussed above have made it possible to scale CPU usage of each data node to more than 5x of that possible in MySQL Cluster 7.1. In addition, a number of bottlenecks have been removed, making it possible to scale data node performance by even more than 5x. Figure 2: MySQL Cluster 7.2 Delivers 8.4x Higher Performance than 7.1 The flexAsynch benchmark was used to compare MySQL Cluster 7.2 performance to 7.1 across an 8-node Intel Xeon x5670-based cluster of dual socket commodity servers (6 cores each). As the results demonstrate, MySQL Cluster 7.2 delivers over 8x higher performance per data nodes than MySQL Cluster 7.1. More details of this and other benchmarks will be published in a new whitepaper – coming soon, so stay tuned! In a following blog post, I’ll provide recommendations on optimum thread configurations for different types of server processor. You can also learn more from the Best Practices Guide to Optimizing Performance of MySQL Cluster Conclusion MySQL Cluster has achieved a range of impressive benchmark results, and set in context with the previous 7.1 release, is able to deliver over 8x higher performance per node. As a result, the multi-threaded data node extensions not only serve to increase performance of MySQL Cluster, they also enable users to achieve significantly improved levels of utilization from current and future generations of massively multi-core, multi-thread processor designs.

Read the article

Combination of Operating Mode and Commit Strategy

- by Kevin Yang

If you want to populate a source into multiple targets, you may also want to ensure that every row from the source affects all targets uniformly (or separately). Let’s consider the Example Mapping below. If a row from SOURCE causes different changes in multiple targets (TARGET_1, TARGET_2 and TARGET_3), for example, it can be successfully inserted into TARGET_1 and TARGET_3, but failed to be inserted into TARGET_2, and the current Mapping Property TLO (target load order) is “TARGET_1 -> TARGET_2 -> TARGET_3”. What should Oracle Warehouse Builder do, in order to commit the appropriate data to all affected targets at the same time? If it doesn’t behave as you intended, the data could become inaccurate and possibly unusable. Example Mapping In OWB, we can use Mapping Configuration Commit Strategies and Operating Modes together to achieve this kind of requirements. Below we will explore the combination of these two features and how they affect the results in the target tables Before going to the example, let’s review some of the terms we will be using (Details can be found in white paper Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide11g Release 2): Operating Modes: Set-Based Mode: Warehouse Builder generates a single SQL statement that processes all data and performs all operations. Row-Based Mode: Warehouse Builder generates statements that process data row by row. The select statement is in a SQL cursor. All subsequent statements are PL/SQL. Row-Based (Target Only) Mode: Warehouse Builder generates a cursor select statement and attempts to include as many operations as possible in the cursor. For each target, Warehouse Builder inserts each row into the target separately. Commit Strategies: Automatic: Warehouse Builder loads and then automatically commits data based on the mapping design. If the mapping has multiple targets, Warehouse Builder commits and rolls back each target separately and independently of other targets. Use the automatic commit when the consequences of multiple targets being loaded unequally are not great or are irrelevant. Automatic correlated: It is a specialized type of automatic commit that applies to PL/SQL mappings with multiple targets only. Warehouse Builder considers all targets collectively and commits or rolls back data uniformly across all targets. Use the correlated commit when it is important to ensure that every row in the source affects all affected targets uniformly. Manual: select manual commit control for PL/SQL mappings when you want to interject complex business logic, perform validations, or run other mappings before committing data. Combination of the commit strategy and operating mode To understand the effects of each combination of operating mode and commit strategy, I’ll illustrate using the following example Mapping. Firstly we insert 100 rows into the SOURCE table and make sure that the 99th row and 100th row have the same ID value. And then we create a unique key constraint on ID column for TARGET_2 table. So while running the example mapping, OWB tries to load all 100 rows to each of the targets. But the mapping should fail to load the 100th row to TARGET_2, because it will violate the unique key constraint of table TARGET_2. With different combinations of Commit Strategy and Operating Mode, here are the results ¦ Set-based/ Correlated Commit: Configuration of Example mapping: Result: What’s happening: A single error anywhere in the mapping triggers the rollback of all data. OWB encounters the error inserting into Target_2, it reports an error for the table and does not load the row. OWB rolls back all the rows inserted into Target_1 and does not attempt to load rows to Target_3. No rows are added to any of the target tables. ¦ Row-based/ Correlated Commit: Configuration of Example mapping: Result: What’s happening: OWB evaluates each row separately and loads it to all three targets. Loading continues in this way until OWB encounters an error loading row 100th to Target_2. OWB reports the error and does not load the row. It rolls back the row 100th previously inserted into Target_1 and does not attempt to load row 100 to Target_3. Then, if there are remaining rows, OWB will continue loading them, resuming with loading rows to Target_1. The mapping completes with 99 rows inserted into each target. ¦ Set-based/ Automatic Commit: Configuration of Example mapping: Result: What’s happening: When OWB encounters the error inserting into Target_2, it does not load any rows and reports an error for the table. It does, however, continue to insert rows into Target_3 and does not roll back the rows previously inserted into Target_1. The mapping completes with one error message for Target_2, no rows inserted into Target_2, and 100 rows inserted into Target_1 and Target_3 separately. ¦ Row-based/Automatic Commit: Configuration of Example mapping: Result: What’s happening: OWB evaluates each row separately for loading into the targets. Loading continues in this way until OWB encounters an error loading row 100 to Target_2 and reports the error. OWB does not roll back row 100th from Target_1, does insert it into Target_3. If there are remaining rows, it will continue to load them. The mapping completes with 99 rows inserted into Target_2 and 100 rows inserted into each of the other targets. Note: Automatic Correlated commit is not applicable for row-based (target only). If you design a mapping with the row-based (target only) and correlated commit combination, OWB runs the mapping but does not perform the correlated commit. In set-based mode, correlated commit may impact the size of your rollback segments. Space for rollback segments may be a concern when you merge data (insert/update or update/insert). Correlated commit operates transparently with PL/SQL bulk processing code. The correlated commit strategy is not available for mappings run in any mode that are configured for Partition Exchange Loading or that include a Queue, Match Merge, or Table Function operator. If you want to practice in your own environment, you can follow the steps: 1. Import the MDL file: commit_operating_mode.mdl 2. Fix the location for oracle module ORCL and deploy all tables under it. 3. Insert sample records into SOURCE table, using below plsql code: begin for i in 1..99 loop insert into source values(i, 'col_'||i); end loop; insert into source values(99, 'col_99'); end; 4. Configure MAPPING_1 to any combinations of operating mode and commit strategy you want to test. And make sure feature TLO of mapping is open. 5. Deploy Mapping “MAPPING_1”. 6. Run the mapping and check the result.

Read the article

C#: Does an IDisposable in a Halted Iterator Dispose?

- by James Michael Hare

If that sounds confusing, let me give you an example. Let's say you expose a method to read a database of products, and instead of returning a List<Product> you return an IEnumerable<Product> in iterator form (yield return). This accomplishes several good things: The IDataReader is not passed out of the Data Access Layer which prevents abstraction leak and resource leak potentials. You don't need to construct a full List<Product> in memory (which could be very big) if you just want to forward iterate once. If you only want to consume up to a certain point in the list, you won't incur the database cost of looking up the other items. This could give us an example like: 1: // a sample data access object class to do standard CRUD operations. 2: public class ProductDao 3: { 4: private DbProviderFactory _factory = SqlClientFactory.Instance 5: 6: // a method that would retrieve all available products 7: public IEnumerable<Product> GetAvailableProducts() 8: { 9: // must create the connection 10: using (var con = _factory.CreateConnection()) 11: { 12: con.ConnectionString = _productsConnectionString; 13: con.Open(); 14: 15: // create the command 16: using (var cmd = _factory.CreateCommand()) 17: { 18: cmd.Connection = con; 19: cmd.CommandText = _getAllProductsStoredProc; 20: cmd.CommandType = CommandType.StoredProcedure; 21: 22: // get a reader and pass back all results 23: using (var reader = cmd.ExecuteReader()) 24: { 25: while(reader.Read()) 26: { 27: yield return new Product 28: { 29: Name = reader["product_name"].ToString(), 30: ... 31: }; 32: } 33: } 34: } 35: } 36: } 37: } The database details themselves are irrelevant. I will say, though, that I'm a big fan of using the System.Data.Common classes instead of your provider specific counterparts directly (SqlCommand, OracleCommand, etc). This lets you mock your data sources easily in unit testing and also allows you to swap out your provider in one line of code. In fact, one of the shared components I'm most proud of implementing was our group's DatabaseUtility library that simplifies all the database access above into one line of code in a thread-safe and provider-neutral way. I went with my own flavor instead of the EL due to the fact I didn't want to force internal company consumers to use the EL if they didn't want to, and it made it easy to allow them to mock their database for unit testing by providing a MockCommand, MockConnection, etc that followed the System.Data.Common model. One of these days I'll blog on that if anyone's interested. Regardless, you often have situations like the above where you are consuming and iterating through a resource that must be closed once you are finished iterating. For the reasons stated above, I didn't want to return IDataReader (that would force them to remember to Dispose it), and I didn't want to return List<Product> (that would force them to hold all products in memory) -- but the first time I wrote this, I was worried. What if you never consume the last item and exit the loop? Are the reader, command, and connection all disposed correctly? Of course, I was 99.999999% sure the creators of C# had already thought of this and taken care of it, but inspection in Reflector was difficult due to the nature of the state machines yield return generates, so I decided to try a quick example program to verify whether or not Dispose() will be called when an iterator is broken from outside the iterator itself -- i.e. before the iterator reports there are no more items. So I wrote a quick Sequencer class with a Dispose() method and an iterator for it. Yes, it is COMPLETELY contrived: 1: // A disposable sequence of int -- yes this is completely contrived... 2: internal class Sequencer : IDisposable 3: { 4: private int _i = 0; 5: private readonly object _mutex = new object(); 6: 7: // Constructs an int sequence. 8: public Sequencer(int start) 9: { 10: _i = start; 11: } 12: 13: // Gets the next integer 14: public int GetNext() 15: { 16: lock (_mutex) 17: { 18: return _i++; 19: } 20: } 21: 22: // Dispose the sequence of integers. 23: public void Dispose() 24: { 25: // force output immediately (flush the buffer) 26: Console.WriteLine("Disposed with last sequence number of {0}!", _i); 27: Console.Out.Flush(); 28: } 29: } And then I created a generator (infinite-loop iterator) that did the using block for auto-Disposal: 1: // simply defines an extension method off of an int to start a sequence 2: public static class SequencerExtensions 3: { 4: // generates an infinite sequence starting at the specified number 5: public static IEnumerable<int> GetSequence(this int starter) 6: { 7: // note the using here, will call Dispose() when block terminated. 8: using (var seq = new Sequencer(starter)) 9: { 10: // infinite loop on this generator, means must be bounded by caller! 11: while(true) 12: { 13: yield return seq.GetNext(); 14: } 15: } 16: } 17: } This is really the same conundrum as the database problem originally posed. Here we are using iteration (yield return) over a large collection (infinite sequence of integers). If we cut the sequence short by breaking iteration, will that using block exit and hence, Dispose be called? Well, let's see: 1: // The test program class 2: public class IteratorTest 3: { 4: // The main test method. 5: public static void Main() 6: { 7: Console.WriteLine("Going to consume 10 of infinite items"); 8: Console.Out.Flush(); 9: 10: foreach(var i in 0.GetSequence()) 11: { 12: // could use TakeWhile, but wanted to output right at break... 13: if(i >= 10) 14: { 15: Console.WriteLine("Breaking now!"); 16: Console.Out.Flush(); 17: break; 18: } 19: 20: Console.WriteLine(i); 21: Console.Out.Flush(); 22: } 23: 24: Console.WriteLine("Done with loop."); 25: Console.Out.Flush(); 26: } 27: } So, what do we see? Do we see the "Disposed" message from our dispose, or did the Dispose get skipped because from an "eyeball" perspective we should be locked in that infinite generator loop? Here's the results: 1: Going to consume 10 of infinite items 2: 0 3: 1 4: 2 5: 3 6: 4 7: 5 8: 6 9: 7 10: 8 11: 9 12: Breaking now! 13: Disposed with last sequence number of 11! 14: Done with loop. Yes indeed, when we break the loop, the state machine that C# generates for yield iterate exits the iteration through the using blocks and auto-disposes the IDisposable correctly. I must admit, though, the first time I wrote one, I began to wonder and that led to this test. If you've never seen iterators before (I wrote a previous entry here) the infinite loop may throw you, but you have to keep in mind it is not a linear piece of code, that every time you hit a "yield return" it cedes control back to the state machine generated for the iterator. And this state machine, I'm happy to say, is smart enough to clean up the using blocks correctly. I suspected those wily guys and gals at Microsoft engineered it well, and I wasn't disappointed. But, I've been bitten by assumptions before, so it's good to test and see. Yes, maybe you knew it would or figured it would, but isn't it nice to know? And as those campy 80s G.I. Joe cartoon public service reminders always taught us, "Knowing is half the battle...". Technorati Tags: C#,.NET

Read the article

SQL SERVER – Faster SQL Server Databases and Applications – Power and Control with SafePeak Caching Options

- by Pinal Dave

Update: This blog post is written based on the SafePeak, which is available for free download. Today, I’d like to examine more closely one of my preferred technologies for accelerating SQL Server databases, SafePeak. Safepeak’s software provides a variety of advanced data caching options, techniques and tools to accelerate the performance and scalability of SQL Server databases and applications. I’d like to look more closely at some of these options, as some of these capabilities could help you address lagging database and performance on your systems. To better understand the available options, it is best to start by understanding the difference between the usual “Basic Caching” vs. SafePeak’s “Dynamic Caching”. Basic Caching Basic Caching (or the stale and static cache) is an ability to put the results from a query into cache for a certain period of time. It is based on TTL, or Time-to-live, and is designed to stay in cache no matter what happens to the data. For example, although the actual data can be modified due to DML commands (update/insert/delete), the cache will still hold the same obsolete query data. Meaning that with the Basic Caching is really static / stale cache. As you can tell, this approach has its limitations. Dynamic Caching Dynamic Caching (or the non-stale cache) is an ability to put the results from a query into cache while maintaining the cache transaction awareness looking for possible data modifications. The modifications can come as a result of: DML commands (update/insert/delete), indirect modifications due to triggers on other tables, executions of stored procedures with internal DML commands complex cases of stored procedures with multiple levels of internal stored procedures logic. When data modification commands arrive, the caching system identifies the related cache items and evicts them from cache immediately. In the dynamic caching option the TTL setting still exists, although its importance is reduced, since the main factor for cache invalidation (or cache eviction) become the actual data updates commands. Now that we have a basic understanding of the differences between “basic” and “dynamic” caching, let’s dive in deeper. SafePeak: A comprehensive and versatile caching platform SafePeak comes with a wide range of caching options. Some of SafePeak’s caching options are automated, while others require manual configuration. Together they provide a complete solution for IT and Data managers to reach excellent performance acceleration and application scalability for a wide range of business cases and applications. Automated caching of SQL Queries: Fully/semi-automated caching of all “read” SQL queries, containing any types of data, including Blobs, XMLs, Texts as well as all other standard data types. SafePeak automatically analyzes the incoming queries, categorizes them into SQL Patterns, identifying directly and indirectly accessed tables, views, functions and stored procedures; Automated caching of Stored Procedures: Fully or semi-automated caching of all read” stored procedures, including procedures with complex sub-procedure logic as well as procedures with complex dynamic SQL code. All procedures are analyzed in advance by SafePeak’s Metadata-Learning process, their SQL schemas are parsed – resulting with a full understanding of the underlying code, objects dependencies (tables, views, functions, sub-procedures) enabling automated or semi-automated (manually review and activate by a mouse-click) cache activation, with full understanding of the transaction logic for cache real-time invalidation; Transaction aware cache: Automated cache awareness for SQL transactions (SQL and in-procs); Dynamic SQL Caching: Procedures with dynamic SQL are pre-parsed, enabling easy cache configuration, eliminating SQL Server load for parsing time and delivering high response time value even in most complicated use-cases; Fully Automated Caching: SQL Patterns (including SQL queries and stored procedures) that are categorized by SafePeak as “read and deterministic” are automatically activated for caching; Semi-Automated Caching: SQL Patterns categorized as “Read and Non deterministic” are patterns of SQL queries and stored procedures that contain reference to non-deterministic functions, like getdate(). Such SQL Patterns are reviewed by the SafePeak administrator and in usually most of them are activated manually for caching (point and click activation); Fully Dynamic Caching: Automated detection of all dependent tables in each SQL Pattern, with automated real-time eviction of the relevant cache items in the event of “write” commands (a DML or a stored procedure) to one of relevant tables. A default setting; Semi Dynamic Caching: A manual cache configuration option enabling reducing the sensitivity of specific SQL Patterns to “write” commands to certain tables/views. An optimization technique relevant for cases when the query data is either known to be static (like archive order details), or when the application sensitivity to fresh data is not critical and can be stale for short period of time (gaining better performance and reduced load); Scheduled Cache Eviction: A manual cache configuration option enabling scheduling SQL Pattern cache eviction based on certain time(s) during a day. A very useful optimization technique when (for example) certain SQL Patterns can be cached but are time sensitive. Example: “select customers that today is their birthday”, an SQL with getdate() function, which can and should be cached, but the data stays relevant only until 00:00 (midnight); Parsing Exceptions Management: Stored procedures that were not fully parsed by SafePeak (due to too complex dynamic SQL or unfamiliar syntax), are signed as “Dynamic Objects” with highest transaction safety settings (such as: Full global cache eviction, DDL Check = lock cache and check for schema changes, and more). The SafePeak solution points the user to the Dynamic Objects that are important for cache effectiveness, provides easy configuration interface, allowing you to improve cache hits and reduce cache global evictions. Usually this is the first configuration in a deployment; Overriding Settings of Stored Procedures: Override the settings of stored procedures (or other object types) for cache optimization. For example, in case a stored procedure SP1 has an “insert” into table T1, it will not be allowed to be cached. However, it is possible that T1 is just a “logging or instrumentation” table left by developers. By overriding the settings a user can allow caching of the problematic stored procedure; Advanced Cache Warm-Up: Creating an XML-based list of queries and stored procedure (with lists of parameters) for periodically automated pre-fetching and caching. An advanced tool allowing you to handle more rare but very performance sensitive queries pre-fetch them into cache allowing high performance for users’ data access; Configuration Driven by Deep SQL Analytics: All SQL queries are continuously logged and analyzed, providing users with deep SQL Analytics and Performance Monitoring. Reduce troubleshooting from days to minutes with database objects and SQL Patterns heat-map. The performance driven configuration helps you to focus on the most important settings that bring you the highest performance gains. Use of SafePeak SQL Analytics allows continuous performance monitoring and analysis, easy identification of bottlenecks of both real-time and historical data; Cloud Ready: Available for instant deployment on Amazon Web Services (AWS). As you can see, there are many options to configure SafePeak’s SQL Server database and application acceleration caching technology to best fit a lot of situations. If you’re not familiar with their technology, they offer free-trial software you can download that comes with a free “help session” to help get you started. You can access the free trial here. Also, SafePeak is available to use on Amazon Cloud. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

CodePlex Daily Summary for Wednesday, May 19, 2010

CodePlex Daily Summary for Wednesday, May 19, 2010New Projects3FD - Framework For Fast Development: This is a C++ framework that provides a solid error handling structure, garbage collection, multi-threading and portability between compilers. The ...ali test project: test projectAttribute Builder: The Attribute Builder builds an attribute from a lambda expression because it can.BDK0008: it is a food lovers websitecgdigest: cg digest template for non-profit orgCokmez: Bilmuh cokmez duyuru sistemiDot Game: It is a dot game that our Bangladeshi people used to play at their childhood time and their last time when they are poor for working.ESRI Javascript .NET Integration: Visual Studio project that shows how to integrate the Esri Javascript API with .NET Exchange 2010 RBAC Editor (RBAC GUI): Exchange 2010 RBAC Editor (RBAC GUI) Developed in C# and using Powershell behind the scenes RBAC tool to simplfy RBAC administrationFile Validator (Validador de Archivos): Componente que permite realizar la validación de archivos (txt, imagenes, PDF, etc) actualmente solo tiene implementado la parte de los txt, permit...Grip 09 Lab4: GripjPageFlipper: This is a wonderful implementation of page flipper entirely based on HTML 5 <canvas> tag. It means that it can work in any browser that supports HT...Main project: Index bird families and associated species. Malware Analysis and Can Handler: MACH is a tool to organize and catalog your malware analysis canned responses, and to track the topic response lifecycle for forum experts.Perf Web: Performance team web sitePiPiBugNet: PiPiBugNet是一套全新的开源Bug管理系统。 PiPiBugNet代码基于ASP.NET 2.0平台开发，编程语言为C#。 PiPiBugNet界面基于Ext JS设计，提供了极佳的用户体验。RemoteDesktop: integrated remote console, desktop and chat utilityRuneScape emulation done right.: RuneScape emulator.Sandkasten: SandkastenSilverlight Metro Theme: Metro Theme for Silverlight.Silverlight Stereoscopy: Stereoscopy with Silverlight.Twitivia: Twitivia is an online trivia service that runs through twitter and is being used as an example set of projects. C#, MVC, Windows Services, Linq ...XPool: A simple school project.New ReleasesDot Game: 'Dot Game' first release: Dot Game first release This is the 'Dot Game' first release.DotNetNuke® Store: 02.01.35: What's New in this release? Bugs corrected: - Fixed a resource for the header in the Category list of the Store Admin module. - Added several test...ESRI Javascript .NET Integration: Map search results in a DataView: Visual Studio 2010 example showing how to pass Map results back to ASP.NET for use in a DataView.Exchange 2010 RBAC Editor (RBAC GUI): RBAC Editor: This binary is still beta (0.0.9.1) but in most case it's very stableExtending C# editor - Outlining, classification: first revision: a couple of bug has been eliminated, performance improvementFloe IRC Client: Floe IRC Client 2010-05 R6: Corrected bug where text would be unexpectedly copied to the clipboard.Floe IRC Client: Floe IRC Client 2010-05 R7: - Fixed bug where text would show up in a query window with someone if they said something on a channel that you are both present on.Free Silverlight & WPF Chart Control - Visifire: Visifire SL and WPF Charts v3.0.9 GA released: Hi, Today we have released the final version of Visifire v3.0.9 which contains the following enhancements: * Two new properties ActualAxisMin...Free Silverlight & WPF Chart Control - Visifire: Visifire SL and WPF Charts v3.5.2 GA Released: Hi, Today we have released the final version of Visifire v3.5.2 which contains the following enhancements: Two new properties ActualAxisMinimum a...HB Batch Encoder Mk 2: HB Batch Encoder Mk2 v1.02: Added .mov support.jPageFlipper: jPageFlipper 0.9: This is an initial community preview of jPageFlipper. It's not ready for production usage but has almost all functionality implemented.linq.js - LINQ for JavaScript: ver 2.1.0.0: Add Class Dictionary Lookup Grouping OrderedEnumerable Add Method ToDictionary MemoizeAll Share Let Add Overload ...Microsoft Research Biology Extension for Excel: MSR Biology Extension for Excel - M9: M9 Release includes the following updates to the previous release: > Import / Export support from Excel for multiple file formats > Bug fixes and ...Nifty CSharp Tools: Event Watcher: Event Watcher!Paint.NET Bulk Image Processor: Paint.NET Bulk Image Processor v1.0: This is the initial release of the Paint.NET Bulk Image processor plugin. All feedback is welcome.PiPiBugNet: PiPiBugNet架构设计: PiPiBugNet架构设计，未包含功能实现RuneScape emulation done right.: rc0: Release cantidate 0.Rx Contrib: V1.6: Adding CCR queue as adapter for the ReactiveQueue credits goes to Yuval Mazor http://blogs.microsoft.co.il/blogs/yuvmaz/Silverlight Metro Theme: Silverlight Metro Theme Alpha 1: Silverlight Metro Theme Alpha 1Silverlight Stereoscopy: Silverlight Stereoscopy Alpha 1: Silverlight Stereoscopy Alpha 20100518Stratosphere: Stratosphere 1.0.6.0: Introduced support for batch put Introduced Support for conditional updates and consistent read Added support for select conditions Brought t...VCC: Latest build, v2.1.30518.0: Automatic drop of latest buildVideo Downloader: Example Program - 1.1: Example Program showing the features of the DLL and what can be achieved using it. For DLL Version 1.1.Video Downloader: Version 1.1: Version 1.1 See Home Page for usage and more information regarding new features. Please remember changes at You-Tube can prevent this software from...WatchersNET.TagCloud: WatchersNET.TagCloud 01.06.00: Whats New New Tag Mode: Show Tags from Ventrian.com NewsArticles Module New Tag Mode: Show Tags from Ventrian.com SimpleGallery Module Hyperlin...Windows Double Explorer: WDE v0.4: -optimization -switch to new vst2010 -viewer close now by pressing escape -reorder tabs -send selected fullname or shortnames via email (eye button...Most Popular ProjectsRawrWBFS ManagerAJAX Control ToolkitMicrosoft SQL Server Product Samples: DatabaseSilverlight ToolkitWindows Presentation Foundation (WPF)patterns & practices – Enterprise LibraryMicrosoft SQL Server Community & SamplesPHPExcelASP.NETMost Active Projectspatterns & practices – Enterprise LibraryRawrPHPExcelGMap.NET - Great Maps for Windows Forms & PresentationCustomer Portal Accelerator for Microsoft Dynamics CRMBlogEngine.NETWindows Azure Command-line Tools for PHP DevelopersCassiniDev - Cassini 3.5/4.0 Developers EditionSQL Server PowerShell ExtensionsFluent Ribbon Control Suite

Read the article

Optimizing AES modes on Solaris for Intel Westmere

- by danx

Optimizing AES modes on Solaris for Intel Westmere Review AES is a strong method of symmetric (secret-key) encryption. It is a U.S. FIPS-approved cryptographic algorithm (FIPS 197) that operates on 16-byte blocks. AES has been available since 2001 and is widely used. However, AES by itself has a weakness. AES encryption isn't usually used by itself because identical blocks of plaintext are always encrypted into identical blocks of ciphertext. This encryption can be easily attacked with "dictionaries" of common blocks of text and allows one to more-easily discern the content of the unknown cryptotext. This mode of encryption is called "Electronic Code Book" (ECB), because one in theory can keep a "code book" of all known cryptotext and plaintext results to cipher and decipher AES. In practice, a complete "code book" is not practical, even in electronic form, but large dictionaries of common plaintext blocks is still possible. Here's a diagram of encrypting input data using AES ECB mode: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 What's the solution to the same cleartext input producing the same ciphertext output? The solution is to further process the encrypted or decrypted text in such a way that the same text produces different output. This usually involves an Initialization Vector (IV) and XORing the decrypted or encrypted text. As an example, I'll illustrate CBC mode encryption: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ IV >----->(XOR) +------------->(XOR) +---> . . . . | | | | | | | | \/ | \/ | AESKey-->(AES Encryption) | AESKey-->(AES Encryption) | | | | | | | | | \/ | \/ | CipherTextOutput ------+ CipherTextOutput -------+ Block 1 Block 2 The steps for CBC encryption are: Start with a 16-byte Initialization Vector (IV), choosen randomly. XOR the IV with the first block of input plaintext Encrypt the result with AES using a user-provided key. The result is the first 16-bytes of output cryptotext. Use the cryptotext (instead of the IV) of the previous block to XOR with the next input block of plaintext Another mode besides CBC is Counter Mode (CTR). As with CBC mode, it also starts with a 16-byte IV. However, for subsequent blocks, the IV is just incremented by one. Also, the IV ix XORed with the AES encryption result (not the plain text input). Here's an illustration: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ IV >----->(XOR) IV + 1 >---->(XOR) IV + 2 ---> . . . . | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 Optimization Which of these modes can be parallelized? ECB encryption/decryption can be parallelized because it does more than plain AES encryption and decryption, as mentioned above. CBC encryption can't be parallelized because it depends on the output of the previous block. However, CBC decryption can be parallelized because all the encrypted blocks are known at the beginning. CTR encryption and decryption can be parallelized because the input to each block is known--it's just the IV incremented by one for each subsequent block. So, in summary, for ECB, CBC, and CTR modes, encryption and decryption can be parallelized with the exception of CBC encryption. How do we parallelize encryption? By interleaving. Usually when reading and writing data there are pipeline "stalls" (idle processor cycles) that result from waiting for memory to be loaded or stored to or from CPU registers. Since the software is written to encrypt/decrypt the next data block where pipeline stalls usually occurs, we can avoid stalls and crypt with fewer cycles. This software processes 4 blocks at a time, which ensures virtually no waiting ("stalling") for reading or writing data in memory. Other Optimizations Besides interleaving, other optimizations performed are Loading the entire key schedule into the 128-bit %xmm registers. This is done once for per 4-block of data (since 4 blocks of data is processed, when present). The following is loaded: the entire "key schedule" (user input key preprocessed for encryption and decryption). This takes 11, 13, or 15 registers, for AES-128, AES-192, and AES-256, respectively The input data is loaded into another %xmm register The same register contains the output result after encrypting/decrypting Using SSSE 4 instructions (AESNI). Besides the aesenc, aesenclast, aesdec, aesdeclast, aeskeygenassist, and aesimc AESNI instructions, Intel has several other instructions that operate on the 128-bit %xmm registers. Some common instructions for encryption are: pxor exclusive or (very useful), movdqu load/store a %xmm register from/to memory, pshufb shuffle bytes for byte swapping, pclmulqdq carry-less multiply for GCM mode Combining AES encryption/decryption with CBC or CTR modes processing. Instead of loading input data twice (once for AES encryption/decryption, and again for modes (CTR or CBC, for example) processing, the input data is loaded once as both AES and modes operations occur at in the same function Performance Everyone likes pretty color charts, so here they are. I ran these on Solaris 11 running on a Piketon Platform system with a 4-core Intel Clarkdale processor @3.20GHz. Clarkdale which is part of the Westmere processor architecture family. The "before" case is Solaris 11, unmodified. Keep in mind that the "before" case already has been optimized with hand-coded Intel AESNI assembly. The "after" case has combined AES-NI and mode instructions, interleaved 4 blocks at-a-time. « For the first table, lower is better (milliseconds). The first table shows the performance improvement using the Solaris encrypt(1) and decrypt(1) CLI commands. I encrypted and decrypted a 1/2 GByte file on /tmp (swap tmpfs). Encryption improved by about 40% and decryption improved by about 80%. AES-128 is slighty faster than AES-256, as expected. The second table shows more detail timings for CBC, CTR, and ECB modes for the 3 AES key sizes and different data lengths. » The results shown are the percentage improvement as shown by an internal PKCS#11 microbenchmark. And keep in mind the previous baseline code already had optimized AESNI assembly! The keysize (AES-128, 192, or 256) makes little difference in relative percentage improvement (although, of course, AES-128 is faster than AES-256). Larger data sizes show better improvement than 128-byte data. Availability This software is in Solaris 11 FCS. It is available in the 64-bit libcrypto library and the "aes" Solaris kernel module. You must be running hardware that supports AESNI (for example, Intel Westmere and Sandy Bridge, microprocessor architectures). The easiest way to determine if AES-NI is available is with the isainfo(1) command. For example, $ isainfo -v 64-bit amd64 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this software. Solaris libraries and kernel automatically determine if it's running on AESNI-capable machines and execute the correctly-tuned software for the current microprocessor. Summary Maximum throughput of AES cipher modes can be achieved by combining AES encryption with modes processing, interleaving encryption of 4 blocks at a time, and using Intel's wide 128-bit %xmm registers and instructions. References "Block cipher modes of operation", Wikipedia Good overview of AES modes (ECB, CBC, CTR, etc.) "Advanced Encryption Standard", Wikipedia "Current Modes" describes NIST-approved block cipher modes (ECB,CBC, CFB, OFB, CCM, GCM)

Read the article

Recap: Oracle Fusion Middleware Strategies Driving Business Innovation

- by Harish Gaur

Hasan Rizvi, Executive Vice President of Oracle Fusion Middleware & Java took the stage on Tuesday to discuss how Oracle Fusion Middleware helps enable business innovation. Through a series of product demos and customer showcases, Hassan demonstrated how Oracle Fusion Middleware is a complete platform to harness the latest technological innovations (cloud, mobile, social and Fast Data) throughout the application lifecycle. Fig 1: Oracle Fusion Middleware is the foundation of business innovation This Session included 4 demonstrations to illustrate these strategies: 1. Build and deploy native mobile applications using Oracle ADF Mobile 2. Empower business user to model processes, design user interface and have rich mobile experience for process interaction using Oracle BPM Suite PS6. 3. Create collaborative user experience and integrate social sign-on using Oracle WebCenter Portal, Oracle WebCenter Content, Oracle Social Network & Oracle Identity Management 11g R2 4. Deploy and manage business applications on Oracle Exalogic Nike, LA Department of Water & Power and Nintendo joined Hasan on stage to share how their organizations are leveraging Oracle Fusion Middleware to enable business innovation. Managing Performance in the Wrld of Social and Mobile How do you provide predictable scalability and performance for an application that monitors active lifestyle of 8 million users on a daily basis? Nike’s answer is Oracle Coherence, a component of Oracle Fusion Middleware and Oracle Exadata. Fig 2: Oracle Coherence enabled data grid improves performance of Nike+ Digital Sports Platform Nicole Otto, Sr. Director of Consumer Digital Technology discussed the vision of the Nike+ platform, a platform which represents a shift for NIKE from a "product" to a "product +" experience. There are currently nearly 8 million users in the Nike+ system who are using digitally-enabled Nike+ devices. Once data from the Nike+ device is transmitted to Nike+ application, users access the Nike+ website or via the Nike mobile applicatoin, seeing metrics around their daily active lifestyle and even engage in socially compelling experiences to compare, compete or collaborate their data with their friends. Nike expects the number of users to grow significantly this year which will drive an explosion of data and potential new experiences. To deal with this challenge, Nike envisioned building a shared platform that would drive a consumer-centric model for the company. Nike built this new platform using Oracle Coherence and Oracle Exadata. Using Coherence, Nike built a data grid tier as a distributed cache, thereby provide low-latency access to most recent and relevant data to consumers. Nicole discussed how Nike+ Digital Sports Platform is unique in the way that it utilizes the Coherence Grid. Nike takes advantage of Coherence as a traditional cache using both cache-aside and cache-through patterns. This new tier has enabled Nike to create a horizontally scalable distributed event-driven processing architecture. Current data grid volume is approximately 150,000 request per minute with about 40 million objects at any given time on the grid. Improving Customer Experience Across Multiple Channels Customer experience is on top of every CIO's mind. Customer Experience needs to be consistent and secure across multiple devices consumers may use. This is the challenge Matt Lampe, CIO of Los Angeles Department of Water & Power (LADWP) was faced with. Despite being the largest utilities company in the country, LADWP had been relying on a 38 year old customer information system for serving its customers. Their prior system had been unable to keep up with growing customer demands. Last year, LADWP embarked on a journey to improve customer experience for 1.6million LA DWP customers using Oracle WebCenter platform. Figure 3: Multi channel & Multi lingual LADWP.com built using Oracle WebCenter & Oracle Identity Management platform Matt shed light on his efforts to drive customer self-service across 3 dimensions – new website, new IVR platform and new bill payment service. LADWP has built a new portal to increase customer self-service while reducing the transactions via IVR. LADWP's website is powered Oracle WebCenter Portal and is accessible by desktop and mobile devices. By leveraging Oracle WebCenter, LADWP eliminated the need to build, format, and maintain individual mobile applications or websites for different devices. Their entire content is managed using Oracle WebCenter Content and secured using Oracle Identity Management. This new portal automated their paper based processes to web based workflows for customers. This includes automation of Self Service implemented through My Account - like Bill Pay, Payment History, Bill History and Usage Analysis. LADWP's solution went live in April 2012. Matt indicated that LADWP's Self-Service Portal has greatly improved customer satisfaction. In a JD Power Associates website satisfaction survey, results indicate rankings have climbed by 25+ points, marking a remarkable increase in user experience. Bolstering Performance and Simplifying Manageability of Business Applications Ingvar Petursson, Senior Vice Preisdent of IT at Nintendo America joined Hasan on-stage to discuss their choice of Exalogic. Nintendo had significant new requirements coming their way for business systems, both internal and external, in the years to come, especially with new products like the WiiU on the horizon this holiday season. Nintendo needed a platform that could give them performance, availability and ease of management as they deploy business systems. Ingvar selected Engineered Systems for two reasons: 1. High performance 2. Ease of management Figure 4: Nintendo relies on Oracle Exalogic to run ATG eCommerce, Oracle e-Business Suite and several business applications Nintendo made a decision to run their business applications (ATG eCommerce, E-Business Suite) and several Fusion Middleware components on the Exalogic platform. What impressed Ingvar was the "stress” testing results during evaluation. Oracle Exalogic could handle their 3-year load estimates for many functions, which was better than Nintendo expected without any hardware expansion. Faster Processing of Big Data Middleware plays an increasingly important role in Big Data. Last year, we announced at OpenWorld the introduction of Oracle Data Integrator for Hadoop and Oracle Loader for Hadoop which helps in the ability to move, transform, load data to and from Big Data Appliance to Exadata. This year, we’ve added new capabilities to find, filter, and focus data using Oracle Event Processing. This product can natively integrate with Big Data Appliance or runs standalone. Hasan briefly discussed how NTT Docomo, largest mobile operator in Japan, leverages Oracle Event Processing & Oracle Coherence to process mobile data (from 13 million smartphone users) at a speed of 700K events per second before feeding it Hadoop for distributed processing of big data. Figure 5: Mobile traffic data processing at NTT Docomo with Oracle Event Processing & Oracle Coherence

Read the article

14+ WordPress Portfolio Themes

- by Edward

There are various portfolio themes for WordPress out there, with this collection we are trying to help you choose the best one. These themes can be used to create any type of personal, photography, art or corporate portfolio. Display 3 in 1 Display 3 in 1 – Business & Portfolio WordPress Theme. Features a fantastic 3D Image slideshow that can be controlled from your backend with a custom tool. The Theme has a huge wordpress custom backend (8 additional Admin Pages) that make customization of the Theme easy for those who dont know much about coding or wordpress. Price: $40 View Demo Download DeepFocus Tempting features such as automatic separation of blog and portfolio content by template, publishing of most important information on homepage, styles to choose from and many more such features. It also provides for page templates for blog, portfolio, blog archive, tags etc. It has the best feature that helps you to manage everything from one place. Price: $39 (Package includes more than 55 themes) View Demo Download SimplePress Simple, yet awesome. One of the best portfolio theme. Price: $39 (Package includes more than 55 themes) View Demo Download Graphix Graphix is one of best word press portfolio themes. It is most suited to aspiring designers, developers, artists and photographers who’d like a framework theme, which has a great-looking portfolio with a feature-rich blog. It has theme option page, 5-color style, SEO option, featured content blocks, drop down multi-level menu, social profile link custom widgets, custom post, custom page template etc. Price: $69 Single & $149 Developer Package View Demo Download Bizznizz It boasts of many features such as custom homepage, custom post types, custom widgets, portfolio templates, alternative styles and many more. View Demo Download Showtime Ultimate WordPress Theme for you to create your web portfolio, It has 3 different styles for you to choose from. Price: $40 View Demo Download Montana WP Horizontal Portfolio Theme Montana Theme – WP Horizontal Portfolio Theme, best suited for creative studios to showcase design, photography, illustration, paintings and art. Price: $30 View Demo Download OverALL OverALL Premium WordPress Blog & Portfolio Theme, is low priced & has amazing tons of features. Price: $17 View Demo Download Habitat Habitat – Blog and Portfolio Theme. Unique Portfolio Sorting/Filtering with a custom jQuery script (each entry supports multiple images or a video) Multiple Featured Images for each post to generate individual Slideshows per Post, or the option to directly embed video content from youtube, vimeo, hulu etc. Price: $35 View Demo Download Fresh Folio Fresh Folio from WooThemes, can be used as both portfolio and a premium WordPress theme. The theme is a remix of the Fresh News Theme and Proud Folio Theme which combines all the best elements of the respective blog and portfolio style themes. View Demo Download Fresh Folio Features: Can be used to create an impressive portfolio. 7 diverse theme styles to choose from (default, blue, red, grunge light, grunge floral, antique, blue creamer, nightlife) The template will automatically (visually) separate your blog & portfolio content, making this an amazing theme for aspiring designers, developers, artists, photographers etc. Unique page templates types for the portfolio, blog, blog archives, tags & search results. Integrated Theme Options (for WordPress) to tweak the layout, colour scheme etc. for the theme Optional Automatic Image Resize, which is used to dynamically create the thumbnails and featured images Includes Widget enabled Sidebars. eGallery eGallery is a theme made to transform your wordpress blog into a fully functional online portfolio. Theme is perfectly designed to emphasize the artwork you choose to showcase. The design has been greatly enhanced using javascript, and is easy to implement. Price: $39 (Package includes more than 55 themes) View Demo Download ProudFolio ProudFolio is a portfolio premium WordPress theme from Woo Themes. The theme is for designers, developers, artists and photographers who would like a showcase theme which would depict as a portfolio and also serves a purpose of blog. ProudFolio puts a strong emphasis on the portfolio pieces, allowing for decent-sized thumbnails, huge fullscreen views via Lightbox, and full details on the single page. The theme file also contains a choice of three different background images and color schemes. Price: $70 Single $150 Developer License View Demo Download Features: The template will automatically (visually) separate your blog & portfolio content. An unique homepage layout, which publishes only the most important information; Unique page templates for the portfolio, blog, blog archives, tags & search results. Integrated Theme Options (for WordPress) to tweak the layout, colour scheme etc. for the theme; Built-in video panel, which you can use to publish any web-based Flash videos; Automatic Image Resize, which is used to dynamically create the thumbnails and featured images; Custom Page Templates for Archives, Sitemap & Image Gallery; Built-in Gravatar Support for Authors & Comments; Integrated Banner Management script to display randomized banner ads of your choice site-wide; Pretty drop down navigation everywhere; and Widget Enabled Sidebars. Porftolio WordPress Theme A FREE wordpress theme designed for web portfolios and (for now) just for web portfolios. It is coming with an Administrative Panel from where you can edit the head quote text, you can edit all theme colors, font families, font sizes and you can fill a curriculum vitae and display it into a special page. Theme demo and download can be found here Viz | Biz Viz | Biz is a premium WordPress photo gallery and portfolio theme designed specifically for photographers, graphic designers and web designers who want to display their creative work online, market their services, as well as have a typical text blog, using the power and flexibility of WordPress. It is priced for $79.95. Theme Features: Premium quality portfolio template Custom logo uploader to replace the standard graphic with your own unique look from the WP Dashboard Integrated blog component (front images are custom fields and thumbnails, but you can also have a typical blog) Four tabbed feature areas (About Me, Services, Recent Posts, and Tags) Two home page feature photos (You choose which photos to feature using a WP category) Manage your online portfolio through the WordPress CMS Crop two sizes of your work: One for the front page thumbnails and another full size version and upload to WP Search engine optimized. Related posts:14 WordPress Photo Blog & Portfolio Themes 6 PhotoBlog Portfolio WordPress Themes Professional WordPress Business Themes

Read the article

Microsoft Business Intelligence Seminar 2011

- by DavidWimbush

I was lucky enough to attend the maiden presentation of this at Microsoft Reading yesterday. It was pretty gripping stuff not only because of what was said but also because of what could only be hinted at. Here's what I took away from the day. (Disclaimer: I'm not a BI guru, just a reasonably experienced BI developer, so I may have misunderstood or misinterpreted a few things. Particularly when so much of the talk was about the vision and subtle hints of what is coming. Please comment if you think I've got anything wrong. I'm also not going to even try to cover Master Data Services as I struggled to imagine how you would actually use it.) I was a bit worried when I learned that the whole day was going to be presented by one guy but Rafal Lukawiecki is a very engaging speaker. He's going to be presenting this about 20 times around the world over the coming months. If you get a chance to hear him speak, I say go for it. No doubt some of the hints will become clearer as Denali gets closer to RTM. Firstly, things are definitely happening in the SQL Server Reporting and BI world. Traditionally IT would build a data warehouse, then cubes on top of that, and then publish them in a structured and controlled way. But, just as with many IT projects in general, by the time it's finished the business has moved on and the system no longer meets their requirements. This not sustainable and something more agile is needed but there has to be some control. Apparently we're going to be hearing the catchphrase 'Balancing agility with control' a lot. More users want more access to more data. Can they define what they want? Of course not, but they'll recognise it when they see it. It's estimated that only 28% of potential BI users have meaningful access to the data they need, so there is a real pent-up demand. The answer looks like: give them some self-service tools so they can experiment and see what works, and then IT can help to support the results. It's estimated that 32% of Excel users are comfortable with its analysis tools such as pivot tables. It's the power user's preferred tool. Why fight it? That's why PowerPivot is an Excel add-in and that's why they released a Data Mining add-in for it as well. It does appear that the strategy is going to be to use Reporting Services (in SharePoint mode), PowerPivot, and possibly something new (smiles and hints but no details) to create reports and explore data. Everything will be published and managed in SharePoint which gives users the ability to mash-up, share and socialise what they've found out. SharePoint also gives IT tools to understand what people are looking at and where to concentrate effort. If PowerPivot report X becomes widely used, it's time to check that it shows what they think it does and perhaps get it a bit more under central control. There was more SharePoint detail that went slightly over my head regarding where Excel Services and Excel Web Application fit in, the differences between them, and the suggestion that it is likely they will one day become one (but not in the immediate future). That basic pattern is set to be expanded upon by further exploiting Vertipaq (the columnar indexing engine that enables PowerPivot to store and process a lot of data fast and in a small memory footprint) to provide scalability 'from the desktop to the data centre', and some yet to be detailed advances in 'frictionless deployment' (part of which is about making the difference between local and the cloud pretty much irrelevant). Excel looks like becoming Microsoft's primary BI client. It already has: the ability to consume cubes strong visualisation tools slicers (which are part of Excel not PowerPivot) a data mining add-in PowerPivot A major hurdle for self-service BI is presenting the data in a consumable format. You can't just give users PowerPivot and a server with a copy of the OLTP database(s). Building cubes is labour intensive and doesn't always give the user what they need. This is where the BI Semantic Model (BISM) comes in. I gather it's a layer of metadata you define that can combine multiple data sources (and types of data source) into a clear 'interface' that users can work with. It comes with a new query language called DAX. SSAS cubes are unlikely to go away overnight because, with their pre-calculated results, they are still the most efficient way to work with really big data sets. A few other random titbits that came up: Reporting Services is going to get some good new stuff in Denali. Keep an eye on www.projectbotticelli.com for the slides. You can also view last year's seminar sessions which covered a lot of the same ground as far as the overall strategy is concerned. They plan to add more material as Denali's features are publicly exposed. Check out the PASS keynote address for a showing of Yahoo's SQL BI servers. Apparently they wheeled the rack out on stage still plugged in and running! Check out the Excel 2010 Data Mining Add-Ins. 32 bit only at present but 64 bit is on the way. There are lots of data sets, many of them free, at the Windows Azure Marketplace Data Market (where you can also get ESRI shape files). If you haven't already seen it, have a look at the Silverlight Pivot Viewer (http://weblogs.asp.net/scottgu/archive/2010/06/29/silverlight-pivotviewer-now-available.aspx). The Bing Maps Data Connector is worth a look if you're into spatial stuff (http://www.bing.com/community/site_blogs/b/maps/archive/2010/07/13/data-connector-sql-server-2008-spatial-amp-bing-maps.aspx).

Read the article

Dell Studio 1737 Overheating

- by Sean

I am using a Dell Studio 1737 laptop. I have been running Linux and have ran Windows recently for a very long time. I upgraded to the 10.10 distribution and since that distro, it seems that for some reason all Linuxes want to push my laptop to extremes. I have recently upgraded to Ubuntu 12.04 since I heart that it contains kernel fixes for overheating issues. 12.04 will actually eventually cool the system, but that is after the fans run to the point it sounds like a jet aircraft taking off and the laptop makes my hands sweat. In trying to combat the heat problems I have done the following: I installed the propriatery driver for my ATI Mobility HD 3600. I have tried both the one in the Additional Drivers and also tried ATI's latest greatest version. If I don't install this my laptop will overheat and shut off in minutes. Both seem to perform similarly, but the heat problem remains. I have tried limiting the CPU by installing the CPUFreq Indicator. This does help keep the machine from shutting off, but the heat is still uncomfortable to be around the machine. I usually run in power saver mode or run the cpu at 1.6 GHZ just to error on safety. I ran sensors-detect and here are the results: sean@sean-Studio-1737:~$ sudo sensors-detect # sensors-detect revision 5984 (2011-07-10 21:22:53 +0200) # System: Dell Inc. Studio 1737 (laptop) # Board: Dell Inc. 0F237N This program will help you determine which kernel modules you need to load to use lm_sensors most effectively. It is generally safe and recommended to accept the default answers to all questions, unless you know what you're doing. Some south bridges, CPUs or memory controllers contain embedded sensors. Do you want to scan for them? This is totally safe. (YES/no): y Module cpuid loaded successfully. Silicon Integrated Systems SIS5595... No VIA VT82C686 Integrated Sensors... No VIA VT8231 Integrated Sensors... No AMD K8 thermal sensors... No AMD Family 10h thermal sensors... No AMD Family 11h thermal sensors... No AMD Family 12h and 14h thermal sensors... No AMD Family 15h thermal sensors... No AMD Family 15h power sensors... No Intel digital thermal sensor... Success! (driver `coretemp') Intel AMB FB-DIMM thermal sensor... No VIA C7 thermal sensor... No VIA Nano thermal sensor... No Some Super I/O chips contain embedded sensors. We have to write to standard I/O ports to probe them. This is usually safe. Do you want to scan for Super I/O sensors? (YES/no): y Probing for Super-I/O at 0x2e/0x2f Trying family `National Semiconductor/ITE'... No Trying family `SMSC'... No Trying family `VIA/Winbond/Nuvoton/Fintek'... No Trying family `ITE'... No Probing for Super-I/O at 0x4e/0x4f Trying family `National Semiconductor/ITE'... Yes Found `ITE IT8512E/F/G Super IO' (but not activated) Some hardware monitoring chips are accessible through the ISA I/O ports. We have to write to arbitrary I/O ports to probe them. This is usually safe though. Yes, you do have ISA I/O ports even if you do not have any ISA slots! Do you want to scan the ISA I/O ports? (YES/no): y Probing for `National Semiconductor LM78' at 0x290... No Probing for `National Semiconductor LM79' at 0x290... No Probing for `Winbond W83781D' at 0x290... No Probing for `Winbond W83782D' at 0x290... No Lastly, we can probe the I2C/SMBus adapters for connected hardware monitoring devices. This is the most risky part, and while it works reasonably well on most systems, it has been reported to cause trouble on some systems. Do you want to probe the I2C/SMBus adapters now? (YES/no): y Using driver `i2c-i801' for device 0000:00:1f.3: Intel ICH9 Module i2c-i801 loaded successfully. Module i2c-dev loaded successfully. Now follows a summary of the probes I have just done. Just press ENTER to continue: Driver `coretemp': * Chip `Intel digital thermal sensor' (confidence: 9) To load everything that is needed, add this to /etc/modules: #----cut here---- # Chip drivers coretemp #----cut here---- If you have some drivers built into your kernel, the list above will contain too many modules. Skip the appropriate ones! Do you want to add these lines automatically to /etc/modules? (yes/NO)y Successful! Monitoring programs won't work until the needed modules are loaded. You may want to run 'service module-init-tools start' to load them. Unloading i2c-dev... OK Unloading i2c-i801... OK Unloading cpuid... OK sean@sean-Studio-1737:~$ sudo service module-init-tools start module-init-tools stop/waiting I also tried installing i8k but that didn't work since it didn't seem to be able to communicate with the hardware (probably for different kind of device). Also I ran acpi -V and here are the results: Battery 0: Full, 100% Battery 0: design capacity 613 mAh, last full capacity 260 mAh = 42% Adapter 0: on-line Thermal 0: ok, 49.0 degrees C Thermal 0: trip point 0 switches to mode critical at temperature 100.0 degrees C Thermal 1: ok, 48.0 degrees C Thermal 1: trip point 0 switches to mode critical at temperature 100.0 degrees C Thermal 2: ok, 51.0 degrees C Thermal 2: trip point 0 switches to mode critical at temperature 100.0 degrees C Cooling 0: LCD 0 of 15 Cooling 1: Processor 0 of 10 Cooling 2: Processor 0 of 10 I have hit a wall and don't know what to do now. Any advice is appreciated.

Read the article

SQL SERVER – SSIS Look Up Component – Cache Mode – Notes from the Field #028

- by Pinal Dave

[Notes from Pinal]: Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems. Similarly most of the people think lookup component is just component which does look up for additional information and does not pay much attention to it. Due to the same reason they do not pay attention to the same and eventually get very bad performance. Linchpin People are database coaches and wellness experts for a data driven world. In this 28th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to write a good lookup component with Cache Mode. In SQL Server Integration Services, the lookup component is one of the most frequently used tools for data validation and completion. The lookup component is provided as a means to virtually join one set of data to another to validate and/or retrieve missing values. Properly configured, it is reliable and reasonably fast. Among the many settings available on the lookup component, one of the most critical is the cache mode. This selection will determine whether and how the distinct lookup values are cached during package execution. It is critical to know how cache modes affect the result of the lookup and the performance of the package, as choosing the wrong setting can lead to poorly performing packages, and in some cases, incorrect results. Full Cache The full cache mode setting is the default cache mode selection in the SSIS lookup transformation. Like the name implies, full cache mode will cause the lookup transformation to retrieve and store in SSIS cache the entire set of data from the specified lookup location. As a result, the data flow in which the lookup transformation resides will not start processing any data buffers until all of the rows from the lookup query have been cached in SSIS. The most commonly used cache mode is the full cache setting, and for good reason. The full cache setting has the most practical applications, and should be considered the go-to cache setting when dealing with an untested set of data. With a moderately sized set of reference data, a lookup transformation using full cache mode usually performs well. Full cache mode does not require multiple round trips to the database, since the entire reference result set is cached prior to data flow execution. There are a few potential gotchas to be aware of when using full cache mode. First, you can see some performance issues – memory pressure in particular – when using full cache mode against large sets of reference data. If the table you use for the lookup is very large (either deep or wide, or perhaps both), there’s going to be a performance cost associated with retrieving and caching all of that data. Also, keep in mind that when doing a lookup on character data, full cache mode will always do a case-sensitive (and in some cases, space-sensitive) string comparison even if your database is set to a case-insensitive collation. This is because the in-memory lookup uses a .NET string comparison (which is case- and space-sensitive) as opposed to a database string comparison (which may be case sensitive, depending on collation). There’s a relatively easy workaround in which you can use the UPPER() or LOWER() function in the pipeline data and the reference data to ensure that case differences do not impact the success of your lookup operation. Again, neither of these present a reason to avoid full cache mode, but should be used to determine whether full cache mode should be used in a given situation. Full cache mode is ideally useful when one or all of the following conditions exist: The size of the reference data set is small to moderately sized The size of the pipeline data set (the data you are comparing to the lookup table) is large, is unknown at design time, or is unpredictable Each distinct key value(s) in the pipeline data set is expected to be found multiple times in that set of data Partial Cache When using the partial cache setting, lookup values will still be cached, but only as each distinct value is encountered in the data flow. Initially, each distinct value will be retrieved individually from the specified source, and then cached. To be clear, this is a row-by-row lookup for each distinct key value(s). This is a less frequently used cache setting because it addresses a narrower set of scenarios. Because each distinct key value(s) combination requires a relational round trip to the lookup source, performance can be an issue, especially with a large pipeline data set to be compared to the lookup data set. If you have, for example, a million records from your pipeline data source, you have the potential for doing a million lookup queries against your lookup data source (depending on the number of distinct values in the key column(s)). Therefore, one has to be keenly aware of the expected row count and value distribution of the pipeline data to safely use partial cache mode. Using partial cache mode is ideally suited for the conditions below: The size of the data in the pipeline (more specifically, the number of distinct key column) is relatively small The size of the lookup data is too large to effectively store in cache The lookup source is well indexed to allow for fast retrieval of row-by-row values No Cache As you might guess, selecting no cache mode will not add any values to the lookup cache in SSIS. As a result, every single row in the pipeline data set will require a query against the lookup source. Since no data is cached, it is possible to save a small amount of overhead in SSIS memory in cases where key values are not reused. In the real world, I don’t see a lot of use of the no cache setting, but I can imagine some edge cases where it might be useful. As such, it’s critical to know your data before choosing this option. Obviously, performance will be an issue with anything other than small sets of data, as the no cache setting requires row-by-row processing of all of the data in the pipeline. I would recommend considering the no cache mode only when all of the below conditions are true: The reference data set is too large to reasonably be loaded into SSIS memory The pipeline data set is small and is not expected to grow There are expected to be very few or no duplicates of the key values(s) in the pipeline data set (i.e., there would be no benefit from caching these values) Conclusion The cache mode, an often-overlooked setting on the SSIS lookup component, represents an important design decision in your SSIS data flow. Choosing the right lookup cache mode directly impacts the fidelity of your results and the performance of package execution. Know how this selection impacts your ETL loads, and you’ll end up with more reliable, faster packages. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SSIS

Read the article

The Birth of a Method - Where did OUM come from?

- by user702549

It seemed fitting to start this blog entry with the OUM vision statement. The vision for the Oracle® Unified Method (OUM) is to support the entire Enterprise IT lifecycle, including support for the successful implementation of every Oracle product. Well, it’s that time of year again; we just finished testing and packaging OUM 5.6. It will be released for general availability to qualifying customers and partners this month. Because of this, I’ve been reflecting back on how the birth of Oracle’s Unified method - OUM came about. As the Release Director of OUM, I’ve been honored to package every method release. No, maybe you’d say it’s not so special. Of course, anyone can use packaging software to create an .exe file. But to me, it is pretty special, because so many people work together to make each release come about. The rich content that results is what makes OUM’s history worth talking about. To me, professionally speaking, working on OUM, well it’s been “a labor of love”. My youngest child was just 8 years old when OUM was born, and she’s now in High School! Watching her grow and change has been fascinating, if you ask her, she’s grown up hearing about OUM. My son would often walk into my home office and ask “How is OUM today, Mom?” I am one of many people that take care of OUM, and have watched the method “mature” over these last 6 years. Maybe that makes me a "Method Mom" (someone in one of my classes last year actually said this outloud) but there are so many others who collaborate and care about OUM Development. I’ve thought about writing this blog entry for a long time just to reflect on how far the Method has come. Each release, as I prepare the OUM Contributors list, I see how many people’s experience and ideas it has taken to create this wealth of knowledge, process and task guidance as well as templates and examples. If you’re wondering how many people, just go into OUM select the resources button on the top of most pages of the method, and on that resources page click the ABOUT link. So now back to my nostalgic moment as I finished release 5.6 packaging. I reflected back, on all the things that happened that cause OUM to become not just a dream but to actually come to fruition. Here are some key conditions that make it possible for each release of the method: A vision to have one method instead of many methods, thereby focusing on deeper, richer content People within Oracle’s consulting Organization willing to contribute to OUM providing Subject Matter Experts who are willing to write down and share what they know. Oracle’s continued acquisition of software companies, the need to assimilate high quality existing materials from these companies The need to bring together people from very different backgrounds and provide a common language to support Oracle Product implementations that often involve multiple product families What came first, and then what was the strategy? Initially OUM 4.0 was based on Oracle’s J2EE Custom Development Method (JCDM), it was a good “backbone” (work breakdown structure) it was Unified Process based, and had good content around UML as well as custom software development. But it needed to be extended in order to achieve the OUM Vision. What happened after that was to take in the “best of the best”, the legacy and acquired methods were scheduled for assimilation into OUM, one release after another. We incrementally built OUM. We didn’t want to lose any of the expertise that was reflected in AIM (Oracle’s legacy Application Implementation Method), Compass (People Soft’s Application implementation method) and so many more. When was OUM born? OUM 4.1 published April 30, 2006. This release allowed Oracles Advanced Technology groups to begin the very first implementations of Fusion Middleware. In the early days of the Method we would prepare several releases a year. Our iterative release development cycle began and continues to be refined with each Method release. Now we typically see one major release each year. The OUM release development cycle is not unlike many Oracle Implementation projects in that we need to gather requirements, prioritize, prepare the content, test package and then go production. Typically we develop an OUM release MoSCoW (must have, should have, could have, and won’t have) right after the prior release goes out. These are the high level requirements. We break the timeframe into increments, frequent checkpoints that help us assess the content and progress is measured through frequent checkpoints. We work as a team to prioritize what should be done in each increment. Yes, the team provides the estimates for what can be done within a particular increment. We sometimes have Method Development workshops (physically or virtually) to accelerate content development on a particular subject area, that is where the best content results. As the written content nears the final stages, it goes through edit and evaluation through peer reviews, and then moves into the release staging environment. Then content freeze and testing of the method pack take place. This iterative cycle is run using the OUM artifacts that make sense “fit for purpose”, project plans, MoSCoW lists, Test plans are just a few of the OUM work products we use on a Method Release project. In 2007 OUM 4.3, 4.4 and 4.5 were published. With the release of 4.5 our Custom BI Method (Data Warehouse Method FastTrack) was assimilated into OUM. These early releases helped us align Oracle’s Unified method with other industry standards Then in 2008 we made significant changes to the OUM “Backbone” to support Applications Implementation projects with that went to the OUM 5.0 release. Now things started to get really interesting. Next we had some major developments in the Envision focus area in the area of Enterprise Architecture. We acquired some really great content from the former BEA, Liquid Enterprise Method (LEM) along with some SMEs who were willing to work at bringing this content into OUM. The Service Oriented Architecture content in OUM is extensive and can help support the successful implementation of Fusion Middleware, as well as Fusion Applications. Of course we’ve developed a wealth of OUM training materials that work also helps to improve the method content. It is one thing to write “how to”, and quite another to be able to teach people how to use the materials to improve the success of their projects. I’ve learned so much by teaching people how to use OUM. What's next? So here toward the end of 2012, what’s in store in OUM 5.6, well, I’m sure you won’t be surprised the answer is Cloud Computing. More details to come in the next couple of weeks! The best part of being involved in the development of OUM is to see how many people have “adopted” OUM over these six years, Clients, Partners, and Oracle Consultants. The content just gets better with each release. I’d love to hear your comments on how OUM has evolved, and ideas for new content you’d like to see in the upcoming releases.

Read the article

Das T5-4 TPC-H Ergebnis naeher betrachtet

- by Stefan Hinker

Inzwischen haben vermutlich viele das neue TPC-H Ergebnis der SPARC T5-4 gesehen, das am 7. Juni bei der TPC eingereicht wurde. Die wesentlichen Punkte dieses Benchmarks wurden wie gewohnt bereits von unserer Benchmark-Truppe auf "BestPerf" zusammengefasst. Es gibt aber noch einiges mehr, das eine naehere Betrachtung lohnt. Skalierbarkeit Das TPC raet von einem Vergleich von TPC-H Ergebnissen in unterschiedlichen Groessenklassen ab. Aber auch innerhalb der 3000GB-Klasse ist es interessant: SPARC T4-4 mit 4 CPUs (32 Cores mit 3.0 GHz) liefert 205,792 QphH. SPARC T5-4 mit 4 CPUs (64 Cores mit 3.6 GHz) liefert 409,721 QphH. Das heisst, es fehlen lediglich 1863 QphH oder 0.45% zu 100% Skalierbarkeit, wenn man davon ausgeht, dass die doppelte Anzahl Kerne das doppelte Ergebnis liefern sollte. Etwas anspruchsvoller, koennte man natuerlich auch einen Faktor von 2.4 erwarten, wenn man die hoehere Taktrate mit beruecksichtigt. Das wuerde die Latte auf 493901 QphH legen. Dann waere die SPARC T5-4 bei 83%. Damit stellt sich die Frage: Was hat hier nicht skaliert? Vermutlich der Plattenspeicher! Auch hier lohnt sich eine naehere Betrachtung: Plattenspeicher Im Bericht auf BestPerf und auch im Full Disclosure Report der TPC stehen einige interessante Details zum Plattenspeicher und der Konfiguration. In der Konfiguration der SPARC T4-4 wurden 12 2540-M2 Arrays verwendet, die jeweils ca. 1.5 GB/s Durchsatz liefert, insgesamt also eta 18 GB/s. Dabei waren die Arrays offensichtlich mit jeweils 2 Kabeln pro Array direkt an die 24 8GBit FC-Ports des Servers angeschlossen. Mit den 2x 8GBit Ports pro Array koennte man so ein theoretisches Maximum von 2GB/s erreichen. Tatsaechlich wurden 1.5GB/s geliefert, was so ziemlich dem realistischen Maximum entsprechen duerfte. Fuer den Lauf mit der SPARC T5-4 wurden doppelt so viele Platten verwendet. Dafuer wurden die 2540-M2 Arrays mit je einem zusaetzlichen Plattentray erweitert. Mit dieser Konfiguration wurde dann (laut BestPerf) ein Maximaldurchsatz von 33 GB/s erreicht - nicht ganz das doppelte des SPARC T4-4 Laufs. Um tatsaechlich den doppelten Durchsatz (36 GB/s) zu liefern, haette jedes der 12 Arrays 3 GB/s ueber seine 4 8GBit Ports liefern muessen. Im FDR stehen nur 12 dual-port FC HBAs, was die Verwendung der Brocade FC Switches erklaert: Es wurden alle 4 8GBit ports jedes Arrays an die Switches angeschlossen, die die Datenstroeme dann in die 24 16GBit HBA ports des Servers buendelten. Das theoretische Maximum jedes Storage-Arrays waere nun 4 GB/s. Wenn man jedoch den Protokoll- und "Realitaets"-Overhead mit einrechnet, sind die tatsaechlich gelieferten 2.75 GB/s gar nicht schlecht. Mit diesen Zahlen im Hinterkopf ist die Verdopplung des SPARC T4-4 Ergebnisses eine gute Leistung - und gleichzeitig eine gute Erklaerung, warum nicht bis zum 2.4-fachen skaliert wurde. Nebenbei bemerkt: Weder die SPARC T4-4 noch die SPARC T5-4 hatten in der gemessenen Konfiguration irgendwelche Flash-Devices. Mitbewerb Seit die T4 Systeme auf dem Markt sind, bemuehen sich unsere Mitbewerber redlich darum, ueberall den Eindruck zu hinterlassen, die Leistung des SPARC CPU-Kerns waere weiterhin mangelhaft. Auch scheinen sie ueberzeugt zu sein, dass (ueber)grosse Caches und hohe Taktraten die einzigen Schluessel zu echter Server Performance seien. Wenn ich mir nun jedoch die oeffentlichen TPC-H Ergebnisse ansehe, sehe ich dies: TPC-H @3000GB, Non-Clustered Systems System QphH SPARC T5-4 3.6 GHz SPARC T5 4/64 – 2048 GB 409,721.8 SPARC T4-4 3.0 GHz SPARC T4 4/32 – 1024 GB 205,792.0 IBM Power 780 4.1 GHz POWER7 8/32 – 1024 GB 192,001.1 HP ProLiant DL980 G7 2.27 GHz Intel Xeon X7560 8/64 – 512 GB 162,601.7 Kurz zusammengefasst: Mit 32 Kernen (mit 3 GHz und 4MB L3 Cache), liefert die SPARC T4-4 mehr QphH@3000GB ab als IBM mit ihrer 32 Kern Power7 (bei 4.1 GHz und 32MB L3 Cache) und auch mehr als HP mit einem 64 Kern Intel Xeon System (2.27 GHz und 24MB L3 Cache). Ich frage mich, wo genau SPARC hier mangelhaft ist? Nun koennte man natuerlich argumentieren, dass beide Ergebnisse nicht gerade neu sind. Nun, in Ermangelung neuerer Ergebnisse kann man ja mal ein wenig spekulieren: IBMs aktueller Performance Report listet die o.g. IBM Power 780 mit einem rPerf Wert von 425.5. Ein passendes Nachfolgesystem mit Power7+ CPUs waere die Power 780+ mit 64 Kernen, verfuegbar mit 3.72 GHz. Sie wird mit einem rPerf Wert von 690.1 angegeben, also 1.62x mehr. Wenn man also annimmt, dass Plattenspeicher nicht der limitierende Faktor ist (IBM hat mit 177 SSDs getestet, sie duerfen das gerne auf 400 erhoehen) und IBMs eigene Leistungsabschaetzung zugrunde legt, darf man ein theoretisches Ergebnis von 311398 QphH@3000GB erwarten. Das waere dann allerdings immer noch weit von dem Ergebnis der SPARC T5-4 entfernt, und gerade in der von IBM so geschaetzen "per core" Metric noch weniger vorteilhaft. In der x86-Welt sieht es nicht besser aus. Leider gibt es von Intel keine so praktischen rPerf-Tabellen. Daher muss ich hier fuer eine Schaetzung auf SPECint_rate2006 zurueckgreifen. (Ich bin kein grosser Fan von solchen Kreuz- und Querschaetzungen. Insb. SPECcpu ist nicht besonders geeignet, um Datenbank-Leistung abzuschaetzen, da fast kein IO im Spiel ist.) Das o.g. HP System wird bei SPEC mit 1580 CINT2006_rate gelistet. Das bis einschl. 2013-06-14 beste Resultat fuer den neuen Intel Xeon E7-4870 mit 8 CPUs ist 2180 CINT2006_rate. Das ist immerhin 1.38x besser. (Wenn man nur die Taktrate beruecksichtigen wuerde, waere man bei 1.32x.) Hier weiter zu rechnen, ist muessig, aber fuer die ungeduldigen Leser hier eine kleine tabellarische Zusammenfassung: TPC-H @3000GB Performance Spekulationen System QphH* Verbesserung gegenueber der frueheren Generation SPARC T4-4 32 cores SPARC T4 205,792 2x SPARC T5-464 cores SPARC T5 409,721 IBM Power 780 32 cores Power7 192,001 1.62x IBM Power 780+ 64 cores Power7+ 311,398* HP ProLiant DL980 G764 cores Intel Xeon X7560 162,601 1.38x HP ProLiant DL980 G780 cores Intel Xeon E7-4870 224,348* * Keine echten Resultate - spekulative Werte auf der Grundlage von rPerf (Power7+) oder SPECint_rate2006 (HP) Natuerlich sind IBM oder HP herzlich eingeladen, diese Werte zu widerlegen. Aber stand heute warte ich noch auf aktuelle Benchmark Veroffentlichungen in diesem Datensegment. Was koennen wir also zusammenfassen? Es gibt einige Hinweise, dass der Plattenspeicher der begrenzende Faktor war, der die SPARC T5-4 daran hinderte, auf jenseits von 2x zu skalieren Der Mythos, dass SPARC Kerne keine Leistung bringen, ist genau das - ein Mythos. Wie sieht es umgekehrt eigentlich mit einem TPC-H Ergebnis fuer die Power7+ aus? Cache ist nicht der magische Performance-Schalter, fuer den ihn manche Leute offenbar halten. Ein System, eine CPU-Architektur und ein Betriebsystem jenseits einer gewissen Grenze zu skalieren ist schwer. In der x86-Welt scheint es noch ein wenig schwerer zu sein. Was fehlt? Nun, das Thema Preis/Leistung ueberlasse ich gerne den Verkaeufern ;-) Und zu guter Letzt: Nein, ich habe mich nicht ins Marketing versetzen lassen. Aber manchmal kann ich mich einfach nicht zurueckhalten... Disclosure Statements The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org, results as of 6/7/13. Prices are in USD. SPARC T5-4 409,721.8 QphH@3000GB, $3.94/QphH@3000GB, available 9/24/13, 4 processors, 64 cores, 512 threads; SPARC T4-4 205,792.0 QphH@3000GB, $4.10/QphH@3000GB, available 5/31/12, 4 processors, 32 cores, 256 threads; IBM Power 780 QphH@3000GB, 192,001.1 QphH@3000GB, $6.37/QphH@3000GB, available 11/30/11, 8 processors, 32 cores, 128 threads; HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB available 10/13/10, 8 processors, 64 cores, 128 threads. SPEC and the benchmark names SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. Results as of June 18, 2013 from www.spec.org. HP ProLiant DL980 G7 (2.27 GHz, Intel Xeon X7560): 1580 SPECint_rate2006; HP ProLiant DL980 G7 (2.4 GHz, Intel Xeon E7-4870): 2180 SPECint_rate2006,

Read the article

Elegance, thy Name is jQuery

- by SGWellens

So, I'm browsing though some questions over on the Stack Overflow website and I found a good jQuery question just a few minutes old. Here is a link to it. It was a tough question; I knew that by answering it, I could learn new stuff and reinforce what I already knew: Reading is good, doing is better. Maybe I could help someone in the process too. I cut and pasted the HTML from the question into my Visual Studio IDE and went back to Stack Overflow to reread the question. Dang, someone had already answered it! And it was a great answer. I never even had a chance to start analyzing the issue. Now I know what a one-legged man feels like in an ass-kicking contest. Nevertheless, since the question and answer were so interesting, I decided to dissect them and learn as much as possible. The HTML consisted of some divs separated by h3 headings. Note the elements are laid out sequentially with no programmatic grouping: <h3 class="heading">Heading 1</h3> <div>Content</div> <div>More content</div> <div>Even more content</div><h3 class="heading">Heading 2</h3> <div>some content</div> <div>some more content</div><h3 class="heading">Heading 3</h3> <div>other content</div></form></body> The requirement was to wrap a div around each h3 heading and the subsequent divs grouping them into sections. Why? I don't know, I suppose if you screen-scrapped some HTML from another site, you might want to reformat it before displaying it on your own. Anyways… Here is the marvelously, succinct posted answer: $('.heading').each(function(){ $(this).nextUntil('.heading').andSelf().wrapAll('<div class="section">');}); I was familiar with all the parts except for nextUntil and andSelf. But, I'll analyze the whole answer for completeness. I'll do this by rewriting the posted answer in a different style and adding a boat-load of comments: function Test(){ // $Sections is a jQuery object and it will contain three elements var $Sections = $('.heading'); // use each to iterate over each of the three elements $Sections.each(function () { // $this is a jquery object containing the current element // being iterated var $this = $(this); // nextUntil gets the following sibling elements until it reaches // an element with the CSS class 'heading' // andSelf adds in the source element (this) to the collection $this = $this.nextUntil('.heading').andSelf(); // wrap the elements with a div $this.wrapAll('<div class="section" >'); });} The code here doesn't look nearly as concise and elegant as the original answer. However, unless you and your staff are jQuery masters, during development it really helps to work through algorithms step by step. You can step through this code in the debugger and examine the jQuery objects to make sure one step is working before proceeding on to the next. It's much easier to debug and troubleshoot when each logical coding step is a separate line of code. Note: You may think the original code runs much faster than this version. However, the time difference is trivial: Not enough to worry about: Less than 1 millisecond (tested in IE and FF). Note: You may want to jam everything into one line because it results in less traffic being sent to the client. That is true. However, most Internet servers now compress HTML and JavaScript by stripping out comments and white space (go to Bing or Google and view the source). This feature should be enabled on your server: Let the server compress your code, you don't need to do it. Free Career Advice: Creating maintainable code is Job One—Maximum Priority—The Prime Directive. If you find yourself suddenly transferred to customer support, it may be that the code you are writing is not as readable as it could be and not as readable as it should be. Moving on… I created a CSS class to enhance the results: .section{ background-color: yellow; border: 2px solid black; margin: 5px;} Here is the rendered output before: …and after the jQuery code runs. Pretty Cool! But, while playing with this code, the logic of nextUntil began to bother me: What happens in the last section? What stops elements from being collected since there are no more elements with the .heading class? The answer is nothing. In this case it stopped collecting elements because it was at the end of the page. But what if there were additional HTML elements? I added an anchor tag and another div to the HTML: <h3 class="heading">Heading 1</h3> <div>Content</div> <div>More content</div> <div>Even more content</div><h3 class="heading">Heading 2</h3> <div>some content</div> <div>some more content</div><h3 class="heading">Heading 3</h3> <div>other content</div><a>this is a link</a><div>unrelated div</div> </form></body> The code as-is will include both the anchor and the unrelated div. This isn't what we want. My first attempt to correct this used the filter parameter of the nextUntil function: nextUntil('.heading', 'div') This will only collect div elements. But it merely skipped the anchor tag and it still collected the unrelated div: The problem is we need a way to tell the nextUntil function when to stop. CSS selectors to the rescue! nextUntil('.heading, a') This tells nextUntil to stop collecting elements when it gets to an element with a .heading class OR when it gets to an anchor tag. In this case it solved the problem. FYI: The comma operator in a CSS selector allows multiple criteria. Bingo! One final note, we could have broken the code down even more: We could have replaced the andSelf function here: $this = $this.nextUntil('.heading, a').andSelf(); With this: // get all the following siblings and then add the current item$this = $this.nextUntil('.heading, a');$this.add(this); But in this case, the andSelf function reads real nice. In my opinion. Here's a link to a jsFiddle if you want to play with it. I hope someone finds this useful Steve Wellens CodeProject

Read the article

Inheritance Mapping Strategies with Entity Framework Code First CTP5: Part 2 – Table per Type (TPT)

- by mortezam

In the previous blog post you saw that there are three different approaches to representing an inheritance hierarchy and I explained Table per Hierarchy (TPH) as the default mapping strategy in EF Code First. We argued that the disadvantages of TPH may be too serious for our design since it results in denormalized schemas that can become a major burden in the long run. In today’s blog post we are going to learn about Table per Type (TPT) as another inheritance mapping strategy and we'll see that TPT doesn’t expose us to this problem. Table per Type (TPT)Table per Type is about representing inheritance relationships as relational foreign key associations. Every class/subclass that declares persistent properties—including abstract classes—has its own table. The table for subclasses contains columns only for each noninherited property (each property declared by the subclass itself) along with a primary key that is also a foreign key of the base class table. This approach is shown in the following figure: For example, if an instance of the CreditCard subclass is made persistent, the values of properties declared by the BillingDetail base class are persisted to a new row of the BillingDetails table. Only the values of properties declared by the subclass (i.e. CreditCard) are persisted to a new row of the CreditCards table. The two rows are linked together by their shared primary key value. Later, the subclass instance may be retrieved from the database by joining the subclass table with the base class table. TPT Advantages The primary advantage of this strategy is that the SQL schema is normalized. In addition, schema evolution is straightforward (modifying the base class or adding a new subclass is just a matter of modify/add one table). Integrity constraint definition are also straightforward (note how CardType in CreditCards table is now a non-nullable column). Another much more important advantage is the ability to handle polymorphic associations (a polymorphic association is an association to a base class, hence to all classes in the hierarchy with dynamic resolution of the concrete class at runtime). A polymorphic association to a particular subclass may be represented as a foreign key referencing the table of that particular subclass. Implement TPT in EF Code First We can create a TPT mapping simply by placing Table attribute on the subclasses to specify the mapped table name (Table attribute is a new data annotation and has been added to System.ComponentModel.DataAnnotations namespace in CTP5): public abstract class BillingDetail { public int BillingDetailId { get; set; } public string Owner { get; set; } public string Number { get; set; } } [Table("BankAccounts")] public class BankAccount : BillingDetail { public string BankName { get; set; } public string Swift { get; set; } } [Table("CreditCards")] public class CreditCard : BillingDetail { public int CardType { get; set; } public string ExpiryMonth { get; set; } public string ExpiryYear { get; set; } } public class InheritanceMappingContext : DbContext { public DbSet<BillingDetail> BillingDetails { get; set; } } If you prefer fluent API, then you can create a TPT mapping by using ToTable() method: protected override void OnModelCreating(ModelBuilder modelBuilder) { modelBuilder.Entity<BankAccount>().ToTable("BankAccounts"); modelBuilder.Entity<CreditCard>().ToTable("CreditCards"); } Generated SQL For QueriesLet’s take an example of a simple non-polymorphic query that returns a list of all the BankAccounts: var query = from b in context.BillingDetails.OfType<BankAccount>() select b; Executing this query (by invoking ToList() method) results in the following SQL statements being sent to the database (on the bottom, you can also see the result of executing the generated query in SQL Server Management Studio): Now, let’s take an example of a very simple polymorphic query that requests all the BillingDetails which includes both BankAccount and CreditCard types: projects some properties out of the base class BillingDetail, without querying for anything from any of the subclasses: var query = from b in context.BillingDetails select new { b.BillingDetailId, b.Number, b.Owner }; -- var query = from b in context.BillingDetails select b; This LINQ query seems even more simple than the previous one but the resulting SQL query is not as simple as you might expect: -- As you can see, EF Code First relies on an INNER JOIN to detect the existence (or absence) of rows in the subclass tables CreditCards and BankAccounts so it can determine the concrete subclass for a particular row of the BillingDetails table. Also the SQL CASE statements that you see in the beginning of the query is just to ensure columns that are irrelevant for a particular row have NULL values in the returning flattened table. (e.g. BankName for a row that represents a CreditCard type) TPT ConsiderationsEven though this mapping strategy is deceptively simple, the experience shows that performance can be unacceptable for complex class hierarchies because queries always require a join across many tables. In addition, this mapping strategy is more difficult to implement by hand— even ad-hoc reporting is more complex. This is an important consideration if you plan to use handwritten SQL in your application (For ad hoc reporting, database views provide a way to offset the complexity of the TPT strategy. A view may be used to transform the table-per-type model into the much simpler table-per-hierarchy model.) SummaryIn this post we learned about Table per Type as the second inheritance mapping in our series. So far, the strategies we’ve discussed require extra consideration with regard to the SQL schema (e.g. in TPT, foreign keys are needed). This situation changes with the Table per Concrete Type (TPC) that we will discuss in the next post. References ADO.NET team blog Java Persistence with Hibernate book a { text-decoration: none; } a:visited { color: Blue; } .title { padding-bottom: 5px; font-family: Segoe UI; font-size: 11pt; font-weight: bold; padding-top: 15px; } .code, .typeName { font-family: consolas; } .typeName { color: #2b91af; } .padTop5 { padding-top: 5px; } .padTop10 { padding-top: 10px; } p.MsoNormal { margin-top: 0in; margin-right: 0in; margin-bottom: 10.0pt; margin-left: 0in; line-height: 115%; font-size: 11.0pt; font-family: "Calibri" , "sans-serif"; }

Read the article

Elegance, thy Name is jQuery

- by SGWellens

So, I'm browsing though some questions over on the Stack Overflow website and I found a good jQuery question just a few minutes old. Here is a link to it. It was a tough question; I knew that by answering it, I could learn new stuff and reinforce what I already knew: Reading is good, doing is better. Maybe I could help someone in the process too. I cut and pasted the HTML from the question into my Visual Studio IDE and went back to Stack Overflow to reread the question. Dang, someone had already answered it! And it was a great answer. I never even had a chance to start analyzing the issue. Now I know what a one-legged man feels like in an ass-kicking contest. Nevertheless, since the question and answer were so interesting, I decided to dissect them and learn as much as possible. The HTML consisted of some divs separated by h3 headings. Note the elements are laid out sequentially with no programmatic grouping: <h3 class="heading">Heading 1</h3> <div>Content</div> <div>More content</div> <div>Even more content</div><h3 class="heading">Heading 2</h3> <div>some content</div> <div>some more content</div><h3 class="heading">Heading 3</h3> <div>other content</div></form></body> The requirement was to wrap a div around each h3 heading and the subsequent divs grouping them into sections. Why? I don't know, I suppose if you screen-scrapped some HTML from another site, you might want to reformat it before displaying it on your own. Anyways… Here is the marvelously, succinct posted answer: $('.heading').each(function(){ $(this).nextUntil('.heading').andSelf().wrapAll('<div class="section">');}); I was familiar with all the parts except for nextUntil and andSelf. But, I'll analyze the whole answer for completeness. I'll do this by rewriting the posted answer in a different style and adding a boat-load of comments: function Test(){ // $Sections is a jQuery object and it will contain three elements var $Sections = $('.heading'); // use each to iterate over each of the three elements $Sections.each(function () { // $this is a jquery object containing the current element // being iterated var $this = $(this); // nextUntil gets the following sibling elements until it reaches // an element with the CSS class 'heading' // andSelf adds in the source element (this) to the collection $this = $this.nextUntil('.heading').andSelf(); // wrap the elements with a div $this.wrapAll('<div class="section" >'); });} The code here doesn't look nearly as concise and elegant as the original answer. However, unless you and your staff are jQuery masters, during development it really helps to work through algorithms step by step. You can step through this code in the debugger and examine the jQuery objects to make sure one step is working before proceeding on to the next. It's much easier to debug and troubleshoot when each logical coding step is a separate line. Note: You may think the original code runs much faster than this version. However, the time difference is trivial: Not enough to worry about: Less than 1 millisecond (tested in IE and FF). Note: You may want to jam everything into one line because it results in less traffic being sent to the client. That is true. However, most Internet servers now compress HTML and JavaScript by stripping out comments and white space (go to Bing or Google and view the source). This feature should be enabled on your server: Let the server compress your code, you don't need to do it. Free Career Advice: Creating maintainable code is Job One—Maximum Priority—The Prime Directive. If you find yourself suddenly transferred to customer support, it may be that the code you are writing is not as readable as it could be and not as readable as it should be. Moving on… I created a CSS class to see the results: .section{ background-color: yellow; border: 2px solid black; margin: 5px;} Here is the rendered output before: …and after the jQuery code runs. Pretty Cool! But, while playing with this code, the logic of nextUntil began to bother me: What happens in the last section? What stops elements from being collected since there are no more elements with the .heading class? The answer is nothing. In this case it stopped because it was at the end of the page. But what if there were additional HTML elements? I added an anchor tag and another div to the HTML: <h3 class="heading">Heading 1</h3> <div>Content</div> <div>More content</div> <div>Even more content</div><h3 class="heading">Heading 2</h3> <div>some content</div> <div>some more content</div><h3 class="heading">Heading 3</h3> <div>other content</div><a>this is a link</a><div>unrelated div</div> </form></body> The code as-is will include both the anchor and the unrelated div. This isn't what we want. My first attempt to correct this used the filter parameter of the nextUntil function: nextUntil('.heading', 'div') This will only collect div elements. But it merely skipped the anchor tag and it still collected the unrelated div: The problem is we need a way to tell the nextUntil function when to stop. CSS selectors to the rescue: nextUntil('.heading, a') This tells nextUntil to stop collecting sibling elements when it gets to an element with a .heading class OR when it gets to an anchor tag. In this case it solved the problem. FYI: The comma operator in a CSS selector allows multiple criteria. Bingo! One final note, we could have broken the code down even more: We could have replaced the andSelf function here: $this = $this.nextUntil('.heading, a').andSelf(); With this: // get all the following siblings and then add the current item$this = $this.nextUntil('.heading, a');$this.add(this); But in this case, the andSelf function reads real nice. In my opinion. Here's a link to a jsFiddle if you want to play with it. I hope someone finds this useful Steve Wellens CodeProject

Read the article

RiverTrail - JavaScript GPPGU Data Parallelism

- by JoshReuben

Where is WebCL ? The Khronos WebCL working group is working on a JavaScript binding to the OpenCL standard so that HTML 5 compliant browsers can host GPGPU web apps – e.g. for image processing or physics for WebGL games - http://www.khronos.org/webcl/ . While Nokia & Samsung have some protype WebCL APIs, Intel has one-upped them with a higher level of abstraction: RiverTrail. Intro to RiverTrail Intel Labs JavaScript RiverTrail provides GPU accelerated SIMD data-parallelism in web applications via a familiar JavaScript programming paradigm. It extends JavaScript with simple deterministic data-parallel constructs that are translated at runtime into a low-level hardware abstraction layer. With its high-level JS API, programmers do not have to learn a new language or explicitly manage threads, orchestrate shared data synchronization or scheduling. It has been proposed as a draft specification to ECMA a (known as ECMA strawman). RiverTrail runs in all popular browsers (except I.E. of course). To get started, download a prebuilt version https://github.com/downloads/RiverTrail/RiverTrail/rivertrail-0.17.xpi , install Intel's OpenCL SDK http://www.intel.com/go/opencl and try out the interactive River Trail shell http://rivertrail.github.com/interactive For a video overview, see http://www.youtube.com/watch?v=jueg6zB5XaM . ParallelArray the ParallelArray type is the central component of this API & is a JS object that contains ordered collections of scalars – i.e. multidimensional uniform arrays. A shape property describes the dimensionality and size– e.g. a 2D RGBA image will have shape [height, width, 4]. ParallelArrays are immutable & fluent – they are manipulated by invoking methods on them which produce new ParallelArray objects. ParallelArray supports several constructors over arrays, functions & even the canvas. // Create an empty Parallel Array var pa = new ParallelArray(); // pa0 = <> // Create a ParallelArray out of a nested JS array. // Note that the inner arrays are also ParallelArrays var pa = new ParallelArray([ [0,1], [2,3], [4,5] ]); // pa1 = <<0,1>, <2,3>, <4.5>> // Create a two-dimensional ParallelArray with shape [3, 2] using the comprehension constructor var pa = new ParallelArray([3, 2], function(iv){return iv[0] * iv[1];}); // pa7 = <<0,0>, <0,1>, <0,2>> // Create a ParallelArray from canvas. This creates a PA with shape [w, h, 4], var pa = new ParallelArray(canvas); // pa8 = CanvasPixelArray ParallelArray exposes fluent API functions that take an elemental JS function for data manipulation: map, combine, scan, filter, and scatter that return a new ParallelArray. Other functions are scalar - reduce returns a scalar value & get returns the value located at a given index. The onus is on the developer to ensure that the elemental function does not defeat data parallelization optimization (avoid global var manipulation, recursion). For reduce & scan, order is not guaranteed - the onus is on the dev to provide an elemental function that is commutative and associative so that scan will be deterministic – E.g. Sum is associative, but Avg is not. map Applies a provided elemental function to each element of the source array and stores the result in the corresponding position in the result array. The map method is shape preserving & index free - can not inspect neighboring values. // Adding one to each element. var source = new ParallelArray([1,2,3,4,5]); var plusOne = source.map(function inc(v) { return v+1; }); //<2,3,4,5,6> combine Combine is similar to map, except an index is provided. This allows elemental functions to access elements from the source array relative to the one at the current index position. While the map method operates on the outermost dimension only, combine, can choose how deep to traverse - it provides a depth argument to specify the number of dimensions it iterates over. The elemental function of combine accesses the source array & the current index within it - element is computed by calling the get method of the source ParallelArray object with index i as argument. It requires more code but is more expressive. var source = new ParallelArray([1,2,3,4,5]); var plusOne = source.combine(function inc(i) { return this.get(i)+1; }); reduce reduces the elements from an array to a single scalar result – e.g. Sum. // Calculate the sum of the elements var source = new ParallelArray([1,2,3,4,5]); var sum = source.reduce(function plus(a,b) { return a+b; }); scan Like reduce, but stores the intermediate results – return a ParallelArray whose ith elements is the results of using the elemental function to reduce the elements between 0 and I in the original ParallelArray. // do a partial sum var source = new ParallelArray([1,2,3,4,5]); var psum = source.scan(function plus(a,b) { return a+b; }); //<1, 3, 6, 10, 15> scatter a reordering function - specify for a certain source index where it should be stored in the result array. An optional conflict function can prevent an exception if two source values are assigned the same position of the result: var source = new ParallelArray([1,2,3,4,5]); var reorder = source.scatter([4,0,3,1,2]); // <2, 4, 5, 3, 1> // if there is a conflict use the max. use 33 as a default value. var reorder = source.scatter([4,0,3,4,2], 33, function max(a, b) {return a>b?a:b; }); //<2, 33, 5, 3, 4> filter // filter out values that are not even var source = new ParallelArray([1,2,3,4,5]); var even = source.filter(function even(iv) { return (this.get(iv) % 2) == 0; }); // <2,4> Flatten used to collapse the outer dimensions of an array into a single dimension. pa = new ParallelArray([ [1,2], [3,4] ]); // <<1,2>,<3,4>> pa.flatten(); // <1,2,3,4> Partition used to restore the original shape of the array. var pa = new ParallelArray([1,2,3,4]); // <1,2,3,4> pa.partition(2); // <<1,2>,<3,4>> Get return value found at the indices or undefined if no such value exists. var pa = new ParallelArray([0,1,2,3,4], [10,11,12,13,14], [20,21,22,23,24]) pa.get([1,1]); // 11 pa.get([1]); // <10,11,12,13,14>

Read the article

The Presentation Isn't Over Until It's Over

- by Phil Factor

The senior corporate dignitaries settled into their seats looking important in a blue-suited sort of way. The lights dimmed as I strode out in front to give my presentation. I had ten vital minutes to make my pitch. I was about to dazzle the top management of a large software company who were considering the purchase of my software product. I would present them with a dazzling synthesis of diagrams, graphs, followed by a live demonstration of my software projected from my laptop. My preparation had been meticulous: It had to be: A year’s hard work was at stake, so I’d prepared it to perfection. I stood up and took them all in, with a gaze of sublime confidence. Then the laptop expired. There are several possible alternative plans of action when this happens A. Stare at the smoking laptop vacuously, flapping ones mouth slowly up and down B. Stand frozen like a statue, locked in indecision between fright and flight. C. Run out of the room, weeping D. Pretend that this was all planned E. Abandon the presentation in favour of a stilted and tedious dissertation about the software F. Shake your fist at the sky, and curse the sense of humour of your preferred deity I started for a few seconds on plan B, normally referred to as the ‘Rabbit in the headlamps of the car’ technique. Suddenly, a little voice inside my head spoke. It spoke the famous inane words of Yogi Berra; ‘The game isn't over until it's over.’ ‘Too right’, I thought. What to do? I ran through the alternatives A-F inclusive in my mind but none appealed to me. I was completely unprepared for this. Nowadays, longevity has since taught me more than I wanted to know about the wacky sense of humour of fate, and I would have taken two laptops. I hadn’t, but decided to do the presentation anyway as planned. I started out ignoring the dead laptop, but pretending, instead that it was still working. The audience looked startled. They were expecting plan B to be succeeded by plan C, I suspect. They weren’t used to denial on this scale. After my introductory talk, which didn’t require any visuals, I came to the diagram that described the application I’d written. I’d taken ages over it and it was hot stuff. Well, it would have been had it been projected onto the screen. It wasn’t. Before I describe what happened then, I must explain that I have thespian tendencies. My triumph as Professor Higgins in My Fair Lady at the local operatic society is now long forgotten, but I remember at the time of my finest performance, the moment that, glancing up over the vast audience of moist-eyed faces at the during the poignant scene between Eliza and Higgins at the end, I realised that I had a talent that one day could possibly be harnessed for commercial use I just talked about the diagram as if it was there, but throwing in some extra description. The audience nodded helpfully when I’d done enough. Emboldened, I began a sort of mime, well, more of a ballet, to represent each slide as I came to it. Heaven knows I’d done my preparation and, in my mind’s eye, I could see every detail, but I had to somehow project the reality of that vision to the audience, much the same way any actor playing Macbeth should do the ghost of Banquo. My desperation gave me a manic energy. If you’ve ever demonstrated a windows application entirely by mime, gesture and florid description, you’ll understand the scale of the challenge, but then I had nothing to lose. With a brief sentence of description here and there, and arms flailing whilst outlining the size and shape of graphs and diagrams, I used the many tricks of mime, gesture and body-language learned from playing Captain Hook, or the Sheriff of Nottingham in pantomime. I set out determinedly on my desperate venture. There wasn’t time to do anything but focus on the challenge of the task: the world around me narrowed down to ten faces and my presentation: ten souls who had to be hypnotized into seeing a Windows application: one that was slick, well organized and functional I don’t remember the details. Eight minutes of my life are gone completely. I was a thespian berserker. I know however that I followed the basic plan of building the presentation in a carefully controlled crescendo until the dazzling finale where the results were displayed on-screen. ‘And here you see the results, neatly formatted and grouped carefully to enhance the significance of the figures, together with running trend-graphs!’ I waved a mime to signify an animated window-opening, and looked up, in my first pause, to gaze defiantly at the audience. It was a sight I’ll never forget. Ten pairs of eyes were gazing in rapt attention at the imaginary window, and several pairs of eyes were glancing at the imaginary graphs and figures. I hadn’t had an audience like that since my starring role in Beauty and the Beast. At that moment, I realized that my desperate ploy might work. I sat down, slightly winded, when my ten minutes were up. For the first and last time in my life, the audience of a ‘PowerPoint’ presentation burst into spontaneous applause. ‘Any questions?’ ‘Yes, Have you got an agent?’ Yes, in case you’re wondering, I got the deal. They bought the software product from me there and then. However, it was a life-changing experience for me and I have never ever again trusted technology as part of a presentation. Even if things can’t go wrong, they’ll go wrong and they’ll kill the flow of what you’re presenting. if you can’t do something without the techno-props, then you shouldn’t do it. The greatest lesson of all is that great presentations require preparation and ‘stage-presence’ rather than fancy graphics. They’re a great supporting aid, but they should never dominate to the point that you’re lost without them.

Read the article

When is a Seek not a Seek?

- by Paul White

The following script creates a single-column clustered table containing the integers from 1 to 1,000 inclusive. IF OBJECT_ID(N'tempdb..#Test', N'U') IS NOT NULL DROP TABLE #Test ; GO CREATE TABLE #Test ( id INTEGER PRIMARY KEY CLUSTERED ); ; INSERT #Test (id) SELECT V.number FROM master.dbo.spt_values AS V WHERE V.[type] = N'P' AND V.number BETWEEN 1 AND 1000 ; Let’s say we need to find the rows with values from 100 to 170, excluding any values that divide exactly by 10. One way to write that query would be: SELECT T.id FROM #Test AS T WHERE T.id IN ( 101,102,103,104,105,106,107,108,109, 111,112,113,114,115,116,117,118,119, 121,122,123,124,125,126,127,128,129, 131,132,133,134,135,136,137,138,139, 141,142,143,144,145,146,147,148,149, 151,152,153,154,155,156,157,158,159, 161,162,163,164,165,166,167,168,169 ) ; That query produces a pretty efficient-looking query plan: Knowing that the source column is defined as an INTEGER, we could also express the query this way: SELECT T.id FROM #Test AS T WHERE T.id >= 101 AND T.id <= 169 AND T.id % 10 > 0 ; We get a similar-looking plan: If you look closely, you might notice that the line connecting the two icons is a little thinner than before. The first query is estimated to produce 61.9167 rows – very close to the 63 rows we know the query will return. The second query presents a tougher challenge for SQL Server because it doesn’t know how to predict the selectivity of the modulo expression (T.id % 10 > 0). Without that last line, the second query is estimated to produce 68.1667 rows – a slight overestimate. Adding the opaque modulo expression results in SQL Server guessing at the selectivity. As you may know, the selectivity guess for a greater-than operation is 30%, so the final estimate is 30% of 68.1667, which comes to 20.45 rows. The second difference is that the Clustered Index Seek is costed at 99% of the estimated total for the statement. For some reason, the final SELECT operator is assigned a small cost of 0.0000484 units; I have absolutely no idea why this is so, or what it models. Nevertheless, we can compare the total cost for both queries: the first one comes in at 0.0033501 units, and the second at 0.0034054. The important point is that the second query is costed very slightly higher than the first, even though it is expected to produce many fewer rows (20.45 versus 61.9167). If you run the two queries, they produce exactly the same results, and both complete so quickly that it is impossible to measure CPU usage for a single execution. We can, however, compare the I/O statistics for a single run by running the queries with STATISTICS IO ON: Table '#Test'. Scan count 63, logical reads 126, physical reads 0. Table '#Test'. Scan count 01, logical reads 002, physical reads 0. The query with the IN list uses 126 logical reads (and has a ‘scan count’ of 63), while the second query form completes with just 2 logical reads (and a ‘scan count’ of 1). It is no coincidence that 126 = 63 * 2, by the way. It is almost as if the first query is doing 63 seeks, compared to one for the second query. In fact, that is exactly what it is doing. There is no indication of this in the graphical plan, or the tool-tip that appears when you hover your mouse over the Clustered Index Seek icon. To see the 63 seek operations, you have click on the Seek icon and look in the Properties window (press F4, or right-click and choose from the menu): The Seek Predicates list shows a total of 63 seek operations – one for each of the values from the IN list contained in the first query. I have expanded the first seek node to show the details; it is seeking down the clustered index to find the entry with the value 101. Each of the other 62 nodes expands similarly, and the same information is contained (even more verbosely) in the XML form of the plan. Each of the 63 seek operations starts at the root of the clustered index B-tree and navigates down to the leaf page that contains the sought key value. Our table is just large enough to need a separate root page, so each seek incurs 2 logical reads (one for the root, and one for the leaf). We can see the index depth using the INDEXPROPERTY function, or by using the a DMV: SELECT S.index_type_desc, S.index_depth FROM sys.dm_db_index_physical_stats ( DB_ID(N'tempdb'), OBJECT_ID(N'tempdb..#Test', N'U'), 1, 1, DEFAULT ) AS S ; Let’s look now at the Properties window when the Clustered Index Seek from the second query is selected: There is just one seek operation, which starts at the root of the index and navigates the B-tree looking for the first key that matches the Start range condition (id >= 101). It then continues to read records at the leaf level of the index (following links between leaf-level pages if necessary) until it finds a row that does not meet the End range condition (id <= 169). Every row that meets the seek range condition is also tested against the Residual Predicate highlighted above (id % 10 > 0), and is only returned if it matches that as well. You will not be surprised that the single seek (with a range scan and residual predicate) is much more efficient than 63 singleton seeks. It is not 63 times more efficient (as the logical reads comparison would suggest), but it is around three times faster. Let’s run both query forms 10,000 times and measure the elapsed time: DECLARE @i INTEGER, @n INTEGER = 10000, @s DATETIME = GETDATE() ; SET NOCOUNT ON; SET STATISTICS XML OFF; ; WHILE @n > 0 BEGIN SELECT @i = T.id FROM #Test AS T WHERE T.id IN ( 101,102,103,104,105,106,107,108,109, 111,112,113,114,115,116,117,118,119, 121,122,123,124,125,126,127,128,129, 131,132,133,134,135,136,137,138,139, 141,142,143,144,145,146,147,148,149, 151,152,153,154,155,156,157,158,159, 161,162,163,164,165,166,167,168,169 ) ; SET @n -= 1; END ; PRINT DATEDIFF(MILLISECOND, @s, GETDATE()) ; GO DECLARE @i INTEGER, @n INTEGER = 10000, @s DATETIME = GETDATE() ; SET NOCOUNT ON ; WHILE @n > 0 BEGIN SELECT @i = T.id FROM #Test AS T WHERE T.id >= 101 AND T.id <= 169 AND T.id % 10 > 0 ; SET @n -= 1; END ; PRINT DATEDIFF(MILLISECOND, @s, GETDATE()) ; On my laptop, running SQL Server 2008 build 4272 (SP2 CU2), the IN form of the query takes around 830ms and the range query about 300ms. The main point of this post is not performance, however – it is meant as an introduction to the next few parts in this mini-series that will continue to explore scans and seeks in detail. When is a seek not a seek? When it is 63 seeks © Paul White 2011 email: [email protected] twitter: @SQL_kiwi

Read the article

Behavior Driven Development (BDD) and DevExpress XAF

- by Patrick Liekhus

So in my previous posts I showed you how I used EDMX to quickly build my business objects within XPO and XAF. But how do you test whether your business objects are actually doing what you want and verify that your business logic is correct? Well I was reading my monthly MSDN magazine last last year and came across an article about using SpecFlow and WatiN to build BDD tests. So why not use these same techniques to write SpecFlow style scripts and have them generate EasyTest scripts for use with XAF. Let me outline and show a few things below. I plan on releasing this code in a short while, I just wanted to preview what I was thinking. Before we begin… First, if you have not read the article in MSDN, here is the link to the article that I found my inspiration. It covers the overview of BDD vs. TDD, how to write some of the SpecFlow syntax and how use the “Steps” logic to create your own tests. Second, if you have not heard of EasyTest from DevExpress I strongly recommend you review it here. It basically takes the power of XAF and the beauty of your application and allows you to create text based files to execute automated commands within your application. Why would we do this? Because as you will see below, the cucumber syntax is easier for business analysts to interpret and digest the business rules from. You can find most of the information you will need on Cucumber syntax within The Secret Ninja Cucumber Scrolls located here. The basics of the syntax are that Given X When Y Then Z. For example, Given I am at the login screen When I enter my login credentials Then I expect to see the home screen. Pretty easy syntax to follow. Finally, we will need to download and install SpecFlow. You can find it on their website here. Once you have this installed then let’s write our first test. Let’s get started… So where to start. Create a new testing project within your solution. I typically call this with a similar naming convention as used by XAF, my project name .FunctionalTests (i.e. AlbumManager.FunctionalTests). Remove the basic test that is created for you. We will not use the default test but rather create our own SpecFlow “Feature” files. Add a new item to your project and select the SpecFlow Feature file under C#. Name your feature file as you do your class files after the test they are performing. Now you can crack open your new feature file and write the actual test. Make sure to have your Ninja Scrolls from above as it provides valuable resources on how to write your test syntax. In this test below you can see how I defined the documentation in the Feature section. This is strictly for our purposes of readability and do not effect the test. The next section is the Scenario Outline which is considered a test template. You can see the brackets <> around the fields that will be filled in for each test. So in the example below you can see that Given I am starting a new test and the application is open. This means I want a new EasyTest file and the windows application generated by XAF is open. Next When I am at the Albums screen tells XAF to navigate to the Albums list view. And I click the New:Album button, tells XAF to click the new button on the list grid. And I enter the following information tells XAF which fields to complete with the mapped values. And I click the Save and Close button causes the record to be saved and the detail form to be closed. Then I verify results tests the input data against what is visible in the grid to ensure that your record was created. The Scenarios section gives each test a unique name and then fills in the values for each test. This way you can use the same test to make multiple passes with different data. Almost there. Now we must save the feature file and the BDD tests will be written using standard unit test syntax. This is all handled for you by SpecFlow so just save the file. What you will see in your Test List Editor is a unit test for each of the above scenarios you just built. You can now use standard unit testing frameworks to execute the test as you desire. As you would expect then, these BDD SpecFlow tests can be automated into your build process to ensure that your business requirements are satisfied each and every time. How does it work? What we have done is to intercept the testing logic at runtime to interpret the SpecFlow syntax into EasyTest syntax. This is the basic StepDefinitions that we are working on now. We expect to put these on CodePlex within the next few days. You can always override and make your own rules as you see fit for your project. Follow the MSDN magazine above to start your own. You can see part of our implementation below. As you can gather from the MSDN article and the code sample below, we have created our own common rules to build the above syntax. The code implementation for these rules basically saves your information from the feature file into an EasyTest file format. It then executes the EasyTest file and parses the XML results of the test. If the test succeeds the test is passed. If the test fails, the EasyTest failure message is logged and the screen shot (as captured by EasyTest) is saved for your review. Again we are working on getting this code ready for mass consumption, but at this time it is not ready. We will post another message when it is ready with all details about usage and setup. Thanks

Search Results

Search found 13889 results on 556 pages for 'results'.

Page 244/556 | < Previous Page | 240 241 242 243 244 245 246 247 248 249 250 251 | Next Page >

- by [email protected]

- by pinaldave

- by Mark Rosenberg

- by Tarun Arora

- by rchrd

- by Mat Keep

- by Kevin Yang

- by James Michael Hare

- by Pinal Dave

- by danx

- by Harish Gaur

- by Edward

- by DavidWimbush

- by Sean

- by Pinal Dave

- by user702549

- by Stefan Hinker

- by SGWellens

- by mortezam

- by SGWellens

- by JoshReuben

- by Phil Factor

- by Paul White

- by Patrick Liekhus

< Previous Page | 240 241 242 243 244 245 246 247 248 249 250 251 | Next Page >