Search Results

Search found 1228 results on 50 pages for 'comparing'.

Page 46/50 | < Previous Page | 42 43 44 45 46 47 48 49 50 | Next Page >

SPARC T3-1 Record Results Running JD Edwards EnterpriseOne Day in the Life Benchmark with Added Batch Component

- by Brian

Using Oracle's SPARC T3-1 server for the application tier and Oracle's SPARC Enterprise M3000 server for the database tier, a world record result was produced running the Oracle's JD Edwards EnterpriseOne applications Day in the Life benchmark run concurrently with a batch workload. The SPARC T3-1 server based result has 25% better performance than the IBM Power 750 POWER7 server even though the IBM result did not include running a batch component. The SPARC T3-1 server based result has 25% better space/performance than the IBM Power 750 POWER7 server as measured by the online component. The SPARC T3-1 server based result is 5x faster than the x86-based IBM x3650 M2 server system when executing the online component of the JD Edwards EnterpriseOne 9.0.1 Day in the Life benchmark. The IBM result did not include a batch component. The SPARC T3-1 server based result has 2.5x better space/performance than the x86-based IBM x3650 M2 server as measured by the online component. The combination of SPARC T3-1 and SPARC Enterprise M3000 servers delivered a Day in the Life benchmark result of 5000 online users with 0.875 seconds of average transaction response time running concurrently with 19 Universal Batch Engine (UBE) processes at 10 UBEs/minute. The solution exercises various JD Edwards EnterpriseOne applications while running Oracle WebLogic Server 11g Release 1 and Oracle Web Tier Utilities 11g HTTP server in Oracle Solaris Containers, together with the Oracle Database 11g Release 2. The SPARC T3-1 server showed that it could handle the additional workload of batch processing while maintaining the same number of online users for the JD Edwards EnterpriseOne Day in the Life benchmark. This was accomplished with minimal loss in response time. JD Edwards EnterpriseOne 9.0.1 takes advantage of the large number of compute threads available in the SPARC T3-1 server at the application tier and achieves excellent response times. The SPARC T3-1 server consolidates the application/web tier of the JD Edwards EnterpriseOne 9.0.1 application using Oracle Solaris Containers. Containers provide flexibility, easier maintenance and better CPU utilization of the server leaving processing capacity for additional growth. A number of Oracle advanced technology and features were used to obtain this result: Oracle Solaris 10, Oracle Solaris Containers, Oracle Java Hotspot Server VM, Oracle WebLogic Server 11g Release 1, Oracle Web Tier Utilities 11g, Oracle Database 11g Release 2, the SPARC T3 and SPARC64 VII+ based servers. This is the first published result running both online and batch workload concurrently on the JD Enterprise Application server. No published results are available from IBM running the online component together with a batch workload. The 9.0.1 version of the benchmark saw some minor performance improvements relative to 9.0. When comparing between 9.0.1 and 9.0 results, the reader should take this into account when the difference between results is small. Performance Landscape JD Edwards EnterpriseOne Day in the Life Benchmark Online with Batch Workload This is the first publication on the Day in the Life benchmark run concurrently with batch jobs. The batch workload was provided by Oracle's Universal Batch Engine. System RackUnits Online Users Resp Time (sec) BatchConcur(# of UBEs) BatchRate(UBEs/m) Version SPARC T3-1, 1xSPARC T3 (1.65 GHz), Solaris 10 M3000, 1xSPARC64 VII+ (2.86 GHz), Solaris 10 4 5000 0.88 19 10 9.0.1 Resp Time (sec) — Response time of online jobs reported in seconds Batch Concur (# of UBEs) — Batch concurrency presented in the number of UBEs Batch Rate (UBEs/m) — Batch transaction rate in UBEs/minute. JD Edwards EnterpriseOne Day in the Life Benchmark Online Workload Only These results are for the Day in the Life benchmark. They are run without any batch workload. System RackUnits Online Users ResponseTime (sec) Version SPARC T3-1, 1xSPARC T3 (1.65 GHz), Solaris 10 M3000, 1xSPARC64 VII (2.75 GHz), Solaris 10 4 5000 0.52 9.0.1 IBM Power 750, 1xPOWER7 (3.55 GHz), IBM i7.1 4 4000 0.61 9.0 IBM x3650M2, 2xIntel X5570 (2.93 GHz), OVM 2 1000 0.29 9.0 IBM result from http://www-03.ibm.com/systems/i/advantages/oracle/, IBM used WebSphere Configuration Summary Hardware Configuration: 1 x SPARC T3-1 server 1 x 1.65 GHz SPARC T3 128 GB memory 16 x 300 GB 10000 RPM SAS 1 x Sun Flash Accelerator F20 PCIe Card, 92 GB 1 x 10 GbE NIC 1 x SPARC Enterprise M3000 server 1 x 2.86 SPARC64 VII+ 64 GB memory 1 x 10 GbE NIC 2 x StorageTek 2540 + 2501 Software Configuration: JD Edwards EnterpriseOne 9.0.1 with Tools 8.98.3.3 Oracle Database 11g Release 2 Oracle 11g WebLogic server 11g Release 1 version 10.3.2 Oracle Web Tier Utilities 11g Oracle Solaris 10 9/10 Mercury LoadRunner 9.10 with Oracle Day in the Life kit for JD Edwards EnterpriseOne 9.0.1 Oracle’s Universal Batch Engine - Short UBEs and Long UBEs Benchmark Description JD Edwards EnterpriseOne is an integrated applications suite of Enterprise Resource Planning (ERP) software. Oracle offers 70 JD Edwards EnterpriseOne application modules to support a diverse set of business operations. Oracle's Day in the Life (DIL) kit is a suite of scripts that exercises most common transactions of JD Edwards EnterpriseOne applications, including business processes such as payroll, sales order, purchase order, work order, and other manufacturing processes, such as ship confirmation. These are labeled by industry acronyms such as SCM, CRM, HCM, SRM and FMS. The kit's scripts execute transactions typical of a mid-sized manufacturing company. The workload consists of online transactions and the UBE workload of 15 short and 4 long UBEs. LoadRunner runs the DIL workload, collects the user’s transactions response times and reports the key metric of Combined Weighted Average Transaction Response time. The UBE processes workload runs from the JD Enterprise Application server. Oracle's UBE processes come as three flavors: Short UBEs < 1 minute engage in Business Report and Summary Analysis, Mid UBEs > 1 minute create a large report of Account, Balance, and Full Address, Long UBEs > 2 minutes simulate Payroll, Sales Order, night only jobs. The UBE workload generates large numbers of PDF files reports and log files. The UBE Queues are categorized as the QBATCHD, a single threaded queue for large UBEs, and the QPROCESS queue for short UBEs run concurrently. One of the Oracle Solaris Containers ran 4 Long UBEs, while another Container ran 15 short UBEs concurrently. The mixed size UBEs ran concurrently from the SPARC T3-1 server with the 5000 online users driven by the LoadRunner. Oracle’s UBE process performance metric is Number of Maximum Concurrent UBE processes at transaction rate, UBEs/minute. Key Points and Best Practices Two JD Edwards EnterpriseOne Application Servers and two Oracle Fusion Middleware WebLogic Servers 11g R1 coupled with two Oracle Fusion Middleware 11g Web Tier HTTP Server instances on the SPARC T3-1 server were hosted in four separate Oracle Solaris Containers to demonstrate consolidation of multiple application and web servers. See Also SPARC T3-1 oracle.com SPARC Enterprise M3000 oracle.com Oracle Solaris oracle.com JD Edwards EnterpriseOne oracle.com Oracle Database 11g Release 2 Enterprise Edition oracle.com Disclosure Statement Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 6/27/2011.

Read the article
Building KPIs to monitor your business Its not really about the Technology

When I have discussions with people about Business Intelligence, one of the questions the inevitably come up is about building KPIs and how to accomplish that. From a technical level the concept of a KPI is very simple, almost too simple in that it is like the tip of an iceberg floating above the water. The key to that iceberg is not really the tip, but the mass of the iceberg that is hidden beneath the surface upon which the tip sits. The analogy of the iceberg is not meant to indicate that the foundation of the KPI is overly difficult or complex. The disparity in size in meant to indicate that the larger thing that needs to be defined is not the technical tip, but the underlying business definition of what the KPI means. From a technical perspective the KPI consists of primarily the following items: Actual Value This is the actual value data point that is being measured. An example would be something like the amount of sales. Target Value This is the target goal for the KPI. This is a number that can be measured against Actual Value. An example would be $10,000 in monthly sales. Target Indicator Range This is the definition of ranges that define what type of indicator the user will see comparing the Actual Value to the Target Value. Most often this is defined by stoplight, but can be any indicator that is going to show a status in a quick fashion to the user. Typically this would be something like: Red Light = Actual Value more than 5% below target; Yellow Light = Within 5% of target either direction; Green Light = More than 5% higher than Target Value Status\Trend Indicator This is an optional attribute of a KPI that is typically used to show some kind of trend. The vast majority of these indicators are used to show some type of progress against a previous period. As an example, the status indicator might be used to show how the monthly sales compare to last month. With this type of indicator there needs to be not only a definition of what the ranges are for your status indictor, but then also what value the number needs to be compared against. So now we have an idea of what data points a KPI consists of from a technical perspective lets talk a bit about tools. As you can see technically there is not a whole lot to them and the choice of technology is not as important as the definition of the KPIs, which we will get to in a minute. There are many different types of tools in the Microsoft BI stack that you can use to expose your KPI to the business. These include Performance Point, SharePoint, Excel, and SQL Reporting Services. There are pluses and minuses to each technology and the right technology is based a lot on your goals and how you want to deliver the information to the users. Additionally, there are other non-Microsoft tools that can be used to expose KPI indicators to your business users. Regardless of the technology used as your front end, the heavy lifting of KPI is in the business definition of the values and benchmarks for that KPI. The discussion about KPIs is very dependent on the history of an organization and how much they are exposed to the attributes of a KPI. Often times when discussing KPIs with a business contact who has not been exposed to KPIs the discussion tends to also be a session educating the business user about what a KPI is and what goes into the definition of a KPI. The majority of times the business user has an idea of what their actual values are and they have been tracking those numbers for some time, generally in Excel and all manually. So they will know the amount of sales last month along with sales two years ago in the same month. Where the conversation tends to get stuck is when you start discussing what the target value should be. The actual value is answering the What and How much questions. When you are talking about the Target values you are asking the question Is this number good or bad. Typically, the user will know whether or not the value is good or bad, but most of the time they are not able to quantify what is good or bad. Their response is usually something like I just know. Because they have been watching the sales quantity for years now, they can tell you that a 5% decrease in sales this month might actually be a good thing, maybe because the salespeople are all waiting until next month when the new versions come out. It can sometimes be very hard to break the business people of this habit. One of the fears generally is that the status indicator is not subjective. Thus, in the scenario above, the business user is going to be fearful that their boss, just looking at a negative red indicator, is going to haul them out to the woodshed for a bad month. But, on the flip side, if all you are displaying is the amount of sales, only a person with knowledge of last month sales and the target amount for this month would have any idea if $10,000 in sales is good or not. Here is where a key point about KPIs needs to be communicated to both the business user and any user who might be viewing the results of that KPI. The KPI is just one tool that is used to report on business performance. The KPI is meant as a quick indicator of one business statistic. It is not meant to tell the entire story. It does not answer the question Why. Its primary purpose is to objectively and quickly expose an area of the business that might warrant more review. There is always going to be the need to do further analysis on any potential negative or neutral KPI. So, hopefully, once you have convinced your business user to come up with some target numbers and ranges for status indicators, you then need to take the next step and help them answer the Why question. The main question here to ask is, Okay, you see the indicator and you need to discover why the number is what is, where do you go?. The answer is usually a combination of sources. A sales manager might have some of the following items at their disposal (Marketing report showing a decrease in the promotional discounts for the month, Pricing Report showing the reduction of prices of older models, an Inventory Report showing the discontinuation of a particular product line, or a memo showing the ending of a large affiliate partnership. The answers to the question Why are never as simple as a single indicator value. Bring able to quickly get to this information is all about designing how a user accesses the KPIs and then also how easily they can get to the additional information they need. This is where a Dashboard mentality can come in handy. For example, the business user can have a dashboard that shows their KPIs, but also has links to some of the common reports that they run regarding Sales Data. The users boss may have the same KPIs on their dashboard, but instead of links to individual reports they are going to have a link to a status report that was created by the user that pulls together all the data about the KPI in a summary format the users boss can review. So some of the key things to think about when building or evaluating KPIs for your organization: Technology should not be the driving factor KPIs are of little value without some indicator for whether a value is good, bad or neutral. KPIs only give an answer to the Is this number good\bad? question Make sure the ability to drill into the Why of a KPI is close at hand and relevant to the user who is viewing the KPI. The KPI is a key business tool when defined properly to help monitor business performance across the enterprise in an objective and consistent manner. At times it might feel like the process of defining the business aspects of a KPI can sometimes be arduous, the payoff in the end can far outweigh the costs. Some of the benefits of going through this process are a better understanding of the key metrics for an organization and the measure of those metrics and a consistent snapshot of business performance that can be utilized across the organization. And I think that these are benefits to any organization regardless of the technology or the implementation.Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
C#: LINQ vs foreach - Round 1.

- by James Michael Hare

So I was reading Peter Kellner's blog entry on Resharper 5.0 and its LINQ refactoring and thought that was very cool. But that raised a point I had always been curious about in my head -- which is a better choice: manual foreach loops or LINQ? The answer is not really clear-cut. There are two sides to any code cost arguments: performance and maintainability. The first of these is obvious and quantifiable. Given any two pieces of code that perform the same function, you can run them side-by-side and see which piece of code performs better. Unfortunately, this is not always a good measure. Well written assembly language outperforms well written C++ code, but you lose a lot in maintainability which creates a big techncial debt load that is hard to offset as the application ages. In contrast, higher level constructs make the code more brief and easier to understand, hence reducing technical cost. Now, obviously in this case we're not talking two separate languages, we're comparing doing something manually in the language versus using a higher-order set of IEnumerable extensions that are in the System.Linq library. Well, before we discuss any further, let's look at some sample code and the numbers. First, let's take a look at the for loop and the LINQ expression. This is just a simple find comparison: // find implemented via LINQ public static bool FindViaLinq(IEnumerable<int> list, int target) { return list.Any(item => item == target); } // find implemented via standard iteration public static bool FindViaIteration(IEnumerable<int> list, int target) { foreach (var i in list) { if (i == target) { return true; } } return false; } Okay, looking at this from a maintainability point of view, the Linq expression is definitely more concise (8 lines down to 1) and is very readable in intention. You don't have to actually analyze the behavior of the loop to determine what it's doing. So let's take a look at performance metrics from 100,000 iterations of these methods on a List<int> of varying sizes filled with random data. For this test, we fill a target array with 100,000 random integers and then run the exact same pseudo-random targets through both searches. List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower Any 10 26 0.00046 30.00% Iteration 10 20 0.00023 - Any 100 116 0.00201 18.37% Iteration 100 98 0.00118 - Any 1000 1058 0.01853 16.78% Iteration 1000 906 0.01155 - Any 10,000 10,383 0.18189 17.41% Iteration 10,000 8843 0.11362 - Any 100,000 104,004 1.8297 18.27% Iteration 100,000 87,941 1.13163 - The LINQ expression is running about 17% slower for average size collections and worse for smaller collections. Presumably, this is due to the overhead of the state machine used to track the iterators for the yield returns in the LINQ expressions, which seems about right in a tight loop such as this. So what about other LINQ expressions? After all, Any() is one of the more trivial ones. I decided to try the TakeWhile() algorithm using a Count() to get the position stopped like the sample Pete was using in his blog that Resharper refactored for him into LINQ: // Linq form public static int GetTargetPosition1(IEnumerable<int> list, int target) { return list.TakeWhile(item => item != target).Count(); } // traditionally iterative form public static int GetTargetPosition2(IEnumerable<int> list, int target) { int count = 0; foreach (var i in list) { if(i == target) { break; } ++count; } return count; } Once again, the LINQ expression is much shorter, easier to read, and should be easier to maintain over time, reducing the cost of technical debt. So I ran these through the same test data: List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Iteration 100,000 81635 0.81635 - Wow! I expected some overhead due to the state machines iterators produce, but 90% slower? That seems a little heavy to me. So then I thought, well, what if TakeWhile() is not the right tool for the job? The problem is TakeWhile returns each item for processing using yield return, whereas our for-loop really doesn't care about the item beyond using it as a stop condition to evaluate. So what if that back and forth with the iterator state machine is the problem? Well, we can quickly create an (albeit ugly) lambda that uses the Any() along with a count in a closure (if a LINQ guru knows a better way PLEASE let me know!), after all , this is more consistent with what we're trying to do, we're trying to find the first occurence of an item and halt once we find it, we just happen to be counting on the way. This mostly matches Any(). // a new method that uses linq but evaluates the count in a closure. public static int TakeWhileViaLinq2(IEnumerable<int> list, int target) { int count = 0; list.Any(item => { if(item == target) { return true; } ++count; return false; }); return count; } Now how does this one compare? List<T> On 100,000 Iterations Method Size Total (ms) Per Iteration (ms) % Slower TakeWhile 10 41 0.00041 128% Any w/Closure 10 23 0.00023 28% Iteration 10 18 0.00018 - TakeWhile 100 171 0.00171 88% Any w/Closure 100 116 0.00116 27% Iteration 100 91 0.00091 - TakeWhile 1000 1604 0.01604 94% Any w/Closure 1000 1101 0.01101 33% Iteration 1000 825 0.00825 - TakeWhile 10,000 15765 0.15765 92% Any w/Closure 10,000 10802 0.10802 32% Iteration 10,000 8204 0.08204 - TakeWhile 100,000 156950 1.5695 92% Any w/Closure 100,000 108378 1.08378 33% Iteration 100,000 81635 0.81635 - Much better! It seems that the overhead of TakeAny() returning each item and updating the state in the state machine is drastically reduced by using Any() since Any() iterates forward until it finds the value we're looking for -- for the task we're attempting to do. So the lesson there is, make sure when you use a LINQ expression you're choosing the best expression for the job, because if you're doing more work than you really need, you'll have a slower algorithm. But this is true of any choice of algorithm or collection in general. Even with the Any() with the count in the closure it is still about 30% slower, but let's consider that angle carefully. For a list of 100,000 items, it was the difference between 1.01 ms and 0.82 ms roughly in a List<T>. That's really not that bad at all in the grand scheme of things. Even running at 90% slower with TakeWhile(), for the vast majority of my projects, an extra millisecond to save potential errors in the long term and improve maintainability is a small price to pay. And if your typical list is 1000 items or less we're talking only microseconds worth of difference. It's like they say: 90% of your performance bottlenecks are in 2% of your code, so over-optimizing almost never pays off. So personally, I'll take the LINQ expression wherever I can because they will be easier to read and maintain (thus reducing technical debt) and I can rely on Microsoft's development to have coded and unit tested those algorithm fully for me instead of relying on a developer to code the loop logic correctly. If something's 90% slower, yes, it's worth keeping in mind, but it's really not until you start get magnitudes-of-order slower (10x, 100x, 1000x) that alarm bells should really go off. And if I ever do need that last millisecond of performance? Well then I'll optimize JUST THAT problem spot. To me it's worth it for the readability, speed-to-market, and maintainability.

Read the article
Making sense of S.M.A.R.T

- by James

First of all, I think everyone knows that hard drives fail a lot more than the manufacturers would like to admit. Google did a study that indicates that certain raw data attributes that the S.M.A.R.T status of hard drives reports can have a strong correlation with the future failure of the drive. We find, for example, that after their first scan error, drives are 39 times more likely to fail within 60 days than drives with no such errors. First errors in re- allocations, offline reallocations, and probational counts are also strongly correlated to higher failure probabil- ities. Despite those strong correlations, we find that failure prediction models based on SMART parameters alone are likely to be severely limited in their prediction accuracy, given that a large fraction of our failed drives have shown no SMART error signals whatsoever. Seagate seems like it is trying to obscure this information about their drives by claiming that only their software can accurately determine the accurate status of their drive and by the way their software will not tell you the raw data values for the S.M.A.R.T attributes. Western digital has made no such claim to my knowledge but their status reporting tool does not appear to report raw data values either. I've been using HDtune and smartctl from smartmontools in order to gather the raw data values for each attribute. I've found that indeed... I am comparing apples to oranges when it comes to certain attributes. I've found for example that most Seagate drives will report that they have many millions of read errors while western digital 99% of the time shows 0 for read errors. I've also found that Seagate will report many millions of seek errors while Western Digital always seems to report 0. Now for my question. How do I normalize this data? Is Seagate producing millions of errors while Western digital is producing none? Wikipedia's article on S.M.A.R.T status says that manufacturers have different ways of reporting this data. Here is my hypothesis: I think I found a way to normalize (is that the right term?) the data. Seagate drives have an additional attribute that Western Digital drives do not have (Hardware ECC Recovered). When you subtract the Read error count from the ECC Recovered count, you'll probably end up with 0. This seems to be equivalent to Western Digitals reported "Read Error" count. This means that Western Digital only reports read errors that it cannot correct while Seagate counts up all read errors and tells you how many of those it was able to fix. I had a Seagate drive where the ECC Recovered count was less than the Read error count and I noticed that many of my files were becoming corrupt. This is how I came up with my hypothesis. The millions of seek errors that Seagate produces are still a mystery to me. Please confirm or correct my hypothesis if you have additional information. Here is the smart status of my western digital drive just so you can see what I'm talking about: james@ubuntu:~$ sudo smartctl -a /dev/sda smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD1001FALS-00E3A0 Serial Number: WD-WCATR0258512 Firmware Version: 05.01D05 User Capacity: 1,000,204,886,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Jun 10 19:52:28 2010 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 179 175 021 Pre-fail Always - 4033 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 270 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1468 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 262 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 46 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 223 194 Temperature_Celsius 0x0022 105 102 000 Old_age Always - 42 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

Read the article
SQL SERVER – Weekly Series – Memory Lane – #032

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Complete Series of Database Coding Standards and Guidelines SQL SERVER Database Coding Standards and Guidelines – Introduction SQL SERVER – Database Coding Standards and Guidelines – Part 1 SQL SERVER – Database Coding Standards and Guidelines – Part 2 SQL SERVER Database Coding Standards and Guidelines Complete List Download Explanation and Example – SELF JOIN When all of the data you require is contained within a single table, but data needed to extract is related to each other in the table itself. Examples of this type of data relate to Employee information, where the table may have both an Employee’s ID number for each record and also a field that displays the ID number of an Employee’s supervisor or manager. To retrieve the data tables are required to relate/join to itself. Insert Multiple Records Using One Insert Statement – Use of UNION ALL This is very interesting question I have received from new developer. How can I insert multiple values in table using only one insert? Now this is interesting question. When there are multiple records are to be inserted in the table following is the common way using T-SQL. Function to Display Current Week Date and Day – Weekly Calendar Straight blog post with script to find current week date and day based on the parameters passed in the function. 2008 In my beginning years, I have almost same confusion as many of the developer had in their earlier years. Here are two of the interesting question which I have attempted to answer in my early year. Even if you are experienced developer may be you will still like to read following two questions: Order Of Column In Index Order of Conditions in WHERE Clauses Example of DISTINCT in Aggregate Functions Have you ever used DISTINCT with the Aggregation Function? Here is a simple example about how users can do it. Create a Comma Delimited List Using SELECT Clause From Table Column Straight to script example where I explained how to do something easy and quickly. Compound Assignment Operators SQL SERVER 2008 has introduced new concept of Compound Assignment Operators. Compound Assignment Operators are available in many other programming languages for quite some time. Compound Assignment Operators is operator where variables are operated upon and assigned on the same line. PIVOT and UNPIVOT Table Examples Here is a very interesting question – the answer to the question can be YES or NO both. “If we PIVOT any table and UNPIVOT that table do we get our original table?” Read the blog post to get the explanation of the question above. 2009 What is Interim Table – Simple Definition of Interim Table The interim table is a table that is generated by joining two tables and not the final result table. In other words, when two tables are joined they create an interim table as resultset but the resultset is not final yet. It may be possible that more tables are about to join on the interim table, and more operations are still to be applied on that table (e.g. Order By, Having etc). Besides, it may be possible that there is no interim table; sometimes final table is what is generated when the query is run. 2010 Stored Procedure and Transactions If Stored Procedure is transactional then, it should roll back complete transactions when it encounters any errors. Well, that does not happen in this case, which proves that Stored Procedure does not only provide just the transactional feature to a batch of T-SQL. Generate Database Script for SQL Azure When talking about SQL Azure the most common complaint I hear is that the script generated from stand-along SQL Server database is not compatible with SQL Azure. This was true for some time for sure but not any more. If you have SQL Server 2008 R2 installed you can follow the guideline below to generate a script which is compatible with SQL Azure. Convert IN to EXISTS – Performance Talk It is NOT necessary that every time when IN is replaced by EXISTS it gives better performance. However, in our case listed above it does for sure give better performance. You can read about this subject in the associated blog post. Subquery or Join – Various Options – SQL Server Engine Knows the Best Every single time whenever there is a performance tuning exercise, I hear the conversation from developer where some prefer subquery and some prefer join. In this two part blog post, I explain the same in the detail with examples. Part 1 | Part 2 Merge Operations – Insert, Update, Delete in Single Execution MERGE is a new feature that provides an efficient way to do multiple DML operations. In earlier versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions; however, at present, by using the MERGE statement, we can include the logic of such data changes in one statement that even checks when the data is matched and then just update it, and similarly, when the data is unmatched, it is inserted. 2011 Puzzle – Statistics are not updated but are Created Once Here is the quick scenario about my setup. Create Table Insert 1000 Records Check the Statistics Now insert 10 times more 10,000 indexes Check the Statistics – it will be NOT updated – WHY? Question to You – When to use Function and When to use Stored Procedure Personally, I believe that they are both different things - they cannot be compared. I can say, it will be like comparing apples and oranges. Each has its own unique use. However, they can be used interchangeably at many times and in real life (i.e., production environment). I have personally seen both of these being used interchangeably many times. This is the precise reason for asking this question. 2012 In year 2012 I had two interesting series ran on the blog. If there is no fun in learning, the learning becomes a burden. For the same reason, I had decided to build a three part quiz around SEQUENCE. The quiz was to identify the next value of the sequence. I encourage all of you to take part in this fun quiz. Guess the Next Value – Puzzle 1 Guess the Next Value – Puzzle 2 Guess the Next Value – Puzzle 3 Guess the Next Value – Puzzle 4 Simple Example to Configure Resource Governor – Introduction to Resource Governor Resource Governor is a feature which can manage SQL Server Workload and System Resource Consumption. We can limit the amount of CPU and memory consumption by limiting /governing /throttling on the SQL Server. If there are different workloads running on SQL Server and each of the workload needs different resources or when workloads are competing for resources with each other and affecting the performance of the whole server resource governor is a very important task. Tricks to Replace SELECT * with Column Names – SQL in Sixty Seconds #017 – Video Retrieves unnecessary columns and increases network traffic When a new columns are added views needs to be refreshed manually Leads to usage of sub-optimal execution plan Uses clustered index in most of the cases instead of using optimal index It is difficult to debug SQL SERVER – Load Generator – Free Tool From CodePlex The best part of this SQL Server Load Generator is that users can run multiple simultaneous queries again SQL Server using different login account and different application name. The interface of the tool is extremely easy to use and very intuitive as well. A Puzzle – Swap Value of Column Without Case Statement Let us assume there is a single column in the table called Gender. The challenge is to write a single update statement which will flip or swap the value in the column. For example if the value in the gender column is ‘male’ swap it with ‘female’ and if the value is ‘female’ swap it with ‘male’. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SSIS: Deploying OLAP cubes using C# script tasks and AMO

- by DrJohn

As part of the continuing series on Building dynamic OLAP data marts on-the-fly, this blog entry will focus on how to automate the deployment of OLAP cubes using SQL Server Integration Services (SSIS) and Analysis Services Management Objects (AMO). OLAP cube deployment is usually done using the Analysis Services Deployment Wizard. However, this option was dismissed for a variety of reasons. Firstly, invoking external processes from SSIS is fraught with problems as (a) it is not always possible to ensure SSIS waits for the external program to terminate; (b) we cannot log the outcome properly and (c) it is not always possible to control the server's configuration to ensure the executable works correctly. Another reason for rejecting the Deployment Wizard is that it requires the 'answers' to be written into four XML files. These XML files record the three things we need to change: the name of the server, the name of the OLAP database and the connection string to the data mart. Although it would be reasonably straight forward to change the content of the XML files programmatically, this adds another set of complication and level of obscurity to the overall process. When I first investigated the possibility of using C# to deploy a cube, I was surprised to find that there are no other blog entries about the topic. I can only assume everyone else is happy with the Deployment Wizard! SSIS "forgets" assembly references If you build your script task from scratch, you will have to remember how to overcome one of the major annoyances of working with SSIS script tasks: the forgetful nature of SSIS when it comes to assembly references. Basically, you can go through the process of adding an assembly reference using the Add Reference dialog, but when you close the script window, SSIS "forgets" the assembly reference so the script will not compile. After repeating the operation several times, you will find that SSIS only remembers the assembly reference when you specifically press the Save All icon in the script window. This problem is not unique to the AMO assembly and has certainly been a "feature" since SQL Server 2005, so I am not amazed it is still present in SQL Server 2008 R2! Sample Package So let's take a look at the sample SSIS package I have provided which can be downloaded from here: DeployOlapCubeExample.zip Below is a screenshot after a successful run. Connection Managers The package has three connection managers: AsDatabaseDefinitionFile is a file connection manager pointing to the .asdatabase file you wish to deploy. Note that this can be found in the bin directory of you OLAP database project once you have clicked the "Build" button in Visual Studio TargetOlapServerCS is an Analysis Services connection manager which identifies both the deployment server and the target database name. SourceDataMart is an OLEDB connection manager pointing to the data mart which is to act as the source of data for your cube. This will be used to replace the connection string found in your .asdatabase file Once you have configured the connection managers, the sample should run and deploy your OLAP database in a few seconds. Of course, in a production environment, these connection managers would be associated with package configurations or set at runtime. When you run the sample, you should see that the script logs its activity to the output screen (see screenshot above). If you configure logging for the package, then these messages will also appear in your SSIS logging. Sample Code Walkthrough Next let's walk through the code. The first step is to parse the connection string provided by the TargetOlapServerCS connection manager and obtain the name of both the target OLAP server and also the name of the OLAP database. Note that the target database does not have to exist to be referenced in an AS connection manager, so I am using this as a convenient way to define both properties. We now connect to the server and check for the existence of the OLAP database. If it exists, we drop the database so we can re-deploy. svr.Connect(olapServerName); if (svr.Connected) { // Drop the OLAP database if it already exists Database db = svr.Databases.FindByName(olapDatabaseName); if (db != null) { db.Drop(); } // rest of script } Next we start building the XMLA command that will actually perform the deployment. Basically this is a small chuck of XML which we need to wrap around the large .asdatabase file generated by the Visual Studio build process. // Start generating the main part of the XMLA command XmlDocument xmlaCommand = new XmlDocument(); xmlaCommand.LoadXml(string.Format("<Batch Transaction='false' xmlns='http://schemas.microsoft.com/analysisservices/2003/engine'><Alter AllowCreate='true' ObjectExpansion='ExpandFull'><Object><DatabaseID>{0}</DatabaseID></Object><ObjectDefinition/></Alter></Batch>", olapDatabaseName)); Next we need to merge two XML files which we can do by simply using setting the InnerXml property of the ObjectDefinition node as follows: // load OLAP Database definition from .asdatabase file identified by connection manager XmlDocument olapCubeDef = new XmlDocument(); olapCubeDef.Load(Dts.Connections["AsDatabaseDefinitionFile"].ConnectionString); // merge the two XML files by obtain a reference to the ObjectDefinition node oaRootNode.InnerXml = olapCubeDef.InnerXml; One hurdle I had to overcome was removing detritus from the .asdabase file left by the Visual Studio build. Through an iterative process, I found I needed to remove several nodes as they caused the deployment to fail. The XMLA error message read "Cannot set read-only node: CreatedTimestamp" or similar. In comparing the XMLA generated with by the Deployment Wizard with that generated by my code, these read-only nodes were missing, so clearly I just needed to strip them out. This was easily achieved using XPath to find the relevant XML nodes, of which I show one example below: foreach (XmlNode node in rootNode.SelectNodes("//ns1:CreatedTimestamp", nsManager)) { node.ParentNode.RemoveChild(node); } Now we need to change the database name in both the ID and Name nodes using code such as: XmlNode databaseID = xmlaCommand.SelectSingleNode("//ns1:Database/ns1:ID", nsManager); if (databaseID != null) databaseID.InnerText = olapDatabaseName; Finally we need to change the connection string to point at the relevant data mart. Again this is easily achieved using XPath to search for the relevant nodes and then replace the content of the node with the new name or connection string. XmlNode connectionStringNode = xmlaCommand.SelectSingleNode("//ns1:DataSources/ns1:DataSource/ns1:ConnectionString", nsManager); if (connectionStringNode != null) { connectionStringNode.InnerText = Dts.Connections["SourceDataMart"].ConnectionString; } Finally we need to perform the deployment using the Execute XMLA command and check the returned XmlaResultCollection for errors before setting the Dts.TaskResult. XmlaResultCollection oResults = svr.Execute(xmlaCommand.InnerXml); // check for errors during deployment foreach (Microsoft.AnalysisServices.XmlaResult oResult in oResults) { foreach (Microsoft.AnalysisServices.XmlaMessage oMessage in oResult.Messages) { if ((oMessage.GetType().Name == "XmlaError")) { FireError(oMessage.Description); HadError = true; } } } If you are not familiar with XML programming, all this may all seem a bit daunting, but perceiver as the sample code is pretty short. If you would like the script to process the OLAP database, simply uncomment the lines in the vicinity of Process method. Of course, you can extend the script to perform your own custom processing and to even synchronize the database to a front-end server. Personally, I like to keep the deployment and processing separate as the code can become overly complex for support staff.If you want to know more, come see my session at the forthcoming SQLBits conference.

Read the article
SQL SERVER – SSIS Look Up Component – Cache Mode – Notes from the Field #028

- by Pinal Dave

[Notes from Pinal]: Lots of people think that SSIS is all about arranging various operations together in one logical flow. Well, the understanding is absolutely correct, but the implementation of the same is not as easy as it seems. Similarly most of the people think lookup component is just component which does look up for additional information and does not pay much attention to it. Due to the same reason they do not pay attention to the same and eventually get very bad performance. Linchpin People are database coaches and wellness experts for a data driven world. In this 28th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to write a good lookup component with Cache Mode. In SQL Server Integration Services, the lookup component is one of the most frequently used tools for data validation and completion. The lookup component is provided as a means to virtually join one set of data to another to validate and/or retrieve missing values. Properly configured, it is reliable and reasonably fast. Among the many settings available on the lookup component, one of the most critical is the cache mode. This selection will determine whether and how the distinct lookup values are cached during package execution. It is critical to know how cache modes affect the result of the lookup and the performance of the package, as choosing the wrong setting can lead to poorly performing packages, and in some cases, incorrect results. Full Cache The full cache mode setting is the default cache mode selection in the SSIS lookup transformation. Like the name implies, full cache mode will cause the lookup transformation to retrieve and store in SSIS cache the entire set of data from the specified lookup location. As a result, the data flow in which the lookup transformation resides will not start processing any data buffers until all of the rows from the lookup query have been cached in SSIS. The most commonly used cache mode is the full cache setting, and for good reason. The full cache setting has the most practical applications, and should be considered the go-to cache setting when dealing with an untested set of data. With a moderately sized set of reference data, a lookup transformation using full cache mode usually performs well. Full cache mode does not require multiple round trips to the database, since the entire reference result set is cached prior to data flow execution. There are a few potential gotchas to be aware of when using full cache mode. First, you can see some performance issues – memory pressure in particular – when using full cache mode against large sets of reference data. If the table you use for the lookup is very large (either deep or wide, or perhaps both), there’s going to be a performance cost associated with retrieving and caching all of that data. Also, keep in mind that when doing a lookup on character data, full cache mode will always do a case-sensitive (and in some cases, space-sensitive) string comparison even if your database is set to a case-insensitive collation. This is because the in-memory lookup uses a .NET string comparison (which is case- and space-sensitive) as opposed to a database string comparison (which may be case sensitive, depending on collation). There’s a relatively easy workaround in which you can use the UPPER() or LOWER() function in the pipeline data and the reference data to ensure that case differences do not impact the success of your lookup operation. Again, neither of these present a reason to avoid full cache mode, but should be used to determine whether full cache mode should be used in a given situation. Full cache mode is ideally useful when one or all of the following conditions exist: The size of the reference data set is small to moderately sized The size of the pipeline data set (the data you are comparing to the lookup table) is large, is unknown at design time, or is unpredictable Each distinct key value(s) in the pipeline data set is expected to be found multiple times in that set of data Partial Cache When using the partial cache setting, lookup values will still be cached, but only as each distinct value is encountered in the data flow. Initially, each distinct value will be retrieved individually from the specified source, and then cached. To be clear, this is a row-by-row lookup for each distinct key value(s). This is a less frequently used cache setting because it addresses a narrower set of scenarios. Because each distinct key value(s) combination requires a relational round trip to the lookup source, performance can be an issue, especially with a large pipeline data set to be compared to the lookup data set. If you have, for example, a million records from your pipeline data source, you have the potential for doing a million lookup queries against your lookup data source (depending on the number of distinct values in the key column(s)). Therefore, one has to be keenly aware of the expected row count and value distribution of the pipeline data to safely use partial cache mode. Using partial cache mode is ideally suited for the conditions below: The size of the data in the pipeline (more specifically, the number of distinct key column) is relatively small The size of the lookup data is too large to effectively store in cache The lookup source is well indexed to allow for fast retrieval of row-by-row values No Cache As you might guess, selecting no cache mode will not add any values to the lookup cache in SSIS. As a result, every single row in the pipeline data set will require a query against the lookup source. Since no data is cached, it is possible to save a small amount of overhead in SSIS memory in cases where key values are not reused. In the real world, I don’t see a lot of use of the no cache setting, but I can imagine some edge cases where it might be useful. As such, it’s critical to know your data before choosing this option. Obviously, performance will be an issue with anything other than small sets of data, as the no cache setting requires row-by-row processing of all of the data in the pipeline. I would recommend considering the no cache mode only when all of the below conditions are true: The reference data set is too large to reasonably be loaded into SSIS memory The pipeline data set is small and is not expected to grow There are expected to be very few or no duplicates of the key values(s) in the pipeline data set (i.e., there would be no benefit from caching these values) Conclusion The cache mode, an often-overlooked setting on the SSIS lookup component, represents an important design decision in your SSIS data flow. Choosing the right lookup cache mode directly impacts the fidelity of your results and the performance of package execution. Know how this selection impacts your ETL loads, and you’ll end up with more reliable, faster packages. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SSIS

Read the article
Cost Comparison Hard Disk Drive to Solid State Drive on Price per Gigabyte - dispelling a myth!

- by tonyrogerson

It is often said that Hard Disk Drive storage is significantly cheaper per GiByte than Solid State Devices – this is wholly inaccurate within the database space. People need to look at the cost of the complete solution and not just a single component part in isolation to what is really required to meet the business requirement. Buying a single Hitachi Ultrastar 600GB 3.5” SAS 15Krpm hard disk drive will cost approximately £239.60 (http://scan.co.uk, 22nd March 2012) compared to an OCZ 600GB Z-Drive R4 CM84 PCIe costing £2,316.54 (http://scan.co.uk, 22nd March 2012); I’ve not included FusionIO ioDrive because there is no public pricing available for it – something I never understand and personally when companies do this I immediately think what are they hiding, luckily in FusionIO’s case the product is proven though is expensive compared to OCZ enterprise offerings. On the face of it the single 15Krpm hard disk has a price per GB of £0.39, the SSD £3.86; this is what you will see in the press and this is what sales people will use in comparing the two technologies – do not be fooled by this bullshit people! What is the requirement? The requirement is the database will have a static size of 400GB kept static through archiving so growth and trim will balance the database size, the client requires resilience, there will be several hundred call centre staff querying the database where queries will read a small amount of data but there will be no hot spot in the data so the randomness will come across the entire 400GB of the database, estimates predict that the IOps required will be approximately 4,000IOps at peak times, because it’s a call centre system the IO latency is important and must remain below 5ms per IO. The balance between read and write is 70% read, 30% write. The requirement is now defined and we have three of the most important pieces of the puzzle – space required, estimated IOps and maximum latency per IO. Something to consider with regard SQL Server; write activity requires synchronous IO to the storage media specifically the transaction log; that means the write thread will wait until the IO is completed and hardened off until the thread can continue execution, the requirement has stated that 30% of the system activity will be write so we can expect a high amount of synchronous activity. The hardware solution needs to be defined; two possible solutions: hard disk or solid state based; the real question now is how many hard disks are required to achieve the IO throughput, the latency and resilience, ditto for the solid state. Hard Drive solution On a test on an HP DL380, P410i controller using IOMeter against a single 15Krpm 146GB SAS drive, the throughput given on a transfer size of 8KiB against a 40GiB file on a freshly formatted disk where the partition is the only partition on the disk thus the 40GiB file is on the outer edge of the drive so more sectors can be read before head movement is required: For 100% sequential IO at a queue depth of 16 with 8 worker threads 43,537 IOps at an average latency of 2.93ms (340 MiB/s), for 100% random IO at the same queue depth and worker threads 3,733 IOps at an average latency of 34.06ms (34 MiB/s). The same test was done on the same disk but the test file was 130GiB: For 100% sequential IO at a queue depth of 16 with 8 worker threads 43,537 IOps at an average latency of 2.93ms (340 MiB/s), for 100% random IO at the same queue depth and worker threads 528 IOps at an average latency of 217.49ms (4 MiB/s). From the result it is clear random performance gets worse as the disk fills up – I’m currently writing an article on short stroking which will cover this in detail. Given the work load is random in nature looking at the random performance of the single drive when only 40 GiB of the 146 GB is used gives near the IOps required but the latency is way out. Luckily I have tested 6 x 15Krpm 146GB SAS 15Krpm drives in a RAID 0 using the same test methodology, for the same test above on a 130 GiB for each drive added the performance boost is near linear, for each drive added throughput goes up by 5 MiB/sec, IOps by 700 IOps and latency reducing nearly 50% per drive added (172 ms, 94 ms, 65 ms, 47 ms, 37 ms, 30 ms). This is because the same 130GiB is spread out more as you add drives 130 / 1, 130 / 2, 130 / 3 etc. so implicit short stroking is occurring because there is less file on each drive so less head movement required. The best latency is still 30 ms but we have the IOps required now, but that’s on a 130GiB file and not the 400GiB we need. Some reality check here: a) the drive randomness is more likely to be 50/50 and not a full 100% but the above has highlighted the effect randomness has on the drive and the more a drive fills with data the worse the effect. For argument sake let us assume that for the given workload we need 8 disks to do the job, for resilience reasons we will need 16 because we need to RAID 1+0 them in order to get the throughput and the resilience, RAID 5 would degrade performance. Cost for hard drives: 16 x £239.60 = £3,833.60 For the hard drives we will need disk controllers and a separate external disk array because the likelihood is that the server itself won’t take the drives, a quick spec off DELL for a PowerVault MD1220 which gives the dual pathing with 16 disks 146GB 15Krpm 2.5” disks is priced at £7,438.00, note its probably more once we had two controller cards to sit in the server in, racking etc. Minimum cost taking the DELL quote as an example is therefore: {Cost of Hardware} / {Storage Required} £7,438.60 / 400 = £18.595 per GB £18.59 per GiB is a far cry from the £0.39 we had been told by the salesman and the myth. Yes, the storage array is composed of 16 x 146 disks in RAID 10 (therefore 8 usable) giving an effective usable storage availability of 1168GB but the actual storage requirement is only 400 and the extra disks have had to be purchased to get the IOps up. Solid State Drive solution A single card significantly exceeds the IOps and latency required, for resilience two will be required. ( £2,316.54 * 2 ) / 400 = £11.58 per GB With the SSD solution only two PCIe sockets are required, no external disk units, no additional controllers, no redundant controllers etc. Conclusion I hope by showing you an example that the myth that hard disk drives are cheaper per GiB than Solid State has now been dispelled - £11.58 per GB for SSD compared to £18.59 for Hard Disk. I’ve not even touched on the running costs, compare the costs of running 18 hard disks, that’s a lot of heat and power compared to two PCIe cards!Just a quick note: I've left a fair amount of information out due to this being a blog! If in doubt, email me :)I'll also deal with the myth that SSD's wear out at a later date as well - that's just way over done still, yes, 5 years ago, but now - no.

Read the article
SQL University: What and why of database testing

- by Mladen Prajdic

This is a post for a great idea called SQL University started by Jorge Segarra also famously known as SqlChicken on Twitter. It’s a collection of blog posts on different database related topics contributed by several smart people all over the world. So this week is mine and we’ll be talking about database testing and refactoring. In 3 posts we’ll cover: SQLU part 1 - What and why of database testing SQLU part 2 - What and why of database refactoring SQLU part 2 – Tools of the trade With that out of the way let us sharpen our pencils and get going. Why test a database The sad state of the industry today is that there is very little emphasis on testing in general. Test driven development is still a small niche of the programming world while refactoring is even smaller. The cause of this is the inability of developers to convince themselves and their managers that writing tests is beneficial. At the moment they are mostly viewed as waste of time. This is because the average person (let’s not fool ourselves, we’re all average) is unable to think about lower future costs in relation to little more current work. It’s orders of magnitude easier to know about the current costs in relation to current amount of work. That’s why programmers convince themselves testing is a waste of time. However we have to ask ourselves what tests are really about? Maybe finding bugs? No, not really. If we introduce bugs, we’re likely to write test around those bugs too. But yes we can find some bugs with tests. The main point of tests is to have reproducible repeatability in our systems. By having a code base largely covered by tests we can know with better certainty what a small code change can break in other parts of the system. By having repeatability we can make code changes with confidence, since we know we’ll see what breaks in other tests. And here comes the inability to estimate future costs. By spending just a few more hours writing those tests we’d know instantly what broke where. Imagine we fix a reported bug. We check-in the code, deploy it and the users are happy. Until we get a call 2 weeks later about a certain monthly process has stopped working. What we don’t know is that this process was developed by a long gone coworker and for some reason it relied on that same bug we’ve happily fixed. There’s no way we could’ve known that. We say OK and go in and fix the monthly process. But what we have no clue about is that there’s this ETL job that relied on data from that monthly process. Now that we’ve fixed the process it’s giving unexpected (yet correct since we fixed it) data to the ETL job. So we have to fix that too. But there’s this part of the app we coded that relies on data from that exact ETL job. And just like that we enter the “Loop of maintenance horror”. With the loop eventually comes blame. Here’s a nice tip for all developers and DBAs out there: If you make a mistake man up and admit to it. All of the above is valid for any kind of software development. Keeping this in mind the database is nothing other than just a part of the application. But a big part! One reason why testing a database is even more important than testing an application is that one database is usually accessed from multiple applications and processes. This makes it the central and vital part of the enterprise software infrastructure. Knowing all this can we really afford not to have tests? What to test in a database Now that we’ve decided we’ll dive into this testing thing we have to ask ourselves what needs to be tested? The short answer is: everything. The long answer is: read on! There are 2 main ways of doing tests: Black box and White box testing. Black box testing means we have no idea how the system internals are built and we only have access to it’s inputs and outputs. With it we test that the internal changes to the system haven’t caused the input/output behavior of the system to change. The most important thing to test here are the edge conditions. It’s where most programs break. Having good edge condition tests we can be more confident that the systems changes won’t break. White box testing has the full knowledge of the system internals. With it we test the internal system changes, different states of the application, etc… White and Black box tests should be complementary to each other as they are very much interconnected. Testing database routines includes testing stored procedures, views, user defined functions and anything you use to access the data with. Database routines are your input/output interface to the database system. They count as black box testing. We test then for 2 things: Data and schema. When testing schema we only care about the columns and the data types they’re returning. After all the schema is the contract to the out side systems. If it changes we usually have to change the applications accessing it. One helpful T-SQL command when doing schema tests is SET FMTONLY ON. It tells the SQL Server to return only empty results sets. This speeds up tests because it doesn’t return any data to the client. After we’ve validated the schema we have to test the returned data. There no other way to do this but to have expected data known before the tests executes and comparing that data to the database routine output. Testing Authentication and Authorization helps us validate who has access to the SQL Server box (Authentication) and who has access to certain database objects (Authorization). For desktop applications and windows authentication this works well. But the biggest problem here are web apps. They usually connect to the database as a single user. Please ensure that that user is not SA or an account with admin privileges. That is just bad. Load testing ensures us that our database can handle peak loads. One often overlooked tool for load testing is Microsoft’s OSTRESS tool. It’s part of RML utilities (x86, x64) for SQL Server and can help determine if our database server can handle loads like 100 simultaneous users each doing 10 requests per second. SQL Profiler can also help us here by looking at why certain queries are slow and what to do to fix them. One particular problem to think about is how to begin testing existing databases. First thing we have to do is to get to know those databases. We can’t test something when we don’t know how it works. To do this we have to talk to the users of the applications accessing the database, run SQL Profiler to see what queries are being run, use existing documentation to decipher all the object relationships, etc… The way to approach this is to choose one part of the database (say a logical grouping of tables that go together) and filter our traces accordingly. Once we’ve done that we move on to the next grouping and so on until we’ve covered the whole database. Then we move on to the next one. Database Testing is a topic that we can spent many hours discussing but let this be a nice intro to the world of database testing. See you in the next post.

Read the article
Why do we need different CPU architecture for server & mini/mainframe & mixed-core?

- by claws

Hello, I was just wondering what other CPU architectures are available other than INTEL & AMD. So, found List of CPU architectures on Wikipedia. It categorizes notable CPU architectures into following categories. Embedded CPU architectures Microcomputer CPU architectures Workstation/Server CPU architectures Mini/Mainframe CPU architectures Mixed core CPU architectures I was analyzing the purposes and have few doubts. I taking Microcomputer CPU (PC) architecture as reference and comparing others. Embedded CPU architecture: They are a completely new world. Embedded systems are small & do very specific task mostly real time & low power consuming so we do not need so many & such wide registers available in a microcomputer CPU (typical PC). In other words we do need a new small & tiny architecture. Hence new architecture & new instruction RISC. The above point also clarifies why do we need a separate operating system (RTOS). Workstation/Server CPU architectures I don't know what is a workstation. Someone clarify regarding the workstation. As of the server. It is dedicated to run a specific software (server software like httpd, mysql etc.). Even if other processes run we need to give server process priority therefore there is a need for new scheduling scheme and thus we need operating system different than general purpose one. If you have any more points for the need of server OS please mention. But I don't get why do we need a new CPU Architecture. Why cant Microcomputer CPU architecture do the job. Can someone please clarify? Mini/Mainframe CPU architectures Again I don't know what are these & what miniframes or mainframes used for? I just know they are very big and occupy complete floor. But I never read about some real world problems they are trying to solve. If any one working on one of these. Share your knowledge. Can some one clarify its purpose & why is it that microcomputer CPU archicture not suitable for it? Is there a new kind of operating system for this too? Why? Mixed core CPU architectures Never heard of these. If possible please keep your answer in this format: XYZ CPU architectures Purpose of XYZ Need for a new architecture. why can't current microcomputer CPU architecture work? They go upto 3GHZ & have upto 8 cores. Need for a new Operating System Why do we need a new kind of operating system for this kind of archictures?

Read the article
HTG Explains: Should You Build Your Own PC?

- by Chris Hoffman

There was a time when every geek seemed to build their own PC. While the masses bought eMachines and Compaqs, geeks built their own more powerful and reliable desktop machines for cheaper. But does this still make sense? Building your own PC still offers as much flexibility in component choice as it ever did, but prebuilt computers are available at extremely competitive prices. Building your own PC will no longer save you money in most cases. The Rise of Laptops It’s impossible to look at the decline of geeks building their own PCs without considering the rise of laptops. There was a time when everyone seemed to use desktops — laptops were more expensive and significantly slower in day-to-day tasks. With the diminishing importance of computing power — nearly every modern computer has more than enough power to surf the web and use typical programs like Microsoft Office without any trouble — and the rise of laptop availability at nearly every price point, most people are buying laptops instead of desktops. And, if you’re buying a laptop, you can’t really build your own. You can’t just buy a laptop case and start plugging components into it — even if you could, you would end up with an extremely bulky device. Ultimately, to consider building your own desktop PC, you have to actually want a desktop PC. Most people are better served by laptops. Benefits to PC Building The two main reasons to build your own PC have been component choice and saving money. Building your own PC allows you to choose all the specific components you want rather than have them chosen for you. You get to choose everything, including the PC’s case and cooling system. Want a huge case with room for a fancy water-cooling system? You probably want to build your own PC. In the past, this often allowed you to save money — you could get better deals by buying the components yourself and combining them, avoiding the PC manufacturer markup. You’d often even end up with better components — you could pick up a more powerful CPU that was easier to overclock and choose more reliable components so you wouldn’t have to put up with an unstable eMachine that crashed every day. PCs you build yourself are also likely more upgradable — a prebuilt PC may have a sealed case and be constructed in such a way to discourage you from tampering with the insides, while swapping components in and out is generally easier with a computer you’ve built on your own. If you want to upgrade your CPU or replace your graphics card, it’s a definite benefit. Downsides to Building Your Own PC It’s important to remember there are downsides to building your own PC, too. For one thing, it’s just more work — sure, if you know what you’re doing, building your own PC isn’t that hard. Even for a geek, researching the best components, price-matching, waiting for them all to arrive, and building the PC just takes longer. Warranty is a more pernicious problem. If you buy a prebuilt PC and it starts malfunctioning, you can contact the computer’s manufacturer and have them deal with it. You don’t need to worry about what’s wrong. If you build your own PC and it starts malfunctioning, you have to diagnose the problem yourself. What’s malfunctioning, the motherboard, CPU, RAM, graphics card, or power supply? Each component has a separate warranty through its manufacturer, so you’ll have to determine which component is malfunctioning before you can send it off for replacement. Should You Still Build Your Own PC? Let’s say you do want a desktop and are willing to consider building your own PC. First, bear in mind that PC manufacturers are buying in bulk and getting a better deal on each component. They also have to pay much less for a Windows license than the $120 or so it would cost you to to buy your own Windows license. This is all going to wipe out the cost savings you’ll see — with everything all told, you’ll probably spend more money building your own average desktop PC than you would picking one up from Amazon or the local electronics store. If you’re an average PC user that uses your desktop for the typical things, there’s no money to be saved from building your own PC. But maybe you’re looking for something higher end. Perhaps you want a high-end gaming PC with the fastest graphics card and CPU available. Perhaps you want to pick out each individual component and choose the exact components for your gaming rig. In this case, building your own PC may be a good option. As you start to look at more expensive, high-end PCs, you may start to see a price gap — but you may not. Let’s say you wanted to blow thousands of dollars on a gaming PC. If you’re looking at spending this kind of money, it would be worth comparing the cost of individual components versus a prebuilt gaming system. Still, the actual prices may surprise you. For example, if you wanted to upgrade Dell’s $2293 Alienware Aurora to include a second NVIDIA GeForce GTX 780 graphics card, you’d pay an additional $600 on Alienware’s website. The same graphics card costs $650 on Amazon or Newegg, so you’d be spending more money building the system yourself. Why? Dell’s Alienware gets bulk discounts you can’t get — and this is Alienware, which was once regarded as selling ridiculously overpriced gaming PCs to people who wouldn’t build their own. Building your own PC still allows you to get the most freedom when choosing and combining components, but this is only valuable to a small niche of gamers and professional users — most people, even average gamers, would be fine going with a prebuilt system. If you’re an average person or even an average gamer, you’ll likely find that it’s cheaper to purchase a prebuilt PC rather than assemble your own. Even at the very high end, components may be more expensive separately than they are in a prebuilt PC. Enthusiasts who want to choose all the individual components for their dream gaming PC and want maximum flexibility may want to build their own PCs. Even then, building your own PC these days is more about flexibility and component choice than it is about saving money. In summary, you probably shouldn’t build your own PC. If you’re an enthusiast, you may want to — but only a small minority of people would actually benefit from building their own systems. Feel free to compare prices, but you may be surprised which is cheaper. Image Credit: Richard Jones on Flickr, elPadawan on Flickr, Richard Jones on Flickr

Read the article
Waterfall Model (SDLC) vs. Prototyping Model

The characters in the fable of the Tortoise and the Hare can easily be used to demonstrate the similarities and differences between the Waterfall and Prototyping software development models. This children fable is about a race between a consistently slow moving but steadfast turtle and an extremely fast but unreliable rabbit. After closely comparing each character’s attributes in correlation with both software development models, a trend seems to appear in that the Waterfall closely resembles the Tortoise in that Waterfall Model is typically a slow moving process that is broken up in to multiple sequential steps that must be executed in a standard linear pattern. The Tortoise can be quoted several times in the story saying “Slow and steady wins the race.” This is the perfect mantra for the Waterfall Model in that this model is seen as a cumbersome and slow moving. Waterfall Model Phases Requirement Analysis & Definition This phase focuses on defining requirements for a project that is to be developed and determining if the project is even feasible. Requirements are collected by analyzing existing systems and functionality in correlation with the needs of the business and the desires of the end users. The desired output for this phase is a list of specific requirements from the business that are to be designed and implemented in the subsequent steps. In addition this phase is used to determine if any value will be gained by completing the project. System Design This phase focuses primarily on the actual architectural design of a system, and how it will interact within itself and with other existing applications. Projects at this level should be viewed at a high level so that actual implementation details are decided in the implementation phase. However major environmental decision like hardware and platform decision are typically decided in this phase. Furthermore the basic goal of this phase is to design an application at the system level in those classes, interfaces, and interactions are defined. Additionally decisions about scalability, distribution and reliability should also be considered for all decisions. The desired output for this phase is a functional design document that states all of the architectural decisions that have been made in regards to the project as well as a diagrams like a sequence and class diagrams. Software Design This phase focuses primarily on the refining of the decisions found in the functional design document. Classes and interfaces are further broken down in to logical modules based on the interfaces and interactions previously indicated. The output of this phase is a formal design document. Implementation / Coding This phase focuses primarily on implementing the previously defined modules in to units of code. These units are developed independently are intergraded as the system is put together as part of a whole system. Software Integration & Verification This phase primarily focuses on testing each of the units of code developed as well as testing the system as a whole. There are basic types of testing at this phase and they include: Unit Test and Integration Test. Unit Test are built to test the functionality of a code unit to ensure that it preforms its desired task. Integration testing test the system as a whole because it focuses on results of combining specific units of code and validating it against expected results. The output of this phase is a test plan that includes test with expected results and actual results. System Verification This phase primarily focuses on testing the system as a whole in regards to the list of project requirements and desired operating environment. Operation & Maintenance his phase primarily focuses on handing off the competed project over to the customer so that they can verify that all of their requirements have been met based on their original requirements. This phase will also validate the correctness of their requirements and if any changed need to be made. In addition, any problems not resolved in the previous phase will be handled in this section. The Waterfall Model’s linear and sequential methodology does offer a project certain advantages and disadvantages. Advantages of the Waterfall Model Simplistic to implement and execute for projects and/or company wide Limited demand on resources Large emphasis on documentation Disadvantages of the Waterfall Model Completed phases cannot be revisited regardless if issues arise within a project Accurate requirement are never gather prior to the completion of the requirement phase due to the lack of clarification in regards to client’s desires. Small changes or errors that arise in applications may cause additional problems The client cannot change any requirements once the requirements phase has been completed leaving them no options for changes as they see their requirements changes as the customers desires change. Excess documentation Phases are cumbersome and slow moving Learn more about the Major Process in the Sofware Development Life Cycle and Waterfall Model. Conversely, the Hare shares similar traits with the prototyping software development model in that ideas are rapidly converted to basic working examples and subsequent changes are made to quickly align the project with customers desires as they are formulated and as software strays from the customers vision. The basic concept of prototyping is to eliminate the use of well-defined project requirements. Projects are allowed to grow as the customer needs and request grow. Projects are initially designed according to basic requirements and are refined as requirement become more refined. This process allows customer to feel their way around the application to ensure that they are developing exactly what they want in the application This model also works well for determining the feasibility of certain approaches in regards to an application. Prototypes allow for quickly developing examples of implementing specific functionality based on certain techniques. Advantages of Prototyping Active participation from users and customers Allows customers to change their mind in specifying requirements Customers get a better understanding of the system as it is developed Earlier bug/error detection Promotes communication with customers Prototype could be used as final production Reduced time needed to develop applications compared to the Waterfall method Disadvantages of Prototyping Promotes constantly redefining project requirements that cause major system rewrites Potential for increased complexity of a system as scope of the system expands Customer could believe the prototype as the working version. Implementation compromises could increase the complexity when applying updates and or application fixes When companies trying to decide between the Waterfall model and Prototype model they need to evaluate the benefits and disadvantages for both models. Typically smaller companies or projects that have major time constraints typically head for more of a Prototype model approach because it can reduce the time needed to complete the project because there is more of a focus on building a project and less on defining requirements and scope prior to the start of a project. On the other hand, Companies with well-defined requirements and time allowed to generate proper documentation should steer towards more of a waterfall model because they are in a position to obtain clarified requirements and have to design and optimal solution prior to the start of coding on a project.

Read the article
DRY and SRP

- by Timothy Klenke

Originally posted on: http://geekswithblogs.net/TimothyK/archive/2014/06/11/dry-and-srp.aspxKent Beck’s XP Simplicity Rules (aka Four Rules of Simple Design) are a prioritized list of rules that when applied to your code generally yield a great design. As you’ll see from the above link the list has slightly evolved over time. I find today they are usually listed as: All Tests Pass Don’t Repeat Yourself (DRY) Express Intent Minimalistic These are prioritized. If your code doesn’t work (rule 1) then everything else is forfeit. Go back to rule one and get the code working before worrying about anything else. Over the years the community have debated whether the priority of rules 2 and 3 should be reversed. Some say a little duplication in the code is OK as long as it helps express intent. I’ve debated it myself. This recent post got me thinking about this again, hence this post. I don’t think it is fair to compare “Expressing Intent” against “DRY”. This is a comparison of apples to oranges. “Expressing Intent” is a principal of code quality. “Repeating Yourself” is a code smell. A code smell is merely an indicator that there might be something wrong with the code. It takes further investigation to determine if a violation of an underlying principal of code quality has actually occurred. For example “using nouns for method names”, “using verbs for property names”, or “using Booleans for parameters” are all code smells that indicate that code probably isn’t doing a good job at expressing intent. They are usually very good indicators. But what principle is the code smell of Duplication pointing to and how good of an indicator is it? Duplication in the code base is bad for a couple reasons. If you need to make a change and that needs to be made in a number of locations it is difficult to know if you have caught all of them. This can lead to bugs if/when one of those locations is overlooked. By refactoring the code to remove all duplication there will be left with only one place to change, thereby eliminating this problem. With most projects the code becomes the single source of truth for a project. If a production code base is inconsistent with a five year old requirements or design document the production code that people are currently living with is usually declared as the current reality (or truth). Requirement or design documents at this age in a project life cycle are usually of little value. Although comparing production code to external documentation is usually straight forward, duplication within the code base muddles this declaration of truth. When code is duplicated small discrepancies will creep in between the two copies over time. The question then becomes which copy is correct? As different factions debate how the software should work, trust in the software and the team behind it erodes. The code smell of Duplication points to a violation of the “Single Source of Truth” principle. Let me define that as: A stakeholder’s requirement for a software change should never cause more than one class to change. Violation of the Single Source of Truth principle will always result in duplication in the code. However, the inverse is not always true. Duplication in the code does not necessarily indicate that there is a violation of the Single Source of Truth principle. To illustrate this, let’s look at a retail system where the system will (1) send a transaction to a bank and (2) print a receipt for the customer. Although these are two separate features of the system, they are closely related. The reason for printing the receipt is usually to provide an audit trail back to the bank transaction. Both features use the same data: amount charged, account number, transaction date, customer name, retail store name, and etcetera. Because both features use much of the same data, there is likely to be a lot of duplication between them. This duplication can be removed by making both features use the same data access layer. Then start coming the divergent requirements. The receipt stakeholder wants a change so that the account number has the last few digits masked out to protect the customer’s privacy. That can be solve with a small IF statement whilst still eliminating all duplication in the system. Then the bank wants to take a picture of the customer as well as capture their signature and/or PIN number for enhanced security. Then the receipt owner wants to pull data from a completely different system to report the customer’s loyalty program point total. After a while you realize that the two stakeholders have somewhat similar, but ultimately different responsibilities. They have their own reasons for pulling the data access layer in different directions. Then it dawns on you, the Single Responsibility Principle: There should never be more than one reason for a class to change. In this example we have two stakeholders giving two separate reasons for the data access class to change. It is clear violation of the Single Responsibility Principle. That’s a problem because it can often lead the project owner pitting the two stakeholders against each other in a vein attempt to get them to work out a mutual single source of truth. But that doesn’t exist. There are two completely valid truths that the developers need to support. How is this to be supported and honour the Single Responsibility Principle? The solution is to duplicate the data access layer and let each stakeholder control their own copy. The Single Source of Truth and Single Responsibility Principles are very closely related. SST tells you when to remove duplication; SRP tells you when to introduce it. They may seem to be fighting each other, but really they are not. The key is to clearly identify the different responsibilities (or sources of truth) over a system. Sometimes there is a single person with that responsibility, other times there are many. This can be especially difficult if the same person has dual responsibilities. They might not even realize they are wearing multiple hats. In my opinion Single Source of Truth should be listed as the second rule of simple design with Express Intent at number three. Investigation of the DRY code smell should yield to the proper application SST, without violating SRP. When necessary leave duplication in the system and let the class names express the different people that are responsible for controlling them. Knowing all the people with responsibilities over a system is the higher priority because you’ll need to know this before you can express it. Although it may be a code smell when there is duplication in the code, it does not necessarily mean that the coder has chosen to be expressive over DRY or that the code is bad.

Read the article
RIF PRD: Presentation syntax issues

- by Charles Young

Over Christmas I got to play a bit with the W3C RIF PRD and came across a few issues which I thought I would record for posterity. Specifically, I was working on a grammar for the presentation syntax using a GLR grammar parser tool (I was using the current CTP of ‘M’ (MGrammer) and Intellipad – I do so hope the MS guys don’t kill off M and Intellipad now they have dropped the other parts of SQL Server Modelling). I realise that the presentation syntax is non-normative and that any issues with it do not therefore compromise the standard. However, presentation syntax is useful in its own right, and it would be great to iron out any issues in a future revision of the standard. The main issues are actually not to do with the grammar at all, but rather with the ‘running example’ in the RIF PRD recommendation. I started with the code provided in Example 9.1. There are several discrepancies when compared with the EBNF rules documented in the standard. Broadly the problems can be categorised as follows: · Parenthesis mismatch – the wrong number of parentheses are used in various places. For example, in GoldRule, the RHS of the rule (the ‘Then’) is nested in the LHS (‘the If’). In NewCustomerAndWidgetRule, the RHS is orphaned from the LHS. Together with additional incorrect parenthesis, this leads to orphanage of UnknownStatusRule from the entire Document. · Invalid use of parenthesis in ‘Forall’ constructs. Parenthesis should not be used to enclose formulae. Removal of the invalid parenthesis gave me a feeling of inconsistency when comparing formulae in Forall to formulae in If. The use of parenthesis is not actually inconsistent in these two context, but in an If construct it ‘feels’ as if you are enclosing formulae in parenthesis in a LISP-like fashion. In reality, the parenthesis is simply being used to group subordinate syntax elements. The fact that an If construct can contain only a single formula as an immediate child adds to this feeling of inconsistency. · Invalid representation of compact URIs (CURIEs) in the context of Frame productions. In several places the URIs are not qualified with a namespace prefix (‘ex1:’). This conflicts with the definition of CURIEs in the RIF Datatypes and Built-Ins 1.0 document. Here are the productions: CURIE ::= PNAME_LN | PNAME_NS PNAME_LN ::= PNAME_NS PN_LOCAL PNAME_NS ::= PN_PREFIX? ':' PN_LOCAL ::= ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* PN_CHARS)? PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040] PN_CHARS_U ::= PN_CHARS_BASE | '_' PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] PN_PREFIX ::= PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)? The more I look at CURIEs, the more my head hurts! The RIF specification allows prefixes and colons without local names, which surprised me. However, the CURIE Syntax 1.0 working group note specifically states that this form is supported…and then promptly provides a syntactic definition that seems to preclude it! However, on (much) deeper inspection, it appears that ‘ex1:’ (for example) is allowed, but would really represent a ‘fragment’ of the ‘reference’, rather than a prefix! Ouch! This is so completely ambiguous that it surely calls into question the whole CURIE specification. In any case, RIF does not allow local names without a prefix. · Missing ‘External’ specifiers for built-in functions and predicates. The EBNF specification enforces this for terms within frames, but does not appear to enforce (what I believe is) the correct use of External on built-in predicates. In any case, the running example only specifies ‘External’ once on the predicate in UnknownStatusRule. External() is required in several other places. · The List used on the LHS of UnknownStatusRule is comma-delimited. This is not supported by the EBNF definition. Similarly, the argument list of pred:list-contains is illegally comma-delimited. · Unnecessary use of conjunction around a single formula in DiscountRule. This is strictly legal in the EBNF, but redundant. All the above issues concern the presentation syntax used in the running example. There are a few minor issues with the grammar itself. Note that Michael Kiefer stated in his paper “Rule Interchange Format: The Framework” that: “The presentation syntax of RIF … is an abstract syntax and, as such, it omits certain details that might be important for unambiguous parsing.” · The grammar cannot differentiate unambiguously between strategies and priorities on groups. A processor is forced to resolve this by detecting the use of IRIs and integers. This could easily be fixed in the grammar. · The grammar cannot unambiguously parse the ‘->’ operator in frames. Specifically, ‘-’ characters are allowed in PN_LOCAL names and hence a parser cannot determine if ‘status->’ is (‘status’ ‘->’) or (‘status-’ ‘>’). One way to fix this is to amend the PN_LOCAL production as follows: PN_LOCAL ::= ( PN_CHARS_U | [0-9] ) ((PN_CHARS|'.')* ((PN_CHARS)-('-')))? However, unilaterally changing the definition of this production, which is defined in the SPARQL Query Language for RDF specification, makes me uncomfortable. · I assume that the presentation syntax is case-sensitive. I couldn’t find this stated anywhere in the documentation, but function/predicate names do appear to be documented as being case-sensitive. · The EBNF does not specify whitespace handling. A couple of productions (RULE and ACTION_BLOCK) are crafted to enforce the use of whitespace. This is not necessary. It seems inconsistent with the rest of the specification and can cause parsing issues. In addition, the Const production exhibits whitespaces issues. The intention may have been to disallow the use of whitespace around ‘^^’, but any direct implementation of the EBNF will probably allow whitespace between ‘^^’ and the SYMSPACE. Of course, I am being a little nit-picking about all this. On the whole, the EBNF translated very smoothly and directly to ‘M’ (MGrammar) and proved to be fairly complete. I have encountered far worse issues when translating other EBNF specifications into usable grammars. I can’t imagine there would be any difficulty in implementing the same grammar in Antlr, COCO/R, gppg, XText, Bison, etc. A general observation, which repeats a point made above, is that the use of parenthesis in the presentation syntax can feel inconsistent and un-intuitive. It isn’t actually inconsistent, but I think the presentation syntax could be improved by adopting braces, rather than parenthesis, to delimit subordinate syntax elements in a similar way to so many programming languages. The familiarity of braces would communicate the structure of the syntax more clearly to people like me. If braces were adopted, parentheses could be retained around ‘var (frame | ‘new()’) constructs in action blocks. This use of parenthesis feels very LISP-like, and I think that this is my issue. It’s as if the presentation syntax represents the deformed love-child of LISP and C. In some places (specifically, action blocks), parenthesis is used in a LISP-like fashion. In other places it is used like braces in C. I find this quite confusing. Here is a corrected version of the running example (Example 9.1) in compliant presentation syntax: Document( Prefix( ex1 <http://example.com/2009/prd2> ) (* ex1:CheckoutRuleset *) Group rif:forwardChaining ( (* ex1:GoldRule *) Group 10 ( Forall ?customer such that And(?customer # ex1:Customer ?customer[ex1:status->"Silver"]) (Forall ?shoppingCart such that ?customer[ex1:shoppingCart->?shoppingCart] (If Exists ?value (And(?shoppingCart[ex1:value->?value] External(pred:numeric-greater-than-or-equal(?value 2000)))) Then Do(Modify(?customer[ex1:status->"Gold"]))))) (* ex1:DiscountRule *) Group ( Forall ?customer such that ?customer # ex1:Customer (If Or( ?customer[ex1:status->"Silver"] ?customer[ex1:status->"Gold"]) Then Do ((?s ?customer[ex1:shoppingCart-> ?s]) (?v ?s[ex1:value->?v]) Modify(?s [ex1:value->External(func:numeric-multiply (?v 0.95))])))) (* ex1:NewCustomerAndWidgetRule *) Group ( Forall ?customer such that And(?customer # ex1:Customer ?customer[ex1:status->"New"] ) (If Exists ?shoppingCart ?item (And(?customer[ex1:shoppingCart->?shoppingCart] ?shoppingCart[ex1:containsItem->?item] ?item # ex1:Widget ) ) Then Do( (?s ?customer[ex1:shoppingCart->?s]) (?val ?s[ex1:value->?val]) (?voucher ?customer[ex1:voucher->?voucher]) Retract(?customer[ex1:voucher->?voucher]) Retract(?voucher) Modify(?s[ex1:value->External(func:numeric-multiply(?val 0.90))])))) (* ex1:UnknownStatusRule *) Group ( Forall ?customer such that ?customer # ex1:Customer (If Not(Exists ?status (And(?customer[ex1:status->?status] External(pred:list-contains(List("New" "Bronze" "Silver" "Gold") ?status)) ))) Then Do( Execute(act:print(External(func:concat("New customer: " ?customer)))) Assert(?customer[ex1:status->"New"])))) ) ) I hope that helps someone out there :-)

Read the article
General Policies and Procedures for Maintaining the Value of Data Assets

Here is a general list for policies and procedures regarding maintaining the value of data assets. Data Backup Policies and Procedures Backups are very important when dealing with data because there is always the chance of losing data due to faulty hardware or a user activity. So the need for a strategic backup system should be mandatory for all companies. This being said, in the real world some companies that I have worked for do not really have a good data backup plan. Typically when companies tend to take this kind of approach in data backups usually the data is not really recoverable. Unfortunately when companies do not regularly test their backup plans they get a false sense of security because they think that they are covered. However, I can tell you from personal and professional experience that a backup plan/system is never fully implemented until it is regularly tested prior to the time when it actually needs to be used. Disaster Recovery Plan Expanding on Backup Policies and Procedures, a company needs to also have a disaster recovery plan in order to protect its data in case of a catastrophic disaster. Disaster recovery plans typically encompass how to restore all of a company’s data and infrastructure back to a restored operational status. Most Disaster recovery plans also include time estimates on how long each step of the disaster recovery plan should take to be executed. It is important to note that disaster recovery plans are never fully implemented until they have been tested just like backup plans. Disaster recovery plans should be tested regularly so that the business can be confident in not losing any or minimal data due to a catastrophic disaster. Firewall Policies and Content Filters One way companies can protect their data is by using a firewall to separate their internal network from the outside. Firewalls allow for enabling or disabling network access as data passes through it by applying various defined restrictions. Furthermore firewalls can also be used to prevent access from the internal network to the outside by these same factors. Common Firewall Restrictions Destination/Sender IP Address Destination/Sender Host Names Domain Names Network Ports Companies can also desire to restrict what their network user’s view on the internet through things like content filters. Content filters allow a company to track what webpages a person has accessed and can also restrict user’s access based on established rules set up in the content filter. This device and/or software can block access to domains or specific URLs based on a few factors. Common Content Filter Criteria Known malicious sites Specific Page Content Page Content Theme Anti-Virus/Mal-ware Polices Fortunately, most companies utilize antivirus programs on all computers and servers for good reason, virus have been known to do the following: Corrupt/Invalidate Data, Destroy Data, and Steal Data. Anti-Virus applications are a great way to prevent any malicious application from being able to gain access to a company’s data. However, anti-virus programs must be constantly updated because new viruses are always being created, and the anti-virus vendors need to distribute updates to their applications so that they can catch and remove them. Data Validation Policies and Procedures Data validation is very important to ensure that only accurate information is stored. The existence of invalid data can cause major problems when businesses attempt to use data for knowledge based decisions and for performance reporting. Data Scrubbing Policies and Procedures Data scrubbing is valuable to companies in one of two ways. The first can be used to clean data prior to being analyzed for report generation. The second is that it allows companies to remove things like personally Identifiable information from its data prior to transmit it between multiple environments or if the information is sent to an external location. An example of this can be seen with medical records in regards to HIPPA laws that prohibit the storage of specific personal and medical information. Additionally, I have professionally run in to a scenario where the Canadian government does not allow any Canadian’s personal information to be stored on a server not located in Canada. Encryption Practices The use of encryption is very valuable when a company needs to any personal information. This allows users with the appropriated access levels to view or confirm the existence or accuracy of data within a system by either decrypting the information or encrypting a piece of data and comparing it to the stored version. Additionally, if for some unforeseen reason the data got in to the wrong hands then they would have to first decrypt the data before they could even be able to read it. Encryption just adds and additional layer of protection around data itself. Standard Normalization Practices The use of standard data normalization practices is very important when dealing with data because it can prevent allot of potential issues by eliminating the potential for unnecessary data duplication. Issues caused by data duplication include excess use of data storage, increased chance for invalidated data, and over use of data processing. Network and Database Security/Access Policies Every company has some form of network/data access policy even if they have none. These policies help secure data from being seen by inappropriate users along with preventing the data from being updated or deleted by users. In addition, without a good security policy there is a large potential for data to be corrupted by unassuming users or even stolen. Data Storage Policies Data storage polices are very important depending on how they are implemented especially when a company is trying to utilize them in conjunction with other policies like Data Backups. I have worked at companies where all network user folders are constantly backed up, and if a user wanted to ensure the existence of a piece of data in the form of a file then they had to store that file in their network folder. Conversely, I have also worked in places where when a user logs on or off of the network there entire user profile is backed up. Training Policies One of the biggest ways to prevent data loss and ensure that data will remain a company asset is through training. The practice of properly train employees on how to work with in systems that access data is crucial when trying to ensure a company’s data will remain an asset. Users need to be trained on how to manipulate a company’s data in order to perform their tasks to reduce the chances of invalidating data.

Read the article
Execution plan warnings–The final chapter

- by Dave Ballantyne

In my previous posts (here and here), I showed examples of some of the execution plan warnings that have been added to SQL Server 2012. There is one other warning that is of interest to me : “Unmatched Indexes”. Firstly, how do I know this is the final one ? The plan is an XML document, right ? So that means that it can have an accompanying XSD. As an XSD is a schema definition, we can poke around inside it to find interesting things that *could* be in the final XML file. The showplan schema is stored in the folder Microsoft SQL Server\110\Tools\Binn\schemas\sqlserver\2004\07\showplan and by comparing schemas over releases you can get a really good idea of any new functionality that has been added. Here is the section of the Sql Server 2012 showplan schema that has been interesting me so far : <xsd:complexType name="AffectingConvertWarningType"> <xsd:annotation> <xsd:documentation>Warning information for plan-affecting type conversion</xsd:documentation> </xsd:annotation> <xsd:sequence>  </xsd:sequence> <xsd:attribute name="ConvertIssue" use="required"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Cardinality Estimate" /> <xsd:enumeration value="Seek Plan" />  </xsd:restriction> </xsd:simpleType> </xsd:attribute> <xsd:attribute name="Expression" type ="xsd:string" use="required" /></xsd:complexType><xsd:complexType name="WarningsType"> <xsd:annotation> <xsd:documentation>List of all possible iterator or query specific warnings (e.g. hash spilling, no join predicate)</xsd:documentation> </xsd:annotation> <xsd:choice minOccurs="1" maxOccurs="unbounded"> <xsd:element name="ColumnsWithNoStatistics" type="shp:ColumnReferenceListType" minOccurs="0" maxOccurs="1" /> <xsd:element name="SpillToTempDb" type="shp:SpillToTempDbType" minOccurs="0" maxOccurs="unbounded" /> <xsd:element name="Wait" type="shp:WaitWarningType" minOccurs="0" maxOccurs="unbounded" /> <xsd:element name="PlanAffectingConvert" type="shp:AffectingConvertWarningType" minOccurs="0" maxOccurs="unbounded" /> </xsd:choice> <xsd:attribute name="NoJoinPredicate" type="xsd:boolean" use="optional" /> <xsd:attribute name="SpatialGuess" type="xsd:boolean" use="optional" /> <xsd:attribute name="UnmatchedIndexes" type="xsd:boolean" use="optional" /> <xsd:attribute name="FullUpdateForOnlineIndexBuild" type="xsd:boolean" use="optional" /></xsd:complexType> I especially like the “to be extended here” comment, high hopes that we will see more of these in the future. So “Unmatched Indexes” was a warning that I couldn’t get and many thanks must go to Fabiano Amorim (b|t) for showing me the way. Filtered indexes were introduced in Sql Server 2008 and are really useful if you only need to index only a portion of the data within a table. However, if your SQL code uses a variable as a predicate on the filtered data that matches the filtered condition, then the filtered index cannot be used as, naturally, the value in the variable may ( and probably will ) change and therefore will need to read data outside the index. As an aside, you could use option(recompile) here , in which case the optimizer will build a plan specific to the variable values and use the filtered index, but that can bring about other problems. To demonstrate this warning, we need to generate some test data : DROP TABLE #TestTab1GOCREATE TABLE #TestTab1 (Col1 Int not null, Col2 Char(7500) not null, Quantity Int not null)GOINSERT INTO #TestTab1 VALUES (1,1,1),(1,2,5),(1,2,10),(1,3,20), (2,1,101),(2,2,105),(2,2,110),(2,3,120)GO and then add a filtered index CREATE INDEX ixFilter ON #TestTab1 (Col1)WHERE Quantity = 122 Now if we execute SELECT COUNT(*) FROM #TestTab1 WHERE Quantity = 122 We will see the filtered index being scanned But if we parameterize the query DECLARE @i INT = 122SELECT COUNT(*) FROM #TestTab1 WHERE Quantity = @i The plan is very different a table scan, as the value of the variable used in the predicate can change at run time, and also we see the familiar warning triangle. If we now look at the properties pane, we will see two pieces of information “Warnings” and “UnmatchedIndexes”. So, handily, we are being told which filtered index is not being used due to parameterization.

Read the article
The True Cost of a Solution

- by D'Arcy Lussier

I had a Twitter chat recently with someone suggesting Oracle and SQL Server were losing out to OSS (Open Source Software) in the enterprise due to their issues with scaling or being too generic (one size fits all). I challenged that a bit, as my experience with enterprise sized clients has been different – adverse to OSS but receptive to an established vendor. The response I got was: Found it easier to influence change by showing how X can’t solve our problems or X is extremely costly to scale. Money talks. I think this is definitely the right approach for anyone pitching an alternate or alien technology as part of a solution: identify the issue, identify the solution, then present pros and cons including a cost/benefit analysis. What can happen though is we get tunnel vision and don’t present a full view of the costs associated with a solution. An “Acura”te Example (I’m so clever…) This is my dream vehicle, a Crystal Black Pearl coloured Acura MDX with the SH-AWD package! We’re a family of 4 (5 if my daughters ever get their wish of adding a dog), and I’ve always wanted a luxury type of vehicle, so this is a perfect replacement in a few years when our Rav 4 has hit the 8 – 10 year mark. MSRP – $62,890 But as we all know, that’s not *really* the cost of the vehicle. There’s taxes and fees added on, there’s the extended warranty if I choose to purchase it, there’s the finance rate that needs to be factored in… MSRP – $62,890 Taxes – $7,546 Warranty - $2,500 SubTotal – $72,936 Finance Charge – $ 1094.04 Grand Total – $74,030 Well! Glad we did that exercise – we discovered an extra $11k added on to the MSRP! Well now we have our true price…or do we? Lifetime of the Vehicle I’m expecting to have this vehicle for 7 – 10 years. While the hard cost of the vehicle is known and dealt with, the costs to run and maintain the vehicle are on top of this. I did some research, and here’s what I’ve found: Fuel and Mileage Gas prices are high as it is for regular fuel, but getting into an MDX will require that I *only* purchase premium fuel, which comes at a premium price. I need to expect my bill at the pump to be higher. Comparing the MDX to my 2007 Rav4 also shows I’ll be gassing up more often. The Rav4 has a city MPG of 21, while the MDX plummets to 16! The MDX does have a bigger fuel tank though, so all in all the number of times I hit the pumps might even out. Still, I estimate I’ll be spending approximately $8000 – $10000 more on gas over a 10 year period than my current Rav4. Service Options Limited Although I have options with my Toyota here in Winnipeg (we have 4 Toyota dealerships), I do go to my original dealer for any service work. Still, I like the fact that I have options. However, there’s only one Acura dealership in all of Winnipeg! So if, for whatever reason, I’m not satisfied with the level of service I’m stuck. Non Warranty Service Work Also let’s not forget that there’s a bulk of work required every year that is *not* covered under warranty – oil changes, tire rotations, brake pads, etc. I expect I’ll need to get new tires at the 5 years mark as well, which can easily be $1200 – $1500 (I just paid $1000 for new tires for the Rav4 and we’re at the 5 year mark). Now these aren’t going to be *new* costs that I’m not used to from our existing vehicles, but they should still be factored in. I’d budget $500/year, or $5000 over the 10 years I’ll own the vehicle. Final Assessment So let’s re-assess the true cost of my dream MDX: MSRP $62,890 Taxes $7,546 Warranty $2,500 Finance Charge $1094 Gas $10,000 Service Work $5000 Grand Total $89,030 So now I have a better idea of 10 year cost overall, and I’ve identified some concerns with local service availability. And there’s now much more to consider over the original $62,890 price tag. Tying This Back to Technology Solutions The process that we just went through is no different than what organizations do when considering implementing a new system, technology, or technology based solution, within their environments. It’s easy to tout the short term cost savings of particular product/platform/technology in a vacuum. But its when you consider the wider impact that the true cost comes into play. Let’s create a scenario: A company is not happy with its current data reporting suite. An employee suggests moving to an open source solution. The selling points are: - Because its open source its free - The organization would have access to the source code so they could alter it however they wished - It provided features not available with the current reporting suite At first this sounds great to the management and executive, but then they start asking some questions and uncover more information: - The OSS product is built on a technology not used anywhere within the organization - There are no vendors offering product support for the OSS product - The OSS product requires a specific server platform to operate on, one that’s not standard in the organization All of a sudden, the true cost of implementing this solution is starting to become clearer. The company might save money on licensing costs, but their training costs would increase significantly – developers would need to learn how to develop in the technology the OSS solution was built on, IT staff must learn how to set up and maintain a new server platform within their existing infrastructure, and if a problem was found there was no vendor to contact for support. The true cost of implementing a “free” OSS solution is actually spinning up a project to implement it within the organization – no small cost. And that’s just the short-term cost. Now the organization must ensure they maintain trained staff who can make changes to the OSS reporting solution and IT staff that will stay knowledgeable in the new server platform. If those skills are very niche, then higher labour costs could be incurred if those people are hard to find or if trained employees use that knowledge as leverage for higher pay. Maybe a vendor exists that will contract out support, but then there are those costs to consider as well. And let’s not forget end-user training – in our example, anyone that runs reports will need to be trained on how to use the new system. Here’s the Point We still tend to look at software in an “off the shelf” kind of way. It’s very easy to say “oh, this product is better than vendor x’s product – and its free because its OSS!” but the reality is that implementing any new technology within an organization has a cost regardless of the retail price of the product. Training, integration, support – these are real costs that impact an organization and span multiple departments. Whether you’re pitching an improved business process, a new system, or a new technology, you need to consider the bigger picture costs of implementation. What you define as success (in our example, having better reporting functionality) might not be what others define as success if implementing your solution causes them issues. A true enterprise solution needs to consider the entire enterprise.

Read the article
Advice for a distracted, unhappy, recently graduated programmer? [closed]

- by Re-Invent

I graduated 4 months ago. I had offers from a few good places to work at. At the same time I wanted to stick to building a small software business of my own, still have some ideas with good potential, some half done projects frozen in my github. But due to social pressures, I chose a job, the pay is great, but I am half-passionate about it. A small team of smart folks building useful product, working out contracts across the world. I've started finding it extremely boring. Boring to the extent that I skip 2-3 days a week together not doing work. Neither do I spend that time progressing any of my own projects. Yes, I feel stupid at the way I'm wasting time, but I don't understand exactly why is it happening. It's as if all the excitement has been drained. What can I do about it? Long version: School - I was in third standard. Only students, 6th grade had access to computer labs. I once peeked into the lab from the little door opening. No hard-disks, MS DOS on 5 1/2 inch floppies. I asked a senior student to play some sound in BASIC. He used PLAY to compose a tune. Boy! I was so excited, I was jumping from within. Back home, asked my brother to teach me some programming. We bought a book "MODERN All About GW-BASIC for Schools & Colleges". The book had everything, right from printing, to taking input, file i/o, game programming, machine level support, etc. I was in 6th standard, wrote my first game - a wheel of fortune, rotated the wheel by manipulating 16 color palette's definition. Got internet soon, got hooked to QuickBasic programming community. Made some more games "007 in Danger", "Car Crush 2" for submission to allbasiccode archives. I was extremely excited about all this. My interests now swayed into "hacking" (computer security). Taught myself some perl, found it annoying, learnt PHP and a bit of SQL. Also taught myself Visual Basic one of the winters and wrote a pacman clone with Direct X. By the time I was in 10th standard, I created some evil tools using visual basic, php and mysql and eventually landed myself into an unpaid side-job at a government facility, building evil tools for them. It was a dream come true for crackers of that time. And so was I, still very excited. Things changed soon, last two years of school were not so great as I was balancing preps for college, work at govt. and studies for school at same time. College - College was opposite of all I had wished it to be. I imagined it to be a place where I'd spend my 4 years building something awesome. It was rather an epitome of rote learning, attendance, rules, busy schedules, ban on personal laptops, hardly any hackers surrounding you and shit like that. We had to take permissions to even introduce some cultural/creative activities in our annual schedule. The labs won't be open on weekends because the lab employees had to have their leaves. Yes, a horrible place for someone like me. I still managed to pull out a project with a friend over 2 months. Showed it to people high in the academia hierarchy. They were immensely impressed, we proposed to allow personal computers for students. They made up half-assed reasons and didn't agree. We felt frustrated. And so on, I still managed to teach myself new languages, do new projects of my own, do an intern at the same govt. facility, start a small business for sometime, give a talk at a conference I'm passionate about, win game-dev and hacking contest at most respected colleges, solve good deal of programming contest problems, etc. At the same time I was not content with all these restrictions, great emphasis on rote learning, and sheer wastage of time due to college. I never felt I was overdoing, but now I feel I burnt myself out. During my last days at college, I did an intern at a bigco. While I spent my time building prototypes for certain LBS, the other interns around me, even a good friend, was just skipping time. I thought maybe, in a few weeks he would put in some serious efforts at work assigned to him, but all he did was to find creative ways to skip work, hide his face from manager, engage people in talks if they try to question his progress, etc. I tried a few time to get him on track, but it seems all he wanted was to "not to work hard at all and still reap the fruits". I don't know how others take such people, but I find their vicinity very very poisonous to one's own motivation and productivity. Over that, the place where I come from, HRs don't give much value to what have you done past 4 years. So towards the end of out intern, we all were offered work at the bigco, but the slacker, even after not writing more than 200 lines of code was made a much better offer. I felt enraged instantly - "Is this how the corp world treats someone who does fruitful, if not extra-ordinary work form them for past 6 months?". Yes, I did try to negotiate and debate. The bigcos seem blind due to departmentalization of responsibilities and many layers of management. I decided not to be in touch with any characters of that depressing play. Probably the busy time I had at college, ignoring friends, ignoring fun and squeezing every bit of free time for myself is also responsible. Probably this is what has drained all my willingness to work for anyone. I find my day job boring, at the same time I with to maintain it for financial reasons. I feel a bit burnt out, unsatisfied and at the same time an urge to quit working for someone else and start finishing my frozen side-projects (which may be profitable). Though I haven't got much to support myself with food, office, internet bills, etc in savings. I still have my day job, but I don't find it very interesting, even though the pay is higher than the slacker, I don't find money to be a great motivator here. I keep comparing myself to my past version. I wonder how to get rid of this and reboot myself back to the way I was in school days - excited about it, tinkering, building, learning new things daily, and NOT BORED?

Read the article
ANTS CLR and Memory Profiler In Depth Review (Part 2 of 2 – Memory Profiler)

- by ToStringTheory

One of the things that people might not know about me, is my obsession to make my code as efficient as possible. Many people might not realize how much of a task or undertaking that this might be, but it is surely a task as monumental as climbing Mount Everest, except this time it is a challenge for the mind… In trying to make code efficient, there are many different factors that play a part – size of project or solution, tiers, language used, experience and training of the programmer, technologies used, maintainability of the code – the list can go on for quite some time. I spend quite a bit of time when developing trying to determine what is the best way to implement a feature to accomplish the efficiency that I look to achieve. One program that I have recently come to learn about – Red Gate ANTS Performance (CLR) and Memory profiler gives me tools to accomplish that job more efficiently as well. In this review, I am going to cover some of the features of the ANTS memory profiler set by compiling some hideous example code to test against. Notice As a member of the Geeks With Blogs Influencers program, one of the perks is the ability to review products, in exchange for a free license to the program. I have not let this affect my opinions of the product in any way, and Red Gate nor Geeks With Blogs has tried to influence my opinion regarding this product in any way. Introduction – Part 2 In my last post, I reviewed the feature packed Red Gate ANTS Performance Profiler. Separate from the Red Gate Performance Profiler is the Red Gate ANTS Memory Profiler – a simple, easy to use utility for checking how your application is handling memory management… A tool that I wish I had had many times in the past. This post will be focusing on the ANTS Memory Profiler and its tool set. The memory profiler has a large assortment of features just like the Performance Profiler, with the new session looking nearly exactly alike: ANTS Memory Profiler Memory profiling is not something that I have to do very often… In the past, the few cases I’ve had to find a memory leak in an application I have usually just had to trace the code of the operations being performed to look for oddities… Sadly, I have come across more undisposed/non-using’ed IDisposable objects, usually from ADO.Net than I would like to ever see. Support is not fun, however using ANTS Memory Profiler makes this task easier. For this round of testing, I am going to use the same code from my previous example, using the WPF application. This time, I will choose the ‘Profile Memory’ option from the ANTS menu in Visual Studio, which launches the solution in its currently configured state/start-up project, and then launches the ANTS Memory Profiler to help. It prepopulates all of the fields with the current project information, and all I have to do is select the ‘Start Profiling’ option. When the window comes up, it is actually quite barren, just giving ideas on how to work the profiler. You start by getting to the point in your application that you want to profile, and then taking a ‘Memory Snapshot’. This performs a full garbage collection, and snapshots the managed heap. Using the same WPF app as before, I will go ahead and take a snapshot now. As you can see, ANTS is already giving me lots of information regarding the snapshot, however this is just a snapshot. The whole point of the profiler is to perform an action, usually one where a memory problem is being noticed, and then take another snapshot and perform a diff between them to see what has changed. I am going to go ahead and generate 5000 primes, and then take another snapshot: As you can see, ANTS is already giving me a lot of new information about this snapshot compared to the last. Information such as difference in memory usage, fragmentation, class usage, etc… If you take more snapshots, you can use the dropdown at the top to set your actual comparison snapshots. If you beneath the timeline, you will see a breadcrumb trail showing how best to approach profiling memory using ANTS. When you first do the comparison, you start on the Summary screen. You can either use the charts at the bottom, or switch to the class list screen to get to the next step. Here is the class list screen: As you can see, it lists information about all of the instances between the snapshots, as well as at the bottom giving you a way to filter by telling ANTS what your problem is. I am going to go ahead and select the Int16[] to look at the Instance Categorizer Using the instance categorizer, you can travel backwards to see where all of the instances are coming from. It may be hard to see in this image, but hopefully the lightbox (click on it) will help: I can see that all of these instances are rooted to the application through the UI TextBlock control. This image will probably be even harder to see, however using the ‘Instance Retention Graph’, you can trace an objects memory inheritance up the chain to see its roots as well. This is a simple example, as this is simply a known element. Usually you would be profiling an actual problem, and comparing those differences. I know in the past, I have spotted a problem where a new context was created per page load, and it was rooted into the application through an event. As the application began to grow, performance and reliability problems started to emerge. A tool like this would have been a great way to identify the problem quickly. Overview Overall, I think that the Red Gate ANTS Memory Profiler is a great utility for debugging those pesky leaks. 3 Biggest Pros: Easy to use interface with lots of options for configuring profiling session Intuitive and helpful interface for drilling down from summary, to instance, to root graphs ANTS provides an API for controlling the profiler. Not many options, but still helpful. 2 Biggest Cons: Inability to automatically snapshot the memory by interval Lack of complete integration with Visual Studio via an extension panel Ratings Ease of Use (9/10) – I really do believe that they have brought simplicity to the once difficult task of memory profiling. I especially liked how it stepped you further into the drilldown by directing you towards the best options. Effectiveness (10/10) – I believe that the profiler does EXACTLY what it purports to do. Features (7/10) – A really great set of features all around in the application, however, I would like to see some ability for automatically triggering snapshots based on intervals or framework level items such as events. Customer Service (10/10) – My entire experience with Red Gate personnel has been nothing but good. their people are friendly, helpful, and happy! UI / UX (9/10) – The interface is very easy to get around, and all of the options are easy to find. With a little bit of poking around, you’ll be optimizing Hello World in no time flat! Overall (9/10) – Overall, I am happy with the Memory Profiler and its features, as well as with the service I received when working with the Red Gate personnel. Thank you for reading up to here, or skipping ahead – I told you it would be shorter! Please, if you do try the product, drop me a message and let me know what you think! I would love to hear any opinions you may have on the product. Code Feel free to download the code I used above – download via DropBox

Read the article
LINQ – SequenceEqual() method

- by nmarun

I have been looking at LINQ extension methods and have blogged about what I learned from them in my blog space. Next in line is the SequenceEqual() method. Here’s the description about this method: “Determines whether two sequences are equal by comparing the elements by using the default equality comparer for their type.” Let’s play with some code: 1: int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 }; 2: // int[] numbersCopy = numbers; 3: int[] numbersCopy = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 }; 4: 5: Console.WriteLine(numbers.SequenceEqual(numbersCopy)); This gives an output of ‘True’ – basically compares each of the elements in the two arrays and returns true in this case. The result is same even if you uncomment line 2 and comment line 3 (I didn’t need to say that now did I?). So then what happens for custom types? For this, I created a Product class with the following definition: 1: class Product 2: { 3: public int ProductId { get; set; } 4: public string Name { get; set; } 5: public string Category { get; set; } 6: public DateTime MfgDate { get; set; } 7: public Status Status { get; set; } 8: } 9: 10: public enum Status 11: { 12: Active = 1, 13: InActive = 2, 14: OffShelf = 3, 15: } In my calling code, I’m just adding a few product items: 1: private static List<Product> GetProducts() 2: { 3: return new List<Product> 4: { 5: new Product 6: { 7: ProductId = 1, 8: Name = "Laptop", 9: Category = "Computer", 10: MfgDate = new DateTime(2003, 4, 3), 11: Status = Status.Active, 12: }, 13: new Product 14: { 15: ProductId = 2, 16: Name = "Compact Disc", 17: Category = "Water Sport", 18: MfgDate = new DateTime(2009, 12, 3), 19: Status = Status.InActive, 20: }, 21: new Product 22: { 23: ProductId = 3, 24: Name = "Floppy", 25: Category = "Computer", 26: MfgDate = new DateTime(1993, 3, 7), 27: Status = Status.OffShelf, 28: }, 29: }; 30: } Now for the actual check: 1: List<Product> products1 = GetProducts(); 2: List<Product> products2 = GetProducts(); 3: 4: Console.WriteLine(products1.SequenceEqual(products2)); This one returns ‘False’ and the reason is simple – this one checks for reference equality and the products in the both the lists get different ‘memory addresses’ (sounds like I’m talking in ‘C’). In order to modify this behavior and return a ‘True’ result, we need to modify the Product class as follows: 1: class Product : IEquatable<Product> 2: { 3: public int ProductId { get; set; } 4: public string Name { get; set; } 5: public string Category { get; set; } 6: public DateTime MfgDate { get; set; } 7: public Status Status { get; set; } 8: 9: public override bool Equals(object obj) 10: { 11: return Equals(obj as Product); 12: } 13: 14: public bool Equals(Product other) 15: { 16: //Check whether the compared object is null. 17: if (ReferenceEquals(other, null)) return false; 18: 19: //Check whether the compared object references the same data. 20: if (ReferenceEquals(this, other)) return true; 21: 22: //Check whether the products' properties are equal. 23: return ProductId.Equals(other.ProductId) 24: && Name.Equals(other.Name) 25: && Category.Equals(other.Category) 26: && MfgDate.Equals(other.MfgDate) 27: && Status.Equals(other.Status); 28: } 29: 30: // If Equals() returns true for a pair of objects 31: // then GetHashCode() must return the same value for these objects. 32: // read why in the following articles: 33: // http://geekswithblogs.net/akraus1/archive/2010/02/28/138234.aspx 34: // http://stackoverflow.com/questions/371328/why-is-it-important-to-override-gethashcode-when-equals-method-is-overriden-in-c 35: public override int GetHashCode() 36: { 37: //Get hash code for the ProductId field. 38: int hashProductId = ProductId.GetHashCode(); 39: 40: //Get hash code for the Name field if it is not null. 41: int hashName = Name == null ? 0 : Name.GetHashCode(); 42: 43: //Get hash code for the ProductId field. 44: int hashCategory = Category.GetHashCode(); 45: 46: //Get hash code for the ProductId field. 47: int hashMfgDate = MfgDate.GetHashCode(); 48: 49: //Get hash code for the ProductId field. 50: int hashStatus = Status.GetHashCode(); 51: //Calculate the hash code for the product. 52: return hashProductId ^ hashName ^ hashCategory & hashMfgDate & hashStatus; 53: } 54: 55: public static bool operator ==(Product a, Product b) 56: { 57: // Enable a == b for null references to return the right value 58: if (ReferenceEquals(a, b)) 59: { 60: return true; 61: } 62: // If one is null and the other not. Remember a==null will lead to Stackoverflow! 63: if (ReferenceEquals(a, null)) 64: { 65: return false; 66: } 67: return a.Equals((object)b); 68: } 69: 70: public static bool operator !=(Product a, Product b) 71: { 72: return !(a == b); 73: } 74: } Now THAT kinda looks overwhelming. But lets take one simple step at a time. Ok first thing you’ve noticed is that the class implements IEquatable<Product> interface – the key step towards achieving our goal. This interface provides us with an ‘Equals’ method to perform the test for equality with another Product object, in this case. This method is called in the following situations: when you do a ProductInstance.Equals(AnotherProductInstance) and when you perform actions like Contains<T>, IndexOf() or Remove() on your collection Coming to the Equals method defined line 14 onwards. The two ‘if’ blocks check for null and referential equality using the ReferenceEquals() method defined in the Object class. Line 23 is where I’m doing the actual check on the properties of the Product instances. This is what returns the ‘True’ for us when we run the application. I have also overridden the Object.Equals() method which calls the Equals() method of the interface. One thing to remember is that anytime you override the Equals() method, its’ a good practice to override the GetHashCode() method and overload the ‘==’ and the ‘!=’ operators. For detailed information on this, please read this and this. Since we’ve overloaded the operators as well, we get ‘True’ when we do actions like: 1: Console.WriteLine(products1.Contains(products2[0])); 2: Console.WriteLine(products1[0] == products2[0]); This completes the full circle on the SequenceEqual() method. See the code used in the article here.

Read the article
Need help identiying a nasty rootkit in Windows

- by goofrider

I have a nasty rootkit that not tools seem to be able to idenity. I know for sure it's a rootkit, but I can figure out which rootkit it is. Here's what I gathered so far: It creates multiple copies of itself in %HOME%\Local Settings\Temp with names like Q.EXE, IAJARZ.exe, etc., and install them as hidden services. These EXE have SysInternals identifiers in them so they're definitely rootkits. It hooked very deep in the system, including file read/write, security policies, registry read/write, and possibly WinSock/TCP/IP. When going to Sophos.com to download their software, the rootkit inject something called Microsoft Ajax Tootkit into the page, which injects code into the email submission form in order to redirect it. (EDIT: I might have panicked. Looks like Sophos does use an AJAZ email form, their form is just broken on Chrome so it looked like a mail form injection attack, the link is http://www.sophos.com/en-us/products/free-tools/virus-removal-tool/download.aspx ) Super-Antispyware found a lot of spyware cookies, in the name of .kaspersky.2o7.net, etc. (just chedk 2o7.net, looks like it's a legit ad company) I tried comparing DNS lookup from the infected systems and from system in other physical locations, no DNS redirections it seems. I used dd to copy the MBR and compared it with the MBR provided by ms-sys package, no differences so it's not infecting MBR. No antivirus or rootkit scanner be able to identify it. Most of them can't even find it. I tried scanning, in-situ (normal mode), in safe mode, and boot to linux live CD. Scanners used: Avast, Sophos anti rootkit, Kasersky TDSSKiller, GMER, RootkitRevealer, and many others. Kaspersky reported some unsigned system files that ought to be signed (e.g. tcpip.sys), and reported a number of MD5 mismatches. But otherwise couldn't identify anything based on signature. When running Sysinternal RootkitRevealer and Sophos AntiRootkit, CPU usage goes up to 100% and gets stucked. The Rootkit is blocking them. When trying running/installing HiJackThis, RootkitRevealer and some other scanners, it tells me system security policy prevent running/installing it. The list of malicious acitivities go on and on. here's a sample of logs from all my scans. In particular, aswSnx.SYS, apnenfno.sys and PROCMON20.SYS has a huge number of hooks. It's hard to tell if the rootkit replaced legit program files like aswSnx.SYS (from Avast) and PROCMON20.SYS (from Sysinternal Process Monitor). I can't find whether apnenfno.sys is from a legit program. Help to identify it is appreciated. Trend Micro RootkitBuster ------ [HIDDEN_REGISTRY][Hidden Reg Value]: KeyPath : HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\sptd\Cfg Root : 586bfc0 SubKey : Cfg ValueName : g0 Data : 38 23 E8 D0 BF F2 2D 6F ... ValueType : 3 AccessType: 0 FullLength: 61 DataSize : 32 [HOOKED_SERVICE_API]: Service API : ZwCreateMutant Image Path : C:\WINDOWS\System32\Drivers\aswSnx.SYS OriginalHandler : 0x8061758e CurrentHandler : 0xaa66cce8 ServiceNumber : 0x2b ModuleName : aswSnx.SYS SDTType : 0x0 [HOOKED_SERVICE_API]: Service API : ZwCreateThread Image Path : c:\windows\system32\drivers\apnenfno.sys OriginalHandler : 0x805d1038 CurrentHandler : 0xaa5f118c ServiceNumber : 0x35 ModuleName : apnenfno.sys SDTType : 0x0 [HOOKED_SERVICE_API]: Service API : ZwDeleteKey Image Path : C:\WINDOWS\system32\Drivers\PROCMON20.SYS OriginalHandler : 0x80624472 CurrentHandler : 0xa709b0f8 ServiceNumber : 0x3f ModuleName : PROCMON20.SYS SDTType : 0x0 HiJackThis ------ O23 - Service: JWAHQAGZ - Sysinternals - www.sysinternals.com - C:\DOCUME~1\jeff\LOCALS~1\Temp\JWAHQAGZ.exe O23 - Service: LHIJ - Sysinternals - www.sysinternals.com - C:\DOCUME~1\jeff\LOCALS~1\Temp\LHIJ.exe Kaspersky TDSSKiller ------ 21:05:58.0375 3936 C:\WINDOWS\system32\ati2sgag.exe - copied to quarantine 21:05:59.0217 3936 ATI Smart ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:05:59.0342 3936 C:\WINDOWS\system32\BUFADPT.SYS - copied to quarantine 21:05:59.0856 3936 BUFADPT ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:05:59.0965 3936 C:\Program Files\CrashPlan\CrashPlanService.exe - copied to quarantine 21:06:00.0152 3936 CrashPlanService ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:00.0246 3936 C:\WINDOWS\system32\epmntdrv.sys - copied to quarantine 21:06:00.0433 3936 epmntdrv ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:00.0464 3936 C:\WINDOWS\system32\EuGdiDrv.sys - copied to quarantine 21:06:00.0526 3936 EuGdiDrv ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:00.0604 3936 C:\Program Files\Common Files\Macrovision Shared\FLEXnet Publisher\FNPLicensingService.exe - copied to quarantine 21:06:01.0181 3936 FLEXnet Licensing Service ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:01.0321 3936 C:\Program Files\AddinForUNCFAT\UNCFATDMS.exe - copied to quarantine 21:06:01.0430 3936 OTFSDMS ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:01.0492 3936 C:\WINDOWS\system32\DRIVERS\tcpip.sys - copied to quarantine 21:06:01.0539 3936 Tcpip ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:01.0601 3936 C:\DOCUME~1\jeff\LOCALS~1\Temp\TULPUWOX.exe - copied to quarantine 21:06:01.0664 3936 HKLM\SYSTEM\ControlSet003\services\TULPUWOX - will be deleted on reboot 21:06:01.0664 3936 C:\DOCUME~1\jeff\LOCALS~1\Temp\TULPUWOX.exe - will be deleted on reboot 21:06:01.0664 3936 TULPUWOX ( UnsignedFile.Multi.Generic ) - User select action: Delete 21:06:01.0757 3936 C:\WINDOWS\system32\Drivers\usbaapl.sys - copied to quarantine 21:06:01.0866 3936 USBAAPL ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:01.0913 3936 C:\Program Files\VMware\VMware Player\vmware-authd.exe - copied to quarantine 21:06:02.0443 3936 VMAuthdService ( UnsignedFile.Multi.Generic ) - User select action: Quarantine 21:06:02.0443 3936 vmount2 ( UnsignedFile.Multi.Generic ) - skipped by user 21:06:02.0443 3936 vmount2 ( UnsignedFile.Multi.Generic ) - User select action: Skip 21:06:02.0459 3936 vstor2 ( UnsignedFile.Multi.Generic ) - skipped by user 21:06:02.0459 3936 vstor2 ( UnsignedFile.Multi.Generic ) - User select action: Skip

Read the article
Paying great programmers more than average programmers

- by Kelly French

It's fairly well recognized that some programmers are up to 10 times more productive than others. Joel mentions this topic on his blog. There is a whole blog devoted to the idea of the "10x productive programmer". In years since the original study, the general finding that "There are order-of-magnitude differences among programmers" has been confirmed by many other studies of professional programmers (Curtis 1981, Mills 1983, DeMarco and Lister 1985, Curtis et al. 1986, Card 1987, Boehm and Papaccio 1988, Valett and McGarry 1989, Boehm et al 2000). Fred Brooks mentions the wide range in the quality of designers in his "No Silver Bullet" article, The differences are not minor--they are rather like the differences between Salieri and Mozart. Study after study shows that the very best designers produce structures that are faster, smaller, simpler, cleaner, and produced with less effort. The differences between the great and the average approach an order of magnitude. The study that Brooks cites is: H. Sackman, W.J. Erikson, and E.E. Grant, "Exploratory Experimental Studies Comparing Online and Offline Programming Performance," Communications of the ACM, Vol. 11, No. 1 (January 1968), pp. 3-11. The way programmers are paid by employers these days makes it almost impossible to pay the great programmers a large multiple of what the entry-level salary is. When the starting salary for a just-graduated entry-level programmer, we'll call him Asok (From Dilbert), is $40K, even if the top programmer, we'll call him Linus, makes $120K that is only a multiple of 3. I'd be willing to be that Linus does much more than 3 times what Asok does, so why wouldn't we expect him to get paid more as well? Here is a quote from Stroustrup: "The companies are complaining because they are hurting. They can't produce quality products as cheaply, as reliably, and as quickly as they would like. They correctly see a shortage of good developers as a part of the problem. What they generally don't see is that inserting a good developer into a culture designed to constrain semi-skilled programmers from doing harm is pointless because the rules/culture will constrain the new developer from doing anything significantly new and better." This leads to two questions. I'm excluding self-employed programmers and contractors. If you disagree that's fine but please include your rationale. It might be that the self-employed or contract programmers are where you find the top-10 earners, but please provide a explanation/story/rationale along with any anecdotes. [EDIT] I thought up some other areas in which talent/ability affects pay. Financial traders (commodities, stock, derivatives, etc.) designers (fashion, interior decorators, architects, etc.) professionals (doctor, lawyer, accountant, etc.) sales Questions: Why aren't the top 1% of programmers paid like A-list movie stars? What would the industry be like if we did pay the "Smart and gets things done" programmers 6, 8, or 10 times what an intern makes? [Footnote: I posted this question after submitting it to the Stackoverflow podcast. It was included in episode 77 and I've written more about it as a Codewright's Tale post 'Of Rockstars and Bricklayers'] Epilogue: It's probably unfair to exclude contractors and the self-employed. One aspect of the highest earners in other fields is that they are free-agents. The competition for their skills is what drives up their earning power. This means they can not be interchangeable or otherwise treated as a plug-and-play resource. I liked the example in one answer of a major league baseball team trying to field two first-basemen. Also, something that Joel mentioned in the Stackoverflow podcast (#77). There are natural dynamics to shrink any extreme performance/pay ranges between the highs and lows. One is the peer pressure of organizations to pay within a given range, another is the likelyhood that the high performer will realize their undercompensation and seek greener pastures.

Read the article
Fitting it together, database, reporting, applications in C#

- by alvonellos

Introduction Preamble I was hesitant to post this, since it's an application whose intricate details are defined elsewhere, and answers may not be helpful to others. Within the past few weeks (I was actually going to write a blog post about this after I finished) I've discovered that the barrier I'm encountering is one that's actually quite common for newer developers. This question is not so much about a specific thing as it is about piecing those things together. I've searched the internet far and wide, and found many tutorials on how to create applications that are kind of similar to what I'm looking for. I've also looked at hiring another, more experienced, developer to help me along, but all I've gotten are unqualified candidates that don't have the experience necessary and won't take care of the client or project like I will. I'd rather have the project never transpire than to release a solution that is half-baked. I've asked professors at my school, but they've not turned up answers to my question. I'm an experienced developer, and I've written many applications that are -- very abstractly -- close to what I'm doing, but my experiences from those applications aren't giving me enough leverage to solve this particular problem. I just hope that posting this article isn't a mistake for me to write. Project Description I have a project I'm working on for a client that is a rewrite of an application, originally written in Foxpro 2.6 by someone before me, that performs some analysis (which, sadly, I'm not allowed to disclose as per of my employment contract) on financial data. One day, after a long talk between the client and I -- where he intimately described his frustrations with all the bugs I've been hacking out of this code for 6 months now -- he told me to just rewrite it and gave me a month to write a good 1/8 of this 65k LOC Foxpro monstrosity. this 65k line of code foxpro monstrosity. It'll take me a good 3 - 6 months to rewrite this software (I know things the original programmer did not, like inheritance) going as I am right now, but I'm quickly discovering that I'm going to need to use databases. Prior to this contract I didn't even know about foxpro, and so I've had to learn foxpro on the fly, write procedures and make modifications to the database. I've actually come to like it, and this project would be rewritten in Foxpro if it were still a supported language, because over the past few months, I've come to like the features of Foxpro that make it so easy to develop data-driven applications. I once perfomed an experiment, comparing C# to Foxpro. What took me 45 minutes in C# took me two in Foxpro, and I knew C# prior to Foxpro. I was hoping to leverage the power of C#, but it intimidates me that in foxpro, you can have one line of code and be using a database. Prior to this, I have never written any serious database development from scratch. All the applications that I've written are in a different league. They are either completely data-naive or data-naive enough that I can get away with not using a database through serialization or by designing algorithms that work with the data in a manner that is stateless, so there is no need to worry about databases. I've come to realize, very quickly, that serialization and my efficacy with data structures has been my crutch all these years that's prevented me from adventuring into databases, and has consequently hindered my success in real-world programming. Sure, I've written some database stuff in Perl and Python, and I've done forms and worked with relational databases and tables, I'm a wizard in Access and Excel (seriously) and can do just about anything, but it just feels unnatural writing SQL code in another language... I don't mind writing SQL, and I don't It's that bridge between the database and the program code that drives me absolutely bonkers. I hope I'm not the only one to think this, but it bothers me that I have to create statements like the following string sSql = "SELECT * from tablename" When there's really no reason for that kind of unchecked language binding between two languages and two API's. Don't get my wrong, SQL is great, but I don't like the idea that, when executing commands on a SQL database, that one must intermix database and application software, and there's no database independence, which means that different versions of different databases can break code. This isn't very nice. The nicest thing about Foxpro is the cohesiveness between programming language and database. It's so easy, and Foxpro makes it easy, because the tool just fits the task. I can see why so many developers have created a career with this language, because it lowered the barrier of entry to data-driven applications that so many businesses need. It was wonderful. For my purposes today, with the demands and need for community support, extensibility, and language features, Foxpro isn't a solution that I feel would be the right tool for the job. I'm also worried about working too heavy with the database, because I've seen data-driven .NET applications have issues with database caches, running out of memory, and objects in the database not being collected. (Memory leaks) And OH the queries. Which one, how, and why? There are a plethora of different ways that a database can be setup, I think I counted 5 or 6 different kinds of database applications alone that I can chose from. That is a great mountain for me to climb when I don't even know where to begin when it comes to writing data-driven applications. The problem isn't that I don't know SQL or that I don't know C#. I know both and have worked with both extensively. It's making them work together that's the problem, and it's something I've never done in C# before. Reports The client likes paper. The data needs to be printed out in a format that is extensible, layered, and easy to use. I have never done reporting before, and so this is a bit of a problem. From the data source comes crystal reports, and so there's a dependency on the database, from what I understand. Code reuse A large part of the design decision that I've gone through so far is to break the task of writing a piece of this software into routines and modular DLL's and so forth such that much of the code can be reused. For example, when I setup this database, I want to be able to reuse the same database code over and over again. I also want to make sure that when the day comes that another developer is here, that he/she will be able to pick up just where I left off. The quicker I develop these applications, the better off I am. Tasks & Goals In my project, I need to write routines that apply algorithms and look for predefined patterns in financial data. Additionally, I need to simulate trading based on predefined algorithms and data. Then I need to prepare reports on that data. Additionally, I need to have a way to change the code base for this application quickly and effectively, without hacking together some band-aid solution for a problem that really needs a trauma ward. Special Considerations The solution must be fast, run quickly on existing hardware, and not be too much of a pain to maintain and write. I understand that anything I write I'm married to -- I'm responsible for the things that I write because my reputation and livelihood is dependent on it. Do I really need a database? What about performance? Performance was such a big issue that I hand wrote a data structure that is capable of performing 2 billion operations, using a total of 4 gigs of memory in under 1/4 of a second using the standard core two duo processor. I could not find a similar, pre-written data structure in C# to perform this task. What setup do I use in terms of database? What about reporting? I'd prefer to have PDF's generated, but I'd like to be able to visually sketch those reports and then just have a ReportFactory of some sort, that when I pass some variables in, it just does that data. About Me I'm a lone developer for a small business in this area. This is the first time I've done this and I've never had the breadth and depth of my knowledge tested. I'm incredibly frustrated with this project because I feel incredibly overwhelmed with the task at hand. I'm looking for that entry level point where I can draw a line and say "this is what I need to do" Conclusion I may have not been clear enough on my post. I'm still new to this whole thing, and I've been doing my best to contribute back to the community that I've leached so much knowledge from. I'd be glad to edit my post and add more information if possible. I'm looking for a big-picture solution or design process that helps me get off the ground in this world of data-driven applications, because I have a feeling that it's going to be concentric to my entire career as a programmer for some time. Specifically, if you didn't get it from the rest of the post (I may not have been clear enough) I really need some guidance as to where to go in terms of the design decisions for this project. Some things that'll be useful will be a pro/con list for the different kinds of database projects available in VS2010. I've tried, but generating that list has been as hard as solving the problem itself... If you could walk a developer writing a data-driven application for the first time in C#, how would you do that? Where would you point them to?

Read the article
CodePlex Daily Summary for Saturday, March 10, 2012

CodePlex Daily Summary for Saturday, March 10, 2012Popular ReleasesPlayer Framework by Microsoft: Player Framework for Windows 8 Metro (BETA): Player Framework for HTML/JavaScript and XAML/C# Metro Style Applications.WPF Application Framework (WAF): WAF for .NET 4.5 (Experimental): Version: 2.5.0.440 (Experimental): This is an experimental release! It can be used to investigate the new .NET Framework 4.5 features. The ideas shown in this release might come in a future release (after 2.5) of the WPF Application Framework (WAF). More information can be found in this dicussion post. Requirements .NET Framework 4.5 (The package contains a solution file for Visual Studio 11) The unit test projects require Visual Studio 11 Professional Changelog All: Upgrade all proje...SSH.NET Library: 2012.3.9: There are still few outstanding issues I wanted to include in this release but since its been a while and there are few new features already I decided to create a new release now. New Features Add SOCKS4, SOCKS5 and HTTP Proxy support when connecting to remote server. For silverlight only IP address can be used for server address when using proxy. Add dynamic port forwarding support using ForwardedPortDynamic class. Add new ShellStream class to work with SSH Shell. Add supports for mu...Test Case Import Utilities for Visual Studio 2010 and Visual Studio 11 Beta: V1.2 RTM: This release (V1.2 RTM) includes: Support for connecting to Hosted Team Foundation Server Preview. Support for connecting to Team Foundation Server 11 Beta. Fix to issue with read-only attribute being set for LinksMapping-ReportFile which may have led to problems when saving the report file. Fix to issue with “related links” not being set properly in certain conditions. Fix to ensure that tool works fine when the Excel file contained rich text data. Note: Data is still imported in pl...Audio Pitch & Shift: Audio Pitch And Shift 3.5.0: Modules (mod, xm, it, etc..) supportcallisto: callisto 2.0.19: BUG FIX: Autorun.load() function in scripting now has sandboxed path (Thanks Mikey!) BUG FIX: UserObject.Name property now allows full 20 byte string replacements. FEATURE REQUEST: File.* script functions now allow file extensions.EntitiesToDTOs - Entity Framework DTO Generator: EntitiesToDTOs.v2.1: Changelog Fixed template file access issue on Win7. Fix on configuration load when target project was not found and "Use project default namespace" was checked. Minor fix on loading latest configuration at startup. Minor fix in VisualStudioHelper class. DTO's properties accessors are now in one line. Improvements in PropertyHelper to get a cleaner and more performant code. Added Website project type as a not supported project type. Using Error List pane from VS IDE to show Enti...DotNetNuke® Community Edition CMS: 06.01.04: Major Highlights Fixed issue with loading the splash page skin in the login, privacy and terms of use pages Fixed issue when searching for words with special characters in them Fixed redirection issue when the user does not have permissions to access a resource Fixed issue when clearing the cache using the ClearHostCache() function Fixed issue when displaying the site structure in the link to page feature Fixed issue when inline editing the title of modules Fixed issue with ...Mayhem: Mayhem Developer Preview: This is the developer preview of Mayhem. Enjoy!Team Foundation Server Process Template Customization Guidance: v1 - For Visual Studio 11: Welcome to the BETA release of the Team Foundation Server Process Template Customization preview. As this is a BETA release and the quality bar for the final Release has not been achieved, we value your candid feedback and recommend that you do not use or deploy these BETA artifacts in a production environment. Quality-Bar Details Documentation has been reviewed by Visual Studio ALM Rangers Documentation has not been through an independent technical review Documentation has not been rev...Magelia WebStore Open-source Ecommerce software: Magelia WebStore 1.2: Medium trust compliant lot of small change for medium trust compliance full refactoring of user management refactoring of Client Refactoring of user management Magelia.WebStore.Client no longer reference Magelia.WebStore.Services.Contract Refactoring page category multi parent category added copy category feature added Refactoring page catalog copy catalog feature added variant management improvement ability to define a default variant for a variable product ability to ord...PDFsharp - A .NET library for processing PDF: PDFsharp and MigraDoc Foundation 1.32: PDFsharp and MigraDoc Foundation 1.32 is a stable version that fixes a few bugs that were found with version 1.31. Version 1.32 includes solutions for Visual Studio 2010 only (but it should be possible to add the project files to existing solutions for VS 2005 or VS 2008). Users of VS 2005 or VS 2008 can still download version 1.31 with the solutions for those versions that allow them to easily try the samples that are included. While it may create smaller PDF files than version 1.30 because...Terminals: Version 2.0 - Release: Changes since version 1.9a:New art works New usability in Organize favorites window Improved usability of imports/exports and scans Large number of fixes Improvements in single instance mode Comparing November beta 4, this corrects: New application icons Doesn't show Logon error codes Fixed command line arguments exception for single instance mode Fixed detaching of tabs improved usability in detached window Fixed option settings for Capture manager Fixed system tray noti...MFCMAPI: March 2012 Release: Build: 15.0.0.1032 Full release notes at SGriffin's blog. If you just want to run the MFCMAPI or MrMAPI, get the executables. If you want to debug them, get the symbol files and the source. The 64 bit builds will only work on a machine with Outlook 2010 64 bit installed. All other machines should use the 32 bit builds, regardless of the operating system. Facebook BadgeTortoiseHg: TortoiseHg 2.3.1: bugfix releaseSimple Injector: Simple Injector v1.4.1: This release adds two small improvements to the SimpleInjector.Extensions.dll. No changes have been made to the core library. New features and improvements in this release for the SimpleInjector.Extensions.dll The RegisterManyForOpenGeneric extension methods now accept non-generic decorator, as long as they implement the given open generic service type. GetTypesToRegister methods added to the OpenGenericBatchRegistrationExtensions class which allows to customize the behavior. Note that the...CommonLibrary: Code: CodeVidCoder: 1.3.1: Updated HandBrake core to 0.9.6 release (svn 4472). Removed erroneous "None" container choice. Change some logic and help text to stop assuming you have to pick the VIDEO_TS folder for a DVD scan. This should make previewing DVD titles on the Queue Multiple Titles window possible when you've picked the root DVD directory.Google Books Downloader for Windows: Google Books Downloader: Google Books Downloader 1.8ExtAspNet: ExtAspNet v3.1.0: ExtAspNet - ?? ExtJS ??? ASP.NET 2.0 ???，????? AJAX ?????????? ExtAspNet ????? ExtJS ??? ASP.NET 2.0 ???，????? AJAX ??????????。 ExtAspNet ??????? JavaScript，?? CSS，?? UpdatePanel，?? ViewState，?? WebServices ???????。 ??????: IE 7.0, Firefox 3.6, Chrome 3.0, Opera 10.5, Safari 3.0+ ????：Apache License 2.0 (Apache) ??：http://extasp.net/ ??：http://bbs.extasp.net/ ??：http://extaspnet.codeplex.com/ ??：http://sanshi.cnblogs.com/ ????： +2012-03-04 v3.1.0 -??Hidden???????（〓?〓）。 -?PageManager??...New ProjectsAres Backup: Ares Backup is a Backup software which can save bytediffs and provides several storage plugins.BackItUp: Backup-Tool für Visual Studio Projektebinbin domain: binbin domain Blexus Service Plattform: Some cool stuff about Wcf Services. - can communicate files - can communicate xaml objects (generate dynamically Gui) CardPlay - a Solitaire Framework for .Net: CardPlay is a C# framework for developing Solitaire card games. The solution includes a sample WPF client along with over 100 games.Cloud Files Upload: Windows application to script cloud file uploads.Code First API Library, Scaffolding & Guidance for Coded UI Tests: Code first Coded UI Tests for web apps. Library, Scaffolding and Guidance.CPEBook by FMUG & TPAY: CPEBook by MUG & TPAY Projet dot NET CPEBookDot Net Application String Resources Viewer: Dot Net Application String Resources ViewerFilter for SharePoint Web Settings Page: This solution show a simple way to integrate a filter box by using jquery, a global farm feature and a simple delegate control for AdditionalPageHead.Google Books Downloader for Windows: Save Google books in PDF, JPEG or PNG format.GUIToolkit: C++ Windowless GUI，DirectUIHarvest Sports: Harvest SportsInfoPath Analyzer: InfoPath Analyzer makes InfoPath form development and troubleshooting much easier. You're easy to find the relationship between controls and data fields, search data fields or controls by name, edit InfoPath inner html directly.Kinectsignlanguage project: This project will help kinect be used for sign language to speech so that sign language people can be understood while talking to important people. LotteryVote: ????manager123: Trying out CodePlexMcRegister: McRegister is an asp.net mvc 3 razor website that enables you to register users on your minecraft server it works in conjunction with a minecraft mod called EasyAuth.MetroTipi: HelloTipi Sous l'interface Metro de Windows 8Microsoft AppFactory: AppFactory is a powerful data-driven build system for Windows Phone (and soon Windows 8) projects. Its purpose is to help developers start with template projects and turn them into suites of applications.MiniStock: MiniStock is an experimental reference architecture for scalable cloud-based architectures. Implemented in .net.online book shopping: online book shoppingScopa: Carousel Team Scopa per WP7 XNA testtom03092012tfs01: testtom03092012tfs01UsingLib: A library of automatically removed utilities: 1.Changing cursor to hourglass in Windows Forms 2.Logging It's developed in C#WHMCS Library: WHMCS is an all-in-one client management, billing & support solution for online businesses. Handling everything from signup to termination, WHMCS is a powerful business automation tool that puts you firmly in control. The WHMCS Library is a .NET wrapper for the WHMCS API. Written entirely in C# but really easy to port over to VB.NET. Coming from a VB.NET background we tried hard to make sure porting would be simple for VB.NET community members. ZoneEditService: Windows service to update ZoneEdit for dynamic dns.

Read the article
SQLAuthority Book Review – DBA Survivor: Become a Rock Star DBA

- by pinaldave

DBA Survivor: Become a Rock Star DBA – Thomas LaRock Link to Amazon Link to Flipkart First of all, I thank all my readers when I wrote that I could not get this book in any local book stores, because they offered me to send a copy of this good book. A very special mention goes to Sripada and Jayesh for they gave so much effort in finding my home address and sending me the hard copy. Before, I did not have the copy of the book, but now I have two of it already! It surprises me how my readers were able to find my home address, which I have not publicly shared. Quick Review: This is indeed a one easy-to-read and fun book. We all work day and night with technology yet we should not forget to show our love and care for our family at home. For our souls that starve for peace and guidance, this one book is the “it” book for all the technology enthusiasts. Though this book was specifically written for DBAs, the reach is not limited to DBAs only because the lessons incorporated in it actually applies to all. This is one of the most motivating technical books I have read. Detailed Review: Let us go over a few questions first: Who wants to be as famous as rockstars in the field of Database Administration? How can one learn what it takes to become a top notch software developer? If you are a beginner in your field, how will you go to next level? Your boss may be very kind or like Dilbert’s Boss, what will you do? How do you keep growing when Eco-system around you does not support you? You are almost at top but there is someone else at the TOP, what do you do and how do you avoid office politics? As a database developer what should be your basic responsibility? and many more… I was able to completely read book in one sitting and I loved it. Before I continue with my opinion, I want to echo the opinion of Kevin Kline who has written the Forward of the book. He has truly suggested that “You hold in your hands a collection of insights and wisdom on the topic of database administration gained through many years of hard-won experience, long nights of study, and direct mentorship under some of the industry’s most talented database professionals and information technology (IT) experts.” Today, IT field is getting bigger and better, while talking about terabytes of the database becomes “more” normal every single day. The gods and demigods of database professionals are taking care of these large scale databases and are carefully maintaining them. In this world, there are only a few beginnings on the first step. There are many experts in different technology fields who are asked to address the issues with databases. There is YOU and ME, who is just new to this work. So we ask ourselves WHERE to begin and HOW to begin. We adore and follow the religion of our rockstars, but oftentimes we really have no idea about their background and their struggles. Every rockstar has his success story which needs to be digested before learning his tricks and tips. This book starts with the same note and teaches the two most important lessons for anybody who wants to be a DBA Rockstar – to focus on their single goal of learning and to excel the technology. The story starts with three simple guidelines – Get Prepared, Get Trained, Get Certified. Once a person learns the skills, and then, it would be about time that he needs to enrich or to improve those skills you have learned. I am sure that the right opportunity will come finding themselves and they will not have to go run behind it. However, the real challenge for any person is the first day or first week. A new employee, no matter how much experienced he is, sometimes has no clue about what should one do at new job. Chapter 2 and chapter 3 precisely talk about what one should do as soon as the new job begins. It is also written with keeping the fact in focus that each job can be very much different but there are few infrastructure setups and programming concepts are the same. Learning basics of database was really interesting. I like to focus on the roots of any technology. It is important to understand the structure of the database before suggesting what indexes needs to be created, the same way this book covers the most essential knowledge one must learn by most database developers. I think the title of the fourth chapter is my favorite sentence in this book. I can see that I will be saying this again and again in the future – “A Development Server Is a Production Server to a Developer“. I have worked in the software industry for almost 8 years now and I have seen so many developers sitting on their chairs and waiting for instructions from their lead about how to improve the code or what to do the next. When I talk to them, I suggest that the experiment with their server and try various techniques. I think they all should understand that for them, a development server is their production server and needs to pay proper attention to the code from the beginning. There should be NO any inappropriate code from the beginning. One has to fully focus and give their best, if they are not sure they should ask but should do something and stay active. Chapter 5 and 6 talks about two essential skills for any developer and database administration – what are the ethics of developers when they are working with production server and how to support software which is running on the production server. I have met many people who know the theory by heart but when put in front of keyboard they do not know where to start. The first thing they do opening the browser and searching online, instead of opening SQL Server Management Studio. This can very well happen to anybody who is experienced as well. Chapter 5 and 6 addresses that situation as well includes the handy scripts which can solve almost all the basic trouble shooting issues. “Where’s the Buffet?” By far, this is the best chapter in this book. If you have ever met me, you would know that I love food. I think after reading this chapter, I felt Thomas has written this just keeping me in mind. I think there will be many other people who feel the same way, too. Even my wife who read this chapter thought this was specifically written for me. I will not talk any more about this chapter as this is one must read chapter. And of course this is about real ‘FOOD‘. I am an SQL Server Trainer and Consultant and I totally agree with the point made in the chapter 8 of this book. Yes, it says here that what is necessary to train employees and people. Millions of dollars worth the labor is continuously done in the world which has faults and incorrect. Once something goes wrong, very expensive consultant comes in and fixes the problem. This whole cycle which can be stopped and improved if proper training is done. There is plenty of free trainings available as well, if one cannot afford paid training. “Connect. Learn. Share” – I think this is a great summary and bird’s eye view of this book. Networking is the key. Everything which is discussed in this book can be taken to next level if one properly uses this tips and continuously grow with it. Connecting with others, helping learn each other and building the good knowledge sharing environment should be the goal of everyone. Before I end the review I want to share a real experience. I have personally met one DBA who has worked in a single department in a company for so long that when he was put in a different department in his company due to closing that department, he could not adjust and quit the job despite the same people and company around him. Adjusting in the new environment gets much tougher as one person gets more and more experienced. This book precisely addresses the same issue along with their solutions. I just cannot stop comparing the book with my personal journey. I found so many things which are coincidently in the book is written as how we developer and DBA think. I must express special thanks to Thomas for taking time in his personal life and write this book for us. This book is indeed a book for everybody who wants to grow healthy in the tough and competitive environment. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Book Review, SQLAuthority News, SQLServer, T SQL, Technology

Read the article

Search Results

Search found 1228 results on 50 pages for 'comparing'.

Page 46/50 | < Previous Page | 42 43 44 45 46 47 48 49 50 | Next Page >

- by Brian

- by James Michael Hare

- by James

- by Pinal Dave

- by DrJohn

- by Pinal Dave

- by tonyrogerson

- by Mladen Prajdic

- by claws

- by Chris Hoffman

- by Timothy Klenke

- by Charles Young

- by Dave Ballantyne

- by D'Arcy Lussier

- by Re-Invent

- by ToStringTheory

- by nmarun

- by goofrider

- by Kelly French

- by alvonellos

- by pinaldave

< Previous Page | 42 43 44 45 46 47 48 49 50 | Next Page >