Search Results

Search found 6152 results on 247 pages for 'known'.

Page 227/247 | < Previous Page | 223 224 225 226 227 228 229 230 231 232 233 234 | Next Page >

SQL SERVER – Faster SQL Server Databases and Applications – Power and Control with SafePeak Caching Options

- by Pinal Dave

Update: This blog post is written based on the SafePeak, which is available for free download. Today, I’d like to examine more closely one of my preferred technologies for accelerating SQL Server databases, SafePeak. Safepeak’s software provides a variety of advanced data caching options, techniques and tools to accelerate the performance and scalability of SQL Server databases and applications. I’d like to look more closely at some of these options, as some of these capabilities could help you address lagging database and performance on your systems. To better understand the available options, it is best to start by understanding the difference between the usual “Basic Caching” vs. SafePeak’s “Dynamic Caching”. Basic Caching Basic Caching (or the stale and static cache) is an ability to put the results from a query into cache for a certain period of time. It is based on TTL, or Time-to-live, and is designed to stay in cache no matter what happens to the data. For example, although the actual data can be modified due to DML commands (update/insert/delete), the cache will still hold the same obsolete query data. Meaning that with the Basic Caching is really static / stale cache. As you can tell, this approach has its limitations. Dynamic Caching Dynamic Caching (or the non-stale cache) is an ability to put the results from a query into cache while maintaining the cache transaction awareness looking for possible data modifications. The modifications can come as a result of: DML commands (update/insert/delete), indirect modifications due to triggers on other tables, executions of stored procedures with internal DML commands complex cases of stored procedures with multiple levels of internal stored procedures logic. When data modification commands arrive, the caching system identifies the related cache items and evicts them from cache immediately. In the dynamic caching option the TTL setting still exists, although its importance is reduced, since the main factor for cache invalidation (or cache eviction) become the actual data updates commands. Now that we have a basic understanding of the differences between “basic” and “dynamic” caching, let’s dive in deeper. SafePeak: A comprehensive and versatile caching platform SafePeak comes with a wide range of caching options. Some of SafePeak’s caching options are automated, while others require manual configuration. Together they provide a complete solution for IT and Data managers to reach excellent performance acceleration and application scalability for a wide range of business cases and applications. Automated caching of SQL Queries: Fully/semi-automated caching of all “read” SQL queries, containing any types of data, including Blobs, XMLs, Texts as well as all other standard data types. SafePeak automatically analyzes the incoming queries, categorizes them into SQL Patterns, identifying directly and indirectly accessed tables, views, functions and stored procedures; Automated caching of Stored Procedures: Fully or semi-automated caching of all read” stored procedures, including procedures with complex sub-procedure logic as well as procedures with complex dynamic SQL code. All procedures are analyzed in advance by SafePeak’s Metadata-Learning process, their SQL schemas are parsed – resulting with a full understanding of the underlying code, objects dependencies (tables, views, functions, sub-procedures) enabling automated or semi-automated (manually review and activate by a mouse-click) cache activation, with full understanding of the transaction logic for cache real-time invalidation; Transaction aware cache: Automated cache awareness for SQL transactions (SQL and in-procs); Dynamic SQL Caching: Procedures with dynamic SQL are pre-parsed, enabling easy cache configuration, eliminating SQL Server load for parsing time and delivering high response time value even in most complicated use-cases; Fully Automated Caching: SQL Patterns (including SQL queries and stored procedures) that are categorized by SafePeak as “read and deterministic” are automatically activated for caching; Semi-Automated Caching: SQL Patterns categorized as “Read and Non deterministic” are patterns of SQL queries and stored procedures that contain reference to non-deterministic functions, like getdate(). Such SQL Patterns are reviewed by the SafePeak administrator and in usually most of them are activated manually for caching (point and click activation); Fully Dynamic Caching: Automated detection of all dependent tables in each SQL Pattern, with automated real-time eviction of the relevant cache items in the event of “write” commands (a DML or a stored procedure) to one of relevant tables. A default setting; Semi Dynamic Caching: A manual cache configuration option enabling reducing the sensitivity of specific SQL Patterns to “write” commands to certain tables/views. An optimization technique relevant for cases when the query data is either known to be static (like archive order details), or when the application sensitivity to fresh data is not critical and can be stale for short period of time (gaining better performance and reduced load); Scheduled Cache Eviction: A manual cache configuration option enabling scheduling SQL Pattern cache eviction based on certain time(s) during a day. A very useful optimization technique when (for example) certain SQL Patterns can be cached but are time sensitive. Example: “select customers that today is their birthday”, an SQL with getdate() function, which can and should be cached, but the data stays relevant only until 00:00 (midnight); Parsing Exceptions Management: Stored procedures that were not fully parsed by SafePeak (due to too complex dynamic SQL or unfamiliar syntax), are signed as “Dynamic Objects” with highest transaction safety settings (such as: Full global cache eviction, DDL Check = lock cache and check for schema changes, and more). The SafePeak solution points the user to the Dynamic Objects that are important for cache effectiveness, provides easy configuration interface, allowing you to improve cache hits and reduce cache global evictions. Usually this is the first configuration in a deployment; Overriding Settings of Stored Procedures: Override the settings of stored procedures (or other object types) for cache optimization. For example, in case a stored procedure SP1 has an “insert” into table T1, it will not be allowed to be cached. However, it is possible that T1 is just a “logging or instrumentation” table left by developers. By overriding the settings a user can allow caching of the problematic stored procedure; Advanced Cache Warm-Up: Creating an XML-based list of queries and stored procedure (with lists of parameters) for periodically automated pre-fetching and caching. An advanced tool allowing you to handle more rare but very performance sensitive queries pre-fetch them into cache allowing high performance for users’ data access; Configuration Driven by Deep SQL Analytics: All SQL queries are continuously logged and analyzed, providing users with deep SQL Analytics and Performance Monitoring. Reduce troubleshooting from days to minutes with database objects and SQL Patterns heat-map. The performance driven configuration helps you to focus on the most important settings that bring you the highest performance gains. Use of SafePeak SQL Analytics allows continuous performance monitoring and analysis, easy identification of bottlenecks of both real-time and historical data; Cloud Ready: Available for instant deployment on Amazon Web Services (AWS). As you can see, there are many options to configure SafePeak’s SQL Server database and application acceleration caching technology to best fit a lot of situations. If you’re not familiar with their technology, they offer free-trial software you can download that comes with a free “help session” to help get you started. You can access the free trial here. Also, SafePeak is available to use on Amazon Cloud. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Writing the tests for FluentPath

- by Bertrand Le Roy

Writing the tests for FluentPath is a challenge. The library is a wrapper around a legacy API (System.IO) that wasn’t designed to be easily testable. If it were more testable, the sensible testing methodology would be to tell System.IO to act against a mock file system, which would enable me to verify that my code is doing the expected file system operations without having to manipulate the actual, physical file system: what we are testing here is FluentPath, not System.IO. Unfortunately, that is not an option as nothing in System.IO enables us to plug a mock file system in. As a consequence, we are left with few options. A few people have suggested me to abstract my calls to System.IO away so that I could tell FluentPath – not System.IO – to use a mock instead of the real thing. That in turn is getting a little silly: FluentPath already is a thin abstraction around System.IO, so layering another abstraction between them would double the test surface while bringing little or no value. I would have to test that new abstraction layer, and that would bring us back to square one. Unless I’m missing something, the only option I have here is to bite the bullet and test against the real file system. Of course, the tests that do that can hardly be called unit tests. They are more integration tests as they don’t only test bits of my code. They really test the successful integration of my code with the underlying System.IO. In order to write such tests, the techniques of BDD work particularly well as they enable you to express scenarios in natural language, from which test code is generated. Integration tests are being better expressed as scenarios orchestrating a few basic behaviors, so this is a nice fit. The Orchard team has been successfully using SpecFlow for integration tests for a while and I thought it was pretty cool so that’s what I decided to use. Consider for example the following scenario: Scenario: Change extension Given a clean test directory When I change the extension of bar\notes.txt to foo Then bar\notes.txt should not exist And bar\notes.foo should exist This is human readable and tells you everything you need to know about what you’re testing, but it is also executable code. What happens when SpecFlow compiles this scenario is that it executes a bunch of regular expressions that identify the known Given (set-up phases), When (actions) and Then (result assertions) to identify the code to run, which is then translated into calls into the appropriate methods. Nothing magical. Here is the code generated by SpecFlow: [NUnit.Framework.TestAttribute()] [NUnit.Framework.DescriptionAttribute("Change extension")] public virtual void ChangeExtension() { TechTalk.SpecFlow.ScenarioInfo scenarioInfo = new TechTalk.SpecFlow.ScenarioInfo("Change extension", ((string[])(null))); #line 6 this.ScenarioSetup(scenarioInfo); #line 7 testRunner.Given("a clean test directory"); #line 8 testRunner.When("I change the extension of " + "bar\\notes.txt to foo"); #line 9 testRunner.Then("bar\\notes.txt should not exist"); #line 10 testRunner.And("bar\\notes.foo should exist"); #line hidden testRunner.CollectScenarioErrors();} The #line directives are there to give clues to the debugger, because yes, you can put breakpoints into a scenario: The way you usually write tests with SpecFlow is that you write the scenario first, let it fail, then write the translation of your Given, When and Then into code if they don’t already exist, which results in running but failing tests, and then you write the code to make your tests pass (you implement the scenario). In the case of FluentPath, I built a simple Given method that builds a simple file hierarchy in a temporary directory that all scenarios are going to work with: [Given("a clean test directory")] public void GivenACleanDirectory() { _path = new Path(SystemIO.Path.GetTempPath()) .CreateSubDirectory("FluentPathSpecs") .MakeCurrent(); _path.GetFileSystemEntries() .Delete(true); _path.CreateFile("foo.txt", "This is a text file named foo."); var bar = _path.CreateSubDirectory("bar"); bar.CreateFile("baz.txt", "bar baz") .SetLastWriteTime(DateTime.Now.AddSeconds(-2)); bar.CreateFile("notes.txt", "This is a text file containing notes."); var barbar = bar.CreateSubDirectory("bar"); barbar.CreateFile("deep.txt", "Deep thoughts"); var sub = _path.CreateSubDirectory("sub"); sub.CreateSubDirectory("subsub"); sub.CreateFile("baz.txt", "sub baz") .SetLastWriteTime(DateTime.Now); sub.CreateFile("binary.bin", new byte[] {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0xFF}); } Then, to implement the scenario that you can read above, I had to write the following When: [When("I change the extension of (.*) to (.*)")] public void WhenIChangeTheExtension( string path, string newExtension) { var oldPath = Path.Current.Combine(path.Split('\\')); oldPath.Move(p => p.ChangeExtension(newExtension)); } As you can see, the When attribute is specifying the regular expression that will enable the SpecFlow engine to recognize what When method to call and also how to map its parameters. For our scenario, “bar\notes.txt” will get mapped to the path parameter, and “foo” to the newExtension parameter. And of course, the code that verifies the assumptions of the scenario: [Then("(.*) should exist")] public void ThenEntryShouldExist(string path) { Assert.IsTrue(_path.Combine(path.Split('\\')).Exists); } [Then("(.*) should not exist")] public void ThenEntryShouldNotExist(string path) { Assert.IsFalse(_path.Combine(path.Split('\\')).Exists); } These steps should be written with reusability in mind. They are building blocks for your scenarios, not implementation of a specific scenario. Think small and fine-grained. In the case of the above steps, I could reuse each of those steps in other scenarios. Those tests are easy to write and easier to read, which means that they also constitute a form of documentation. Oh, and SpecFlow is just one way to do this. Rob wrote a long time ago about this sort of thing (but using a different framework) and I highly recommend this post if I somehow managed to pique your interest: http://blog.wekeroad.com/blog/make-bdd-your-bff-2/ And this screencast (Rob always makes excellent screencasts): http://blog.wekeroad.com/mvc-storefront/kona-3/ (click the “Download it here” link)

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
Does Test Driven Development (TDD) improve Quality and Correctness? (Part 1)

- by David V. Corbin

Since the dawn of the computer age, various methodologies have been introduced to improve quality and reduce cost. In this posting, I will by sharing my experiences with Test Driven Development; both its benefits and limitations. To start this topic, we need to agree on what TDD is. The first is to define each of the three words as used in this context. Test - An item or action which measures something in some quantifiable form. Driven - The primary motivation or focus of a series of activities (process) Development - All phases of a software project/product from concept through delivery. The above are very simple definitions that result in the following: "TDD is a process where the primary focus is on measuring and quantifying all aspects of the creation of a (software) product." There are many places where TDD is used outside of software development, even though it is not known by this name. Consider the (conventional) education process that most of us grew up on. The focus was to get the best grades as measured by different tests. Many of these tests measured rote memorization and not understanding of the subject matter. The result of this that many people graduated with high scores but without "quality and correctness" in their ability to utilize the subject matter (of course, the flip side is true where certain people DID understand the material but were not very good at taking this type of test). Returning to software development, let us look at some common scenarios. While these items are generally applicable regardless of platform, language and tools; the remainder of this post will utilize Microsoft Visual Studio and Team Foundation Server (TFS) for examples. It should be realized that everyone does at least some aspect of TDD. At the most rudimentary level, getting a program to compile involves a "pass/fail" measurement (is the syntax valid) that drives their ability to proceed further (run the program). Other developers may create "Unit Tests" in the belief that having a test for every method/property of a class and good code coverage is the goal of TDD. These items may be helpful and even important, but really only address a small aspect of the overall effort. To see TDD in a bigger view, lets identify the various activities that are part of the Software Development LifeCycle. These are going to be presented in a Waterfall style for simplicity, but each item also occurs within Iterative methodologies such as Agile/Scrum. the key ones here are: Requirements Gathering Architecture Design Implementation Quality Assurance Can each of these items be subjected to a process which establishes metrics (quantified metrics) that reflect both the quality and correctness of each item? It should be clear that conventional Unit Tests do not apply to all of these items; at best they can verify that a local aspect (e.g. a Class/Method) of implementation matches the (test writers perspective of) the appropriate design document. So what can we do? For each of area, the goal is to create tests that are quantifiable and durable. The ability to quantify the measurements (beyond a simple pass/fail) is critical to tracking progress(eventually measuring the level of success that has been achieved) and for providing clear information on what items need to be addressed (along with the appropriate time to address them - in varying levels of detail) . Durability is important so that the test can be reapplied (ideally in an automated fashion) over the entire cycle. Returning for a moment back to our "education example", one must also be careful of how the tests are organized and how the measurements are taken. If a test is in a multiple choice format, there is a significant statistical probability that a correct answer might be the result of a random guess. Also, in many situations, having the student simply provide a final answer can obscure many important elements. For example, on a math test, having the student simply provide a numeric answer (rather than showing the methodology) may result in a complete mismatch between the process and the result. It is hard to determine which is worse: The student who makes a simple arithmetric error at one step of a long process (resulting in a wrong answer) or The student who (without providing the "workflow") uses a completely invalid approach, yet still comes up with the right number. The "Wrong Process"/"Right Answer" is probably the single biggest problem in software development. Even very simple items can suffer from this. As an example consider the following code for a "straight line" calculation....Is it correct? (for Integral Points) int Solve(int m, int b, int x) { return m * x + b; } Most people would respond "Yes". But let's take the question one step further... Is it correct for all possible values of m,b,x??? (no fair if you cheated by being focused on the bolded text!) Without additional information regarding constrains on "the possible values of m,b,x" the answer must be NO, there is the risk of overflow/wraparound that will produce an incorrect result! To properly answer this question (i.e. Test the Code), one MUST be able to backtrack from the implementation through the design, and architecture all the way back to the requirements. And the requirement itself must be tested against the stakeholder(s). It is only when the bounding conditions are defined that it is possible to determine if the code is "Correct" and has "Quality". Yet, how many of us (myself included) have written such code without even thinking about it. In many canses we (think we) "know" what the bounds are, and that the code will be correct. As we all know, requirements change, "code reuse" causes implementations to be applied to different scenarios, etc. This leads directly to the types of system failures that plague so many projects. This approach to TDD is much more holistic than ones which start by focusing on the details. The fundamental concepts still apply: Each item should be tested. The test should be defined/implemented before (or concurrent with) the definition/implementation of the actual item. We also add concepts that expand the scope and alter the style by recognizing: There are many things beside "lines of code" that benefit from testing (measuring/evaluating in a formal way) Correctness and Quality can not be solely measured by "correct results" In the future parts, we will examine in greater detail some of the techniques that can be applied to each of these areas....

Read the article
The Data Scientist

- by BuckWoody

A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

Read the article
Use BGInfo to Build a Database of System Information of Your Network Computers

- by Sysadmin Geek

One of the more popular tools of the Sysinternals suite among system administrators is BGInfo which tacks real-time system information to your desktop wallpaper when you first login. For obvious reasons, having information such as system memory, available hard drive space and system up time (among others) right in front of you is very convenient when you are managing several systems. A little known feature about this handy utility is the ability to have system information automatically saved to a SQL database or some other data file. With a few minutes of setup work you can easily configure BGInfo to record system information of all your network computers in a centralized storage location. You can then use this data to monitor or report on these systems however you see fit. BGInfo Setup If you are familiar with BGInfo, you can skip this section. However, if you have never used this tool, it takes just a few minutes to setup in order to capture the data you are looking for. When you first open BGInfo, a timer will be counting down in the upper right corner. Click the countdown button to keep the interface up so we can edit the settings. Now edit the information you want to capture from the available fields on the right. Since all the output will be redirected to a central location, don’t worry about configuring the layout or formatting. Configuring the Storage Database BGInfo supports the ability to store information in several database formats: SQL Server Database, Access Database, Excel and Text File. To configure this option, open File > Database. Using a Text File The simplest, and perhaps most practical, option is to store the BGInfo data in a comma separated text file. This format allows for the file to be opened in Excel or imported into a database. To use a text file or any other file system type (Excel or MS Access), simply provide the UNC to the respective file. The account running the task to write to this file will need read/write access to both the share and NTFS file permissions. When using a text file, the only option is to have BGInfo create a new entry each time the capture process is run which will add a new line to the respective CSV text file. Using a SQL Database If you prefer to have the data dropped straight into a SQL Server database, BGInfo support this as well. This requires a bit of additional configuration, but overall it is very easy. The first step is to create a database where the information will be stored. Additionally, you will want to create a user account to fill data into this table (and this table only). For your convenience, this script creates a new database and user account (run this as Administrator on your SQL Server machine): @SET Server=%ComputerName%.@SET Database=BGInfo@SET UserName=BGInfo@SET Password=passwordSQLCMD -S “%Server%” -E -Q “Create Database [%Database%]“SQLCMD -S “%Server%” -E -Q “Create Login [%UserName%] With Password=N’%Password%’, DEFAULT_DATABASE=[%Database%], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF”SQLCMD -S “%Server%” -E -d “%Database%” -Q “Create User [%UserName%] For Login [%UserName%]“SQLCMD -S “%Server%” -E -d “%Database%” -Q “EXEC sp_addrolemember N’db_owner’, N’%UserName%’” Note the SQL user account must have ‘db_owner’ permissions on the database in order for BGInfo to work correctly. This is why you should have a SQL user account specifically for this database. Next, configure BGInfo to connect to this database by clicking on the SQL button. Fill out the connection properties according to your database settings. Select the option of whether or not to only have one entry per computer or keep a history of each system. The data will then be dropped directly into a table named “BGInfoTable” in the respective database. Configure User Desktop Options While the primary function of BGInfo is to alter the user’s desktop by adding system info as part of the wallpaper, for our use here we want to leave the user’s wallpaper alone so this process runs without altering any of the user’s settings. Click the Desktops button. Configure the Wallpaper modifications to not alter anything. Preparing the Deployment Now we are all set for deploying the configuration to the individual machines so we can start capturing the system data. If you have not done so already, click the Apply button to create the first entry in your data repository. If all is configured correctly, you should be able to open your data file or database and see the entry for the respective machine. Now click the File > Save As menu option and save the configuration as “BGInfoCapture.bgi”. Deploying to Client Machines Deployment to the respective client machines is pretty straightforward. No installation is required as you just need to copy the BGInfo.exe and the BGInfoCapture.bgi to each machine and place them in the same directory. Once in place, just run the command: BGInfo.exe BGInfoCapture.bgi /Timer:0 /Silent /NoLicPrompt Of course, you probably want to schedule the capture process to run on a schedule. This command creates a Scheduled Task to run the capture process at 8 AM every morning and assumes you copied the required files to the root of your C drive: SCHTASKS /Create /SC DAILY /ST 08:00 /TN “System Info” /TR “C:\BGInfo.exe C:\BGInfoCapture.bgi /Timer:0 /Silent /NoLicPrompt” Adjust as needed, but the end result is the scheduled task command should look something like this: Download BGInfo from Sysinternals Latest Features How-To Geek ETC How To Create Your Own Custom ASCII Art from Any Image How To Process Camera Raw Without Paying for Adobe Photoshop How Do You Block Annoying Text Message (SMS) Spam? How to Use and Master the Notoriously Difficult Pen Tool in Photoshop HTG Explains: What Are the Differences Between All Those Audio Formats? How To Use Layer Masks and Vector Masks to Remove Complex Backgrounds in Photoshop Bring Summer Back to Your Desktop with the LandscapeTheme for Chrome and Iron The Prospector – Home Dash Extension Creates a Whole New Browsing Experience in Firefox KinEmote Links Kinect to Windows Why Nobody Reads Web Site Privacy Policies [Infographic] Asian Temple in the Snow Wallpaper 10 Weird Gaming Records from the Guinness Book

Read the article
The Shift: how Orchard painlessly shifted to document storage, and how it’ll affect you

- by Bertrand Le Roy

We’ve known it all along. The storage for Orchard content items would be much more efficient using a document database than a relational one. Orchard content items are composed of parts that serialize naturally into infoset kinds of documents. Storing them as relational data like we’ve done so far was unnatural and requires the data for a single item to span multiple tables, related through 1-1 relationships. This means lots of joins in queries, and a great potential for Select N+1 problems. Document databases, unfortunately, are still a tough sell in many places that prefer the more familiar relational model. Being able to x-copy Orchard to hosters has also been a basic constraint in the design of Orchard. Combine those with the necessity at the time to run in medium trust, and with license compatibility issues, and you’ll find yourself with very few reasonable choices. So we went, a little reluctantly, for relational SQL stores, with the dream of one day transitioning to document storage. We have played for a while with the idea of building our own document storage on top of SQL databases, and Sébastien implemented something more than decent along those lines, but we had a better way all along that we didn’t notice until recently… In Orchard, there are fields, which are named properties that you can add dynamically to a content part. Because they are so dynamic, we have been storing them as XML into a column on the main content item table. This infoset storage and its associated API are fairly generic, but were only used for fields. The breakthrough was when Sébastien realized how this existing storage could give us the advantages of document storage with minimal changes, while continuing to use relational databases as the substrate. public bool CommercialPrices { get { return this.Retrieve(p => p.CommercialPrices); } set { this.Store(p => p.CommercialPrices, value); } } This code is very compact and efficient because the API can infer from the expression what the type and name of the property are. It is then able to do the proper conversions for you. For this code to work in a content part, there is no need for a record at all. This is particularly nice for site settings: one query on one table and you get everything you need. This shows how the existing infoset solves the data storage problem, but you still need to query. Well, for those properties that need to be filtered and sorted on, you can still use the current record-based relational system. This of course continues to work. We do however provide APIs that make it trivial to store into both record properties and the infoset storage in one operation: public double Price { get { return Retrieve(r => r.Price); } set { Store(r => r.Price, value); } } This code looks strikingly similar to the non-record case above. The difference is that it will manage both the infoset and the record-based storages. The call to the Store method will send the data in both places, keeping them in sync. The call to the Retrieve method does something even cooler: if the property you’re looking for exists in the infoset, it will return it, but if it doesn’t, it will automatically look into the record for it. And if that wasn’t cool enough, it will take that value from the record and store it into the infoset for the next time it’s required. This means that your data will start automagically migrating to infoset storage just by virtue of using the code above instead of the usual: public double Price { get { return Record.Price; } set { Record.Price = value; } } As your users browse the site, it will get faster and faster as Select N+1 issues will optimize themselves away. If you preferred, you could still have explicit migration code, but it really shouldn’t be necessary most of the time. If you do already have code using QueryHints to mitigate Select N+1 issues, you might want to reconsider those, as with the new system, you’ll want to avoid joins that you don’t need for filtering or sorting, further optimizing your queries. There are some rare cases where the storage of the property must be handled differently. Check out this string[] property on SearchSettingsPart for example: public string[] SearchedFields { get { return (Retrieve<string>("SearchedFields") ?? "") .Split(new[] {',', ' '}, StringSplitOptions.RemoveEmptyEntries); } set { Store("SearchedFields", String.Join(", ", value)); } } The array of strings is transformed by the property accessors into and from a comma-separated list stored in a string. The Retrieve and Store overloads used in this case are lower-level versions that explicitly specify the type and name of the attribute to retrieve or store. You may be wondering what this means for code or operations that look directly at the database tables instead of going through the new infoset APIs. Even if there is a record, the infoset version of the property will win if it exists, so it is necessary to keep the infoset up-to-date. It’s not very complicated, but definitely something to keep in mind. Here is what a product record looks like in Nwazet.Commerce for example: And here is the same data in the infoset: The infoset is stored in Orchard_Framework_ContentItemRecord or Orchard_Framework_ContentItemVersionRecord, depending on whether the content type is versionable or not. A good way to find what you’re looking for is to inspect the record table first, as it’s usually easier to read, and then get the item record of the same id. Here is the detailed XML document for this product: <Data> <ProductPart Inventory="40" Price="18" Sku="pi-camera-box" OutOfStockMessage="" AllowBackOrder="false" Weight="0.2" Size="" ShippingCost="null" IsDigital="false" /> <ProductAttributesPart Attributes="" /> <AutoroutePart DisplayAlias="camera-box" /> <TitlePart Title="Nwazet Pi Camera Box" /> <BodyPart Text="[...]" /> <CommonPart CreatedUtc="2013-09-10T00:39:00Z" PublishedUtc="2013-09-14T01:07:47Z" /> </Data> The data is neatly organized under each part. It is easy to see how that document is all you need to know about that content item, all in one table. If you want to modify that data directly in the database, you should be careful to do it in both the record table and the infoset in the content item record. In this configuration, the record is now nothing more than an index, and will only be used for sorting and filtering. Of course, it’s perfectly fine to mix record-backed properties and record-less properties on the same part. It really depends what you think must be sorted and filtered on. In turn, this potentially simplifies migrations considerably. So here it is, the great shift of Orchard to document storage, something that Orchard has been designed for all along, and that we were able to implement with a satisfying and surprising economy of resources. Expect this code to make its way into the 1.8 version of Orchard when that’s available.

Read the article
CodePlex Daily Summary for Sunday, March 28, 2010

CodePlex Daily Summary for Sunday, March 28, 2010New ProjectsFeed Tracker: Feed Tracker allows you to track your favorite feeds (RSS 2.0 and Atom 1.0) and open them up directly in your browser.FIM 2010 Resource Management Client: The Forefront Identity Management 2010 Resource Management Client is a library to communicate with the FIM 2010 web service. The development langu...Infection Protection: A game about controlling disease outbreak in a city. Developed for OGPC 2010, using Qt.OrthoLab: Homepage of Orthocone open-source laboratory.Paragliding ThermalMarker: Paragliding / Hanggliding Windows Application that receives waypoint files and returns only the thermals that get triggered more often in a place.RSSFalls: RssFalls makes it easier for developers to download RSS or Podcast enclosures.String Library for C++ Language: StrLib++ is a string library for C++ language. for now it support only ANSI strings, later Unicode support will added for UT8, UTF16 and UTF32 for...Sweeper: Sweeper is a Visual Studio 2008 add-in for C# that takes care of many of the trivial code-formatting issues that developers run into - particularly...System.Common: A .Net library that provides methods, properties and more that the .Net Framework doesn't provide.Tiveriad: The framework is designed to help you more easily build modular Windows applicationT-Shirts Online: Online shop build in Silverlight 4 using DIBS as payment module.New ReleasesArkSwitch: ArkSwitch v1.1.3: This release has some important changes. Thanks to MichyPrima for helping with some of the code. 1. Improved theming to more easily support multip...Catharsis: Catharsis 2.5 on catarsa.com: The Catharsis framework has finally its own portal http://catarsa.com The latest release version is 2.5 - string names of properties are not any ...EffiProz - A Pure C# Database: EffiProz CF 1.0: EffiProz for .Net compact framework.Encrypted Notes: Encrypted Notes 1.6: This is the latest version of Encrypted Notes (1.6). It has an installer - it will create a directory 'CPascoe' in My Documents. Once you have ext...Extend SmallBasic: Teaching Extensions v.009: Added Pentagon Crazy Recipe QuizGapi.NET - .NET (C#) wrapper for Google API: Gapi.NET 0.5.0.0: - Fixed some minor bugs. - Add minor features. - Performance improvement. See code check-ins for detailed informationHouseFly experimental controls: HouseFly experimental control: Alpha version of HouseFly experimental controlsiTuner - The iTunes Companion: iTuner 1.2.3738 Beta 2: V1.2 allows you to synchronize one or more iTunes playlists to a USB MP3 player. Beta 2 resolves all known issues. This continues the evolution ye...jQuery Library for SharePoint Web Services: SPServices 0.5.4: IMPORTANT NOTE: This release is in an alpha state. You should only download it if you know what you are getting and are interested in testing it f...JSINQ - LINQ to Objects for JavaScript: JSINQ 1.0: This is the first stable release of JSINQ. It is fully compatible with the 0.9 beta release. It contains the following new features: Now supports ...MSBuild Mercurial Tasks: 1.0.0 Beta: First release of the application. This version integrates all the basic functionalities of Mercurial as defined in the Use Case 1.Open Portal Foundation: Open Portal Foundation V1.4.2: What's news? noscript template was updated naming convention for layout autogenerated controls now use the "master" prefix. The documentation ce...Open Portal Foundation: Open Portal Foundation V1.4.4: What's news? ASP .NET Master page support for custom aspx integrated pages New usercontrols for ASCX, ASPX and Master page integration : Link: f...Paint.NET PSD Plugin: 1.5.0: RLE compression is now working fully on save. File sizes are now competitive with Photoshop's. Saving takes about twice as long with RLE compressi...Paragliding ThermalMarker: ThermalMarker_Alfa0.1: Release Alfa 1Simple Service Locator: Simple Service Locator v0.7: The Simple Service Locator is an easy-to-use Inversion of Control library that is a complete implementation of the Common Service Locator interface...String Library for C++ Language: Release 0.9: version 0.9 beta release DO NOT USE IN SERIOUS PROJECTS this release use default application heap, and because visual studio is using special debug...Sweeper: Sweeper Alpha 1: SweeperA Visual Studio Add-in for C# Code Formatting - Visual Studio 2008 Includes: A UI for options, Enable or disable any specific task you want ...T-Shirts Online: 1.0: First release of the online shop.Twilio Server Library for .NET (TSL.NET): v0.1.0 Beta: This is the first release of TSL.NET. This v0.1.0 release is a Beta. Subsequent builds will be posted as v0.1.x and release-candidate Betas will be...Vr30 OS: Facebook 1.0: Connect you to Facebook without your web browser.Vr30 OS: SkyBlog 1.0: SkyBlog without web browser.Vr30 OS: YouTube 1.0: Youtube without web browserWeb Image Resize Handler: Web Image Resize, Zoom, Rotate and Greyscale v.1.0: Efficient Web Image Resize, Zoom, Rotate and Greyscale cacheing handler for ASP.Net.WinXound: WinXound 3.3.0 Beta 1 for Mac OsX: This is the first Beta release for Apple Mac OsX (Universal Binary). DEBUG HELP NEEDED ! Please signal bugs, suggestions or feedback to: stefano_b...Most Popular ProjectsMetaSharpRawrWBFS ManagerASP.NET Ajax LibraryMicrosoft SQL Server Product Samples: DatabaseSilverlight ToolkitAJAX Control ToolkitLiveUpload to FacebookWindows Presentation Foundation (WPF)ASP.NETMost Active ProjectsRawrjQuery Library for SharePoint Web ServicesManaged Extensibility FrameworkBlogEngine.NETMicrosoft Biology Foundationpatterns & practices: Composite WPF and SilverlightLINQ to TwitterFarseer Physics EngineTable2ClassNB_Store - Free DotNetNuke Ecommerce Catalog Module

Read the article
Anatomy of a .NET Assembly - PE Headers

- by Simon Cooper

Today, I'll be starting a look at what exactly is inside a .NET assembly - how the metadata and IL is stored, how Windows knows how to load it, and what all those bytes are actually doing. First of all, we need to understand the PE file format. PE files .NET assemblies are built on top of the PE (Portable Executable) file format that is used for all Windows executables and dlls, which itself is built on top of the MSDOS executable file format. The reason for this is that when .NET 1 was released, it wasn't a built-in part of the operating system like it is nowadays. Prior to Windows XP, .NET executables had to load like any other executable, had to execute native code to start the CLR to read & execute the rest of the file. However, starting with Windows XP, the operating system loader knows natively how to deal with .NET assemblies, rendering most of this legacy code & structure unnecessary. It still is part of the spec, and so is part of every .NET assembly. The result of this is that there are a lot of structure values in the assembly that simply aren't meaningful in a .NET assembly, as they refer to features that aren't needed. These are either set to zero or to certain pre-defined values, specified in the CLR spec. There are also several fields that specify the size of other datastructures in the file, which I will generally be glossing over in this initial post. Structure of a PE file Most of a PE file is split up into separate sections; each section stores different types of data. For instance, the .text section stores all the executable code; .rsrc stores unmanaged resources, .debug contains debugging information, and so on. Each section has a section header associated with it; this specifies whether the section is executable, read-only or read/write, whether it can be cached... When an exe or dll is loaded, each section can be mapped into a different location in memory as the OS loader sees fit. In order to reliably address a particular location within a file, most file offsets are specified using a Relative Virtual Address (RVA). This specifies the offset from the start of each section, rather than the offset within the executable file on disk, so the various sections can be moved around in memory without breaking anything. The mapping from RVA to file offset is done using the section headers, which specify the range of RVAs which are valid within that section. For example, if the .rsrc section header specifies that the base RVA is 0x4000, and the section starts at file offset 0xa00, then an RVA of 0x401d (offset 0x1d within the .rsrc section) corresponds to a file offset of 0xa1d. Because each section has its own base RVA, each valid RVA has a one-to-one mapping with a particular file offset. PE headers As I said above, most of the header information isn't relevant to .NET assemblies. To help show what's going on, I've created a diagram identifying all the various parts of the first 512 bytes of a .NET executable assembly. I've highlighted the relevant bytes that I will refer to in this post: Bear in mind that all numbers are stored in the assembly in little-endian format; the hex number 0x0123 will appear as 23 01 in the diagram. The first 64 bytes of every file is the DOS header. This starts with the magic number 'MZ' (0x4D, 0x5A in hex), identifying this file as an executable file of some sort (an .exe or .dll). Most of the rest of this header is zeroed out. The important part of this header is at offset 0x3C - this contains the file offset of the PE signature (0x80). Between the DOS header & PE signature is the DOS stub - this is a stub program that simply prints out 'This program cannot be run in DOS mode.\r\n' to the console. I will be having a closer look at this stub later on. The PE signature starts at offset 0x80, with the magic number 'PE\0\0' (0x50, 0x45, 0x00, 0x00), identifying this file as a PE executable, followed by the PE file header (also known as the COFF header). The relevant field in this header is in the last two bytes, and it specifies whether the file is an executable or a dll; bit 0x2000 is set for a dll. Next up is the PE standard fields, which start with a magic number of 0x010b for x86 and AnyCPU assemblies, and 0x20b for x64 assemblies. Most of the rest of the fields are to do with the CLR loader stub, which I will be covering in a later post. After the PE standard fields comes the NT-specific fields; again, most of these are not relevant for .NET assemblies. The one that is is the highlighted Subsystem field, and specifies if this is a GUI or console app - 0x20 for a GUI app, 0x30 for a console app. Data directories & section headers After the PE and COFF headers come the data directories; each directory specifies the RVA (first 4 bytes) and size (next 4 bytes) of various important parts of the executable. The only relevant ones are the 2nd (Import table), 13th (Import Address table), and 15th (CLI header). The Import and Import Address table are only used by the startup stub, so we will look at those later on. The 15th points to the CLI header, where the CLR-specific metadata begins. After the data directories comes the section headers; one for each section in the file. Each header starts with the section's ASCII name, null-padded to 8 bytes. Again, most of each header is irrelevant, but I've highlighted the base RVA and file offset in each header. In the diagram, you can see the following sections: .text: base RVA 0x2000, file offset 0x200 .rsrc: base RVA 0x4000, file offset 0xa00 .reloc: base RVA 0x6000, file offset 0x1000 The .text section contains all the CLR metadata and code, and so is by far the largest in .NET assemblies. The .rsrc section contains the data you see in the Details page in the right-click file properties page, but is otherwise unused. The .reloc section contains address relocations, which we will look at when we study the CLR startup stub. What about the CLR? As you can see, most of the first 512 bytes of an assembly are largely irrelevant to the CLR, and only a few bytes specify needed things like the bitness (AnyCPU/x86 or x64), whether this is an exe or dll, and the type of app this is. There are some bytes that I haven't covered that affect the layout of the file (eg. the file alignment, which determines where in a file each section can start). These values are pretty much constant in most .NET assemblies, and don't affect the CLR data directly. Conclusion To summarize, the important data in the first 512 bytes of a file is: DOS header. This contains a pointer to the PE signature. DOS stub, which we'll be looking at in a later post. PE signature PE file header (aka COFF header). This specifies whether the file is an exe or a dll. PE standard fields. This specifies whether the file is AnyCPU/32bit or 64bit. PE NT-specific fields. This specifies what type of app this is, if it is an app. Data directories. The 15th entry (at offset 0x168) contains the RVA and size of the CLI header inside the .text section. Section headers. These are used to map between RVA and file offset. The important one is .text, which is where all the CLR data is stored. In my next post, we'll start looking at the metadata used by the CLR directly, which is all inside the .text section.

Read the article
Emit Knowledge - social network for knowledge sharing

- by hajan

Emit Knowledge, as the words refer - it's a social network for emitting / sharing knowledge from users by users. Those who can benefit the most out of this network is perhaps all of YOU who have something to share with others and contribute to the knowledge world. I've been closely communicating with the core team of this very, very interesting, brand new social network (with specific purpose!) about the concept, idea and the vision they have for their product and I can say with a lot of confidence that this network has real potential to become something from which we will all benefit. I won't speak much about that and would prefer to give you link and try it yourself - http://www.emitknowledge.com Mainly, through the past few months I've been testing this network and it is getting improved all the time. The user experience is great, you can easily find out what you need and it follows some known patterns that are common for all social networks. They have some real good ideas and plans that are already under development for the next updates of their product. You can do micro blogging or you can do regular normal blogging… it’s up to you, and the way it works, it is seamless. Here is a short Question and Answers (QA) interview I made with the lead of the team, Marijan Nikolovski: 1. Can you please explain us briefly, what is Emit Knowledge? Emit Knowledge is a brand new knowledge based social network, delivering quality content from users to users. We believe that people’s knowledge, experience and professional thoughts compose quality content, worth sharing among millions around the world. Therefore, we created the platform that matches people’s need to share and gain knowledge in the most suitable and comfortable way. Easy to work with, Emit Knowledge lets you to smoothly craft and emit knowledge around the globe. 2. How 'old' is Emit Knowledge? In hamster’s years we are almost five years old start-up :). Just kidding. We’ve released our public beta about three months ago. Our official release date is 27 of June 2012. 3. How did you come up with this idea? Everything started from a simple idea to solve a complex problem. We’ve seen that the social web has become polluted with data and is on the right track to lose its base principles – socialization and common cause. That was our start point. We’ve gathered the team, drew some sketches and started to mind map the idea. After several idea refactoring’s Emit Knowledge was born. 4. Is there any competition out there in the market? Currently we don't have any competitors that share the same cause. What makes our platform different is the ideology that our product promotes and the functionalities that our platform offers for easy socialization based on interests and knowledge sharing. 5. What are the main technologies used to build Emit Knowledge? Emit Knowledge was built on a heterogeneous pallet of technologies. Currently, we have four of separation: UI – Built on ASP.NET MVC3 and Knockout.js; Messaging infrastructure – Build on top of RabbitMQ; Background services – Our in-house solution for job distribution, orchestration and processing; Data storage – Build on top of MongoDB; What are the main reasons you've chosen ASP.NET MVC? Since all of our team members are .NET engineers, the decision was very natural. ASP.NET MVC is the only Microsoft web stack that sticks to the HTTP behavioral standards. It is easy to work with, have a tiny learning curve and everyone who is familiar with the HTTP will understand its architecture and convention without any difficulties. 6. What are the main reasons for choosing ASP.NET MVC? Since all of our team members are .NET engineers, the decision was very natural. ASP.NET MVC is the only Microsoft web stack that sticks to the HTTP behavioral standards. It is easy to work with, have a tiny learning curve and everyone who is familiar with the HTTP will understand its architecture and convention without any difficulties. 7. Did you use some of the latest Microsoft technologies? If yes, which ones? Yes, we like to rock the cutting edge tech house. Currently we are using Microsoft’s latest technologies like ASP.NET MVC, Web API (work in progress) and the best for the last; we are utilizing Windows Azure IaaS to the bone. 8. Can you please tell us shortly, what would be the benefit of regular bloggers in other blogging platforms to join Emit Knowledge? Well, unless you are some of the smoking ace gurus whose blogs are followed by a large number of users, our platform offers knowledge based segregated community equipped with tools that will enable both current and future users to expand their relations and to self-promote in the community based on their activity and knowledge sharing. 10. I see you are working very intensively and there is already integration with some third-party services to make the process of sharing and emitting knowledge easier, which services did you integrate until now and what do you plan do to next? We have “reemit” functionality for internal sharing and we also support external services like: Twitter; LinkedIn; Facebook; For the regular bloggers we have an extra cream, Windows Live Writer support for easy blog posts emitting. 11. What should we expect next? Currently, we are working on a new fancy community feature. This means that we are going to support user groups to be formed. So for all existing communities and user groups out there, wait us a little bit, we are coming for rescue :). One of the top next features they are developing is the Community Feature. It means, if you have your own User Group, Community Group or any other Group on which you and your users are mostly blogging or sharing (emitting) knowledge in various ways, Emit Knowledge as a platform will help you have everything you need to promote your group, make new followers and host all the necessary stuff that you have had need of. I would invite you to try the network and start sharing knowledge in a way that will help you gather new followers and spread your knowledge faster, easier and in a more efficient way! Let’s Emit Knowledge!

Read the article
Big Data Matters with ODI12c

- by Madhu Nair

contributed by Mike Eisterer On October 17th, 2013, Oracle announced the release of Oracle Data Integrator 12c (ODI12c). This release signifies improvements to Oracle’s Data Integration portfolio of solutions, particularly Big Data integration. Why Big Data = Big Business Organizations are gaining greater insights and actionability through increased storage, processing and analytical benefits offered by Big Data solutions. New technologies and frameworks like HDFS, NoSQL, Hive and MapReduce support these benefits now. As further data is collected, analytical requirements increase and the complexity of managing transformations and aggregations of data compounds and organizations are in need for scalable Data Integration solutions. ODI12c provides enterprise solutions for the movement, translation and transformation of information and data heterogeneously and in Big Data Environments through: The ability for existing ODI and SQL developers to leverage new Big Data technologies. A metadata focused approach for cataloging, defining and reusing Big Data technologies, mappings and process executions. Integration between many heterogeneous environments and technologies such as HDFS and Hive. Generation of Hive Query Language. Working with Big Data using Knowledge Modules ODI12c provides developers with the ability to define sources and targets and visually develop mappings to effect the movement and transformation of data. As the mappings are created, ODI12c leverages a rich library of prebuilt integrations, known as Knowledge Modules (KMs). These KMs are contextual to the technologies and platforms to be integrated. Steps and actions needed to manage the data integration are pre-built and configured within the KMs. The Oracle Data Integrator Application Adapter for Hadoop provides a series of KMs, specifically designed to integrate with Big Data Technologies. The Big Data KMs include: Check Knowledge Module Reverse Engineer Knowledge Module Hive Transform Knowledge Module Hive Control Append Knowledge Module File to Hive (LOAD DATA) Knowledge Module File-Hive to Oracle (OLH-OSCH) Knowledge Module Nothing to beat an Example: To demonstrate the use of the KMs which are part of the ODI Application Adapter for Hadoop, a mapping may be defined to move data between files and Hive targets. The mapping is defined by dragging the source and target into the mapping, performing the attribute (column) mapping (see Figure 1) and then selecting the KM which will govern the process. In this mapping example, movie data is being moved from an HDFS source into a Hive table. Some of the attributes, such as “CUSTID to custid”, have been mapped over. Figure 1 Defining the Mapping Before the proper KM can be assigned to define the technology for the mapping, it needs to be added to the ODI project. The Big Data KMs have been made available to the project through the KM import process. Generally, this is done prior to defining the mapping. Figure 2 Importing the Big Data Knowledge Modules Following the import, the KMs are available in the Designer Navigator. v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Normal 0 false false false EN-US ZH-TW X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Figure 3 The Project View in Designer, Showing Installed IKMs Once the KM is imported, it may be assigned to the mapping target. This is done by selecting the Physical View of the mapping and examining the Properties of the Target. In this case MOVIAPP_LOG_STAGE is the target of our mapping. Figure 4 Physical View of the Mapping and Assigning the Big Data Knowledge Module to the Target Alternative KMs may have been selected as well, providing flexibility and abstracting the logical mapping from the physical implementation. Our mapping may be applied to other technologies as well. The mapping is now complete and is ready to run. We will see more in a future blog about running a mapping to load Hive. To complete the quick ODI for Big Data Overview, let us take a closer look at what the IKM File to Hive is doing for us. ODI provides differentiated capabilities by defining the process and steps which normally would have to be manually developed, tested and implemented into the KM. As shown in figure 5, the KM is preparing the Hive session, managing the Hive tables, performing the initial load from HDFS and then performing the insert into Hive. HDFS and Hive options are selected graphically, as shown in the properties in Figure 4. Figure 5 Process and Steps Managed by the KM What’s Next Big Data being the shape shifting business challenge it is is fast evolving into the deciding factor between market leaders and others. Now that an introduction to ODI and Big Data has been provided, look for additional blogs coming soon using the Knowledge Modules which make up the Oracle Data Integrator Application Adapter for Hadoop: Importing Big Data Metadata into ODI, Testing Data Stores and Loading Hive Targets Generating Transformations using Hive Query language Loading Oracle from Hadoop Sources For more information now, please visit the Oracle Data Integrator Application Adapter for Hadoop web site, http://www.oracle.com/us/products/middleware/data-integration/hadoop/overview/index.html Do not forget to tune in to the ODI12c Executive Launch webcast on the 12th to hear more about ODI12c and GG12c. Normal 0 false false false EN-US ZH-TW X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";}

Read the article
Anatomy of a .NET Assembly - CLR metadata 1

- by Simon Cooper

Before we look at the bytes comprising the CLR-specific data inside an assembly, we first need to understand the logical format of the metadata (For this post I only be looking at simple pure-IL assemblies; mixed-mode assemblies & other things complicates things quite a bit). Metadata streams Most of the CLR-specific data inside an assembly is inside one of 5 streams, which are analogous to the sections in a PE file. The name of each section in a PE file starts with a ., and the name of each stream in the CLR metadata starts with a #. All but one of the streams are heaps, which store unstructured binary data. The predefined streams are: #~ Also called the metadata stream, this stream stores all the information on the types, methods, fields, properties and events in the assembly. Unlike the other streams, the metadata stream has predefined contents & structure. #Strings This heap is where all the namespace, type & member names are stored. It is referenced extensively from the #~ stream, as we'll be looking at later. #US Also known as the user string heap, this stream stores all the strings used in code directly. All the strings you embed in your source code end up in here. This stream is only referenced from method bodies. #GUID This heap exclusively stores GUIDs used throughout the assembly. #Blob This heap is for storing pure binary data - method signatures, generic instantiations, that sort of thing. Items inside the heaps (#Strings, #US, #GUID and #Blob) are indexed using a simple binary offset from the start of the heap. At that offset is a coded integer giving the length of that item, then the item's bytes immediately follow. The #GUID stream is slightly different, in that GUIDs are all 16 bytes long, so a length isn't required. Metadata tables The #~ stream contains all the assembly metadata. The metadata is organised into 45 tables, which are binary arrays of predefined structures containing information on various aspects of the metadata. Each entry in a table is called a row, and the rows are simply concatentated together in the file on disk. For example, each row in the TypeRef table contains: A reference to where the type is defined (most of the time, a row in the AssemblyRef table). An offset into the #Strings heap with the name of the type An offset into the #Strings heap with the namespace of the type. in that order. The important tables are (with their table number in hex): 0x2: TypeDef 0x4: FieldDef 0x6: MethodDef 0x14: EventDef 0x17: PropertyDef Contains basic information on all the types, fields, methods, events and properties defined in the assembly. 0x1: TypeRef The details of all the referenced types defined in other assemblies. 0xa: MemberRef The details of all the referenced members of types defined in other assemblies. 0x9: InterfaceImpl Links the types defined in the assembly with the interfaces that type implements. 0xc: CustomAttribute Contains information on all the attributes applied to elements in this assembly, from method parameters to the assembly itself. 0x18: MethodSemantics Links properties and events with the methods that comprise the get/set or add/remove methods of the property or method. 0x1b: TypeSpec 0x2b: MethodSpec These tables provide instantiations of generic types and methods for each usage within the assembly. There are several ways to reference a single row within a table. The simplest is to simply specify the 1-based row index (RID). The indexes are 1-based so a value of 0 can represent 'null'. In this case, which table the row index refers to is inferred from the context. If the table can't be determined from the context, then a particular row is specified using a token. This is a 4-byte value with the most significant byte specifying the table, and the other 3 specifying the 1-based RID within that table. This is generally how a metadata table row is referenced from the instruction stream in method bodies. The third way is to use a coded token, which we will look at in the next post. So, back to the bytes Now we've got a rough idea of how the metadata is logically arranged, we can now look at the bytes comprising the start of the CLR data within an assembly: The first 8 bytes of the .text section are used by the CLR loader stub. After that, the CLR-specific data starts with the CLI header. I've highlighted the important bytes in the diagram. In order, they are: The size of the header. As the header is a fixed size, this is always 0x48. The CLR major version. This is always 2, even for .NET 4 assemblies. The CLR minor version. This is always 5, even for .NET 4 assemblies, and seems to be ignored by the runtime. The RVA and size of the metadata header. In the diagram, the RVA 0x20e4 corresponds to the file offset 0x2e4 Various flags specifying if this assembly is pure-IL, whether it is strong name signed, and whether it should be run as 32-bit (this is how the CLR differentiates between x86 and AnyCPU assemblies). A token pointing to the entrypoint of the assembly. In this case, 06 (the last byte) refers to the MethodDef table, and 01 00 00 refers to to the first row in that table. (after a gap) RVA of the strong name signature hash, which comes straight after the CLI header. The RVA 0x2050 corresponds to file offset 0x250. The rest of the CLI header is mainly used in mixed-mode assemblies, and so is zeroed in this pure-IL assembly. After the CLI header comes the strong name hash, which is a SHA-1 hash of the assembly using the strong name key. After that comes the bodies of all the methods in the assembly concatentated together. Each method body starts off with a header, which I'll be looking at later. As you can see, this is a very small assembly with only 2 methods (an instance constructor and a Main method). After that, near the end of the .text section, comes the metadata, containing a metadata header and the 5 streams discussed above. We'll be looking at this in the next post. Conclusion The CLI header data doesn't have much to it, but we've covered some concepts that will be important in later posts - the logical structure of the CLR metadata and the overall layout of CLR data within the .text section. Next, I'll have a look at the contents of the #~ stream, and how the table data is arranged on disk.

Read the article
Fraud Detection with the SQL Server Suite Part 2

- by Dejan Sarka

This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

Read the article
Integrating Oracle Hyperion Smart View Data Queries with MS Word and Power Point

- by Andreea Vaduva

Untitled Document table { border: thin solid; } Most Smart View users probably appreciate that they can use just one add-in to access data from the different sources they might work with, like Oracle Essbase, Oracle Hyperion Planning, Oracle Hyperion Financial Management and others. But not all of them are aware of the options to integrate data analyses not only in Excel, but also in MS Word or Power Point. While in the past, copying and pasting single numbers or tables from a recent analysis in Excel made the pasted content a static snapshot, copying so called Data Points now creates dynamic, updateable references to the data source. It also provides additional nice features, which can make life easier and less stressful for Smart View users. So, how does this option work: after building an ad-hoc analysis with Smart View as usual in an Excel worksheet, any area including data cells/numbers from the database can be highlighted in order to copy data points - even single data cells only. TIP It is not necessary to highlight and copy the row or column descriptions Next from the Smart View ribbon select Copy Data Point. Then transfer to the Word or Power Point document into which the selected content should be copied. Note that in these Office programs you will find a menu item Smart View;from it select the Paste Data Point icon. The copied details from the Excel report will be pasted, but showing #NEED_REFRESH in the data cells instead of the original numbers. =After clicking the Refresh icon on the Smart View menu the data will be retrieved and displayed. (Maybe at that moment a login window pops up and you need to provide your credentials.) It works in the same way if you just copy one single number without any row or column descriptions, for example in order to incorporate it into a continuous text: Before refresh: After refresh: From now on for any subsequent updates of the data shown in your documents you only need to refresh data by clicking the Refresh button on the Smart View menu, without copying and pasting the context or content again. As you might realize, trying out this feature on your own, there won’t be any Point of View shown in the Office document. Also you have seen in the example, where only a single data cell was copied, that there aren’t any member names or row/column descriptions copied, which are usually required in an ad-hoc report in order to exactly define where data comes from or how data is queried from the source. Well, these definitions are not visible, but they are transferred to the Word or Power Point document as well. They are stored in the background for each individual data cell copied and can be made visible by double-clicking the data cell as shown in the following screen shot (but which is taken from another context). So for each cell/number the complete connection information is stored along with the exact member/cell intersection from the database. And that’s not all: you have the chance now to exchange the members originally selected in the Point of View (POV) in the Excel report. Remember, at that time we had the following selection: By selecting the Manage POV option from the Smart View meny in Word or Power Point… … the following POV Manager – Queries window opens: You can now change your selection for each dimension from the original POV by either double-clicking the dimension member in the lower right box under POV: or by selecting the Member Selector icon on the top right hand side of the window. After confirming your changes you need to refresh your document again. Be aware, that this will update all (!) numbers taken from one and the same original Excel sheet, even if they appear in different locations in your Office document, reflecting your recent changes in the POV. TIP Build your original report already in a way that dimensions you might want to change from within Word or Power Point are placed in the POV. And there is another really nice feature I wouldn’t like to miss mentioning: Using Dynamic Data Points in the way described above, you will never miss or need to search again for your original Excel sheet from which values were taken and copied as data points into an Office document. Because from even only one single data cell Smart View is able to recreate the entire original report content with just a few clicks: Select one of the numbers from within your Word or Power Point document by double-clicking. Then select the Visualize in Excel option from the Smart View menu. Excel will open and Smart View will rebuild the entire original report, including POV settings, and retrieve all data from the most recent actual state of the database. (It might be necessary to provide your credentials before data is displayed.) However, in order to make this work, an active online connection to your databases on the server is necessary and at least read access to the retrieved data. But apart from this, your newly built Excel report is fully functional for ad-hoc analysis and can be used in the common way for drilling, pivoting and all the other known functions and features. So far about embedding Dynamic Data Points into Office documents and linking them back into Excel worksheets. You can apply this in the described way with ad-hoc analyses directly on Essbase databases or using Hyperion Planning and Hyperion Financial Management ad-hoc web forms. If you are also interested in other new features and smart enhancements in Essbase or Hyperion Planning stay tuned for coming articles or check our training courses and web presentations. You can find general information about offerings for the Essbase and Planning curriculum or other Oracle-Hyperion products here (please make sure to select your country/region at the top of this page) or in the OU Learning paths section , where Planning, Essbase and other Hyperion products can be found under the Fusion Middleware heading (again, please select the right country/region). Or drop me a note directly: [email protected] . About the Author: Bernhard Kinkel started working for Hyperion Solutions as a Presales Consultant and Consultant in 1998 and moved to Hyperion Education Services in 1999. He joined Oracle University in 2007 where he is a Principal Education Consultant. Based on these many years of working with Hyperion products he has detailed product knowledge across several versions. He delivers both classroom and live virtual courses. His areas of expertise are Oracle/Hyperion Essbase, Oracle Hyperion Planning and Hyperion Web Analysis.

Read the article
PostSharp, Obfuscation, and IL

- by Simon Cooper

Aspect-oriented programming (AOP) is a relatively new programming paradigm. Originating at Xerox PARC in 1994, the paradigm was first made available for general-purpose development as an extension to Java in 2001. From there, it has quickly been adapted for use in all the common languages used today. In the .NET world, one of the primary AOP toolkits is PostSharp. Attributes and AOP Normally, attributes in .NET are entirely a metadata construct. Apart from a few special attributes in the .NET framework, they have no effect whatsoever on how a class or method executes within the CLR. Only by using reflection at runtime can you access any attributes declared on a type or type member. PostSharp changes this. By declaring a custom attribute that derives from PostSharp.Aspects.Aspect, applying it to types and type members, and running the resulting assembly through the PostSharp postprocessor, you can essentially declare 'clever' attributes that change the behaviour of whatever the aspect has been applied to at runtime. A simple example of this is logging. By declaring a TraceAttribute that derives from OnMethodBoundaryAspect, you can automatically log when a method has been executed: public class TraceAttribute : PostSharp.Aspects.OnMethodBoundaryAspect { public override void OnEntry(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Entering {0}.{1}.", method.DeclaringType.FullName, method.Name)); } public override void OnExit(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Leaving {0}.{1}.", method.DeclaringType.FullName, method.Name)); } } [Trace] public void MethodToLog() { ... } Now, whenever MethodToLog is executed, the aspect will automatically log entry and exit, without having to add the logging code to MethodToLog itself. PostSharp Performance Now this does introduce a performance overhead - as you can see, the aspect allows access to the MethodBase of the method the aspect has been applied to. If you were limited to C#, you would be forced to retrieve each MethodBase instance using Type.GetMethod(), matching on the method name and signature. This is slow. Fortunately, PostSharp is not limited to C#. It can use any instruction available in IL. And in IL, you can do some very neat things. Ldtoken C# allows you to get the Type object corresponding to a specific type name using the typeof operator: Type t = typeof(Random); The C# compiler compiles this operator to the following IL: ldtoken [mscorlib]System.Random call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle( valuetype [mscorlib]System.RuntimeTypeHandle) The ldtoken instruction obtains a special handle to a type called a RuntimeTypeHandle, and from that, the Type object can be obtained using GetTypeFromHandle. These are both relatively fast operations - no string lookup is required, only direct assembly and CLR constructs are used. However, a little-known feature is that ldtoken is not just limited to types; it can also get information on methods and fields, encapsulated in a RuntimeMethodHandle or RuntimeFieldHandle: // get a MethodBase for String.EndsWith(string) ldtoken method instance bool [mscorlib]System.String::EndsWith(string) call class [mscorlib]System.Reflection.MethodBase [mscorlib]System.Reflection.MethodBase::GetMethodFromHandle( valuetype [mscorlib]System.RuntimeMethodHandle) // get a FieldInfo for the String.Empty field ldtoken field string [mscorlib]System.String::Empty call class [mscorlib]System.Reflection.FieldInfo [mscorlib]System.Reflection.FieldInfo::GetFieldFromHandle( valuetype [mscorlib]System.RuntimeFieldHandle) These usages of ldtoken aren't usable from C# or VB, and aren't likely to be added anytime soon (Eric Lippert's done a blog post on the possibility of adding infoof, methodof or fieldof operators to C#). However, PostSharp deals directly with IL, and so can use ldtoken to get MethodBase objects quickly and cheaply, without having to resort to string lookups. The kicker However, there are problems. Because ldtoken for methods or fields isn't accessible from C# or VB, it hasn't been as well-tested as ldtoken for types. This has resulted in various obscure bugs in most versions of the CLR when dealing with ldtoken and methods, and specifically, generic methods and methods of generic types. This means that PostSharp was behaving incorrectly, or just plain crashing, when aspects were applied to methods that were generic in some way. So, PostSharp has to work around this. Without using the metadata tokens directly, the only way to get the MethodBase of generic methods is to use reflection: Type.GetMethod(), passing in the method name as a string along with information on the signature. Now, this works fine. It's slower than using ldtoken directly, but it works, and this only has to be done for generic methods. Unfortunately, this poses problems when the assembly is obfuscated. PostSharp and Obfuscation When using ldtoken, obfuscators don't affect how PostSharp operates. Because the ldtoken instruction directly references the type, method or field within the assembly, it is unaffected if the name of the object is changed by an obfuscator. However, the indirect loading used for generic methods was breaking, because that uses the name of the method when the assembly is put through the PostSharp postprocessor to lookup the MethodBase at runtime. If the name then changes, PostSharp can't find it anymore, and the assembly breaks. So, PostSharp needs to know about any changes an obfuscator does to an assembly. The way PostSharp does this is by adding another layer of indirection. When PostSharp obfuscation support is enabled, it includes an extra 'name table' resource in the assembly, consisting of a series of method & type names. When PostSharp needs to lookup a method using reflection, instead of encoding the method name directly, it looks up the method name at a fixed offset inside that name table: MethodBase genericMethod = typeof(ContainingClass).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: get_Prop1 21: set_Prop1 22: DoFoo 23: GetWibble When the assembly is later processed by an obfuscator, the obfuscator can replace all the method and type names within the name table with their new name. That way, the reflection lookups performed by PostSharp will now use the new names, and everything will work as expected: MethodBase genericMethod = typeof(#kGy).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: #kkA 21: #zAb 22: #EF5a 23: #2tg As you can see, this requires direct support by an obfuscator in order to perform these rewrites. Dotfuscator supports it, and now, starting with SmartAssembly 6.6.4, SmartAssembly does too. So, a relatively simple solution to a tricky problem, with some CLR bugs thrown in for good measure. You don't see those every day!

Read the article
The SSIS tuning tip that everyone misses

- by Rob Farley

I know that everyone misses this, because I’m yet to find someone who doesn’t have a bit of an epiphany when I describe this. When tuning Data Flows in SQL Server Integration Services, people see the Data Flow as moving from the Source to the Destination, passing through a number of transformations. What people don’t consider is the Source, getting the data out of a database. Remember, the source of data for your Data Flow is not your Source Component. It’s wherever the data is, within your database, probably on a disk somewhere. You need to tune your query to optimise it for SSIS, and this is what most people fail to do. I’m not suggesting that people don’t tune their queries – there’s plenty of information out there about making sure that your queries run as fast as possible. But for SSIS, it’s not about how fast your query runs. Let me say that again, but in bolder text: The speed of an SSIS Source is not about how fast your query runs. If your query is used in a Source component for SSIS, the thing that matters is how fast it starts returning data. In particular, those first 10,000 rows to populate that first buffer, ready to pass down the rest of the transformations on its way to the Destination. Let’s look at a very simple query as an example, using the AdventureWorks database: We’re picking the different Weight values out of the Product table, and it’s doing this by scanning the table and doing a Sort. It’s a Distinct Sort, which means that the duplicates are discarded. It'll be no surprise to see that the data produced is sorted. Obvious, I know, but I'm making a comparison to what I'll do later. Before I explain the problem here, let me jump back into the SSIS world... If you’ve investigated how to tune an SSIS flow, then you’ll know that some SSIS Data Flow Transformations are known to be Blocking, some are Partially Blocking, and some are simply Row transformations. Take the SSIS Sort transformation, for example. I’m using a larger data set for this, because my small list of Weights won’t demonstrate it well enough. Seven buffers of data came out of the source, but none of them could be pushed past the Sort operator, just in case the last buffer contained the data that would be sorted into the first buffer. This is a blocking operation. Back in the land of T-SQL, we consider our Distinct Sort operator. It’s also blocking. It won’t let data through until it’s seen all of it. If you weren’t okay with blocking operations in SSIS, why would you be happy with them in an execution plan? The source of your data is not your OLE DB Source. Remember this. The source of your data is the NCIX/CIX/Heap from which it’s being pulled. Picture it like this... the data flowing from the Clustered Index, through the Distinct Sort operator, into the SELECT operator, where a series of SSIS Buffers are populated, flowing (as they get full) down through the SSIS transformations. Alright, I know that I’m taking some liberties here, because the two queries aren’t the same, but consider the visual. The data is flowing from your disk and through your execution plan before it reaches SSIS, so you could easily find that a blocking operation in your plan is just as painful as a blocking operation in your SSIS Data Flow. Luckily, T-SQL gives us a brilliant query hint to help avoid this. OPTION (FAST 10000) This hint means that it will choose a query which will optimise for the first 10,000 rows – the default SSIS buffer size. And the effect can be quite significant. First let’s consider a simple example, then we’ll look at a larger one. Consider our weights. We don’t have 10,000, so I’m going to use OPTION (FAST 1) instead. You’ll notice that the query is more expensive, using a Flow Distinct operator instead of the Distinct Sort. This operator is consuming 84% of the query, instead of the 59% we saw from the Distinct Sort. But the first row could be returned quicker – a Flow Distinct operator is non-blocking. The data here isn’t sorted, of course. It’s in the same order that it came out of the index, just with duplicates removed. As soon as a Flow Distinct sees a value that it hasn’t come across before, it pushes it out to the operator on its left. It still has to maintain the list of what it’s seen so far, but by handling it one row at a time, it can push rows through quicker. Overall, it’s a lot more work than the Distinct Sort, but if the priority is the first few rows, then perhaps that’s exactly what we want. The Query Optimizer seems to do this by optimising the query as if there were only one row coming through: This 1 row estimation is caused by the Query Optimizer imagining the SELECT operation saying “Give me one row” first, and this message being passed all the way along. The request might not make it all the way back to the source, but in my simple example, it does. I hope this simple example has helped you understand the significance of the blocking operator. Now I’m going to show you an example on a much larger data set. This data was fetching about 780,000 rows, and these are the Estimated Plans. The data needed to be Sorted, to support further SSIS operations that needed that. First, without the hint. ...and now with OPTION (FAST 10000): A very different plan, I’m sure you’ll agree. In case you’re curious, those arrows in the top one are 780,000 rows in size. In the second, they’re estimated to be 10,000, although the Actual figures end up being 780,000. The top one definitely runs faster. It finished several times faster than the second one. With the amount of data being considered, these numbers were in minutes. Look at the second one – it’s doing Nested Loops, across 780,000 rows! That’s not generally recommended at all. That’s “Go and make yourself a coffee” time. In this case, it was about six or seven minutes. The faster one finished in about a minute. But in SSIS-land, things are different. The particular data flow that was consuming this data was significant. It was being pumped into a Script Component to process each row based on previous rows, creating about a dozen different flows. The data flow would take roughly ten minutes to run – ten minutes from when the data first appeared. The query that completes faster – chosen by the Query Optimizer with no hints, based on accurate statistics (rather than pretending the numbers are smaller) – would take a minute to start getting the data into SSIS, at which point the ten-minute flow would start, taking eleven minutes to complete. The query that took longer – chosen by the Query Optimizer pretending it only wanted the first 10,000 rows – would take only ten seconds to fill the first buffer. Despite the fact that it might have taken the database another six or seven minutes to get the data out, SSIS didn’t care. Every time it wanted the next buffer of data, it was already available, and the whole process finished in about ten minutes and ten seconds. When debugging SSIS, you run the package, and sit there waiting to see the Debug information start appearing. You look for the numbers on the data flow, and seeing operators going Yellow and Green. Without the hint, I’d sit there for a minute. With the hint, just ten seconds. You can imagine which one I preferred. By adding this hint, it felt like a magic wand had been waved across the query, to make it run several times faster. It wasn’t the case at all – but it felt like it to SSIS.

Read the article
Oracle Tutor: Top 10 to Implement Sustainable Policies and Procedures

- by emily.chorba(at)oracle.com

Overview Your organization (executives, managers, and employees) understands the value of having written business process documents (process maps, procedures, instructions, reference documents, and form abstracts). Policies and procedures should be documented because they help to reduce the range of individual decisions and encourage management by exception: the manager only needs to give special attention to unusual problems, not covered by a specific policy or procedure. As more and more procedures are written to cover recurring situations, managers will begin to make decisions which will be consistent from one functional area to the next.Companies should take a project management approach when implementing an environment for a sustainable documentation program and do the following:1. Identify an Executive Champion2. Put together a winning team3. Assign ownership4. Centralize publishing5. Establish the Document Maintenance Process Up Front6. Document critical activities only7. Document actual practice8. Minimize documentation9. Support continuous improvement10. Keep it simple 1. Identify an Executive ChampionAppoint a top down driver. Select one key individual to be a mentor for the procedure planning team. The individual should be a senior manager, such as your company president, CIO, CFO, the vice-president of quality, manufacturing, or engineering. Written policies and procedures can be important supportive aids when known to express the thinking for the chief executive officer and / or the president and to have his or her full support. 2. Put Together a Winning TeamChoose a strong Project Management Leader and staff the procedure planning team with management members from cross functional groups. Make sure team members have the responsibility - and the authority - to make things happen.The winning team should consist of the Documentation Project Manager, Document Owners (one for each functional area), a Document Controller, and Document Specialists (as needed). The Tutor Implementation Guide has complete job descriptions for these roles. 3. Assign Ownership It is virtually impossible to keep process documentation simple and meaningful if employees who are far removed from the activity itself create it. It is impossible to keep documentation up-to-date when responsibility for the document is not clearly understood.Key to the Tutor methodology, therefore, is the concept of ownership. Each document has a single owner, who is responsible for ensuring that the document is necessary and that it reflects actual practice. The owner must be a person who is knowledgeable about the activity and who has the authority to build consensus among the persons who participate in the activity as well as the authority to define or change the way an activity is performed. The owner must be an advocate of the performers and negotiate, not dictate practices.In the Tutor environment, a document's owner is the only person with the authority to approve an update to that document. 4. Centralize Publishing Although it is tempting (especially in a networked environment and with document management software solutions) to decentralize the control of all documents -- with each owner updating and distributing his own -- Tutor promotes centralized publishing by assigning the Document Administrator (gate keeper) to manage the updates and distribution of the procedures library. 5. Establish a Document Maintenance Process Up Front (and stick to it) Everyone in your organization should know they are invited to suggest changes to procedures and should understand exactly what steps to take to do so. Tutor provides a set of procedures to help your company set up a healthy document control system. There are many document management products available to automate some of the document change and maintenance steps. Depending on the size of your organization, a simple document management system can reduce the effort it takes to track and distribute document changes and updates. Whether your company decides to store the written policies and procedures on a file server or in a database, the essential tasks for maintaining documents are the same, though some tasks are automated. 6. Document Critical Activities Only The best way to keep your documentation simple is to reduce the number of process documents to a bare minimum and to include in those documents only as much detail as is absolutely necessary. The first step to reducing process documentation is to document only those activities that are deemed critical. Not all activities require documentation. In fact, some critical activities cannot and should not be standardized. Others may be sufficiently documented with an instruction or a checklist and may not require a procedure. A document should only be created when it enhances the performance of the employee performing the activity. If it does not help the employee, then there is no reason to maintain the document. Activities that represent little risk (such as project status), activities that cannot be defined in terms of specific tasks (such as product research), and activities that can be performed in a variety of ways (such as advertising) often do not require documentation. Sometimes, an activity will evolve to the point where documentation is necessary. For example, an activity performed by single employee may be straightforward and uncomplicated -- that is, until the activity is performed by multiple employees. Sometimes, it is the interaction between co-workers that necessitates documentation; sometimes, it is the complexity or the diversity of the activity.7. Document Actual Practices The only reason to maintain process documentation is to enhance the performance of the employee performing the activity. And documentation can only enhance performance if it reflects reality -- that is, current best practice. Documentation that reflects an unattainable ideal or outdated practices will end up on the shelf, unused and forgotten.Documenting actual practice means (1) auditing the activity to understand how the work is really performed, (2) identifying best practices with employees who are involved in the activity, (3) building consensus so that everyone agrees on a common method, and (4) recording that consensus.8. Minimize Documentation One way to keep it simple is to document at the highest level possible. That is, include in your documents only as much detail as is absolutely necessary.When writing a document, you should ask yourself, What is the purpose of this document? That is, what problem will it solve?By focusing on this question, you can target the critical information.• What questions are the end users likely to have?• What level of detail is required?• Is any of this information extraneous to the document's purpose? Short, concise documents are user friendly and they are easier to keep up to date. 9. Support Continuous Improvement Employees who perform an activity are often in the best position to identify improvements to the process. In other words, continuous improvement is a natural byproduct of the work itself -- but only if the improvements are communicated to all employees who are involved in the process, and only if there is consensus among those employees.Traditionally, process documentation has been used to dictate performance, to limit employees' actions. In the Tutor environment, process documents are used to communicate improvements identified by employees. How does this work? The Tutor methodology requires a process document to reflect actual practice, so the owner of a document must routinely audit its content -- does the document match what the employees are doing? If it doesn't, the owner has the responsibility to evaluate the process, to build consensus among the employees, to identify "best practices," and to communicate these improvements via a document update. Continuous improvement can also be an outgrowth of corrective action -- but only if the solutions to problems are communicated effectively. The goal should be to solve a problem once and only once, which means not only identifying the solution, but ensuring that the solution becomes part of the process. The Tutor system provides the method through which improvements and solutions are documented and communicated to all affected employees in a cost-effective, timely manner; it ensures that improvements are not lost or confined to a single employee. 10. Keep it Simple Process documents don't have to be complex and unfriendly. In fact, the simpler the format and organization, the more likely the documents will be used. And the simpler the method of maintenance, the more likely the documents will be kept up-to-date. Keep it simply by:• Minimizing skills and training required• Following the established Tutor document format and layout• Avoiding technology just for technology's sake No other rule has as major an impact on the success of your internal documentation as -- keep it simple. Learn More For more information about Tutor, visit Oracle.Com or the Tutor Blog. Post your questions at the Tutor Forum. Emily Chorba Principle Product Manager Oracle Tutor & BPM

Read the article
Using Apache FOP from .NET level

- by Lukasz Kurylo

In one of my previous posts I was talking about FO.NET which I was using to generate a pdf documents from XSL-FO. FO.NET is one of the .NET ports of Apache FOP. Unfortunatelly it is no longer maintained. I known it when I decidec to use it, because there is a lack of available (free) choices for .NET to render a pdf form XSL-FO. I hoped in this implementation I will find all I need to create a pdf file with my really simple requirements. FO.NET is a port from some old version of Apache FOP and I found really quickly that there is a lack of some features that I needed, like dotted borders, double borders or support for margins. So I started to looking for some alternatives. I didn’t try the NFOP, another port of Apache FOP, because I found something I think much more better, the IKVM.NET project. IKVM.NET it is not a pdf renderer. So what it is? From the project site: IKVM.NET is an implementation of Java for Mono and the Microsoft .NET Framework. It includes the following components: a Java Virtual Machine implemented in .NET a .NET implementation of the Java class libraries tools that enable Java and .NET interoperability In the simplest form IKVM.NET allows to use a Java code library in the C# code and vice versa. I tried to use an Apache FOP, the best I think open source pdf –> XSL-FO renderer written in Java from my project written in C# using an IKVM.NET and it work like a charm. In the rest of the post I want to show, how to prepare a .NET *.dll class library from Apache FOP *.jar’s with IKVM.NET and generate a simple Hello world pdf document. To start playing with IKVM.NET and Apache FOP we need to download their packages: IKVM.NET Apache FOP and then unpack them. From the FOP directory copy all the *.jar’s files from lib and build catalogs to some location, e.g. d:\fop. Second step is to build the *.dll library from these files. On the console execute the following comand: ikvmc –target:library –out:d:\fop\fop.dll –recurse:d:\fop The ikvmc is located in the bin subdirectory where you unpacked the IKVM.NET. You must execute this command from this catalog, add this path to the global variable PATH or specify the full path to the bin subdirectory. In no error occurred during this process, the fop.dll library should be created. Right now we can create a simple project to test if we can create a pdf file. So let’s create a simple console project application and add reference to the fop.dll and the IKVM dll’s: IKVM.OpenJDK.Core and IKVM.OpenJDK.XML.API. Full code to generate a pdf file from XSL-FO template: static void Main(string[] args) { //initialize the Apache FOP FopFactory fopFactory = FopFactory.newInstance(); //in this stream we will get the generated pdf file OutputStream o = new DotNetOutputMemoryStream(); try { Fop fop = fopFactory.newFop("application/pdf", o); TransformerFactory factory = TransformerFactory.newInstance(); Transformer transformer = factory.newTransformer(); //read the template from disc Source src = new StreamSource(new File("HelloWorld.fo")); Result res = new SAXResult(fop.getDefaultHandler()); transformer.transform(src, res); } finally { o.close(); } using (System.IO.FileStream fs = System.IO.File.Create("HelloWorld.pdf")) { //write from the .NET MemoryStream stream to disc the generated pdf file var data = ((DotNetOutputMemoryStream)o).Stream.GetBuffer(); fs.Write(data, 0, data.Length); } Process.Start("HelloWorld.pdf"); System.Console.ReadLine(); } Apache FOP be default using a Java’s Xalan to work with XML files. I didn’t find a way to replace this piece of code with equivalent from .NET standard library. If any error or warning will occure during generating the pdf file, on the console will ge shown, that’s why I inserted the last line in the sample above. The DotNetOutputMemoryStream this is my wrapper for the Java OutputStream. I have created it to have the possibility to exchange data between the .NET <-> Java objects. It’s implementation: class DotNetOutputMemoryStream : OutputStream { private System.IO.MemoryStream ms = new System.IO.MemoryStream(); public System.IO.MemoryStream Stream { get { return ms; } } public override void write(int i) { ms.WriteByte((byte)i); } public override void write(byte[] b, int off, int len) { ms.Write(b, off, len); } public override void write(byte[] b) { ms.Write(b, 0, b.Length); } public override void close() { ms.Close(); } public override void flush() { ms.Flush(); } } The last thing we need, this is the HelloWorld.fo template. <?xml version="1.0" encoding="utf-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <fo:layout-master-set> <fo:simple-page-master master-name="simple" page-height="29.7cm" page-width="21cm" margin-top="1.8cm" margin-bottom="0.8cm" margin-left="1.6cm" margin-right="1.2cm"> <fo:region-body margin-top="3cm"/> <fo:region-before extent="3cm"/> <fo:region-after extent="1.5cm"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="simple"> <fo:flow flow-name="xsl-region-body"> <fo:block font-size="18pt" color="black" text-align="center"> Hello, World! </fo:block> </fo:flow> </fo:page-sequence> </fo:root> I’m not going to explain how how this template is created, because this will be covered in the near future posts. Generated pdf file should look that:

Read the article
Testing Workflows – Test-After

- by Timothy Klenke

Originally posted on: http://geekswithblogs.net/TimothyK/archive/2014/05/30/testing-workflows-ndash-test-after.aspxIn this post I’m going to outline a few common methods that can be used to increase the coverage of of your test suite. This won’t be yet another post on why you should be doing testing; there are plenty of those types of posts already out there. Assuming you know you should be testing, then comes the problem of how do I actual fit that into my day job. When the opportunity to automate testing comes do you take it, or do you even recognize it? There are a lot of ways (workflows) to go about creating automated tests, just like there are many workflows to writing a program. When writing a program you can do it from a top-down approach where you write the main skeleton of the algorithm and call out to dummy stub functions, or a bottom-up approach where the low level functionality is fully implement before it is quickly wired together at the end. Both approaches are perfectly valid under certain contexts. Each approach you are skilled at applying is another tool in your tool belt. The more vectors of attack you have on a problem – the better. So here is a short, incomplete list of some of the workflows that can be applied to increasing the amount of automation in your testing and level of quality in general. Think of each workflow as an opportunity that is available for you to take. Test workflows basically fall into 2 categories: test first or test after. Test first is the best approach. However, this post isn’t about the one and only best approach. I want to focus more on the lesser known, less ideal approaches that still provide an opportunity for adding tests. In this post I’ll enumerate some test-after workflows. In my next post I’ll cover test-first. Bug Reporting When someone calls you up or forwards you a email with a vague description of a bug its usually standard procedure to create or verify a reproduction plan for the bug via manual testing and log that in a bug tracking system. This can be problematic. Often reproduction plans when written down might skip a step that seemed obvious to the tester at the time or they might be missing some crucial environment setting. Instead of data entry into a bug tracking system, try opening up the test project and adding a failing unit test to prove the bug. The test project guarantees that all aspects of the environment are setup properly and no steps are missing. The language in the test project is much more precise than the English that goes into a bug tracking system. This workflow can easily be extended for Enhancement Requests as well as Bug Reporting. Exploratory Testing Exploratory testing comes in when you aren’t sure how the system will behave in a new scenario. The scenario wasn’t planned for in the initial system requirements and there isn’t an existing test for it. By definition the system behaviour is “undefined”. So write a new unit test to define that behaviour. Add assertions to the tests to confirm your assumptions. The new test becomes part of the living system specification that is kept up to date with the test suite. Examples This workflow is especially good when developing APIs. When you are finally done your production API then comes the job of writing documentation on how to consume the API. Good documentation will also include code examples. Don’t let these code examples merely exist in some accompanying manual; implement them in a test suite. Example tests and documentation do not have to be created after the production API is complete. It is best to write the example code (tests) as you go just before the production code. Smoke Tests Every system has a typical use case. This represents the basic, core functionality of the system. If this fails after an upgrade the end users will be hosed and they will be scratching their heads as to how it could be possible that an update got released with this core functionality broken. The tests for this core functionality are referred to as “smoke tests”. It is a good idea to have them automated and run with each build in order to avoid extreme embarrassment and angry customers. Coverage Analysis Code coverage analysis is a tool that reports how much of the production code base is exercised by the test suite. In Visual Studio this can be found under the Test main menu item. The tool will report a total number for the code coverage, which can be anywhere between 0 and 100%. Coverage Analysis shouldn’t be used strictly for numbers reporting. Companies shouldn’t set minimum coverage targets that mandate that all projects must have at least 80% or 100% test coverage. These arbitrary requirements just invite gaming of the coverage analysis, which makes the numbers useless. The analysis tool will break down the coverage by the various classes and methods in projects. Instead of focusing on the total number, drill down into this view and see which classes have high or low coverage. It you are surprised by a low number on a class this is an opportunity to add tests. When drilling through the classes there will be generally two types of reaction to a surprising low test coverage number. The first reaction type is a recognition that there is low hanging fruit to be picked. There may be some classes or methods that aren’t being tested, which could easy be. The other reaction type is “OMG”. This were you find a critical piece of code that isn’t under test. In both cases, go and add the missing tests. Test Refactoring The general theme of this post up to this point has been how to add more and more tests to a test suite. I’ll step back from that a bit and remind that every line of code is a liability. Each line of code has to be read and maintained, which costs money. This is true regardless whether the code is production code or test code. Remember that the primary goal of the test suite is that it be easy to read so that people can easily determine the specifications of the system. Make sure that adding more and more tests doesn’t interfere with this primary goal. Perform code reviews on the test suite as often as on production code. Hold the test code up to the same high readability standards as the production code. If the tests are hard to read then change them. Look to remove duplication. Duplicate setup code between two or more test methods that can be moved to a shared function. Entire test methods can be removed if it is found that the scenario it tests is covered by other tests. Its OK to delete a test that isn’t pulling its own weight anymore. Remember to only start refactoring when all the test are green. Don’t refactor the tests and the production code at the same time. An automated test suite can be thought of as a double entry book keeping system. The unchanging, passing production code serves as the tests for the test suite while refactoring the tests. As with all refactoring, it is best to fit this into your regular work rather than asking for time later to get it done. Fit this into the standard red-green-refactor cycle. The refactor step no only applies to production code but also the tests, but not at the same time. Perhaps the cycle should be called red-green-refactor production-refactor tests (not quite as catchy). That about covers most of the test-after workflows I can think of. In my next post I’ll get into test-first workflows.

Read the article
SQL University: What and why of database refactoring

- by Mladen Prajdic

This is a post for a great idea called SQL University started by Jorge Segarra also famously known as SqlChicken on Twitter. It’s a collection of blog posts on different database related topics contributed by several smart people all over the world. So this week is mine and we’ll be talking about database testing and refactoring. In 3 posts we’ll cover: SQLU part 1 - What and why of database testing SQLU part 2 - What and why of database refactoring SQLU part 3 - Tools of the trade This is a second part of the series and in it we’ll take a look at what database refactoring is and why do it. Why refactor a database To know why refactor we first have to know what refactoring actually is. Code refactoring is a process where we change module internals in a way that does not change that module’s input/output behavior. For successful refactoring there is one crucial thing we absolutely must have: Tests. Automated unit tests are the only guarantee we have that we haven’t broken the input/output behavior before refactoring. If you haven’t go back ad read my post on the matter. Then start writing them. Next thing you need is a code module. Those are views, UDFs and stored procedures. By having direct table access we can kiss fast and sweet refactoring good bye. One more point to have a database abstraction layer. And no, ORM’s don’t fall into that category. But also know that refactoring is NOT adding new functionality to your code. Many have fallen into this trap. Don’t be one of them and resist the lure of the dark side. And it’s a strong lure. We developers in general love to add new stuff to our code, but hate fixing our own mistakes or changing existing code for no apparent reason. To be a good refactorer one needs discipline and focus. Now we know that refactoring is all about changing inner workings of existing code. This can be due to performance optimizations, changing internal code workflows or some other reason. This is a typical black box scenario to the outside world. If we upgrade the car engine it still has to drive on the road (preferably faster) and not fly (no matter how cool that would be). Also be aware that white box tests will break when we refactor. What to refactor in a database Refactoring databases doesn’t happen that often but when it does it can include a lot of stuff. Let us look at a few common cases. Adding or removing database schema objects Adding, removing or changing table columns in any way, adding constraints, keys, etc… All of these can be counted as internal changes not visible to the data consumer. But each of these carries a potential input/output behavior change. Dropping a column can result in views not working anymore or stored procedure logic crashing. Adding a unique constraint shows duplicated data that shouldn’t exist. Foreign keys break a truncate table command executed from an application that runs once a month. All these scenarios are very real and can happen. With the proper database abstraction layer fully covered with black box tests we can make sure something like that does not happen (hopefully at all). Changing physical structures Physical structures include heaps, indexes and partitions. We can pretty much add or remove those without changing the data returned by the database. But the performance can be affected. So here we use our performance tests. We do have them, right? Just by adding a single index we can achieve orders of magnitude performance improvement. Won’t that make users happy? But what if that index causes our write operations to crawl to a stop. again we have to test this. There are a lot of things to think about and have tests for. Without tests we can’t do successful refactoring! Fixing bad code We all have some bad code in our systems. We usually refer to that code as code smell as they violate good coding practices. Examples of such code smells are SQL injection, use of SELECT *, scalar UDFs or cursors, etc… Each of those is huge code smell and can result in major code changes. Take SELECT * from example. If we remove a column from a table the client using that SELECT * statement won’t have a clue about that until it runs. Then it will gracefully crash and burn. Not to mention the widely unknown SELECT * view refresh problem that Tomas LaRock (@SQLRockstar on Twitter) and Colin Stasiuk (@BenchmarkIT on Twitter) talk about in detail. Go read about it, it’s informative. Refactoring this includes replacing the * with column names and most likely change to application using the database. Breaking apart huge stored procedures Have you ever seen seen a stored procedure that was 2000 lines long? I have. It’s not pretty. It hurts the eyes and sucks the will to live the next 10 minutes. They are a maintenance nightmare and turn into things no one dares to touch. I’m willing to bet that 100% of time they don’t have a single test on them. Large stored procedures (and functions) are a clear sign that they contain business logic. General opinion on good database coding practices says that business logic has no business in the database. That’s the applications part. Refactoring such behemoths requires writing lots of edge case tests for the stored procedure input/output behavior and then start to refactor it. First we split the logic inside into smaller parts like new stored procedures and UDFs. Those then get called from the master stored procedure. Once we’ve successfully modularized the database code it’s best to transfer that logic into the applications consuming it. This only leaves the stored procedure with common data manipulation logic. Of course this isn’t always possible so having a plethora of performance and behavior unit tests is absolutely necessary to confirm we’ve actually improved the codebase in some way. Refactoring is not a popular chore amongst developers or managers. The former don’t like fixing old code, the latter can’t see the financial benefit. Remember how we talked about being lousy at estimating future costs in the previous post? But there comes a time when it must be done. Hopefully I’ve given you some ideas how to get started. In the last post of the series we’ll take a look at the tools to use and an example of testing and refactoring.

Read the article
Answers to Your Common Oracle Database Lifecycle Management Questions

- by Scott McNeil

We recently ran a live webcast on Strategies for Managing Oracle Database's Lifecycle. There were tons of questions from our audience that we simply could not get to during the hour long presentation. Below are some of those questions along with their answers. Enjoy! Question: In the webcast the presenter talked about “gold” configuration standards, for those who want to use this technique, could you recommend a best practice to consider or follow? How do I get started? Answer:Gold configuration standardization is a quick and easy way to improve availability through consistency. Start by choosing a reference database and saving the configuration to the Oracle Enterprise Manager repository using the Save Configuration feature. Next create a comparison template using the Oracle provided template as a starting point and modify the ignored properties to eliminate expected differences in your environment. Finally create a comparison specification using the comparison template you created plus your saved gold configuration and schedule it to run on a regular basis. Don’t forget to fill in the email addresses of those you want to notify upon drift detection. Watch the database configuration management demo to learn more. Question: Can Oracle Lifecycle Management Pack for Database help with patching an Oracle Real Application Cluster (RAC) environment? Answer: Yes, Oracle Enterprise Manager supports both parallel and rolling patch application of Oracle Real Application Clusters. The use of rolling patching is recommended as there is no downtime involved. For more details watch this demo. Question: What are some of the things administrators can do to control configuration drift? Why is it important? Answer:Configuration drift is one of the main causes of instability and downtime of applications. Oracle Enterprise Manager makes it easy to manage and control drift using scheduled configuration comparisons combined with comparison templates. Question: Does Oracle Enterprise Manager 12c Release 2 offer an incremental update feature for "gold" images? For instance, if the source binary has a higher PSU level, what is the best approach to update the existing "gold" image in the software library? Do you have to create a new image or can you just update the original one? Answer:Provisioning Profiles (Gold images) can contain the installation files and database configuration templates. Although it is possible to make some changes to the profile after creation (mainly to configuration), it is normally recommended to simply create a new profile after applying a patch to your reference database. Question: The webcast talked about enforcing in-house standards, does Oracle Enterprise Manager 12c offer verification of your databases and systems to those standards? For example, the initial "gold" image has been massively deployed over time, and there may be some changes to it. How can you do regular checks from Enterprise Manager to ensure the in-house standards are being enforced? Answer:There are really two methods to validate conformity to standards. The first method is to use gold standards which you compare other databases to report unwanted differences. This method uses a new comparison template technology which allows users to ignore known differences (i.e. SID, Start time, etc) which results in a report only showing important or non-conformant differences. This method is quick to setup and configure and recommended for those who want to get started validating compliance quickly. The second method leverages the new compliance framework which allows the creation of specific and robust validations. These compliance rules are grouped into standards which can be assigned to databases quickly and easily. Compliance rules allow for targeted and more sophisticated validation beyond the basic equals operation available in the comparison method. The compliance framework can be used to implement just about any internal or industry standard. The compliance results will track current and historic compliance scores at the overall and individual database targets. When the issue is resolved, the score is automatically affected. Compliance framework is the recommended long term solution for validating compliance using Oracle Enterprise Manager 12c. Check out this demo on database compliance to learn more. Question: If you are using the integration between Oracle Enterprise Manager and My Oracle Support in an "offline" mode, how do you know if you have the latest My Oracle Support metadata? Answer:In Oracle Enterprise Manager 12c Release 2, you now only need to download one zip file containing all of the metadata xmls files. There is no indication that the metadata has changed but you could run a checksum on the file and compare it to the previously downloaded version to see if it has changed. Question: What happens if a patch fails while administrators are applying it to a database or system? Answer:A large portion of Oracle Enterprise Manager's patch automation is the pre-requisite checks that happen to ensure the highest level of confidence the patch will successfully apply. It is recommended you test the patch in a non-production environment and save the patch plan as a template once successful so you can create new plans using the saved template. If you are using the recommended ‘out of place’ patching methodology, there is no urgency because the database is still running as the cloned Oracle home is being patched. Users can address the issue and restart the patch procedure at the point it left off. If you are using 'in place' method, you can address the issue and continue where the procedure left off. Question: Can Oracle Enterprise Manager 12c R2 compare configurations between more than one target at the same time? Answer:Oracle Enterprise Manager 12c can compare any number of target configurations at one time. This is the basis of many important use cases including Configuration Drift Management. These comparisons can also be scheduled on a regular basis and emails notification sent should any differences appear. To learn more about configuration search and compare watch this demo. Question: How is data comparison done since changes are taking place in a live production system? Answer:There are many things to keep in mind when using the data comparison feature (as part of the Change Management ability to compare table data). It was primarily intended to be used for maintaining consistency of important but relatively static data. For example, application seed data and application setup configuration. This data does not change often but is critical when testing an application to ensure results are consistent with production. It is not recommended to use data comparison on highly dynamic data like transactional tables or very large tables. Question: Which versions of Oracle Database can be monitored through Oracle Enterprise Manager 12c? Answer:Oracle Database versions: 9.2.0.8, 10.1.0.5, 10.2.0.4, 10.2.0.5, 11.1.0.7, 11.2.0.1, 11.2.0.2, 11.2.0.3. Watch the On-Demand Webcast Stay Connected: Twitter | Facebook | YouTube | Linkedin | NewsletterDownload the Oracle Enterprise Manager Cloud Control12c Mobile app

Read the article
ASP.NET MVC 3 Hosting :: Error Handling and CustomErrors in ASP.NET MVC 3 Framework

- by C. Miller

So, what else is new in MVC 3? MVC 3 now has a GlobalFilterCollection that is automatically populated with a HandleErrorAttribute. This default FilterAttribute brings with it a new way of handling errors in your web applications. In short, you can now handle errors inside of the MVC pipeline. What does that mean? This gives you direct programmatic control over handling your 500 errors in the same way that ASP.NET and CustomErrors give you configurable control of handling your HTTP error codes. How does that work out? Think of it as a routing table specifically for your Exceptions, it's pretty sweet! Global Filters The new Global.asax file now has a RegisterGlobalFilters method that is used to add filters to the new GlobalFilterCollection, statically located at System.Web.Mvc.GlobalFilter.Filters. By default this method adds one filter, the HandleErrorAttribute. public class MvcApplication : System.Web.HttpApplication { public static void RegisterGlobalFilters(GlobalFilterCollection filters) { filters.Add(new HandleErrorAttribute()); } HandleErrorAttributes The HandleErrorAttribute is pretty simple in concept: MVC has already adjusted us to using Filter attributes for our AcceptVerbs and RequiresAuthorization, now we are going to use them for (as the name implies) error handling, and we are going to do so on a (also as the name implies) global scale. The HandleErrorAttribute has properties for ExceptionType, View, and Master. The ExceptionType allows you to specify what exception that attribute should handle. The View allows you to specify which error view (page) you want it to redirect to. Last but not least, the Master allows you to control which master page (or as Razor refers to them, Layout) you want to render with, even if that means overriding the default layout specified in the view itself. public class MvcApplication : System.Web.HttpApplication { public static void RegisterGlobalFilters(GlobalFilterCollection filters) { filters.Add(new HandleErrorAttribute { ExceptionType = typeof(DbException), // DbError.cshtml is a view in the Shared folder. View = "DbError", Order = 2 }); filters.Add(new HandleErrorAttribute()); }Error Views All of your views still work like they did in the previous version of MVC (except of course that they can now use the Razor engine). However, a view that is used to render an error can not have a specified model! This is because they already have a model, and that is System.Web.Mvc.HandleErrorInfo @model System.Web.Mvc.HandleErrorInfo @{ ViewBag.Title = "DbError"; } <h2>A Database Error Has Occurred</h2> @if (Model != null) { <p>@Model.Exception.GetType().Name<br /> thrown in @Model.ControllerName @Model.ActionName</p> }Errors Outside of the MVC Pipeline The HandleErrorAttribute will only handle errors that happen inside of the MVC pipeline, better known as 500 errors. Errors outside of the MVC pipeline are still handled the way they have always been with ASP.NET. You turn on custom errors, specify error codes and paths to error pages, etc. It is important to remember that these will happen for anything and everything outside of what the HandleErrorAttribute handles. Also, these will happen whenever an error is not handled with the HandleErrorAttribute from inside of the pipeline. <system.web> <customErrors mode="On" defaultRedirect="~/error"> <error statusCode="404" redirect="~/error/notfound"></error> </customErrors>Sample Controllers public class ExampleController : Controller { public ActionResult Exception() { throw new ArgumentNullException(); } public ActionResult Db() { // Inherits from DbException throw new MyDbException(); } } public class ErrorController : Controller { public ActionResult Index() { return View(); } public ActionResult NotFound() { return View(); } } Putting It All Together If we have all the code above included in our MVC 3 project, here is how the following scenario's will play out: 1. A controller action throws an Exception. You will remain on the current page and the global HandleErrorAttributes will render the Error view. 2. A controller action throws any type of DbException. You will remain on the current page and the global HandleErrorAttributes will render the DbError view. 3. Go to a non-existent page. You will be redirect to the Error controller's NotFound action by the CustomErrors configuration for HTTP StatusCode 404. But don't take my word for it, download the sample project and try it yourself. Three Important Lessons Learned For the most part this is all pretty straight forward, but there are a few gotcha's that you should remember to watch out for: 1) Error views have models, but they must be of type HandleErrorInfo. It is confusing at first to think that you can't control the M in an MVC page, but it's for a good reason. Errors can come from any action in any controller, and no redirect is taking place, so the view engine is just going to render an error view with the only data it has: The HandleError Info model. Do not try to set the model on your error page or pass in a different object through a controller action, it will just blow up and cause a second exception after your first exception! 2) When the HandleErrorAttribute renders a page, it does not pass through a controller or an action. The standard web.config CustomErrors literally redirect a failed request to a new page. The HandleErrorAttribute is just rendering a view, so it is not going to pass through a controller action. But that's ok! Remember, a controller's job is to get the model for a view, but an error already has a model ready to give to the view, thus there is no need to pass through a controller. That being said, the normal ASP.NET custom errors still need to route through controllers. So if you want to share an error page between the HandleErrorAttribute and your web.config redirects, you will need to create a controller action and route for it. But then when you render that error view from your action, you can only use the HandlerErrorInfo model or ViewData dictionary to populate your page. 3) The HandleErrorAttribute obeys if CustomErrors are on or off, but does not use their redirects. If you turn CustomErrors off in your web.config, the HandleErrorAttributes will stop handling errors. However, that is the only configuration these two mechanisms share. The HandleErrorAttribute will not use your defaultRedirect property, or any other errors registered with customer errors. In Summary The HandleErrorAttribute is for displaying 500 errors that were caused by exceptions inside of the MVC pipeline. The custom errors are for redirecting from error pages caused by other HTTP codes.

Read the article
Persisting Session Between Different Browser Instances

- by imran_ku07

Introduction: By default inproc session's identifier cookie is saved in browser memory. This cookie is known as non persistent cookie identifier. This simply means that if the user closes his browser then the cookie is immediately removed. On the other hand cookies which stored on the user’s hard drive and can be reused for later visits are called persistent cookies. Persistent cookies are less used than nonpersistent cookies because of security. Simply because nonpersistent cookies makes session hijacking attacks more difficult and more limited. If you are using shared computer then there are lot of chances that your persistent session will be used by other shared members. However this is not always the case, lot of users desired that their session will remain persisted even they open two instances of same browser or when they close and open a new browser. So in this article i will provide a very simple way to persist your session even the browser is closed. Description: Let's create a simple ASP.NET Web Application. In this article i will use Web Form but it also works in MVC. Open Default.aspx.cs and add the following code in Page_Load. protected void Page_Load(object sender, EventArgs e) { if (Session["Message"] != null) Response.Write(Session["Message"].ToString()); Session["Message"] = "Hello, Imran"; } This page simply shows a message if a session exist previously and set the session. Now just run the application, you will just see an empty page on first try. After refreshing the page you will see the Message "Hello, Imran". Now just close the browser and reopen it or just open another browser instance, you will get the exactly same behavior when you run your application first time . Why the session is not persisted between browser instances. The simple reason is non persistent session cookie identifier. The session cookie identifier is not shared between browser instances. Now let's make it persistent. To make your application share session between different browser instances just add the following code in global.asax. protected void Application_PostMapRequestHandler(object sender, EventArgs e) { if (Request.Cookies["ASP.NET_SessionIdTemp"] != null) { if (Request.Cookies["ASP.NET_SessionId"] == null) Request.Cookies.Add(new HttpCookie("ASP.NET_SessionId", Request.Cookies["ASP.NET_SessionIdTemp"].Value)); else Request.Cookies["ASP.NET_SessionId"].Value = Request.Cookies["ASP.NET_SessionIdTemp"].Value; } } protected void Application_PostRequestHandlerExecute(object sender, EventArgs e) { HttpCookie cookie = new HttpCookie("ASP.NET_SessionIdTemp", Session.SessionID); cookie.Expires = DateTime.Now.AddMinutes(Session.Timeout); Response.Cookies.Add(cookie); } This code simply state that during Application_PostRequestHandlerExecute(which is executed after HttpHandler) just add a persistent cookie ASP.NET_SessionIdTemp which contains the value of current user SessionID and sets the timeout to current user session timeout. In Application_PostMapRequestHandler(which is executed just before th session is restored) we just check whether the Request cookie contains ASP.NET_SessionIdTemp. If yes then just add or update ASP.NET_SessionId cookie with ASP.NET_SessionIdTemp. So when a new browser instance is open, then a check will made that if ASP.NET_SessionIdTemp exist then simply add or update ASP.NET_SessionId cookie with ASP.NET_SessionIdTemp. So run your application again, you will get the last closed browser session(if it is not expired). Summary: Persistence session is great way to increase the user usability. But always beware the security before doing this. However there are some cases in which you might need persistence session. In this article i just go through how to do this simply. So hopefully you will again enjoy this simple article too.

Read the article
Day 3 - XNA: Hacking around with images

- by dapostolov

Yay! Today I'm going to get into some code! My mind has been on this all day! I find it amusing how I practice, daily, to be "in the moment" or "present" and the excitement and anticipation of this project seems to snatch it away from me frequently. WELL!!! (Shakes Excitedly) Let's do this =)! Let's code! For these next few days it is my intention to better understand image rendering using XNA; after said prototypes are complete I should (fingers crossed) be able to dive into my game code using the design document I hammered out the other night. On a personal note, I think the toughest thing right now is finding the time to do this project. Each night, after my little ones go to bed I can only really afford a couple hours of work on this project. However, I hope to utilise this time as best as I can because this is the first time in a while I've found a project that I've been passionate about. A friend recently asked me if I intend to go 3D or extend the game design. Yes. For now I'm keeping it simple. Lastly, just as a note, as I was doing some further research into image rendering this morning I came across some other XNA content and lessons learned. I believe this content could have probably been posted in the first couple of posts, however, I will share the new content as I learn it at the end of each day. Maybe I'll take some time later to fix the posts but for now Installation and Deployment - Lessons Learned I had installed the XNA studio (Day 1) and the site instructions were pretty easy to follow. However, I had a small difficulty with my development environment. You see, I run a virtual desktop development environment. Even though I was able to code and compile all the tutorials the game failed to run...because I lacked a 3D capable card; it was not detected on the virtual box... First Lesson: The XNA runtime needs to "see" the 3D card! No sweat, Il copied the files over to my parent box and executed the program. ERROR. Hmm... Second Lesson (which I should have probably known but I let the excitement get the better of me): you need the XNA runtime on the client PC to run the game, oh, and don't forget the .Net Runtime! Sprite, it ain't just a Soft Drink... With these prototypes I intend to understand and perform the following tasks. learn game development terminology how to place and position (rotate) a static image on the screen how to layer static images on the screen understand image scaling can we reuse images? understand how framerate is handled in XNA how to display text , basic shapes, and colors on the screen how to interact with an image (collision of user input?) how to animate an image and understand basic animation techniques how to detect colliding images or screen edges how to manipulate the image, lets say colors, stretching how to focus on a segment of an image...like only displaying a frame on a film reel what's the best way to manage images (compression, storage, location, prevent artwork theft, etc.) Well, let's start with this "prototype" task list for now...Today, let's get an image on the screen and maybe I can mark a few of the tasks as completed... C# Prototype1 New Visual Studio Project Select the XNA Game Studio 3.1 Project Type Select the Windows Game 3.1 Template Type Prototype1 in the Name textbox provided Press OK. At this point code has auto-magically been created. Feel free to press the F5 key to run your first XNA program. You should have a blue screen infront of you. Without getting into the nitty gritty right, the code that was generated basically creates some basic code to clear the window content with the lovely CornFlowerBlue color. Something to notice, when you move your mouse into the window...nothing. ooooo spoooky. Let's put an image on that screen! Step A - Get an Image into the solution Under "Content" in your Solution Explorer, right click and add a new folder and name it "Sprites". Copy a small image in there; I copied a "Royalty Free" wizard hat from a quick google search and named it wizards_hat.jpg (rightfully so!) Step B - Add the sprite and position fields Now, open/edit Game1.cs Locate the following line: SpriteBatch spriteBatch; Under this line type the following: SpriteBatch spriteBatch; // the line you are looking for... Texture2D sprite; Vector2 position; Step C - Load the image asset Locate the "Load Content" Method and duplicate the following: protected override void LoadContent() { spriteBatch = new SpriteBatch(GraphicsDevice); // your image name goes here... sprite = Content.Load<Texture2D>("Sprites\\wizards_hat"); position = new Vector2(200, 100); base.LoadContent(); } Step D - Draw the image Locate the "Draw" Method and duplicate the following: protected override void Draw(GameTime gameTime) { GraphicsDevice.Clear(Color.CornflowerBlue); spriteBatch.Begin(SpriteBlendMode.AlphaBlend); spriteBatch.Draw(sprite, position, Color.White); spriteBatch.End(); base.Draw(gameTime); } Step E - Compile and Run Engage! (F5) - Debug! Your image should now display on a cornflowerblue window about 200 pixels from the left and 100 pixels from the top. Awesome! =) Pretty cool how we only coded a few lines to display an image, but believe me, there is plenty going on behind the scenes. However, for now, I'm going to call it a night here. Blogging all this progress certainly takes time... However, tomorrow night I'm going to detail what we just did, plus start checking off points on that list! I'm wondering right now if I should add pictures / code to this post...let me know if you want them =) Best Regards, D.

Read the article

Search Results

Search found 6152 results on 247 pages for 'known'.

Page 227/247 | < Previous Page | 223 224 225 226 227 228 229 230 231 232 233 234 | Next Page >

- by Pinal Dave

- by Bertrand Le Roy

- by Rob Farley

- by Rob Farley

- by David V. Corbin

- by BuckWoody

- by Sysadmin Geek

- by Bertrand Le Roy

- by Simon Cooper

- by hajan

- by Madhu Nair

- by Simon Cooper

- by Dejan Sarka

- by Andreea Vaduva

- by Simon Cooper

- by Rob Farley

- by emily.chorba(at)oracle.com

- by Lukasz Kurylo

- by Timothy Klenke

- by Mladen Prajdic

- by Scott McNeil

- by C. Miller

- by imran_ku07

- by dapostolov

< Previous Page | 223 224 225 226 227 228 229 230 231 232 233 234 | Next Page >