Search Results

Search found 4176 results on 168 pages for 'graph algorithms'.

Page 156/168 | < Previous Page | 152 153 154 155 156 157 158 159 160 161 162 163 | Next Page >

Social Business Forum Milano: Day 2

- by me

@YourService. The business world has flipped and small business can capitalize by Frank Eliason (twitter: @FrankEliason ) Technology and social media tools have made it easier than ever for companies to communicate with consumers. They can listen and join in on conversations, solve problems, get instant feedback about their products and services, and more. So why, then, are most companies not doing this? Instead, it seems as if customer service is at an all time low, and that the few companies who are choosing to focus on their customers are experiencing a great competitive advantage. At Your Service explains the importance of refocusing your business on your customers and your employees, and just how to do it. Explains how to create a culture of empowered employees who understand the value of a great customer experience Advises on the need to communicate that experience to their customers and potential customers Frank Eliason, recognized by BusinessWeek as the 'most famous customer service manager in the US, possibly in the world,' has built a reputation for helping large businesses improve the way they connect with customers and enhance their relationships Quotes from the Audience: Bertrand Duperrin ?@bduperrin social service is not about shutting up the loudest cutsomers ! #sbf12 @frankeliason Paolo Pelloni ?@paolopelloniGautam Ghosh ?@GautamGhosh RT @cecildijoux: #sbf12 @frankeliason you need to change things and fix the approach it's not about social media it's about driving change Peter H. Reiser ?@peterreiser #sbf12 Company Experience = Product Experience + Customer Interactions + Employee Experience @yourservice Engage or lose! Socialize, mobilize, conversify: engage your employees to improve business performance Christian Finn (twitter: @cfinn) First Christian was presenting the flying monkey Then he outlined the four principals to fix the Intranet: 1. Socalize the Intranet 2. Get Thee to a Single Repository 3. Mobilize the Intranet 4. Conversationalize Your Processes Quotes from the Audience: Oscar Berg ?@oscarberg Engaged employees think their work bring out the best of their ideas @cfinn #sbf12 http://pic.twitter.com/68eddp48 John Stepper ?@johnstepper I like @cfinn's "conversify your processes" A nice related concept to "narrating your work", part of working out loud. http://johnstepper.com/2012/05/26/working-out-loud-your-personal-content-strategy/ Oscar Berg ?@oscarberg Organizations are talent markets - socializing your intranet makes this market function better @cfinn #sbf12 For profit, productivity, and personal benefit: creating a collaborative culture at Deutsche Bank John Stepper (twitter:@johnstepper) Driving adoption of collaboration + social media platforms at Deutsche Bank. John shared some great best practices on how to deploy an enterprise wide community model in a large company. He started with the most important question What is the commercial value of adding social ? Then he talked about the success of Community of Practices deployment and outlined some key use cases including the relevant measures to proof the ROI of the investment. Examples: Community of practice -> measure: systematic collection of value stories Self-service website -> measure: based on representative models Optimizing asset inventory - > measure: Actual counts This use case was particular interesting. It is a crowd sourced spending/saving of infrastructure model. User can cancel IT services they don't need (as example Software xx). 5% of the saving goes to social responsibility projects. The John outlined some best practices on how to address the WIIFM (What's In It For Me) question of the individual users: - change from hierarchy to graph - working out loud = observable work + narrating your work - add social skills to career objectives - example: building a purposeful social network course/training as part of the job development curriculum And last but not least John gave some important tips on how to get senior management buy-in by establishing management sponsored division level collaboration boards which defines clear uses cases and measures. This divisional use cases are then implemented using a common social platform. Thanks John - I learned a lot from your presentation! Quotes from the Audience: Ana Silva ?@AnaDataGirl #sbf12 what's in it for individuals at Deutsche Bank? Shapping their reputations in a big org says @johnstepper #e20Ana Silva ?@AnaDataGirl Any reason why not? MT @magatorlibero #sbf12 is Deutsche B. experience on applying social inside company applicable to Italian people? Oscar Berg ?@oscarberg Your career is not a ladder, it is a network that opens up opportunities - @johnstepper #sbf12 Oscar Berg ?@oscarberg @johnstepper: Institutionalizing collaboration is next - collaboration woven into the fabric of daily work #sbf12 Ana Silva ?@AnaDataGirl #sbf12 @johnstepper talking about how Deutsche Bank is using #socbiz to build purposeful CoP & save money

Read the article
Team Leaders & Authors - Manage and Report Workflow using "Print an Outline" in UPK

- by [email protected]

Did you know you can "print an outline?" You can print any outline or portion of an outline. Why might you want to "print an outline" in UPK... Have you ever wondered how many topics you have recorded, how many of your topics are ready for review, or even better, how many topics are complete! Do you need to report your project status to management? Maybe you just like to have a copy of your outline to refer to during development. Included in this output is the outline structure as well as the layout defined in the Details View of the Outline Editor. To print an outline, you must open either a module or section in the Outline Editor. A set of default data columns is automatically included in the output; however, you can configure which columns you want to appear in the report by switching to the Details view and customizing the columns. (To learn more about customizing your columns refer to the Add and Remove Columns section of the Content Development.pdf guide) To print an outline from the Outline Editor: 1. Open a module or section document in the Outline Editor. 2. Expand the documents to display the details that you want included in the report. 3. On the File menu, choose Print and use the toolbar icons to print, view, or save the report to a file. Personally, I opt to save my outline in Microsoft Excel. Using the delivered features of Microsoft Excel you can add columns of information, such as development notes, to your outline or you can graph and chart your Project status. As mentioned above you can configure what columns you want to appear in the outline. When utilizing the Print an Outline feature in conjunction with the Managing Workflow features of the UPK Multi-user instance you as a Team Lead or Author can better report project status. Read more about Managing Workflow below. Managing Workflow: The Properties toolpane contains special properties that allow authors to track document status or State as well as assign Document Ownership. Assign Content State The State property is an editable property for communicating the status of a document. This is particularly helpful when collaborating with other authors in a development team. Authors can assign a state to documents from the master list defined by the administrator. The default list of States includes (blank), Not Started, Draft, In Review, and Final. Administrators can customize the list by adding, deleting or renaming the values. To assign a State value to a document: 1. Make sure you are working online. 2. Display the Properties toolpane. 3. Select the document(s) to which you want to assign a state. Note: You can select multiple documents using the standard Windows selection keys (CTRL+click and SHIFT+click). 4. In the Workflow category, click in the State cell. 5. Select a value from the list. Assign Document Ownership In many enterprises, multiple authors often work together developing content in a team environment. Team leaders typically handle large projects by assigning specific development responsibilities to authors. The Owner property allows team leaders and authors to assign documents to themselves and other authors to track who is responsible for a specific document. You view and change document assignments for a document using the Owner property in the Properties toolpane. To assign a document owner: 1. Make sure you are working online. 2. On the View menu, choose Properties. 3. Select the document(s) to which you want to assign document responsibility. Note: You can select multiple documents using the standard Windows selection keys (CTRL+click and SHIFT+click). 4. In the Workflow category, click in the Owner cell. 5. Select a name from the list. Is anyone out there already using this feature? Share your ideas with the group. Those of you new to this feature, give it a test drive and let us know what you think. - Kathryn Lustenberger, Oracle UPK & Tutor Outbound Product Management

Read the article
SQL SERVER – SSMS: Top Object and Batch Execution Statistics Reports

- by Pinal Dave

The month of June till mid of July has been the fever of sports. First, it was Wimbledon Tennis and then the Soccer fever was all over. There is a huge number of fan followers and it is great to see the level at which people sometimes worship these sports. Being an Indian, I cannot forget to mention the India tour of England later part of July. Following these sports and as the events unfold to the finals, there are a number of ways the statisticians can slice and dice the numbers. Cue from soccer I can surely say there is a team performance against another team and then there is individual member fairs against a particular opponent. Such statistics give us a fair idea to how a team in the past or in the recent past has fared against each other, head-to-head stats during World cup and during other neutral venue games. All these statistics are just pointers. In reality, they don’t reflect the calibre of the current team because the individuals who performed in each of these games are totally different (Typical example being the Brazil Vs Germany semi-final match in FIFA 2014). So at times these numbers are misleading. It is worth investigating and get the next level information. Similar to these statistics, SQL Server Management studio is also equipped with a number of reports like a) Object Execution Statistics report and b) Batch Execution Statistics reports. As discussed in the example, the team scorecard is like the Batch Execution statistics and individual stats is like Object Level statistics. The analogy can be taken only this far, trust me there is no correlation between SQL Server functioning and playing sports – It is like I think about diet all the time except while I am eating. Performance – Batch Execution Statistics Let us view the first report which can be invoked from Server Node -> Reports -> Standard Reports -> Performance – Batch Execution Statistics. Most of the values that are displayed in this report come from the DMVs sys.dm_exec_query_stats and sys.dm_exec_sql_text(sql_handle). This report contains 3 distinctive sections as outline below. Section 1: This is a graphical bar graph representation of Average CPU Time, Average Logical reads and Average Logical Writes for individual batches. The Batch numbers are indicative and the details of individual batch is available in section 3 (detailed below). Section 2: This represents a Pie chart of all the batches by Total CPU Time (%) and Total Logical IO (%) by batches. This graphical representation tells us which batch consumed the highest CPU and IO since the server started, provided plan is available in the cache. Section 3: This is the section where we can find the SQL statements associated with each of the batch Numbers. This also gives us the details of Average CPU / Average Logical Reads and Average Logical Writes in the system for the given batch with object details. Expanding the rows, I will also get the # Executions and # Plans Generated for each of the queries. Performance – Object Execution Statistics The second report worth a look is Object Execution statistics. This is a similar report as the previous but turned on its head by SQL Server Objects. The report has 3 areas to look as above. Section 1 gives the Average CPU, Average IO bar charts for specific objects. The section 2 is a graphical representation of Total CPU by objects and Total Logical IO by objects. The final section details the various objects in detail with the Avg. CPU, IO and other details which are self-explanatory. At a high-level both the reports are based on queries on two DMVs (sys.dm_exec_query_stats and sys.dm_exec_sql_text) and it builds values based on calculations using columns in them: SELECT * FROM sys.dm_exec_query_stats s1 CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS s2 WHERE s2.objectid IS NOT NULL AND DB_NAME(s2.dbid) IS NOT NULL ORDER BY s1.sql_handle; This is one of the simplest form of reports and in future blogs we will look at more complex reports. I truly hope that these reports can give DBAs and developers a hint about what is the possible performance tuning area. As a closing point I must emphasize that all above reports pick up data from the plan cache. If a particular query has consumed a lot of resources earlier, but plan is not available in the cache, none of the above reports would show that bad query. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

Read the article
Selling Visual Studio ALM

- by Tarun Arora

Introduction As a consultant I have been selling Application Lifecycle Management services using Visual Studio and Team Foundation Server. I’ve been contacted various times by friends working in organization telling me that ALM processes in their company were benchmarked when dinosaurs walked the earth. Most of these individuals already know the great features Microsoft ALM tools offer and are keen to start a conversation with the CIO but don’t exactly know where to start. It is very important how you engage in your first conversation, if you start the conversation with ‘There is this great tooling from Microsoft which offers amazing features to boost developer productivity, … ‘ from experience I can tell you the reply from your CIO would be ‘I already know! Our existing landscape has a combination of bleeding edge open source and cutting edge licensed tools which already cover these features quite well, more over Microsoft products have a high licensing cost associated to them.’ You will always find it harder to sell by feature, the trick is to highlight the gap in the existing processes & tools and then highlight the impact of these gaps to the overall development processes, by now you would have captured enough attention to show off how the ALM tooling offered by Microsoft not only fills those gaps but offers great value adds to take their development practices to the next level. Rangers ALM Assessment Guide Image 1 – Welcome! First look at the Rangers ALM assessment guide Most organization already have some processes in place to cover aspects of ALM. How do you go about proving that there isn’t enough cover in place? This is where Visual Studio ALM Rangers ALM Assessment guide can help. The ALM assessment guide is really a tool that helps you gather information about Development practices and processes within a customer's environment. Several questionnaires are used to identify the current state of individual development lifecycle areas and decide on a desired state for those processes. It also presents guidance and roll-up summaries to help with recommendations moving forward. The ALM Rangers assessment guide can be downloaded from here. Image 2 – ALM Assessment guide divided into different functions of SDLC The assessment guide is divided into different functions of Software Development Lifecycle (listed below), this gives you the ability to access how mature the company is in different areas of SDLC. Architecture & Design Requirement Engineering & UX Development Software Configuration Management Governance Deployment & Operations Testing & Quality Assurance Project Planning & Management Each section has a set of questions, fill in the assessment by selecting “Never/Sometimes/Always” from the Answer column in the question sheets. Each answer has weightage to the overall score. Each question has a link next to it, clicking the link takes you to the Reference sheet which gives you more details about the question along with a reason for “why you need to ask this question?”, “other ways to phrase the question” and “what to expect as an answer from the customer”. The trick is to engage the customer in a discussion. You need to probe a lot, listen to the customer and have a discussion with several team members, preferably without management to ensure that you receive candid feedback. This reminds me of a funny incident when during an ALM review a customer told me that they have a sophisticated semi-automated application deployment process, further discussions revealed that deployment actually involved 72 manual configuration steps per production node. Such observations can be recorded in the Issue Brainstorming worksheet for further consideration later. It is also worth mentioning the different levels of ALM maturity to the customer. By default the desired state of ALM maturity is set to Standard, it is possible to set a desired state by area, you should strive for Advanced or Dynamic, it always helps by explaining the classification and advantages. Image 3 – ALM levels by description The ALM assessment guide helps you arrive at a quantitative measure of the company’s ALM maturity. The resultant graph plotted on a spider’s web shows you the company’s current state of ALM maturity and the desired state of ALM maturity. Further since the results are classified by area you can immediately spot the areas where the customer needs immediate help. Image 4 – The spiders web! The red cross icons are areas shouting out for immediate attention, the yellow exclamation icons are areas that need improvement. These icons are calculated on the difference between the Current State of ALM maturity VS the Desired state of ALM maturity. Image 5 – Results by area Conclusion To conclude the Rangers ALM assessment guide gives you the ability to, Measure the customer’s current ALM maturity level Understand the ALM maturity level the customer desires to achieve Capture a healthy list of issues the customer wants to brainstorm further Now What’s next…? Download and get started with the Rangers ALM Assessment Guide. If you have successfully captured the above listed three pieces of information you are in a great state to make recommendations on the identified areas highlighting the benefits that Visual Studio ALM tools would offer. In the next post I will be covering how to take the ALM assessment results as the base to actually convert your recommendation into a sell. Remember to subscribe to http://feeds.feedburner.com/TarunArora. I would love to hear your feedback! If you have any recommendations on things that I should consider or any questions or feedback, feel free to leave a comment. *** A special thanks goes out to fellow ranges Willy, Ethem and Philip for reviewing the blog post and providing valuable feedback. ***

Read the article
My Code Kata–A Solution Kata

- by Glav

There are many developers and coders out there who like to do code Kata’s to keep their coding ability up to scratch and to practice their skills. I think it is a good idea. While I like the concept, I find them dead boring and of minimal purpose. Yes, they serve to hone your skills but that’s about it. They are often quite abstract, in that they usually focus on a small problem set requiring specific solutions. It is fair enough as that is how they are designed but again, I find them quite boring. What I personally like to do is go for something a little larger and a little more fun. It takes a little more time and is not as easily executed as a kata though, but it services the same purposes from a practice perspective and allows me to continue to solve some problems that are not directly part of the initial goal. This means I can cover a broader learning range and have a bit more fun. If I am lucky, sometimes they even end up being useful tools. With that in mind, I thought I’d share my current ‘kata’. It is not really a code kata as it is too big. I prefer to think of it as a ‘solution kata’. The code is on bitbucket here. What I wanted to do was create a kind of simplistic virtual world where I can create a player, or a class, stuff it into the world, and see if it survives, and can navigate its way to the exit. Requirements were pretty simple: Must be able to define a map to describe the world using simple X,Y co-ordinates. Z co-ordinates as well if you feel like getting clever. Should have the concept of entrances, exists, solid blocks, and potentially other materials (again if you want to get clever). A coder should be able to easily write a class which will act as an inhabitant of the world. An inhabitant will receive stimulus from the world in the form of surrounding environment and be able to make a decision on action which it passes back to the ‘world’ for processing. At a minimum, an inhabitant will have sight and speed characteristics which determine how far they can ‘see’ in the world, and how fast they can move. Coders who write a really bad ‘inhabitant’ should not adversely affect the rest of world. Should allow multiple inhabitants in the world. So that was the solution I set out to act as a practice solution and a little bit of fun. It had some interesting problems to solve and I figured, if it turned out ok, I could potentially use this as a ‘developer test’ for interviews. Ask a potential coder to write a class for an inhabitant. Show the coder the map they will navigate, but also mention that we will use their code to navigate a map they have not yet seen and a little more complex. I have been playing with solution for a short time now and have it working in basic concepts. Below is a screen shot using a very basic console visualiser that shows the map, boundaries, blocks, entrance, exit and players/inhabitants. The yellow asterisks ‘*’ are the players, green ‘O’ the entrance, purple ‘^’ the exit, maroon/browny ‘#’ are solid blocks. The players can move around at different speeds, knock into each others, and make directional movement decisions based on what they see and who is around them. It has been quite fun to write and it is also quite fun to develop different players to inject into the world. The code below shows a really simple implementation of an inhabitant that can work out what to do based on stimulus from the world. It is pretty simple and just tries to move in some direction if there is nothing blocking the path. public class TestPlayer:LivingEntity { public TestPlayer() { Name = "Beta Boy"; LifeKey = Guid.NewGuid(); } public override ActionResult DecideActionToPerform(EcoDev.Core.Common.Actions.ActionContext actionContext) { try { var action = new MovementAction(); // move forward if we can if (actionContext.Position.ForwardFacingPositions.Length > 0) { if (CheckAccessibilityOfMapBlock(actionContext.Position.ForwardFacingPositions[0])) { action.DirectionToMove = MovementDirection.Forward; return action; } } if (actionContext.Position.LeftFacingPositions.Length > 0) { if (CheckAccessibilityOfMapBlock(actionContext.Position.LeftFacingPositions[0])) { action.DirectionToMove = MovementDirection.Left; return action; } } if (actionContext.Position.RearFacingPositions.Length > 0) { if (CheckAccessibilityOfMapBlock(actionContext.Position.RearFacingPositions[0])) { action.DirectionToMove = MovementDirection.Back; return action; } } if (actionContext.Position.RightFacingPositions.Length > 0) { if (CheckAccessibilityOfMapBlock(actionContext.Position.RightFacingPositions[0])) { action.DirectionToMove = MovementDirection.Right; return action; } } return action; } catch (Exception ex) { World.WriteDebugInformation("Player: "+ Name, string.Format("Player Generated exception: {0}",ex.Message)); throw ex; } } private bool CheckAccessibilityOfMapBlock(MapBlock block) { if (block == null || block.Accessibility == MapBlockAccessibility.AllowEntry || block.Accessibility == MapBlockAccessibility.AllowExit || block.Accessibility == MapBlockAccessibility.AllowPotentialEntry) { return true; } return false; } } It is simple and it seems to work well. The world implementation itself decides the stimulus context that is passed to he inhabitant to make an action decision. All movement is carried out on separate threads and timed appropriately to be as fair as possible and to cater for additional skills such as speed, and eventually maybe stamina, strength, with actions like fighting. It is pretty fun to make up random maps and see how your inhabitant does. You can download the code from here. Along the way I have played with parallel extensions to make the compute intensive stuff spread across all cores, had to heavily factor in visibility of methods and properties so design of classes was paramount, work out movement algorithms that play fairly in the world and properly favour the players with higher abilities, as well as a host of other issues. So that is my ‘solution kata’. If I keep going with it, I may develop a web interface for it where people can upload assemblies and watch their player within a web browser visualiser and maybe even a map designer. What do you do to keep the fires burning?

Read the article
Security in Software

The term security has many meanings based on the context and perspective in which it is used. Security from the perspective of software/system development is the continuous process of maintaining confidentiality, integrity, and availability of a system, sub-system, and system data. This definition at a very high level can be restated as the following: Computer security is a continuous process dealing with confidentiality, integrity, and availability on multiple layers of a system. Key Aspects of Software Security Integrity Confidentiality Availability Integrity within a system is the concept of ensuring only authorized users can only manipulate information through authorized methods and procedures. An example of this can be seen in a simple lead management application. If the business decided to allow each sales member to only update their own leads in the system and sales managers can update all leads in the system then an integrity violation would occur if a sales member attempted to update someone else’s leads. An integrity violation occurs when a team member attempts to update someone else’s lead because it was not entered by the sales member. This violates the business rule that leads can only be update by the originating sales member. Confidentiality within a system is the concept of preventing unauthorized access to specific information or tools. In a perfect world the knowledge of the existence of confidential information/tools would be unknown to all those who do not have access. When this this concept is applied within the context of an application only the authorized information/tools will be available. If we look at the sales lead management system again, leads can only be updated by originating sales members. If we look at this rule then we can say that all sales leads are confidential between the system and the sales person who entered the lead in to the system. The other sales team members would not need to know about the leads let alone need to access it. Availability within a system is the concept of authorized users being able to access the system. A real world example can be seen again from the lead management system. If that system was hosted on a web server then IP restriction can be put in place to limit access to the system based on the requesting IP address. If in this example all of the sales members where accessing the system from the 192.168.1.23 IP address then removing access from all other IPs would be need to ensure that improper access to the system is prevented while approved users can access the system from an authorized location. In essence if the requesting user is not coming from an authorized IP address then the system will appear unavailable to them. This is one way of controlling where a system is accessed. Through the years several design principles have been identified as being beneficial when integrating security aspects into a system. These principles in various combinations allow for a system to achieve the previously defined aspects of security based on generic architectural models. Security Design Principles Least Privilege Fail-Safe Defaults Economy of Mechanism Complete Mediation Open Design Separation Privilege Least Common Mechanism Psychological Acceptability Defense in Depth Least Privilege Design PrincipleThe Least Privilege design principle requires a minimalistic approach to granting user access rights to specific information and tools. Additionally, access rights should be time based as to limit resources access bound to the time needed to complete necessary tasks. The implications of granting access beyond this scope will allow for unnecessary access and the potential for data to be updated out of the approved context. The assigning of access rights will limit system damaging attacks from users whether they are intentional or not. This principle attempts to limit data changes and prevents potential damage from occurring by accident or error by reducing the amount of potential interactions with a resource. Fail-Safe Defaults Design PrincipleThe Fail-Safe Defaults design principle pertains to allowing access to resources based on granted access over access exclusion. This principle is a methodology for allowing resources to be accessed only if explicit access is granted to a user. By default users do not have access to any resources until access has been granted. This approach prevents unauthorized users from gaining access to resource until access is given. Economy of Mechanism Design PrincipleThe Economy of mechanism design principle requires that systems should be designed as simple and small as possible. Design and implementation errors result in unauthorized access to resources that would not be noticed during normal use. Complete Mediation Design PrincipleThe Complete Mediation design principle states that every access to every resource must be validated for authorization. Open Design Design PrincipleThe Open Design Design Principle is a concept that the security of a system and its algorithms should not be dependent on secrecy of its design or implementation Separation Privilege Design PrincipleThe separation privilege design principle requires that all resource approved resource access attempts be granted based on more than a single condition. For example a user should be validated for active status and has access to the specific resource. Least Common Mechanism Design PrincipleThe Least Common Mechanism design principle declares that mechanisms used to access resources should not be shared. Psychological Acceptability Design PrincipleThe Psychological Acceptability design principle refers to security mechanisms not make resources more difficult to access than if the security mechanisms were not present Defense in Depth Design PrincipleThe Defense in Depth design principle is a concept of layering resource access authorization verification in a system reduces the chance of a successful attack. This layered approach to resource authorization requires unauthorized users to circumvent each authorization attempt to gain access to a resource. When designing a system that requires meeting a security quality attribute architects need consider the scope of security needs and the minimum required security qualities. Not every system will need to use all of the basic security design principles but will use one or more in combination based on a company’s and architect’s threshold for system security because the existence of security in an application adds an additional layer to the overall system and can affect performance. That is why the definition of minimum security acceptably is need when a system is design because this quality attributes needs to be factored in with the other system quality attributes so that the system in question adheres to all qualities based on the priorities of the qualities. Resources: Barnum, Sean. Gegick, Michael. (2005). Least Privilege. Retrieved on August 28, 2011 from https://buildsecurityin.us-cert.gov/bsi/articles/knowledge/principles/351-BSI.html Saltzer, Jerry. (2011). BASIC PRINCIPLES OF INFORMATION PROTECTION. Retrieved on August 28, 2011 from http://web.mit.edu/Saltzer/www/publications/protection/Basic.html Barnum, Sean. Gegick, Michael. (2005). Defense in Depth. Retrieved on August 28, 2011 from https://buildsecurityin.us-cert.gov/bsi/articles/knowledge/principles/347-BSI.html Bertino, Elisa. (2005). Design Principles for Security. Retrieved on August 28, 2011 from http://homes.cerias.purdue.edu/~bhargav/cs526/security-9.pdf

Read the article
Talend Enterprise Data Integration overperforms on Oracle SPARC T4

- by Amir Javanshir

The SPARC T microprocessor, released in 2005 by Sun Microsystems, and now continued at Oracle, has a good track record in parallel execution and multi-threaded performance. However it was less suited for pure single-threaded workloads. The new SPARC T4 processor is now filling that gap by offering a 5x better single-thread performance over previous generations. Following our long-term relationship with Talend, a fast growing ISV positioned by Gartner in the “Visionaries” quadrant of the “Magic Quadrant for Data Integration Tools”, we decided to test some of their integration components with the T4 chip, more precisely on a T4-1 system, in order to verify first hand if this new processor stands up to its promises. Several tests were performed, mainly focused on: Single-thread performance of the new SPARC T4 processor compared to an older SPARC T2+ processor Overall throughput of the SPARC T4-1 server using multiple threads The tests consisted in reading large amounts of data --ten's of gigabytes--, processing and writing them back to a file or an Oracle 11gR2 database table. They are CPU, memory and IO bound tests. Given the main focus of this project --CPU performance--, bottlenecks were removed as much as possible on the memory and IO sub-systems. When possible, the data to process was put into the ZFS filesystem cache, for instance. Also, two external storage devices were directly attached to the servers under test, each one divided in two ZFS pools for read and write operations. Multi-thread: Testing throughput on the Oracle T4-1 The tests were performed with different number of simultaneous threads (1, 2, 4, 8, 12, 16, 32, 48 and 64) and using different storage devices: Flash, Fibre Channel storage, two stripped internal disks and one single internal disk. All storage devices used ZFS as filesystem and volume management. Each thread read a dedicated 1GB-large file containing 12.5M lines with the following structure: customerID;FirstName;LastName;StreetAddress;City;State;Zip;Cust_Status;Since_DT;Status_DT 1;Ronald;Reagan;South Highway;Santa Fe;Montana;98756;A;04-06-2006;09-08-2008 2;Theodore;Roosevelt;Timberlane Drive;Columbus;Louisiana;75677;A;10-05-2009;27-05-2008 3;Andrew;Madison;S Rustle St;Santa Fe;Arkansas;75677;A;29-04-2005;09-02-2008 4;Dwight;Adams;South Roosevelt Drive;Baton Rouge;Vermont;75677;A;15-02-2004;26-01-2007 […] The following graphs present the results of our tests: Unsurprisingly up to 16 threads, all files fit in the ZFS cache a.k.a L2ARC : once the cache is hot there is no performance difference depending on the underlying storage. From 16 threads upwards however, it is clear that IO becomes a bottleneck, having a good IO subsystem is thus key. Single-disk performance collapses whereas the Sun F5100 and ST6180 arrays allow the T4-1 to scale quite seamlessly. From 32 to 64 threads, the performance is almost constant with just a slow decline. For the database load tests, only the best IO configuration --using external storage devices-- were used, hosting the Oracle table spaces and redo log files. Using the Sun Storage F5100 array allows the T4-1 server to scale up to 48 parallel JVM processes before saturating the CPU. The final result is a staggering 646K lines per second insertion in an Oracle table using 48 parallel threads. Single-thread: Testing the single thread performance Seven different tests were performed on both servers. Given the fact that only one thread, thus one file was read, no IO bottleneck was involved, all data being served from the ZFS cache. Read File ? Filter ? Write File: Read file, filter data, write the filtered data in a new file. The filter is set on the “Status” column: only lines with status set to “A” are selected. This limits each output file to about 500 MB. Read File ? Load Database Table: Read file, insert into a single Oracle table. Average: Read file, compute the average of a numeric column, write the result in a new file. Division & Square Root: Read file, perform a division and square root on a numeric column, write the result data in a new file. Oracle DB Dump: Dump the content of an Oracle table (12.5M rows) into a CSV file. Transform: Read file, transform, write the result data in a new file. The transformations applied are: set the address column to upper case and add an extra column at the end, which is the concatenation of two columns. Sort: Read file, sort a numeric and alpha numeric column, write the result data in a new file. The following table and graph present the final results of the tests: Throughput unit is thousand lines per second processed (K lines/second). Improvement is the % of improvement between the T5140 and T4-1. Test T4-1 (Time s.) T5140 (Time s.) Improvement T4-1 (Throughput) T5140 (Throughput) Read/Filter/Write 125 806 645% 100 16 Read/Load Database 195 1111 570% 64 11 Average 96 557 580% 130 22 Division & Square Root 161 1054 655% 78 12 Oracle DB Dump 164 945 576% 76 13 Transform 159 1124 707% 79 11 Sort 251 1336 532% 50 9 The improvement of single-thread performance is quite dramatic: depending on the tests, the T4 is between 5.4 to 7 times faster than the T2+. It seems clear that the SPARC T4 processor has gone a long way filling the gap in single-thread performance, without sacrifying the multi-threaded capability as it still shows a very impressive scaling on heavy-duty multi-threaded jobs. Finally, as always at Oracle ISV Engineering, we are happy to help our ISV partners test their own applications on our platforms, so don't hesitate to contact us and let's see what the SPARC T4-based systems can do for your application! "As describe in this benchmark, Talend Enterprise Data Integration has overperformed on T4. I was generally happy to see that the T4 gave scaling opportunities for many scenarios like complex aggregations. Row by row insertion in Oracle DB is faster with more than 650,000 rows per seconds without using any bulk Oracle capabilities !" Cedric Carbone, Talend CTO.

Read the article
Big Data – Operational Databases Supporting Big Data – RDBMS and NoSQL – Day 12 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Cloud in the Big Data Story. In this article we will understand the role of Operational Databases Supporting Big Data Story. Even though we keep on talking about Big Data architecture, it is extremely crucial to understand that Big Data system can’t just exist in the isolation of itself. There are many needs of the business can only be fully filled with the help of the operational databases. Just having a system which can analysis big data may not solve every single data problem. Real World Example Think about this way, you are using Facebook and you have just updated your information about the current relationship status. In the next few seconds the same information is also reflected in the timeline of your partner as well as a few of the immediate friends. After a while you will notice that the same information is now also available to your remote friends. Later on when someone searches for all the relationship changes with their friends your change of the relationship will also show up in the same list. Now here is the question – do you think Big Data architecture is doing every single of these changes? Do you think that the immediate reflection of your relationship changes with your family member is also because of the technology used in Big Data. Actually the answer is Facebook uses MySQL to do various updates in the timeline as well as various events we do on their homepage. It is really difficult to part from the operational databases in any real world business. Now we will see a few of the examples of the operational databases. Relational Databases (This blog post) NoSQL Databases (This blog post) Key-Value Pair Databases (Tomorrow’s post) Document Databases (Tomorrow’s post) Columnar Databases (The Day After’s post) Graph Databases (The Day After’s post) Spatial Databases (The Day After’s post) Relational Databases We have earlier discussed about the RDBMS role in the Big Data’s story in detail so we will not cover it extensively over here. Relational Database is pretty much everywhere in most of the businesses which are here for many years. The importance and existence of the relational database are always going to be there as long as there are meaningful structured data around. There are many different kinds of relational databases for example Oracle, SQL Server, MySQL and many others. If you are looking for Open Source and widely accepted database, I suggest to try MySQL as that has been very popular in the last few years. I also suggest you to try out PostgreSQL as well. Besides many other essential qualities PostgreeSQL have very interesting licensing policies. PostgreSQL licenses allow modifications and distribution of the application in open or closed (source) form. One can make any modifications and can keep it private as well as well contribute to the community. I believe this one quality makes it much more interesting to use as well it will play very important role in future. Nonrelational Databases (NOSQL) We have also covered Nonrelational Dabases in earlier blog posts. NoSQL actually stands for Not Only SQL Databases. There are plenty of NoSQL databases out in the market and selecting the right one is always very challenging. Here are few of the properties which are very essential to consider when selecting the right NoSQL database for operational purpose. Data and Query Model Persistence of Data and Design Eventual Consistency Scalability Though above all of the properties are interesting to have in any NoSQL database but the one which most attracts to me is Eventual Consistency. Eventual Consistency RDBMS uses ACID (Atomicity, Consistency, Isolation, Durability) as a key mechanism for ensuring the data consistency, whereas NonRelational DBMS uses BASE for the same purpose. Base stands for Basically Available, Soft state and Eventual consistency. Eventual consistency is widely deployed in distributed systems. It is a consistency model used in distributed computing which expects unexpected often. In large distributed system, there are always various nodes joining and various nodes being removed as they are often using commodity servers. This happens either intentionally or accidentally. Even though one or more nodes are down, it is expected that entire system still functions normally. Applications should be able to do various updates as well as retrieval of the data successfully without any issue. Additionally, this also means that system is expected to return the same updated data anytime from all the functioning nodes. Irrespective of when any node is joining the system, if it is marked to hold some data it should contain the same updated data eventually. As per Wikipedia - Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. In other words - Informally, if no additional updates are made to a given data item, all reads to that item will eventually return the same value. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SortedDictionary and SortedList

- by Simon Cooper

Apart from Dictionary<TKey, TValue>, there's two other dictionaries in the BCL - SortedDictionary<TKey, TValue> and SortedList<TKey, TValue>. On the face of it, these two classes do the same thing - provide an IDictionary<TKey, TValue> interface where the iterator returns the items sorted by the key. So what's the difference between them, and when should you use one rather than the other? (as in my previous post, I'll assume you have some basic algorithm & datastructure knowledge) SortedDictionary We'll first cover SortedDictionary. This is implemented as a special sort of binary tree called a red-black tree. Essentially, it's a binary tree that uses various constraints on how the nodes of the tree can be arranged to ensure the tree is always roughly balanced (for more gory algorithmical details, see the wikipedia link above). What I'm concerned about in this post is how the .NET SortedDictionary is actually implemented. In .NET 4, behind the scenes, the actual implementation of the tree is delegated to a SortedSet<KeyValuePair<TKey, TValue>>. One example tree might look like this: Each node in the above tree is stored as a separate SortedSet<T>.Node object (remember, in a SortedDictionary, T is instantiated to KeyValuePair<TKey, TValue>): class Node { public bool IsRed; public T Item; public SortedSet<T>.Node Left; public SortedSet<T>.Node Right; } The SortedSet only stores a reference to the root node; all the data in the tree is accessed by traversing the Left and Right node references until you reach the node you're looking for. Each individual node can be physically stored anywhere in memory; what's important is the relationship between the nodes. This is also why there is no constructor to SortedDictionary or SortedSet that takes an integer representing the capacity; there are no internal arrays that need to be created and resized. This may seen trivial, but it's an important distinction between SortedDictionary and SortedList that I'll cover later on. And that's pretty much it; it's a standard red-black tree. Plenty of webpages and datastructure books cover the algorithms behind the tree itself far better than I could. What's interesting is the comparions between SortedDictionary and SortedList, which I'll cover at the end. As a side point, SortedDictionary has existed in the BCL ever since .NET 2. That means that, all through .NET 2, 3, and 3.5, there has been a bona-fide sorted set class in the BCL (called TreeSet). However, it was internal, so it couldn't be used outside System.dll. Only in .NET 4 was this class exposed as SortedSet. SortedList Whereas SortedDictionary didn't use any backing arrays, SortedList does. It is implemented just as the name suggests; two arrays, one containing the keys, and one the values (I've just used random letters for the values): The items in the keys array are always guarenteed to be stored in sorted order, and the value corresponding to each key is stored in the same index as the key in the values array. In this example, the value for key item 5 is 'z', and for key item 8 is 'm'. Whenever an item is inserted or removed from the SortedList, a binary search is run on the keys array to find the correct index, then all the items in the arrays are shifted to accomodate the new or removed item. For example, if the key 3 was removed, a binary search would be run to find the array index the item was at, then everything above that index would be moved down by one: and then if the key/value pair {7, 'f'} was added, a binary search would be run on the keys to find the index to insert the new item, and everything above that index would be moved up to accomodate the new item: If another item was then added, both arrays would be resized (to a length of 10) before the new item was added to the arrays. As you can see, any insertions or removals in the middle of the list require a proportion of the array contents to be moved; an O(n) operation. However, if the insertion or removal is at the end of the array (ie the largest key), then it's only O(log n); the cost of the binary search to determine it does actually need to be added to the end (excluding the occasional O(n) cost of resizing the arrays to fit more items). As a side effect of using backing arrays, SortedList offers IList Keys and Values views that simply use the backing keys or values arrays, as well as various methods utilising the array index of stored items, which SortedDictionary does not (and cannot) offer. The Comparison So, when should you use one and not the other? Well, here's the important differences: Memory usage SortedDictionary and SortedList have got very different memory profiles. SortedDictionary... has a memory overhead of one object instance, a bool, and two references per item. On 64-bit systems, this adds up to ~40 bytes, not including the stored item and the reference to it from the Node object. stores the items in separate objects that can be spread all over the heap. This helps to keep memory fragmentation low, as the individual node objects can be allocated wherever there's a spare 60 bytes. In contrast, SortedList... has no additional overhead per item (only the reference to it in the array entries), however the backing arrays can be significantly larger than you need; every time the arrays are resized they double in size. That means that if you add 513 items to a SortedList, the backing arrays will each have a length of 1024. To conteract this, the TrimExcess method resizes the arrays back down to the actual size needed, or you can simply assign list.Capacity = list.Count. stores its items in a continuous block in memory. If the list stores thousands of items, this can cause significant problems with Large Object Heap memory fragmentation as the array resizes, which SortedDictionary doesn't have. Performance Operations on a SortedDictionary always have O(log n) performance, regardless of where in the collection you're adding or removing items. In contrast, SortedList has O(n) performance when you're altering the middle of the collection. If you're adding or removing from the end (ie the largest item), then performance is O(log n), same as SortedDictionary (in practice, it will likely be slightly faster, due to the array items all being in the same area in memory, also called locality of reference). So, when should you use one and not the other? As always with these sort of things, there are no hard-and-fast rules. But generally, if you: need to access items using their index within the collection are populating the dictionary all at once from sorted data aren't adding or removing keys once it's populated then use a SortedList. But if you: don't know how many items are going to be in the dictionary are populating the dictionary from random, unsorted data are adding & removing items randomly then use a SortedDictionary. The default (again, there's no definite rules on these sort of things!) should be to use SortedDictionary, unless there's a good reason to use SortedList, due to the bad performance of SortedList when altering the middle of the collection.

Read the article
A Good Developer is So Hard to Find

- by James Michael Hare

Let me start out by saying I want to damn the writers of the Toughest Developer Puzzle Ever – 2. It is eating every last shred of my free time! But as I've been churning through each puzzle and marvelling at the brain teasers and trivia within, I began to think about interviewing developers and why it seems to be so hard to find good ones. The problem is, it seems like no matter how hard we try to find the perfect way to separate the chaff from the wheat, inevitably someone will get hired who falls far short of expectations or someone will get passed over for missing a piece of trivia or a tricky brain teaser that could have been an excellent team member. In shops that are primarily software-producing businesses or other heavily IT-oriented businesses (Microsoft, Amazon, etc) there often exists a much tighter bond between HR and the hiring development staff because development is their life-blood. Unfortunately, many of us work in places where IT is viewed as a cost or just a means to an end. In these shops, too often, HR and development staff may work against each other due to differences in opinion as to what a good developer is or what one is worth. It seems that if you ask two different people what makes a good developer, often you will get three different opinions. With the exception of those shops that are purely development-centric (you guys have it much easier!), most other shops have management who have very little knowledge about the development process. Their view can often be that development is simply a skill that one learns and then once aquired, that developer can produce widgets as good as the next like workers on an assembly-line floor. On the other side, you have many developers that feel that software development is an art unto itself and that the ability to create the most pure design or know the most obscure of keywords or write the shortest-possible obfuscated piece of code is a good coder. So is it a skill? An Art? Or something entirely in between? Saying that software is merely a skill and one just needs to learn the syntax and tools would be akin to saying anyone who knows English and can use Word can write a 300 page book that is accurate, meaningful, and stays true to the point. This just isn't so. It takes more than mere skill to take words and form a sentence, join those sentences into paragraphs, and those paragraphs into a document. I've interviewed candidates who could answer obscure syntax and keyword questions and once they were hired could not code effectively at all. So development must be more than a skill. But on the other end, we have art. Is development an art? Is our end result to produce art? I can marvel at a piece of code -- see it as concise and beautiful -- and yet that code most perform some stated function with accuracy and efficiency and maintainability. None of these three things have anything to do with art, per se. Art is beauty for its own sake and is a wonderful thing. But if you apply that same though to development it just doesn't hold. I've had developers tell me that all that matters is the end result and how you code it is entirely part of the art and I couldn't disagree more. Yes, the end result, the accuracy, is the prime criteria to be met. But if code is not maintainable and efficient, it would be just as useless as a beautiful car that breaks down once a week or that gets 2 miles to the gallon. Yes, it may work in that it moves you from point A to point B and is pretty as hell, but if it can't be maintained or is not efficient, it's not a good solution. So development must be something less than art. In the end, I think I feel like development is a matter of craftsmanship. We use our tools and we use our skills and set about to construct something that satisfies a purpose and yet is also elegant and efficient. There is skill involved, and there is an art, but really it boils down to being able to craft code. Crafting code is far more than writing code. Anyone can write code if they know the syntax, but so few people can actually craft code that solves a purpose and craft it well. So this is what I want to find, I want to find code craftsman! But how? I used to ask coding-trivia questions a long time ago and many people still fall back on this. The thought is that if you ask the candidate some piece of coding trivia and they know the answer it must follow that they can craft good code. For example: What C++ keyword can be applied to a class/struct field to allow it to be changed even from a const-instance of that class/struct? (answer: mutable) So what do we prove if a candidate can answer this? Only that they know what mutable means. One would hope that this would infer that they'd know how to use it, and more importantly when and if it should ever be used! But it rarely does! The problem with triva questions is that you will either: Approve a really good developer who knows what some obscure keyword is (good) Reject a really good developer who never needed to use that keyword or is too inexperienced to know how to use it (bad) Approve a really bad developer who googled "C++ Interview Questions" and studied like hell but can't craft (very bad) Many HR departments love these kind of tests because they are short and easy to defend if a legal issue arrises on hiring decisions. After all it's easy to say a person wasn't hired because they scored 30 out of 100 on some trivia test. But unfortunately, you've eliminated a large part of your potential developer pool and possibly hired a few duds. There are times I've hired candidates who knew every trivia question I could throw out them and couldn't craft. And then there are times I've interviewed candidates who failed all my trivia but who I took a chance on who were my best finds ever. So if not trivia, then what? Brain teasers? The thought is, these type of questions measure the thinking power of a candidate. The problem is, once again, you will either: Approve a good candidate who has never heard the problem and can solve it (good) Reject a good candidate who just happens not to see the "catch" because they're nervous or it may be really obscure (bad) Approve a candidate who has studied enough interview brain teasers (once again, you can google em) to recognize the "catch" or knows the answer already (bad). Once again, you're eliminating good candidates and possibly accepting bad candidates. In these cases, I think testing someone with brain teasers only tests their ability to answer brain teasers, not the ability to craft code. So how do we measure someone's ability to craft code? Here's a novel idea: have them code! Give them a computer and a compiler, or a whiteboard and a pen, or paper and pencil and have them construct a piece of code. It just makes sense that if we're going to hire someone to code we should actually watch them code. When they're done, we can judge them on several criteria: Correctness - does the candidate's solution accurately solve the problem proposed? Accuracy - is the candidate's solution reasonably syntactically correct? Efficiency - did the candidate write or use the more efficient data structures or algorithms for the job? Maintainability - was the candidate's code free of obfuscation and clever tricks that diminish readability? Persona - are they eager and willing or aloof and egotistical? Will they work well within your team? It may sound simple, or it may sound crazy, but when I'm looking to hire a developer, I want to see them actually develop well-crafted code.

Read the article
Fraud Detection with the SQL Server Suite Part 2

- by Dejan Sarka

This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

Read the article
Tuning Red Gate: #2 of Many

- by Grant Fritchey

In the last installment, I used the SQL Monitor tool to get a snapshot view of the current state of the servers at Red Gate that are giving us trouble. That snapshot suggested some areas where I should focus some time, primarily in which queries were being called most frequently or were running the longest. But, you don't want to just run off & start tuning queries. Remember, the foundation for query tuning is the server itself. So, I want to be sure I'm not looking at some major hardware or configuration issues that I need to address first. Rather than look at the current status of the server, I'm going to look at historical data. Clicking on the Analysis tab of SQL Monitor I get a whole list of counters that I can look at. More importantly, I can look at them over a period of time. Even more importantly, I can compare past periods with current periods to see if we're looking at a progressive issue or not. There are counters here that will give me an indication of load, and there are counters here that will tell me specifics about that load. First, I want to just look at the load to understand where the pain points might be. Trying to drill down before you have detailed information is just bad planning. First thing I'm going to check is the CPU, just to see what's up there. I have two servers I'm interested in, so I'll show you both: Looking at the last 30 days for both servers, well, let's just say that the first server is about what I would expect. It has an average baseline behavior with occasional, regular, peaks. This looks like a system with a fairly steady & predictable load that probably has a nightly batch process that spikes the processor. In short, normal stuff. The points there where the CPU drops radically. that might be worth investigating further because something changed the processing on this system a lot. But the first server. It's all over the place. There's no steady CPU behavior at all. It's spike high for long periods of time. It's up, it's down. I'm really going to have to spend time looking at CPU issues on this server to try to figure out what's up. It might be other processes being shared on the server, it might be something else. Either way, I'm going to have to spend time evaluating this CPU, especially those peeks about a week ago. Looking at the Pages/sec, again, just a measure of load, I see that there are some peaks on the rg-sql02 server, but over all, it looks like a fairly standard load. Plus, the peaks are only up to 550 pages/sec. Remember, this isn't a performance measure, but just a load measurement, but from this, I don't think we're looking at major memory issues, but I may want to correlate these counters with the CPU counters. Again, the other server looks like there's stuff going on. The load is not at all consistent. In fact there was a point earlier in the year that looks pretty severe. Plus the spikes here are twice the size of the other system. We've got a lot more load going on here and I will probably need to drill down on memory usage on this server. Taking a look at the disk transfers/sec the load on both systems seems to roughly correspond to the other load indicators. Notice that drop right in the middle of the graph for rg-sql02. I wonder if the office was closed over that period or a system was down for maintenance. If I saw spikes in memory or disk that corresponded to the drip in CPU, you can assume something was using those other resources and causing a drop, but when everything goes down, it just means that the system isn't gettting used. The disk on the rg-sql01 system isn't spiking exactly the same way as the memory & cpu, so there's a good chance (chance mind you) that any performance issues might not be disk related. However, notice that huge jump at the beginning of the month. Several disks were used more than they were for the rest of the month. That's the load on the server. What about the load on SQL Server itself? Next time.

Read the article
Movement prediction for non-shooters

- by ShadowChaser

I'm working on an isometric 2D game with moderate-scale multiplayer, approximately 20-30 players connected at once to a persistent server. I've had some difficulty getting a good movement prediction implementation in place. Physics/Movement The game doesn't have a true physics implementation, but uses the basic principles to implement movement. Rather than continually polling input, state changes (ie/ mouse down/up/move events) are used to change the state of the character entity the player is controlling. The player's direction (ie/ north-east) is combined with a constant speed and turned into a true 3D vector - the entity's velocity. In the main game loop, "Update" is called before "Draw". The update logic triggers a "physics update task" that tracks all entities with a non-zero velocity uses very basic integration to change the entities position. For example: entity.Position += entity.Velocity.Scale(ElapsedTime.Seconds) (where "Seconds" is a floating point value, but the same approach would work for millisecond integer values). The key point is that no interpolation is used for movement - the rudimentary physics engine has no concept of a "previous state" or "current state", only a position and velocity. State Change and Update Packets When the velocity of the character entity the player is controlling changes, a "move avatar" packet is sent to the server containing the entity's action type (stand, walk, run), direction (north-east), and current position. This is different from how 3D first person games work. In a 3D game the velocity (direction) can change frame to frame as the player moves around. Sending every state change would effectively transmit a packet per frame, which would be too expensive. Instead, 3D games seem to ignore state changes and send "state update" packets on a fixed interval - say, every 80-150ms. Since speed and direction updates occur much less frequently in my game, I can get away with sending every state change. Although all of the physics simulations occur at the same speed and are deterministic, latency is still an issue. For that reason, I send out routine position update packets (similar to a 3D game) but much less frequently - right now every 250ms, but I suspect with good prediction I can easily boost it towards 500ms. The biggest problem is that I've now deviated from the norm - all other documentation, guides, and samples online send routine updates and interpolate between the two states. It seems incompatible with my architecture, and I need to come up with a better movement prediction algorithm that is closer to a (very basic) "networked physics" architecture. The server then receives the packet and determines the players speed from it's movement type based on a script (Is the player able to run? Get the player's running speed). Once it has the speed, it combines it with the direction to get a vector - the entity's velocity. Some cheat detection and basic validation occurs, and the entity on the server side is updated with the current velocity, direction, and position. Basic throttling is also performed to prevent players from flooding the server with movement requests. After updating its own entity, the server broadcasts an "avatar position update" packet to all other players within range. The position update packet is used to update the client side physics simulations (world state) of the remote clients and perform prediction and lag compensation. Prediction and Lag Compensation As mentioned above, clients are authoritative for their own position. Except in cases of cheating or anomalies, the client's avatar will never be repositioned by the server. No extrapolation ("move now and correct later") is required for the client's avatar - what the player sees is correct. However, some sort of extrapolation or interpolation is required for all remote entities that are moving. Some sort of prediction and/or lag-compensation is clearly required within the client's local simulation / physics engine. Problems I've been struggling with various algorithms, and have a number of questions and problems: Should I be extrapolating, interpolating, or both? My "gut feeling" is that I should be using pure extrapolation based on velocity. State change is received by the client, client computes a "predicted" velocity that compensates for lag, and the regular physics system does the rest. However, it feels at odds to all other sample code and articles - they all seem to store a number of states and perform interpolation without a physics engine. When a packet arrives, I've tried interpolating the packet's position with the packet's velocity over a fixed time period (say, 200ms). I then take the difference between the interpolated position and the current "error" position to compute a new vector and place that on the entity instead of the velocity that was sent. However, the assumption is that another packet will arrive in that time interval, and it's incredibly difficult to "guess" when the next packet will arrive - especially since they don't all arrive on fixed intervals (ie/ state changes as well). Is the concept fundamentally flawed, or is it correct but needs some fixes / adjustments? What happens when a remote player stops? I can immediately stop the entity, but it will be positioned in the "wrong" spot until it moves again. If I estimate a vector or try to interpolate, I have an issue because I don't store the previous state - the physics engine has no way to say "you need to stop after you reach position X". It simply understands a velocity, nothing more complex. I'm reluctant to add the "packet movement state" information to the entities or physics engine, since it violates basic design principles and bleeds network code across the rest of the game engine. What should happen when entities collide? There are three scenarios - the controlling player collides locally, two entities collide on the server during a position update, or a remote entity update collides on the local client. In all cases I'm uncertain how to handle the collision - aside from cheating, both states are "correct" but at different time periods. In the case of a remote entity it doesn't make sense to draw it walking through a wall, so I perform collision detection on the local client and cause it to "stop". Based on point #2 above, I might compute a "corrected vector" that continually tries to move the entity "through the wall" which will never succeed - the remote avatar is stuck there until the error gets too high and it "snaps" into position. How do games work around this?

Read the article
The SPARC SuperCluster

- by Karoly Vegh

Oracle has been providing a lead in the Engineered Systems business for quite a while now, in accordance with the motto "Hardware and Software Engineered to Work Together." Indeed it is hard to find a better definition of these systems. Allow me to summarize the idea. It is: Build a compute platform optimized to run your technologies Develop application aware, intelligently caching storage components Take an impressively fast network technology interconnecting it with the compute nodes Tune the application to scale with the nodes to yet unseen performance Reduce the amount of data moving via compression Provide this all in a pre-integrated single product with a single-pane management interface All these ideas have been around in IT for quite some time now. The real Oracle advantage is adding the last one to put these all together. Oracle has built quite a portfolio of Engineered Systems, to run its technologies - and run those like they never ran before. In this post I'll focus on one of them that serves as a consolidation demigod, a multi-purpose engineered system. As you probably have guessed, I am talking about the SPARC SuperCluster. It has many great features inherited from its predecessors, and it adds several new ones. Allow me to pick out and elaborate about some of the most interesting ones from a technological point of view. I. It is the SPARC SuperCluster T4-4. That is, as compute nodes, it includes SPARC T4-4 servers that we learned to appreciate and respect for their features: The SPARC T4 CPUs: Each CPU has 8 cores, each core runs 8 threads. The SPARC T4-4 servers have 4 sockets. That is, a single compute node can in parallel, simultaneously execute 256 threads. Now, a full-rack SPARC SuperCluster has 4 of these servers on board. Remember the keyword demigod. While retaining the forerunner SPARC T3's exceptional throughput, the SPARC T4 CPUs raise the bar with single performance too - a humble 5x better one than their ancestors. actually, the SPARC T4 CPU cores run in both single-threaded and multi-threaded mode, and switch between these two on-the-fly, fulfilling not only single-threaded OR multi-threaded applications' needs, but even mixed requirements (like in database workloads!). Data security, anyone? Every SPARC T4 CPU core has a built-in encryption engine, that is, encryption algorithms cast into silicon. A PCI controller right on the chip for customers who need I/O performance. Built-in, no-cost Virtualization: Oracle VM for SPARC (the former LDoms or Logical Domains) is not a server-emulation virtualization technology but rather a serverpartitioning one, the hypervisor runs in the server firmware, and all the VMs' HW resources (I/O, CPU, memory) are accessed natively, without performance overhead. This enables customers to run a number of Solaris 10 and Solaris 11 VMs separated, independent of each other within a physical server II. For Database performance, it includes Exadata Storage Cells - one of the main reasons why the Exadata Database Machine performs at diabolic speed. What makes them important? They provide DB backend storage for your Oracle Databases to run on the SPARC SuperCluster, that is what they are built and tuned for DB performance. These storage cells are SQL-aware. That is, if a SPARC T4 database compute node executes a query, it doesn't simply request tons of raw datablocks from the storage, filters the received data, and throws away most of it where the statement doesn't apply, but provides the SQL query to the storage node too. The storage cell software speaks SQL, that is, it is able to prefilter and through that transfer only the relevant data. With this, the traffic between database nodes and storage cells is reduced immensely. Less I/O is a good thing - as they say, all the CPUs of the world do one thing just as fast as any other - and that is waiting for I/O. They don't only pre-filter, but also provide data preprocessing features - e.g. if a DB-node requests an aggregate of data, they can calculate it, and handover only the results, not the whole set. Again, less data to transfer. They support the magical HCC, (Hybrid Columnar Compression). That is, data can be stored in a precompressed form on the storage. Less data to transfer. Of course one can't simply rely on disks for performance, there is Flash Storage included there for caching. III. The low latency, high-speed backbone network: InfiniBand, that interconnects all the members with: Real High Speed: 40 Gbit/s. Full Duplex, of course. Oh, and a really low latency. RDMA. Remote Direct Memory Access. This technology allows the DB nodes to do exactly that. Remotely, directly placing SQL commands into the Memory of the storage cells. Dodging all the network-stack bottlenecks, avoiding overhead, placing requests directly into the process queue. You can also run IP over InfiniBand if you please - that's the way the compute nodes can communicate with each other. IV. Including a general-purpose storage too: the ZFSSA, which is a unified storage, providing NAS and SAN access too, with the following features: NFS over RDMA over InfiniBand. Nothing is faster network-filesystem-wise. All the ZFS features onboard, hybrid storage pools, compression, deduplication, snapshot, replication, NFS and CIFS shares Storageheads in a HA-Cluster configuration providing availability of the data DTrace Live Analytics in a web-based Administration UI Being a general purpose application data storage for your non-database applications running on the SPARC SuperCluster over whichever protocol they prefer, easily replicating, snapshotting, cloning data for them. There's a lot of great technology included in Oracle's SPARC SuperCluster, we have talked its interior through. As for external scalability: you can start with a half- of full- rack SPARC SuperCluster, and scale out to several racks - that is, stacking not separate full-rack SPARC SuperClusters, but extending always one large instance of the size of several full-racks. Yes, over InfiniBand network. Add racks as you grow. What technologies shall run on it? SPARC SuperCluster is a general purpose scaleout consolidation/cloud environment. You can run Oracle Databases with RAC scaling, or Oracle Weblogic (end enjoy the SPARC T4's advantages to run Java). Remember, Oracle technologies have been integrated with the Oracle Engineered Systems - this is the Oracle on Oracle advantage. But you can run other software environments such as SAP if you please too. Run any application that runs on Oracle Solaris 10 or Solaris 11. Separate them in Virtual Machines, or even Oracle Solaris Zones, monitor and manage those from a central UI. Here the key takeaways once again: The SPARC SuperCluster: Is a pre-integrated Engineered System Contains SPARC T4-4 servers with built-in virtualization, cryptography, dynamic threading Contains the Exadata storage cells that intelligently offload the burden of the DB-nodes Contains a highly available ZFS Storage Appliance, that provides SAN/NAS storage in a unified way Combines all these elements over a high-speed, low-latency backbone network implemented with InfiniBand Can grow from a single half-rack to several full-rack size Supports the consolidation of hundreds of applications To summarize: All these technologies are great by themselves, but the real value is like in every other Oracle Engineered System: Integration. All these technologies are tuned to perform together. Together they are way more than the sum of all - and a careful and actually very time consuming integration process is necessary to orchestrate all these for performance. The SPARC SuperCluster's goal is to enable infrastructure operations and offer a pre-integrated solution that can be architected and delivered in hours instead of months of evaluations and tests. The tedious and most importantly time and resource consuming part of the work - testing and evaluating - has been done. Now go, provide services. -- charlie

Read the article
Elegance, thy Name is jQuery

- by SGWellens

So, I'm browsing though some questions over on the Stack Overflow website and I found a good jQuery question just a few minutes old. Here is a link to it. It was a tough question; I knew that by answering it, I could learn new stuff and reinforce what I already knew: Reading is good, doing is better. Maybe I could help someone in the process too. I cut and pasted the HTML from the question into my Visual Studio IDE and went back to Stack Overflow to reread the question. Dang, someone had already answered it! And it was a great answer. I never even had a chance to start analyzing the issue. Now I know what a one-legged man feels like in an ass-kicking contest. Nevertheless, since the question and answer were so interesting, I decided to dissect them and learn as much as possible. The HTML consisted of some divs separated by h3 headings. Note the elements are laid out sequentially with no programmatic grouping: <h3 class="heading">Heading 1</h3> <div>Content</div> <div>More content</div> <div>Even more content</div><h3 class="heading">Heading 2</h3> <div>some content</div> <div>some more content</div><h3 class="heading">Heading 3</h3> <div>other content</div></form></body> The requirement was to wrap a div around each h3 heading and the subsequent divs grouping them into sections. Why? I don't know, I suppose if you screen-scrapped some HTML from another site, you might want to reformat it before displaying it on your own. Anyways… Here is the marvelously, succinct posted answer: $('.heading').each(function(){ $(this).nextUntil('.heading').andSelf().wrapAll('<div class="section">');}); I was familiar with all the parts except for nextUntil and andSelf. But, I'll analyze the whole answer for completeness. I'll do this by rewriting the posted answer in a different style and adding a boat-load of comments: function Test(){ // $Sections is a jQuery object and it will contain three elements var $Sections = $('.heading'); // use each to iterate over each of the three elements $Sections.each(function () { // $this is a jquery object containing the current element // being iterated var $this = $(this); // nextUntil gets the following sibling elements until it reaches // an element with the CSS class 'heading' // andSelf adds in the source element (this) to the collection $this = $this.nextUntil('.heading').andSelf(); // wrap the elements with a div $this.wrapAll('<div class="section" >'); });} The code here doesn't look nearly as concise and elegant as the original answer. However, unless you and your staff are jQuery masters, during development it really helps to work through algorithms step by step. You can step through this code in the debugger and examine the jQuery objects to make sure one step is working before proceeding on to the next. It's much easier to debug and troubleshoot when each logical coding step is a separate line of code. Note: You may think the original code runs much faster than this version. However, the time difference is trivial: Not enough to worry about: Less than 1 millisecond (tested in IE and FF). Note: You may want to jam everything into one line because it results in less traffic being sent to the client. That is true. However, most Internet servers now compress HTML and JavaScript by stripping out comments and white space (go to Bing or Google and view the source). This feature should be enabled on your server: Let the server compress your code, you don't need to do it. Free Career Advice: Creating maintainable code is Job One—Maximum Priority—The Prime Directive. If you find yourself suddenly transferred to customer support, it may be that the code you are writing is not as readable as it could be and not as readable as it should be. Moving on… I created a CSS class to enhance the results: .section{ background-color: yellow; border: 2px solid black; margin: 5px;} Here is the rendered output before: …and after the jQuery code runs. Pretty Cool! But, while playing with this code, the logic of nextUntil began to bother me: What happens in the last section? What stops elements from being collected since there are no more elements with the .heading class? The answer is nothing. In this case it stopped collecting elements because it was at the end of the page. But what if there were additional HTML elements? I added an anchor tag and another div to the HTML: <h3 class="heading">Heading 1</h3> <div>Content</div> <div>More content</div> <div>Even more content</div><h3 class="heading">Heading 2</h3> <div>some content</div> <div>some more content</div><h3 class="heading">Heading 3</h3> <div>other content</div><a>this is a link</a><div>unrelated div</div> </form></body> The code as-is will include both the anchor and the unrelated div. This isn't what we want. My first attempt to correct this used the filter parameter of the nextUntil function: nextUntil('.heading', 'div') This will only collect div elements. But it merely skipped the anchor tag and it still collected the unrelated div: The problem is we need a way to tell the nextUntil function when to stop. CSS selectors to the rescue! nextUntil('.heading, a') This tells nextUntil to stop collecting elements when it gets to an element with a .heading class OR when it gets to an anchor tag. In this case it solved the problem. FYI: The comma operator in a CSS selector allows multiple criteria. Bingo! One final note, we could have broken the code down even more: We could have replaced the andSelf function here: $this = $this.nextUntil('.heading, a').andSelf(); With this: // get all the following siblings and then add the current item$this = $this.nextUntil('.heading, a');$this.add(this); But in this case, the andSelf function reads real nice. In my opinion. Here's a link to a jsFiddle if you want to play with it. I hope someone finds this useful Steve Wellens CodeProject

Read the article
SQL SERVER – Weekly Series – Memory Lane – #034

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 UDF – User Defined Function to Strip HTML – Parse HTML – No Regular Expression The UDF used in the blog does fantastic task – it scans entire HTML text and removes all the HTML tags. It keeps only valid text data without HTML task. This is one of the quite commonly requested tasks many developers have to face everyday. De-fragmentation of Database at Operating System to Improve Performance Operating system skips MDF file while defragging the entire filesystem of the operating system. It is absolutely fine and there is no impact of the same on performance. Read the entire blog post for my conversation with our network engineers. Delay Function – WAITFOR clause – Delay Execution of Commands How do you delay execution of the commands in SQL Server – ofcourse by using WAITFOR keyword. In this blog post, I explain the same with the help of T-SQL script. Find Length of Text Field To measure the length of TEXT fields the function is DATALENGTH(textfield). Len will not work for text field. As of SQL Server 2005, developers should migrate all the text fields to VARCHAR(MAX) as that is the way forward. Retrieve Current Date Time in SQL Server CURRENT_TIMESTAMP, GETDATE(), {fn NOW()} There are three ways to retrieve the current datetime in SQL SERVER. CURRENT_TIMESTAMP, GETDATE(), {fn NOW()} Explanation and Comparison of NULLIF and ISNULL An interesting observation is NULLIF returns null if it comparison is successful, whereas ISNULL returns not null if its comparison is successful. In one way they are opposite to each other. Here is my question to you - How to create infinite loop using NULLIF and ISNULL? If this is even possible? 2008 Introduction to SERVERPROPERTY and example SERVERPROPERTY is a very interesting system function. It returns many of the system values. I use it very frequently to get different server values like Server Collation, Server Name etc. SQL Server Start Time We can use DMV to find out what is the start time of SQL Server in 2008 and later version. In this blog you can see how you can do the same. Find Current Identity of Table Many times we need to know what is the current identity of the column. I have found one of my developers using aggregated function MAX () to find the current identity. However, I prefer following DBCC command to figure out current identity. Create Check Constraint on Column Some time we just need to create a simple constraint over the table but I have noticed that developers do many different things to make table column follow rules than just creating constraint. I suggest constraint is a very useful concept and every SQL Developer should pay good attention to this subject. 2009 List Schema Name and Table Name for Database This is one of the blog post where I straight forward display script. One of the kind of blog posts, which I still love to read and write. Clustered Index on Separate Drive From Table Location A table devoid of primary key index is called heap, and here data is not arranged in a particular order, which gives rise to issues that adversely affect performance. Data must be stored in some kind of order. If we put clustered index on it then the order will be forced by that index and the data will be stored in that particular order. Understanding Table Hints with Examples Hints are options and strong suggestions specified for enforcement by the SQL Server query processor on DML statements. The hints override any execution plan the query optimizer might select for a query. 2010 Data Pages in Buffer Pool – Data Stored in Memory Cache One of my earlier year article, which I still read it many times and point developers to read it again. It is clear from the Resultset that when more than one index is used, datapages related to both or all of the indexes are stored in Memory Cache separately. TRANSACTION, DML and Schema Locks Can you create a situation where you can see Schema Lock? Well, this is a very simple question, however during the interview I notice over 50 candidates failed to come up with the scenario. In this blog post, I have demonstrated the situation where we can see the schema lock in database. 2011 Solution – Puzzle – Statistics are not updated but are Created Once In this example I have created following situation: Create Table Insert 1000 Records Check the Statistics Now insert 10 times more 10,000 indexes Check the Statistics – it will be NOT updated Auto Update Statistics and Auto Create Statistics for database is TRUE Now I have requested two things in the example 1) Why this is happening? 2) How to fix this issue? Selecting Domain from Email Address This is a straight to script blog post where I explain how to select only domain name from entire email address. Solution – Generating Zero Without using Any Numbers in T-SQL How to get zero digit without using any digit? This is indeed a very interesting question and the answer is even interesting. Try to come up with answer in next 10 minutes and if you can’t come up with the answer the blog post read this post for solution. 2012 Simple Explanation and Puzzle with SOUNDEX Function and DIFFERENCE Function In simple words - SOUNDEX converts an alphanumeric string to a four-character code to find similar-sounding words or names. DIFFERENCE function returns an integer value. The integer returned is the number of characters in the SOUNDEX values that are the same. Read Only Files and SQL Server Management Studio (SSMS) I have come across a very interesting feature in SSMS related to “Read Only” files. I believe it is a little unknown feature as well so decided to write a blog about the same. Identifying Column Data Type of uniqueidentifier without Querying System Tables How do I know if any table has a uniqueidentifier column and what is its value without using any DMV or System Catalogues? Only information you know is the table name and you are allowed to return any kind of error if the table does not have uniqueidentifier column. Read the blog post to find the answer. Solution – User Not Able to See Any User Created Object in Tables – Security and Permissions Issue Interesting question – “When I try to connect to SQL Server, it lets me connect just fine as well let me open and explore the database. I noticed that I do not see any user created instances but when my colleague attempts to connect to the server, he is able to explore the database as well see all the user created tables and other objects. Can you help me fix it?” Importing CSV File Into Database – SQL in Sixty Seconds #018 – Video Here is interesting small 60 second video on how to import CSV file into Database. ColumnStore Index – Batch Mode vs Row Mode Here is the logic behind when Columnstore Index uses Batch Mode and when it uses Row Mode. A batch typically represents about 1000 rows of data. Batch mode processing also uses algorithms that are optimized for the multicore CPUs and increased memory throughput. Follow up – Usage of $rowguid and $IDENTITY This is an excellent follow up blog post of my earlier blog post where I explain where to use $rowguid and $identity. If you do not know the difference between them, this is a blog with a script example. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
CodePlex Daily Summary for Tuesday, July 02, 2013

CodePlex Daily Summary for Tuesday, July 02, 2013Popular ReleasesMastersign.Expressions: Mastersign.Expressions v0.4.2: added support for if(<cond>, <true-part>, <false-part>) fixed multithreading issue with rand() improved demo applicationNB_Store - Free DotNetNuke Ecommerce Catalog Module: NB_Store v2.3.6 Rel0: v2.3.6 Is now DNN6 and DNN7 compatible Important : During update this install with overwrite the menu.xml setting, if you have changed this then make a backup before you upgrade and reapply your changes after the upgrade. Please view the following documentation if you are installing and configuring this module for the first time System Requirements Skill requirements Downloads and documents Step by step guide to a working store Please ask all questions in the Discussions tab. Document.Editor: 2013.26: What's new for Document.Editor 2013.26: New Insert Chart Improved User Interface Minor Bug Fix's, improvements and speed upsWsus Package Publisher: Release V1.2.1307.01: Fix an issue in the UI, approvals are not shown correctly in the 'Report' tabDirectX Tool Kit: July 2013: July 1, 2013 VS 2013 Preview projects added and updates for DirectXMath 3.05 vectorcall Added use of sRGB WIC metadata for JPEG, PNG, and TIFF SaveToWIC functions updated with new optional setCustomProps parameter and error check with optional targetFormatCore Server 2012 Powershell Script Hyper-v Manager: new_root.zip: Verison 1.0JSON Toolkit: JSON Toolkit 4.1.736: Improved strinfigy performance New serializing feature New anonymous type support in constructorsDotNetNuke® IFrame: IFrame 04.05.00: New DNN6/7 Manifest file and Azure Compatibility.VidCoder: 1.5.2 Beta: Fixed crash on presets with an invalid bitrate.Gardens Point LEX: Gardens Point LEX version 1.2.1: The main distribution is a zip file. This contains the binary executable, documentation, source code and the examples. ChangesVersion 1.2.1 has new facilities for defining and manipulating character classes. These changes make the construction of large Unicode character classes more convenient. The runtime code for performing automaton backup has been re-implemented, and is now faster for scanners that need backup. Source CodeThe distribution contains a complete VS2010 project for the appli...ZXMAK2: Version 2.7.5.7: - fix TZX emulation (Bruce Lee, Zynaps) - fix ATM 16 colors for border - add memory module PROFI 512K; add PROFI V03 rom image; fix PROFI 3.XX configTwitter image Downloader: Twitter Image Downloader 2 with Installer: Application file with Install shield and Dot Net 4.0 redistributableUltimate Music Tagger: Ultimate Music Tagger 1.0.0.0: First release of Ultimate Music TaggerBlackJumboDog: Ver5.9.2: 2013.06.28 Ver5.9.2 (1) ??????????(????SMTP?????)?????????? (2) HTTPS???????????Outlook 2013 Add-In: Configuration Form: This new version includes the following changes: - Refactored code a bit. - Removing configuration from main form to gain more space to display items. - Moved configuration to separate form. You can click the little "gear" icon to access the configuration form (still very simple). - Added option to show past day appointments from the selected day (previous in time, that is). - Added some tooltips. You will have to uninstall the previous version (add/remove programs) if you had installed it ...Terminals: Version 3.0 - Release: Changes since version 2.0:Choose 100% portable or installed version Removed connection warning when running RDP 8 (Windows 8) client Fixed Active directory search Extended Active directory search by LDAP filters Fixed single instance mode when running on Windows Terminal server Merged usage of Tags and Groups Added columns sorting option in tables No UAC prompts on Windows 7 Completely new file persistence data layer New MS SQL persistence layer (Store data in SQL database)...NuGet: NuGet 2.6: Released June 26, 2013. Release notes: http://docs.nuget.org/docs/release-notes/nuget-2.6Python Tools for Visual Studio: 2.0 Beta: We’re pleased to announce the release of Python Tools for Visual Studio 2.0 Beta. Python Tools for Visual Studio (PTVS) is an open-source plug-in for Visual Studio which supports programming with the Python language. PTVS supports a broad range of features including CPython/IronPython, Edit/Intellisense/Debug/Profile, Cloud, HPC, IPython, and cross platform debugging support. For a quick overview of the general IDE experience, please watch this video: http://www.youtube.com/watch?v=TuewiStN...Player Framework by Microsoft: Player Framework for Windows 8 and WP8 (v1.3 beta): Preview: New MPEG DASH adaptive streaming plugin for Windows Azure Media Services Preview: New Ultraviolet CFF plugin. Preview: New WP7 version with WP8 compatibility. (source code only) Source code is now available via CodePlex Git Misc bug fixes and improvements: WP8 only: Added optional fullscreen and mute buttons to default xaml JS only: protecting currentTime from returning infinity. Some videos would cause currentTime to be infinity which could cause errors in plugins expectin...AssaultCube Reloaded: 2.5.8: SERVER OWNERS: note that the default maprot has changed once again. Linux has Ubuntu 11.10 32-bit precompiled binaries and Ubuntu 10.10 64-bit precompiled binaries, but you can compile your own as it also contains the source. If you are using Mac or other operating systems, please wait while we continue to try to package for those OSes. Or better yet, try to compile it. If it fails, download a virtual machine. The server pack is ready for both Windows and Linux, but you might need to compi...New ProjectsALM Rangers DevOps Tooling and Guidance: Practical tooling and guidance that will enable teams to realize a faster deployment based on continuous feedback.Core Server 2012 Powershell Script Hyper-v Manager: Free core Server 2012 powershell scripts and batch files that replace the non-existent hyper-v manager, vmconnect and mstsc.Enhanced Deployment Service (EDS): EDS is a web service based utility designed to extend the deployment capabilities of administrators with the Microsoft Deployment Toolkit.ExtendedDialogBox: Libreria DialogBoxJazdy: This project is here only because we wanted to take advantage of a public git server.Mon Examen: This web interface is meant to make examinationsneet: summaryOrchard Multi-Choice Voting: A multiple choice voting Orchard module.Particle Swarm Optimization Solving Quadratic Assignment Problem: This project is submitted for the solving of QAP using PSO algorithms with addition of some modification Porjects: 23123123PPL Power Pack: PPL Power PackProperty Builder: Visual Studio tool for speeding up process of coding class properties getters and setters.RedRuler for Redline: I tried some on-screen rulers, none of them help me measure the UI element quickly based on the Redline. So I decided to created this handy RedRuler tool. Royale Living: Mahindra Royale Community PortalSearch and booking Hotel or Tours: Ð? án nghiên c?u c?a sinh viên tdt theo mô hình mvc 4SystemBuilder.Show: This tool is a helper after you create your project in visual studio to create the respective objects and interface. TalentDesk: new ptojectTcmplex: The Training Center teaches many different kind of course such as English, French, Computer hardware and computer softwareTFS Reporting Guide: Provides guidance and samples to enable TFS users to generate reports based on WIT data.Umbraco AdaptiveImages: Adaptive Images Package for UmbracoVirtualNet - A ILcode interpreter/emulator written in C++/Assembly: VirtualNet is a interpreter/emulator for running .net code in native without having to install the .Net FrameWorkVisual Blocks: Visual Blocks ????IDE ????? ??????? ????? ????/?? Visual Studio and Cloud Based Mobile Device Testing: Practical guidance enabling field to remove blockers to adoption and to use and extend the Perfecto Mobile Cloud Device testing within the context of VS.Windows 8 Time Picker for Windows Phone: A Windows Phone implementation of the Time Picker control found in the Windows 8.1 Alarms app.???? - SmallBasic?: ?????????

Read the article
Elegance, thy Name is jQuery

- by SGWellens

So, I'm browsing though some questions over on the Stack Overflow website and I found a good jQuery question just a few minutes old. Here is a link to it. It was a tough question; I knew that by answering it, I could learn new stuff and reinforce what I already knew: Reading is good, doing is better. Maybe I could help someone in the process too. I cut and pasted the HTML from the question into my Visual Studio IDE and went back to Stack Overflow to reread the question. Dang, someone had already answered it! And it was a great answer. I never even had a chance to start analyzing the issue. Now I know what a one-legged man feels like in an ass-kicking contest. Nevertheless, since the question and answer were so interesting, I decided to dissect them and learn as much as possible. The HTML consisted of some divs separated by h3 headings. Note the elements are laid out sequentially with no programmatic grouping: <h3 class="heading">Heading 1</h3> <div>Content</div> <div>More content</div> <div>Even more content</div><h3 class="heading">Heading 2</h3> <div>some content</div> <div>some more content</div><h3 class="heading">Heading 3</h3> <div>other content</div></form></body> The requirement was to wrap a div around each h3 heading and the subsequent divs grouping them into sections. Why? I don't know, I suppose if you screen-scrapped some HTML from another site, you might want to reformat it before displaying it on your own. Anyways… Here is the marvelously, succinct posted answer: $('.heading').each(function(){ $(this).nextUntil('.heading').andSelf().wrapAll('<div class="section">');}); I was familiar with all the parts except for nextUntil and andSelf. But, I'll analyze the whole answer for completeness. I'll do this by rewriting the posted answer in a different style and adding a boat-load of comments: function Test(){ // $Sections is a jQuery object and it will contain three elements var $Sections = $('.heading'); // use each to iterate over each of the three elements $Sections.each(function () { // $this is a jquery object containing the current element // being iterated var $this = $(this); // nextUntil gets the following sibling elements until it reaches // an element with the CSS class 'heading' // andSelf adds in the source element (this) to the collection $this = $this.nextUntil('.heading').andSelf(); // wrap the elements with a div $this.wrapAll('<div class="section" >'); });} The code here doesn't look nearly as concise and elegant as the original answer. However, unless you and your staff are jQuery masters, during development it really helps to work through algorithms step by step. You can step through this code in the debugger and examine the jQuery objects to make sure one step is working before proceeding on to the next. It's much easier to debug and troubleshoot when each logical coding step is a separate line. Note: You may think the original code runs much faster than this version. However, the time difference is trivial: Not enough to worry about: Less than 1 millisecond (tested in IE and FF). Note: You may want to jam everything into one line because it results in less traffic being sent to the client. That is true. However, most Internet servers now compress HTML and JavaScript by stripping out comments and white space (go to Bing or Google and view the source). This feature should be enabled on your server: Let the server compress your code, you don't need to do it. Free Career Advice: Creating maintainable code is Job One—Maximum Priority—The Prime Directive. If you find yourself suddenly transferred to customer support, it may be that the code you are writing is not as readable as it could be and not as readable as it should be. Moving on… I created a CSS class to see the results: .section{ background-color: yellow; border: 2px solid black; margin: 5px;} Here is the rendered output before: …and after the jQuery code runs. Pretty Cool! But, while playing with this code, the logic of nextUntil began to bother me: What happens in the last section? What stops elements from being collected since there are no more elements with the .heading class? The answer is nothing. In this case it stopped because it was at the end of the page. But what if there were additional HTML elements? I added an anchor tag and another div to the HTML: <h3 class="heading">Heading 1</h3> <div>Content</div> <div>More content</div> <div>Even more content</div><h3 class="heading">Heading 2</h3> <div>some content</div> <div>some more content</div><h3 class="heading">Heading 3</h3> <div>other content</div><a>this is a link</a><div>unrelated div</div> </form></body> The code as-is will include both the anchor and the unrelated div. This isn't what we want. My first attempt to correct this used the filter parameter of the nextUntil function: nextUntil('.heading', 'div') This will only collect div elements. But it merely skipped the anchor tag and it still collected the unrelated div: The problem is we need a way to tell the nextUntil function when to stop. CSS selectors to the rescue: nextUntil('.heading, a') This tells nextUntil to stop collecting sibling elements when it gets to an element with a .heading class OR when it gets to an anchor tag. In this case it solved the problem. FYI: The comma operator in a CSS selector allows multiple criteria. Bingo! One final note, we could have broken the code down even more: We could have replaced the andSelf function here: $this = $this.nextUntil('.heading, a').andSelf(); With this: // get all the following siblings and then add the current item$this = $this.nextUntil('.heading, a');$this.add(this); But in this case, the andSelf function reads real nice. In my opinion. Here's a link to a jsFiddle if you want to play with it. I hope someone finds this useful Steve Wellens CodeProject

Read the article
Death March

- by Nick Harrison

It is a horrible sight to watch a project fail. There are few things as bad. Watching a project fail regardless of the reason is almost like sitting in a room with a "Dementor" from Harry Potter. It will literally suck all of the life and joy out of the room. Nearly every project that I have seen fail has failed because of political challenges or management challenges. Sometimes there are technical challenges that bring a project to its knees, but usually projects fail for less technical reasons. Here a few observations about projects failing for political reasons. Both the client and the consultants have to be committed to seeing the project succeed. Put simply, you cannot solve a problem when the primary stake holders do not truly want it solved. This could come from a consultant being more interested in extended the engagement. It could come from a client being afraid of what will happen to them once the problem is solved. It could come from disenfranchised stake holders. Sometimes a project is beset on all sides. When you find yourself working on a project that has this kind of threat, do all that you can to constrain the disruptive influences of the bad apples. If their influence cannot be constrained, you truly have no choice but to move on to a new project. Tough choices have to be made to make a project successful. These choices will affect everyone involved in the project. These choices may involve users not getting a change request through that they want. Developers may not get to use the tools that they want. Everyone may have to put in more hours that they originally planned. Steps may be skipped. Compromises will be made, but if everyone stays committed to the end goal, you can still be successful. If individuals start feeling disgruntled or resentful of the compromises reached, the project can easily be derailed. When everyone is not working towards a common goal, it is like driving with one foot on the break and one foot on the accelerator. Not only will you not get to where you are planning, you will also damage the car and possibly the passengers as well. It is important to always keep the end result in mind. Regardless of the development methodology being followed, the end goal is not comprehensive documentation. In all cases, it is working software. Comprehensive documentation is nice but useless if the software doesn't work. You can never get so distracted by the next goal that you fail to meet the current goal. Most projects are ultimately marathons. This means that the pace must be sustainable. Regardless of the temptations, you cannot burn the team alive. Processes will fail. Technology will get outdated. Requirements will change, but your people will adapt and learn and grow. If everyone on the team from the most senior analyst to the most junior recruit trusts and respects each other, there is no challenge that they cannot overcome. When everyone involved faces challenges with the attitude "This is my project and I will not let it fail" "You are my teammate and I will not let you fail", you will in fact not fail. When you find a team that embraces this attitude, protect it at all cost. Edward Yourdon wrote a book called Death March. In it, he included a graph for categorizing Death March project types based on the Happiness of the Team and the Chances of Success. Chances are we have all worked on Death March projects. We will all most likely work on more Death March projects in the future. To a certain extent, they seem to be inevitable, but they should never be suicide or ugly. Ideally, they can all be "Mission Impossible" where everyone works hard, has fun, and knows that there is good chance that they will succeed. If you are ever lucky enough to work on such a project, you will know that sense of pride that comes from the eventual success. You will recognize a profound bond with the team that you worked with. Chances are it will change your life or at least your outlook on life. If you have not already read this book, get a copy and study it closely. It will help you survive and make the most out of your next Death March project.

Read the article
The Great Divorce

- by BlackRabbitCoder

I have a confession to make: I've been in an abusive relationship for more than 17 years now. Yes, I am not ashamed to admit it, but I'm finally doing something about it. I met her in college, she was new and sexy and amazingly fast -- and I'd never met anything like her before. Her style and her power captivated me and I couldn't wait to learn more about her. I took a chance on her, and though I learned a lot from her -- and will always be grateful for my time with her -- I think it's time to move on. Her name was C++, and she so outshone my previous love, C, that any thoughts of going back evaporated in the heat of this new romance. She promised me she'd be gentle and not hurt me the way C did. She promised me she'd clean-up after herself better than C did. She promised me she'd be less enigmatic and easier to keep happy than C was. But I was deceived. Oh sure, as far as truth goes, it wasn't a complete lie. To some extent she was more fun, more powerful, safer, and easier to maintain. But it just wasn't good enough -- or at least it's not good enough now. I loved C++, some part of me still does, it's my first-love of programming languages and I recognize its raw power, its blazing speed, and its improvements over its predecessor. But with today's hardware, at speeds we could only dream to conceive of twenty years ago, that need for speed -- at the cost of all else -- has died, and that has left my feelings for C++ moribund. If I ever need to write an operating system or a device driver, then I might need that speed. But 99% of the time I don't. I'm a business-type programmer and chances are 90% of you are too, and even the ones who need speed at all costs may be surprised by how much you sacrifice for that. That's not to say that I don't want my software to perform, and it's not to say that in the business world we don't care about speed or that our job is somehow less difficult or technical. There's many times we write programs to handle millions of real-time updates or handle thousands of financial transactions or tracking trading algorithms where every second counts. But if I choose to write my code in C++ purely for speed chances are I'll never notice the speed increase -- and equally true chances are it will be far more prone to crash and far less easy to maintain. Nearly without fail, it's the macro-optimizations you need, not the micro-optimizations. If I choose to write a O(n2) algorithm when I could have used a O(n) algorithm -- that can kill me. If I choose to go to the database to load a piece of unchanging data every time instead of caching it on first load -- that too can kill me. And if I cross the network multiple times for pieces of data instead of getting it all at once -- yes that can also kill me. But choosing an overly powerful and dangerous mid-level language to squeeze out every last drop of performance will realistically not make stock orders process any faster, and more likely than not open up the system to more risk of crashes and resource leaks. And that's when my love for C++ began to die. When I noticed that I didn't need that speed anymore. That that speed was really kind of a lie. Sure, I can be super efficient and pack bits in a byte instead of using separate boolean values. Sure, I can use an unsigned char instead of an int. But in the grand scheme of things it doesn't matter as much as you think it does. The key is maintainability, and that's where C++ failed me. I like to tell the other developers I work with that there's two levels of correctness in coding: Is it immediately correct? Will it stay correct? That is, you can hack together any piece of code and make it correct to satisfy a task at hand, but if a new developer can't come in tomorrow and make a fairly significant change to it without jeopardizing that correctness, it won't stay correct. Some people laugh at me when I say I now prefer maintainability over speed. But that is exactly the point. If you focus solely on speed you tend to produce code that is much harder to maintain over the long hall, and that's a load of technical debt most shops can't afford to carry and end up completely scrapping code before it's time. When good code is written well for maintainability, though, it can be correct both now and in the future. And you know the best part is? My new love is nearly as fast as C++, and in some cases even faster -- and better than that, I know C# will treat me right. Her creators have poured hundreds of thousands of hours of time into making her the sexy beast she is today. They made her easy to understand and not an enigmatic mess. They made her consistent and not moody and amorphous. And they made her perform as fast as I care to go by optimizing her both at compile time and a run-time. Her code is so elegant and easy on the eyes that I'm not worried where she will run to or what she'll pull behind my back. She is powerful enough to handle all my tasks, fast enough to execute them with blazing speed, maintainable enough so that I can rely on even fairly new peers to modify my work, and rich enough to allow me to satisfy any need. C# doesn't ask me to clean up her messes! She cleans up after herself and she tries to make my life easier for me by taking on most of those optimization tasks C++ asked me to take upon myself. Now, there are many of you who would say that I am the cause of my own grief, that it was my fault C++ didn't behave because I didn't pay enough attention to her. That I alone caused the pain she inflicted on me. And to some extent, you have a point. But she was so high maintenance, requiring me to know every twist and turn of her vast and unrestrained power that any wrong term or bout of forgetfulness was met with painful reminders that she wasn't going to watch my back when I made a mistake. But C#, she loves me when I'm good, and she loves me when I'm bad, and together we make beautiful code that is both fast and safe. So that's why I'm leaving C++ behind. She says she's changing for me, but I have no interest in what C++0x may bring. Oh, I'll still keep in touch, and maybe I'll see her now and again when she brings her problems to my door and asks for some attention -- for I always have a soft spot for her, you see. But she's out of my house now. I have three kids and a dog and a cat, and all require me to clean up after them, why should I have to clean up after my programming language as well?

Read the article
Some non-generic collections

- by Simon Cooper

Although the collections classes introduced in .NET 2, 3.5 and 4 cover most scenarios, there are still some .NET 1 collections that don't have generic counterparts. In this post, I'll be examining what they do, why you might use them, and some things you'll need to bear in mind when doing so. BitArray System.Collections.BitArray is conceptually the same as a List<bool>, but whereas List<bool> stores each boolean in a single byte (as that's what the backing bool[] does), BitArray uses a single bit to store each value, and uses various bitmasks to access each bit individually. This means that BitArray is eight times smaller than a List<bool>. Furthermore, BitArray has some useful functions for bitmasks, like And, Xor and Not, and it's not limited to 32 or 64 bits; a BitArray can hold as many bits as you need. However, it's not all roses and kittens. There are some fundamental limitations you have to bear in mind when using BitArray: It's a non-generic collection. The enumerator returns object (a boxed boolean), rather than an unboxed bool. This means that if you do this: foreach (bool b in bitArray) { ... } Every single boolean value will be boxed, then unboxed. And if you do this: foreach (var b in bitArray) { ... } you'll have to manually unbox b on every iteration, as it'll come out of the enumerator an object. Instead, you should manually iterate over the collection using a for loop: for (int i=0; i<bitArray.Length; i++) { bool b = bitArray[i]; ... } Following on from that, if you want to use BitArray in the context of an IEnumerable<bool>, ICollection<bool> or IList<bool>, you'll need to write a wrapper class, or use the Enumerable.Cast<bool> extension method (although Cast would box and unbox every value you get out of it). There is no Add or Remove method. You specify the number of bits you need in the constructor, and that's what you get. You can change the length yourself using the Length property setter though. It doesn't implement IList. Although not really important if you're writing a generic wrapper around it, it is something to bear in mind if you're using it with pre-generic code. However, if you use BitArray carefully, it can provide significant gains over a List<bool> for functionality and efficiency of space. OrderedDictionary System.Collections.Specialized.OrderedDictionary does exactly what you would expect - it's an IDictionary that maintains items in the order they are added. It does this by storing key/value pairs in a Hashtable (to get O(1) key lookup) and an ArrayList (to maintain the order). You can access values by key or index, and insert or remove items at a particular index. The enumerator returns items in index order. However, the Keys and Values properties return ICollection, not IList, as you might expect; CopyTo doesn't maintain the same ordering, as it copies from the backing Hashtable, not ArrayList; and any operations that insert or remove items from the middle of the collection are O(n), just like a normal list. In short; don't use this class. If you need some sort of ordered dictionary, it would be better to write your own generic dictionary combining a Dictionary<TKey, TValue> and List<KeyValuePair<TKey, TValue>> or List<TKey> for your specific situation. ListDictionary and HybridDictionary To look at why you might want to use ListDictionary or HybridDictionary, we need to examine the performance of these dictionaries compared to Hashtable and Dictionary<object, object>. For this test, I added n items to each collection, then randomly accessed n/2 items: So, what's going on here? Well, ListDictionary is implemented as a linked list of key/value pairs; all operations on the dictionary require an O(n) search through the list. However, for small n, the constant factor that big-o notation doesn't measure is much lower than the hashing overhead of Hashtable or Dictionary. HybridDictionary combines a Hashtable and ListDictionary; for small n, it uses a backing ListDictionary, but switches to a Hashtable when it gets to 9 items (you can see the point it switches from a ListDictionary to Hashtable in the graph). Apart from that, it's got very similar performance to Hashtable. So why would you want to use either of these? In short, you wouldn't. Any gain in performance by using ListDictionary over Dictionary<TKey, TValue> would be offset by the generic dictionary not having to cast or box the items you store, something the graphs above don't measure. Only if the performance of the dictionary is vital, the dictionary will hold less than 30 items, and you don't need type safety, would you use ListDictionary over the generic Dictionary. And even then, there's probably more useful performance gains you can make elsewhere.

Read the article
Design Pattern for Complex Data Modeling

- by Aaron Hayman

I'm developing a program that has a SQL database as a backing store. As a very broad description, the program itself allows a user to generate records in any number of user-defined tables and make connections between them. As for specs: Any record generated must be able to be connected to any other record in any other user table (excluding itself...the record, not the table). These "connections" are directional, and the list of connections a record has is user ordered. Moreover, a record must "know" of connections made from it to others as well as connections made to it from others. The connections are kind of the point of this program, so there is a strong possibility that the number of connections made is very high, especially if the user is using the software as intended. A record's field can also include aggregate information from it's connections (like obtaining average, sum, etc) that must be updated on change from another record it's connected to. To conserve memory, only relevant information must be loaded at any one time (can't load the entire database in memory at load and go from there). I cannot assume the backing store is local. Right now it is, but eventually this program will include syncing to a remote db. Neither the user tables, connections or records are known at design time as they are user generated. I've spent a lot of time trying to figure out how to design the backing store and the object model to best fit these specs. In my first design attempt on this, I had one object managing all a table's records and connections. I attempted this first because it kept the memory footprint smaller (records and connections were simple dicts), but maintaining aggregate and link information between tables became....onerous (ie...a huge spaghettified mess). Tracing dependencies using this method almost became impossible. Instead, I've settled on a distributed graph model where each record and connection is 'aware' of what's around it by managing it own data and connections to other records. Doing this increases my memory footprint but also let me create a faulting system so connections/records aren't loaded into memory until they're needed. It's also much easier to code: trace dependencies, eliminate cycling recursive updates, etc. My biggest problem is storing/loading the connections. I'm not happy with any of my current solutions/ideas so I wanted to ask and see if anybody else has any ideas of how this should be structured. Connections are fairly simple. They contain: fromRecordID, fromTableID, fromRecordOrder, toRecordID, toTableID, toRecordOrder. Here's what I've come up with so far: Store all the connections in one big table. If I do this, either I load all connections at once (one big db call) or make a call every time a user table is loaded. The big issue here: the size of the connections table has the potential to be huge, and I'm afraid it would slow things down. Store in separate tables all the outgoing connections for each user table. This is probably the worst idea I've had. Now my connections are 'spread out' over multiple tables (one for each user table), which means I have to make a separate DB called to each table (or make a huge join) just to find all the incoming connections for a particular user table. I've avoided making "one big ass table", but I'm not sure the cost is worth it. Store in separate tables all outgoing AND incoming connections for each user table (using a flag to distinguish between incoming vs outgoing). This is the idea I'm leaning towards, but it will essentially double the total DB storage for all the connections (as each connection will be stored in two tables). It also means I have to make sure connection information is kept in sync in both places. This is obviously not ideal but it does mean that when I load a user table, I only need to load one 'connection' table and have all the information I need. This also presents a separate problem, that of connection object creation. Since each user table has a list of all connections, there are two opportunities for a connection object to be made. However, connections objects (designed to facilitate communication between records) should only be created once. This means I'll have to devise a common caching/factory object to make sure only one connection object is made per connection. Does anybody have any ideas of a better way to do this? Once I've committed to a particular design pattern I'm pretty much stuck with it, so I want to make sure I've come up with the best one possible.

Read the article
ODI 12c - Aggregating Data

- by David Allan

This posting will look at the aggregation component that was introduced in ODI 12c. For many ETL tool users this shouldn't be a big surprise, its a little different than ODI 11g but for good reason. You can use this component for composing data with relational like operations such as sum, average and so forth. Also, Oracle SQL supports special functions called Analytic SQL functions, you can use a specially configured aggregation component or the expression component for these now in ODI 12c. In database systems an aggregate transformation is a transformation where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning - that's exactly the purpose of the aggregate component. In the image below you can see the aggregate component in action within a mapping, for how this and a few other examples are built look at the ODI 12c Aggregation Viewlet here - the viewlet illustrates a simple aggregation being built and then some Oracle analytic SQL such as AVG(EMP.SAL) OVER (PARTITION BY EMP.DEPTNO) built using both the aggregate component and the expression component. In 11g you used to just write the aggregate expression directly on the target, this made life easy for some cases, but it wan't a very obvious gesture plus had other drawbacks with ordering of transformations (agg before join/lookup. after set and so forth) and supporting analytic SQL for example - there are a lot of postings from creative folks working around this in 11g - anything from customizing KMs, to bypassing aggregation analysis in the ODI code generator. The aggregate component has a few interesting aspects. 1. Firstly and foremost it defines the attributes projected from it - ODI automatically will perform the grouping all you do is define the aggregation expressions for those columns aggregated. In 12c you can control this automatic grouping behavior so that you get the code you desire, so you can indicate that an attribute should not be included in the group by, that's what I did in the analytic SQL example using the aggregate component. 2. The component has a few other properties of interest; it has a HAVING clause and a manual group by clause. The HAVING clause includes a predicate used to filter rows resulting from the GROUP BY clause. Because it acts on the results of the GROUP BY clause, aggregation functions can be used in the HAVING clause predicate, in 11g the filter was overloaded and used for both having clause and filter clause, this is no longer the case. If a filter is after an aggregate, it is after the aggregate (not sometimes after, sometimes having). 3. The manual group by clause let's you use special database grouping grammar if you need to. For example Oracle has a wealth of highly specialized grouping capabilities for data warehousing such as the CUBE function. If you want to use specialized functions like that you can manually define the code here. The example below shows the use of a manual group from an example in the Oracle database data warehousing guide where the SUM aggregate function is used along with the CUBE function in the group by clause. The SQL I am trying to generate looks like the following from the data warehousing guide; SELECT channel_desc, calendar_month_desc, countries.country_iso_code, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$ FROM sales, customers, times, channels, countries WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND sales.channel_id= channels.channel_id AND customers.country_id = countries.country_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2000-09', '2000-10') AND countries.country_iso_code IN ('GB', 'US') GROUP BY CUBE(channel_desc, calendar_month_desc, countries.country_iso_code); I can capture the source datastores, the filters and joins using ODI's dataset (or as a traditional flow) which enables us to incrementally design the mapping and the aggregate component for the sum and group by as follows; In the above mapping you can see the joins and filters declared in ODI's dataset, allowing you to capture the relationships of the datastores required in an entity-relationship style just like ODI 11g. The mix of ODI's declarative design and the common flow design provides for a familiar design experience. The example below illustrates flow design (basic arbitrary ordering) - a table load where only the employees who have maximum commission are loaded into a target. The maximum commission is retrieved from the bonus datastore and there is a look using employees as the driving table and only those with maximum commission projected. Hopefully this has given you a taster for some of the new capabilities provided by the aggregate component in ODI 12c. In summary, the actions should be much more consistent in behavior and more easily discoverable for users, the use of the components in a flow graph also supports arbitrary designs and the tool (rather than the interface designer) takes care of the realization using ODI's knowledge modules. Interested to know if a deep dive into each component is interesting for folks. Any thoughts?

Read the article
MapRedux - PowerShell and Big Data

- by Dittenhafer Solutions

MapRedux – #PowerShell and #Big Data Have you been hearing about “big data”, “map reduce” and other large scale computing terms over the past couple of years and been curious to dig into more detail? Have you read some of the Apache Hadoop online documentation and unfortunately concluded that it wasn't feasible to setup a “test” hadoop environment on your machine? More recently, I have read about some of Microsoft’s work to enable Hadoop on the Azure cloud. Being a "Microsoft"-leaning technologist, I am more inclinded to be successful with experimentation when on the Windows platform. Of course, it is not that I am "religious" about one set of technologies other another, but rather more experienced. Anyway, within the past couple of weeks I have been thinking about PowerShell a bit more as the 2012 PowerShell Scripting Games approach and it occured to me that PowerShell's support for Windows Remote Management (WinRM), and some other inherent features of PowerShell might lend themselves particularly well to a simple implementation of the MapReduce framework. I fired up my PowerShell ISE and started writing just to see where it would take me. Quite simply, the ScriptBlock feature combined with the ability of Invoke-Command to create remote jobs on networked servers provides much of the plumbing of a distributed computing environment. There are some limiting factors of course. Microsoft provided some default settings which prevent PowerShell from taking over a network without administrative approval first. But even with just one adjustment, a given Windows-based machine can become a node in a MapReduce-style distributed computing environment. Ok, so enough introduction. Let's talk about the code. First, any machine that will participate as a remote "node" will need WinRM enabled for remote access, as shown below. This is not exactly practical for hundreds of intended nodes, but for one (or five) machines in a test environment it does just fine. C:> winrm quickconfig WinRM is not set up to receive requests on this machine. The following changes must be made: Set the WinRM service type to auto start. Start the WinRM service. Make these changes [y/n]? y Alternatively, you could take the approach described in the Remotely enable PSRemoting post from the TechNet forum and use PowerShell to create remote scheduled tasks that will call Enable-PSRemoting on each intended node. Invoke-MapRedux Moving on, now that you have one or more remote "nodes" enabled, you can consider the actual Map and Reduce algorithms. Consider the following snippet: $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose Invoke-MapRedux takes an instance of a MapReduceItem which references the Map and Reduce scriptblocks, an array of computer names which are the remote nodes, and the initial data set to be processed. As simple as that, you can start working with concepts of big data and the MapReduce paradigm. Now, how did we get there? I have published the initial version of my PsMapRedux PowerShell Module on GitHub. The PsMapRedux module provides the Invoke-MapRedux function described above. Feel free to browse the underlying code and even contribute to the project! In a later post, I plan to show some of the inner workings of the module, but for now let's move on to how the Map and Reduce functions are defined. Map Both the Map and Reduce functions need to follow a prescribed prototype. The prototype for a Map function in the MapRedux module is as follows. A simple scriptblock that takes one PsObject parameter and returns a hashtable. It is important to note that the PsObject $dataset parameter is a MapRedux custom object that has a "Data" property which offers an array of data to be processed by the Map function. $aMap = { Param ( [PsObject] $dataset ) # Indicate the job is running on the remote node. Write-Host ($env:computername + "::Map"); # The hashtable to return $list = @{}; # ... Perform the mapping work and prepare the $list hashtable result with your custom PSObject... # ... The $dataset has a single 'Data' property which contains an array of data rows # which is a subset of the originally submitted data set. # Return the hashtable (Key, PSObject) Write-Output $list; } Reduce Likewise, with the Reduce function a simple prototype must be followed which takes a $key and a result $dataset from the MapRedux's partitioning function (which joins the Map results by key). Again, the $dataset is a MapRedux custom object that has a "Data" property as described in the Map section. $aReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) # The hashtable to return $redux = @{}; # Return Write-Output $redux; } All Together Now When everything is put together in a short example script, you implement your Map and Reduce functions, query for some starting data, build the MapReduxItem via New-MapReduxItem and call Invoke-MapRedux to get the process started: # Import the MapRedux and SQL Server providers Import-Module "MapRedux" Import-Module “sqlps” -DisableNameChecking # Query the database for a dataset Set-Location SQLSERVER:\sql\dbserver1\default\databases\myDb $query = "SELECT MyKey, Date, Value1 FROM BigData ORDER BY MyKey"; Write-Host "Query: $query" $dataset = Invoke-SqlCmd -query $query # Build the Map function $MyMap = { Param ( [PsObject] $dataset ) Write-Host ($env:computername + "::Map"); $list = @{}; foreach($row in $dataset.Data) { # Write-Host ("Key: " + $row.MyKey.ToString()); if($list.ContainsKey($row.MyKey) -eq $true) { $s = $list.Item($row.MyKey); $s.Sum += $row.Value1; $s.Count++; } else { $s = New-Object PSObject; $s | Add-Member -Type NoteProperty -Name MyKey -Value $row.MyKey; $s | Add-Member -type NoteProperty -Name Sum -Value $row.Value1; $list.Add($row.MyKey, $s); } } Write-Output $list; } $MyReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) $redux = @{}; $count = 0; foreach($s in $dataset.Data) { $sum += $s.Sum; $count += 1; } # Reduce $redux.Add($s.MyKey, $sum / $count); # Return Write-Output $redux; } # Create the item data $Mr = New-MapReduxItem "My Test MapReduce Job" $MyMap $MyReduce # Array of processing nodes... $MyNodes = ("node1", "node2", "node3", "node4", "localhost") # Run the Map Reduce routine... $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose # Show the results Set-Location C:\ $MyMrResults | Out-GridView Conclusion I hope you have seen through this article that PowerShell has a significant infrastructure available for distributed computing. While it does take some code to expose a MapReduce-style framework, much of the work is already done and PowerShell could prove to be the the easiest platform to develop and run big data jobs in your corporate data center, potentially in the Azure cloud, or certainly as an academic excerise at home or school. Follow me on Twitter to stay up to date on the continuing progress of my Powershell MapRedux module, and thanks for reading! Daniel

Read the article
Augmenting your Social Efforts via Data as a Service (DaaS)

- by Mike Stiles

The following is the 3rd in a series of posts on the value of leveraging social data across your enterprise by Oracle VP Product Development Don Springer and Oracle Cloud Data and Insight Service Sr. Director Product Management Niraj Deo. In this post, we will discuss the approach and value of integrating additional “public” data via a cloud-based Data-as-as-Service platform (or DaaS) to augment your Socially Enabled Big Data Analytics and CX Management. Let’s assume you have a functional Social-CRM platform in place. You are now successfully and continuously listening and learning from your customers and key constituents in Social Media, you are identifying relevant posts and following up with direct engagement where warranted (both 1:1, 1:community, 1:all), and you are starting to integrate signals for communication into your appropriate Customer Experience (CX) Management systems as well as insights for analysis in your business intelligence application. What is the next step? Augmenting Social Data with other Public Data for More Advanced Analytics When we say advanced analytics, we are talking about understanding causality and correlation from a wide variety, volume and velocity of data to Key Performance Indicators (KPI) to achieve and optimize business value. And in some cases, to predict future performance to make appropriate course corrections and change the outcome to your advantage while you can. The data to acquire, process and analyze this is very nuanced: It can vary across structured, semi-structured, and unstructured data It can span across content, profile, and communities of profiles data It is increasingly public, curated and user generated The key is not just getting the data, but making it value-added data and using it to help discover the insights to connect to and improve your KPIs. As we spend time working with our larger customers on advanced analytics, we have seen a need arise for more business applications to have the ability to ingest and use “quality” curated, social, transactional reference data and corresponding insights. The challenge for the enterprise has been getting this data inline into an easily accessible system and providing the contextual integration of the underlying data enriched with insights to be exported into the enterprise’s business applications. The following diagram shows the requirements for this next generation data and insights service or (DaaS): Some quick points on these requirements: Public Data, which in this context is about Common Business Entities, such as - Customers, Suppliers, Partners, Competitors (all are organizations) Contacts, Consumers, Employees (all are people) Products, Brands This data can be broadly categorized incrementally as - Base Utility data (address, industry classification) Public Master Reference data (trade style, hierarchy) Social/Web data (News, Feeds, Graph) Transactional Data generated by enterprise process, workflows etc. This Data has traits of high-volume, variety, velocity etc., and the technology needed to efficiently integrate this data for your needs includes - Change management of Public Reference Data across all categories Applied Big Data to extract statics as well as real-time insights Knowledge Diagnostics and Data Mining As you consider how to deploy this solution, many of our customers will be using an online “cloud” service that provides quality data and insights uniformly to all their necessary applications. In addition, they are requesting a service that is: Agile and Easy to Use: Applications integrated with the service can obtain data on-demand, quickly and simply Cost-effective: Pre-integrated into applications so customers don’t have to Has High Data Quality: Single point access to reference data for data quality and linkages to transactional, curated and social data Supports Data Governance: Becomes more manageable and cost-effective since control of data privacy and compliance can be enforced in a centralized place Data-as-a-Service (DaaS) Just as the cloud has transformed and now offers a better path for how an enterprise manages its IT from their infrastructure, platform, and software (IaaS, PaaS, and SaaS), the next step is data (DaaS). Over the last 3 years, we have seen the market begin to offer a cloud-based data service and gain initial traction. On one side of the DaaS continuum, we see an “appliance” type of service that provides a single, reliable source of accurate business data plus social information about accounts, leads, contacts, etc. On the other side of the continuum we see more of an online market “exchange” approach where ISVs and Data Publishers can publish and sell premium datasets within the exchange, with the exchange providing a rich set of web interfaces to improve the ease of data integration. Why the difference? It depends on the provider’s philosophy on how fast the rate of commoditization of certain data types will occur. How do you decide the best approach? Our perspective, as shown in the diagram below, is that the enterprise should develop an elastic schema to support multi-domain applicability. This allows the enterprise to take the most flexible approach to harness the speed and breadth of public data to achieve value. The key tenet of the proposed approach is that an enterprise carefully federates common utility, master reference data end points, mobility considerations and content processing, so that they are pervasively available. One way you may already be familiar with this approach is in how you do Address Verification treatments for accounts, contacts etc. If you design and revise this service in such a way that it is also easily available to social analytic needs, you could extend this to launch geo-location based social use cases (marketing, sales etc.). Our fundamental belief is that value-added data achieved through enrichment with specialized algorithms, as well as applying business “know-how” to weight-factor KPIs based on innovative combinations across an ever-increasing variety, volume and velocity of data, will be where real value is achieved. Essentially, Data-as-a-Service becomes a single entry point for the ever-increasing richness and volume of public data, with enrichment and combined capabilities to extract and integrate the right data from the right sources with the right factoring at the right time for faster decision-making and action within your core business applications. As more data becomes available (and in many cases commoditized), this value-added data processing approach will provide you with ongoing competitive advantage. Let’s look at a quick example of creating a master reference relationship that could be used as an input for a variety of your already existing business applications. In phase 1, a simple master relationship is achieved between a company (e.g. General Motors) and a variety of car brands’ social insights. The reference data allows for easy sort, export and integration into a set of CRM use cases for analytics, sales and marketing CRM. In phase 2, as you create more data relationships (e.g. competitors, contacts, other brands) to have broader and deeper references (social profiles, social meta-data) for more use cases across CRM, HCM, SRM, etc. This is just the tip of the iceberg, as the amount of master reference relationships is constrained only by your imagination and the availability of quality curated data you have to work with. DaaS is just now emerging onto the marketplace as the next step in cloud transformation. For some of you, this may be the first you have heard about it. Let us know if you have questions, or perspectives. In the meantime, we will continue to share insights as we can.Photo: Erik Araujo, stock.xchng

Read the article

Search Results

Search found 4176 results on 168 pages for 'graph algorithms'.

Page 156/168 | < Previous Page | 152 153 154 155 156 157 158 159 160 161 162 163 | Next Page >

- by me

- by [email protected]

- by Pinal Dave

- by Tarun Arora

- by Glav

- by Amir Javanshir

- by Pinal Dave

- by Simon Cooper

- by James Michael Hare

- by Dejan Sarka

- by Grant Fritchey

- by ShadowChaser

- by Karoly Vegh

- by SGWellens

- by Pinal Dave

- by SGWellens

- by Nick Harrison

- by BlackRabbitCoder

- by Simon Cooper

- by Aaron Hayman

- by David Allan

- by Dittenhafer Solutions

- by Mike Stiles

< Previous Page | 152 153 154 155 156 157 158 159 160 161 162 163 | Next Page >