Search Results

Search found 9439 results on 378 pages for 'telepath needed'.

Page 119/378 | < Previous Page | 115 116 117 118 119 120 121 122 123 124 125 126 | Next Page >

How To - Guide to Importing Data from a MySQL Database to Excel using MySQL for Excel

- by Javier Treviño

Fetching data from a database to then get it into an Excel spreadsheet to do analysis, reporting, transforming, sharing, etc. is a very common task among users. There are several ways to extract data from a MySQL database to then import it to Excel; for example you can use the MySQL Connector/ODBC to configure an ODBC connection to a MySQL database, then in Excel use the Data Connection Wizard to select the database and table from which you want to extract data from, then specify what worksheet you want to put the data into. Another way is to somehow dump a comma delimited text file with the data from a MySQL table (using the MySQL Command Line Client, MySQL Workbench, etc.) to then in Excel open the file using the Text Import Wizard to attempt to correctly split the data in columns. These methods are fine, but involve some degree of technical knowledge to make the magic happen and involve repeating several steps each time data needs to be imported from a MySQL table to an Excel spreadsheet. So, can this be done in an easier and faster way? With MySQL for Excel you can. MySQL for Excel features an Import MySQL Data action where you can import data from a MySQL Table, View or Stored Procedure literally with a few clicks within Excel. Following is a quick guide describing how to import data using MySQL for Excel. This guide assumes you already have a working MySQL Server instance, Microsoft Office Excel 2007 or 2010 and MySQL for Excel installed. 1. Opening MySQL for Excel Being an Excel Add-In, MySQL for Excel is opened from within Excel, so to use it open Excel, go to the Data tab located in the Ribbon and click MySQL for Excel at the far right of the Ribbon. 2. Creating a MySQL Connection (may be optional) If you have MySQL Workbench installed you will automatically see the same connections that you can see in MySQL Workbench, so you can use any of those and there may be no need to create a new connection. If you want to create a new connection (which normally you will do only once), in the Welcome Panel click New Connection, which opens the Setup New Connection dialog. Here you only need to give your new connection a distinctive Connection Name, specify the Hostname (or IP address) where the MySQL Server instance is running on (if different than localhost), the Port to connect to and the Username for the login. If you wish to test if your setup is good to go, click Test Connection and an information dialog will pop-up stating if the connection is successful or errors were found. 3.Opening a connection to a MySQL Server To open a pre-configured connection to a MySQL Server you just need to double-click it, so the Connection Password dialog is displayed where you enter the password for the login. 4. Selecting a MySQL Schema After opening a connection to a MySQL Server, the Schema Selection Panel is shown, where you can select the Schema that contains the Tables, Views and Stored Procedures you want to work with. To do so, you just need to either double-click the desired Schema or select it and click Next >. 5. Importing data… All previous steps were really the basic minimum needed to drill-down to the DB Object Selection Panel where you can see the Database Objects (grouped by type: Tables, Views and Procedures in that order) that you want to perform actions against; in the case of this guide, the action of importing data from them. a. From a MySQL Table To import from a Table you just need to select it from the list of Database Objects’ Tables group, after selecting it you will note actions below the list become available; then click Import MySQL Data. The Import Data dialog is displayed; you can see some basic information here like the name of the Excel worksheet the data will be imported to (in the window title), the Table Name, the total Row Count and a 10 row preview of the data meant for the user to see the columns that the table contains and to provide a way to select which columns to import. The Import Data dialog is designed with defaults in place so all data is imported (all rows and all columns) by just clicking Import; this is important to minimize the number of clicks needed to get the job done. After the import is performed you will have the data in the Excel worksheet formatted automatically. If you need to override the defaults in the Import Data dialog to change the columns selected for import or to change the number of imported rows you can easily do so before clicking Import. In the screenshot below the defaults are overridden to import only the first 3 columns and rows 10 – 60 (Limit to 50 Rows and Start with Row 10). If the number of rows to be imported exceeds the maximum number of rows Excel can hold in its worksheet, a warning will be displayed in the dialog, meaning the imported number of rows will be limited by that maximum number (65,535 rows if the worksheet is in Compatibility Mode). In the screenshot below you can see the Table contains 80,559 rows, but only 65,534 rows will be imported since the first row is used for the column names if the Include Column Names as Headers checkbox is checked. b. From a MySQL View Similar to the way of importing from a Table, to import from a View you just need to select it from the list of Database Objects’ Views group, then click Import MySQL Data. The Import Data dialog is displayed; identically to the way everything looks when importing from a table, the dialog displays the View Name, the total Row Count and the data preview grid. Since Views are really a filtered way to display data from Tables, it is actually as if we are extracting data from a Table; so the Import Data dialog is actually identical for those 2 Database Objects. After the import is performed, the data in the Excel spreadsheet looks like the following screenshot. Note that you can override the defaults in the Import Data dialog in the same way described above for importing data from Tables. Also the Compatibility Mode warning will be displayed if data exceeds the maximum number of rows explained before. c. From a MySQL Procedure Too import from a Procedure you just need to select it from the list of Database Objects’ Procedures group (note you can see Procedures here but not Functions since these return a single value, so by design they are filtered out). After the selection is made, click Import MySQL Data. The Import Data dialog is displayed, but this time you can see it looks different to the one used for Tables and Views. Given the nature of Store Procedures, they require first that values are supplied for its Parameters and also Procedures can return multiple Result Sets; so the Import Data dialog shows the Procedure Name and the Procedure Parameters in a grid where their values are input. After you supply the Parameter Values click Call. After calling the Procedure, the Result Sets returned by it are displayed at the bottom of the dialog; output parameters and the return value of the Procedure are appended as the last Result Set of the group. You can see each Result Set is displayed as a tab so you can see a preview of the returned data. You can specify if you want to import the Selected Result Set (default), All Result Sets – Arranged Horizontally or All Result Sets – Arranged Vertically using the Import drop-down list; then click Import. After the import is performed, the data in the Excel spreadsheet looks like the following screenshot. Note in this example all Result Sets were imported and arranged vertically. As you can see using MySQL for Excel importing data from a MySQL database becomes an easy task that requires very little technical knowledge, so it can be done by any type of user. Hope you enjoyed this guide! Remember that your feedback is very important for us, so drop us a message: MySQL on Windows (this) Blog - https://blogs.oracle.com/MySqlOnWindows/ Forum - http://forums.mysql.com/list.php?172 Facebook - http://www.facebook.com/mysql Cheers!

Read the article
Fixing a SkyDrive Sync Disaster

- by Rick Strahl

For a few months I've been using SkyDrive to handle some basic synching tasks for a number of folders of mine. Specifically I've been dumping a few of my development folders into sky drive so I have a live running backup. It had been working just fine until about a week ago when something went awry. Badly! The idea is that the SkyDrive should sync files, but somewhere in its sync relationship it appears that SkyDrive got confused and assumed it needed to sync back older files to my local machine from the SkyDrive server. So rather than syncing my newer files to the server SkyDrive was pushing older files back to me. Because SkyDrive is so slow actually updating data it's not unusual for SkyDrive to be far behind in syncing and apparently some files were out of date by several months. Of course this is insidious because I didn't notice it for quite some time. I'd been happily working away on my files when a few days ago I noted a bunch of files with -RasXps (my machine name) popping up in various folders. At first I thought my Git repository was giving me a fit, but eventually realized that SkyDrive was actually pushing old files into my monitored folders. To be fair SkyDrive did make backups of the existing files, but by the time I caught it there were literally a few thousand files scattered on my machine that were now updated with old files from online. Here's what some of this looks like: If you look at the directory list you see a bunch of files with a -RasXps postfix appended to them. Those are the files that SkyDrive replaced and backed up on my machine. As you can see the backed up files are actually newer than the ones it pulled from the online SkyDrive. Unless I modified the files after they were updated they all were older than the existing local files. Not exactly how I imagined my synching would work. At first I started cleaning up this mess manually. In most cases the obvious solution was to simply delete the original file and replace with the -RasXps file, but not in all files. Some scrutiny was required and besides being a pain in the ass to rename files, quite frequently I had to dig out Beyond Compare to compare a few files where it wasn't quite clear what's wrong. I quickly realized that doing this by hand would be too hard for the large number of files that got hosed. Hacking together a small .NET Utility So, I figured the easiest way to tackle this is to write a small utility app that shows me all the mangled files that have backups, allows me to compare them and then quickly select and update them, removing the -RasXps file after choosing one of the two files. What I ended up with was a quick and dirty WinForms app that allows me to pick a root folder, and then shows all the -MachineName files: I start by picking a base folder and a template to search for - typically the -MachineName. Clicking Go brings up a list of all files in that folder and its subdirectories. The list also displays the dates for the saved (-MachineName) file and the current file on disk, along with highlighting for the newer of the two. I can right click on any file and get a context menu pop up to open the folder in Explorer, or open Beyond Compare and view the two files to compare differences which I found very helpful for a number of files where I had modified the files after SkyDrive had updated to an old one. Typically these would be the green files (of which there were thankfully few). To 'fix' files I can select any number of files in the list, then use one of the three buttons on the right to apply an operation. I can use the Saved files - that is the backup file that SkyDrive created with the -MachineName extension (-RasXps above). Or I can use the current file, which is the file with the right name on disk right now and delete the -MachineName file. Or on some occasions I can just opt to delete both of them. For some files like binaries it's often easier to just delete and them be rebuild than choosing. For the most part the process involves accepting the pink files, and checking the few green files and see if any modifications were made since the file was updated incorrectly by SkyDrive. For me luckily those are few in number. Anyways, I thought I share this utility in case anybody else runs into this issue. I've included the VS2012 solution and all the source code so you can see how it works and you can tweak it as needed. The .NET 4.5 binaries are also included if you can't compile. Be warned though! This rough code is provided as is and makes no guarantees or claims about file safety. All three of the action buttons on the form will delete data. It's a very rough utility and there are no safeguards that ask nicely before deleting files. I highly recommend you make a backup before you have at it. This tools is very narrow in focus, but it might also work with other sync issues from other vendors. I seem to remember that I had similar issues with SugarSync at some point and it too created the -MachineName style files on sync conflicts. Hope this helps somebody out so you can avoid wasting the better part of a full work day on this… Resources Download the Source Code and Binaries for SkyDrive Rescue© Rick Strahl, West Wind Technologies, 2005-2013Posted in Windows .NET Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
The APEX of Business Value...or...the Business Value of APEX? Oracle Cloud Takes Oracle APEX to New Heights!

- by Gene Eun

The attraction of Oracle Application Express (APEX) has increased tremendously with the recent launch of the Oracle Cloud. APEX already supported departmental development and deployment of business applications with minimal involvement from the IT department. Positioned as the ideal replacement for MS Access, APEX probably has managed better to capture the eye of developers and was used for enterprise application development at least as much as for the kind of tactical applications that Oracle strategically positioned it for. With APEX as PaaS from the Oracle Cloud, a leap is made to a much higher level of business value. Now the IT department is not even needed to make infrastructure available with a database running on it. All the business needs is a credit card. And the business application that is developed, managed and used from the cloud through a standard browser can now just as easily be accessed by users from around the world as by users from the business department itself. As a bonus – the development of the APEX application is also done in the cloud – with no special demands on the location or the enterprise access privileges of the developers. To sum it up: APEX from Oracle Cloud Database Service get the development environment up and running in minutes no involvement from the internal IT department required (not for infrastructure, platform, or development) superior availability and scalability is offered by Oracle users from anywhere in the world can be invited to access the application developers from anywhere in the world can participate in creating and maintaining the application In addition: because the Oracle Cloud platform is the same as the on-premise platform, you can still decide to move the APEX application between the cloud and the local environment – and back again. The REST-ful services that are available through APEX allow programmatic interaction with the database under the APEX application. That means that this database can be synchronized with on premise databases or data stores in (other) clouds. Through the Oracle Cloud Messaging Service, the APEX application can easily enter into asynchronous conversations with other APEX applications, Fusion Middleware applications (ADF, SOA, BPM) and any other type of REST-enabled application. In my opinion, now, for the first time perhaps, APEX offers the attraction to the business that has been suggested before: because of the cloud, all the business needs is a credit card (a budget of $175 per month), an internet-connection and a browser. Not like before, with a PC hidden under a desk or a database running somewhere in the data center. No matter how unattended: equipment is needed, power is consumed, the database needs to be kept running and if Oracle Database XE does not suffice, software licenses are required as well. And this set up always has a security challenge associated with it. The cloud fee for the Oracle Cloud Database Service includes infrastructure, power, licenses, availability, platform upgrades, a collection of reusable application components and the development and runtime environments containing the APEX platform. Of course this not only means that business departments can move quickly without having to convince their IT colleagues to move along – it also means that small organizations that do not even have IT colleagues can do the same. Getting tailored applications or applications up and running to get in touch with users and customers all over the world is now within easy reach for small outfits – without any investment. My misunderstanding For a long time, I was under the impression that the essence of APEX was that the business could create applications themselves – meaning that business ‘people’ would actually go into APEX to create the application. To me APEX was too much of a developers’ tool to see that happen – apart from the odd business analyst who missed his or her calling as an IT developer. Having looked at various other cloud based development offerings – including Force.com, Mendix, WaveMaker, WorkXpress, OrangeScape, Caspio and Cordys- I have come to realize my mistake. All these platforms are positioned for 'the business' but require a fair amount of coding and technical expertise. However, they make the business happy nevertheless, because they allow the business to completely circumvent the IT department. That is the essence. Not having to go through the red tape, not having to wait for IT staff who (justifiably) need weeks or months to provide an environment, not having to deal with administrators (again, justifiably) refusing to take on that 'strange environment'. Being able to think of an initiative and turn into action right away. The business does not have to build the application - it can easily hire some external developers or even that nerdy boy next door. They can get started, get an application up and running and invite users in – especially external users such as customers. They will worry later about upgrades and life cycle management and integration. To get applications up and running quickly and start turning ideas into action and results rightaway. That is the key selling point for all these cloud offerings, including APEX from the Cloud. And it is a compelling story. For APEX probably even more so than for the others. While I consider APEX a somewhat proprietary framework compared with ‘regular’ Java/JEE web development (or even .NET and PHP development), it is still far more open than most cloud environments. APEX is SQL and PL/SQL based – nothing special about those languages – and can run just as easily on site as in the cloud. It has been around since 2004 (that is not including several predecessors that fed straight into APEX) so it can be considered pretty mature. Oracle as a company seems pretty stable – so investments in its technology are bound to last for some time to come. By the way: neither APEX nor the other Cloud DevaaS offerings are targeted at creating applications with enormous life times. They fit into a trend of agile development and rapid life cycle management, with fairly light weight user interfaces that quickly adapt to taste, technology trends and functional requirements and that are easily replaced. APEX and ADF – a match made in heaven?! (or at least in the sky) Note that using APEX only for cloud based database with REST-ful Services is also a perfectly viable scenario: any UI – mobile or browser based – capable of consuming REST-ful services can be created against such a business tier. Creating an ADF Mobile application for example that runs aginst REST-ful services is a best practice for mobile development. Such REST-ful services can be consumed from any service provider – including the Cloud based APEX powered REST-ful services running against the Oracle Cloud Database Service! The ADF Mobile architecture overview can easily be morphed to fit the APEX services in – allowing for a cloud based mobile app: Want to learn more about Oracle Database Cloud Service or Oracle Cloud, just visit cloud.oracle.com or oracle.com/cloud. Repost of a blog entry by Rick Greenwald, Director of Product Management, Oracle Database Cloud Service.

Read the article
techniques for an AI for a highly cramped turn-based tactics game

- by Adam M.

I'm trying to write an AI for a tactics game in the vein of Final Fantasy Tactics or Vandal Hearts. I can't change the game rules in any way, only upgrade the AI. I have experience programming AI for classic board games (basically minimax and its variants), but I think the branching factor is too great for the approach to be reasonable here. I'll describe the game and some current AI flaws that I'd like to fix. I'd like to hear ideas for applicable techniques. I'm a decent enough programmer, so I only need the ideas, not an implementation (though that's always appreciated). I'd rather not expend effort chasing (too many) dead ends, so although speculation and brainstorming are good and probably helpful, I'd prefer to hear from somebody with actual experience solving this kind of problem. For those who know it, the game is the land battle mini-game in Sid Meier's Pirates! (2004) and you can skim/skip the next two paragraphs. For those who don't, here's briefly how it works. The battle is turn-based and takes place on a 16x16 grid. There are three terrain types: clear (no hindrance), forest (hinders movement, ranged attacks, and sight), and rock (impassible, but does not hinder attacks or sight). The map is randomly generated with roughly equal amounts of each type of terrain. Because there are many rock and forest tiles, movement is typically very cramped. This is tactically important. The terrain is not flat; higher terrain gives minor bonuses. The terrain is known to both sides. The player is always the attacker and the AI is always the defender, so it's perfectly valid for the AI to set up a defensive position and just wait. The player wins by killing all defenders or by getting a unit to the city gates (a tile on the other side of the map). There are very few units on each side, usually 4-8. Because of this, it's crucial not to take damage without gaining some advantage from it. Units can take multiple actions per turn. All units on one side move before any units on the other side. Order of execution is important, and interleaving of actions between units is often useful. Units have melee and ranged attacks. Melee attacks vary widely in strength; ranged attacks have the same strength but vary in range. The main challenges I face are these: Lots of useful move combinations start with a "useless" move that gains no immediate advantage, or even loses advantage, in order to set up a powerful flank attack in the future. And, since the player units are stronger and have longer range, the AI pretty much always has to take some losses before they can start to gain kills. The AI must be able to look ahead to distinguish between sacrificial actions that provide a future benefit and those that don't. Because the terrain is so cramped, most of the tactics come down to achieving good positioning with multiple units that work together to defend an area. For instance, two defenders can often dominate a narrow pass by positioning themselves so an enemy unit attempting to pass must expose itself to a flank attack. But one defender in the same pass would be useless, and three units can defend a slightly larger pass. Etc. The AI should be able to figure out where the player must go to reach the city gates and how to best position its few units to cover the approaches, shifting, splitting, or combining them appropriately as the player moves. Because flank attacks are extremely deadly (and engineering flank attacks is key to the player strategy), the AI should be competent at moving its units so that they cover each other's flanks unless the sacrifice of a unit would give a substantial benefit. They should also be able to force flank attacks on players, for instance by threatening a unit from two different directions such that responding to one threat exposes the flank to the other. The AI should attack if possible, but sometimes there are no good ways to approach the player's position. In that case, the AI should be able to recognize this and set up a defensive position of its own. But the AI shouldn't be vulnerable to a trivial exploit where the player repeatedly opens and closes a hole in his defense and shoots at the AI as it approaches and retreats. That is, the AI should ideally be able to recognize that the player is capable of establishing a solid defense of an area, even if the defense is not currently in place. (I suppose if a good unit allocation algorithm existed, as needed for the second bullet point, the AI could run it on the player units to see where they could defend.) Because it's important to choose a good order of action and interleave actions between units, it's not as simple as just finding the best move for each unit in turn. All of these can be accomplished with a minimax search in theory, but the search space is too large, so specialized techniques are needed. I thought about techniques such as influence mapping, but I don't see how to use the technique to great effect. I thought about assigning goals to the units. This can help them work together in some limited way, and the problem of "how do I accomplish this goal?" is easier to solve than "how do I win this battle?", but assigning good goals is a hard problem in itself, because it requires knowing whether the goal is achievable and whether it's a good use of resources. So, does anyone have specific ideas for techniques that can help cleverize this AI? Update: I found a related question on Stackoverflow: http://stackoverflow.com/questions/3133273/ai-for-a-final-fantasy-tactics-like-game The selected answer gives a decent approach to choosing between alternative actions, but it doesn't seem to have much ability to look into the future and discern beneficial sacrifices from wasteful ones. It also focuses on a single unit at a time and it's not clear how it could be extended to support cooperation between units in defending or attacking.

Read the article
How do I set up MVP for a Winforms solution?

- by JonWillis

Question moved from Stackoverflow - http://stackoverflow.com/questions/4971048/how-do-i-set-up-mvp-for-a-winforms-solution I have used MVP and MVC in the past, and I prefer MVP as it controls the flow of execution so much better in my opinion. I have created my infrastructure (datastore/repository classes) and use them without issue when hard coding sample data, so now I am moving onto the GUI and preparing my MVP. Section A I have seen MVP using the view as the entry point, that is in the views constructor method it creates the presenter, which in turn creates the model, wiring up events as needed. I have also seen the presenter as the entry point, where a view, model and presenter are created, this presenter is then given a view and model object in its constructor to wire up the events. As in 2, but the model is not passed to the presenter. Instead the model is a static class where methods are called and responses returned directly. Section B In terms of keeping the view and model in sync I have seen. Whenever a value in the view in changed, i.e. TextChanged event in .Net/C#. This fires a DataChangedEvent which is passed through into the model, to keep it in sync at all times. And where the model changes, i.e. a background event it listens to, then the view is updated via the same idea of raising a DataChangedEvent. When a user wants to commit changes a SaveEvent it fires, passing through into the model to make the save. In this case the model mimics the view's data and processes actions. Similar to #b1, however the view does not sync with the model all the time. Instead when the user wants to commit changes, SaveEvent is fired and the presenter grabs the latest details and passes them into the model. in this case the model does not know about the views data until it is required to act upon it, in which case it is passed all the needed details. Section C Displaying of business objects in the view, i.e. a object (MyClass) not primitive data (int, double) The view has property fields for all its data that it will display as domain/business objects. Such as view.Animals exposes a IEnumerable<IAnimal> property, even though the view processes these into Nodes in a TreeView. Then for the selected animal it would expose SelectedAnimal as IAnimal property. The view has no knowledge of domain objects, it exposes property for primitive/framework (.Net/Java) included objects types only. In this instance the presenter will pass an adapter object the domain object, the adapter will then translate a given business object into the controls visible on the view. In this instance the adapter must have access to the actual controls on the view, not just any view so becomes more tightly coupled. Section D Multiple views used to create a single control. i.e. You have a complex view with a simple model like saving objects of different types. You could have a menu system at the side with each click on an item the appropriate controls are shown. You create one huge view, that contains all of the individual controls which are exposed via the views interface. You have several views. You have one view for the menu and a blank panel. This view creates the other views required but does not display them (visible = false), this view also implements the interface for each view it contains (i.e. child views) so it can expose to one presenter. The blank panel is filled with other views (Controls.Add(myview)) and ((myview.visible = true). The events raised in these "child"-views are handled by the parent view which in turn pass the event to the presenter, and visa versa for supplying events back down to child elements. Each view, be it the main parent or smaller child views are each wired into there own presenter and model. You can literately just drop a view control into an existing form and it will have the functionality ready, just needs wiring into a presenter behind the scenes. Section E Should everything have an interface, now based on how the MVP is done in the above examples will affect this answer as they might not be cross-compatible. Everything has an interface, the View, Presenter and Model. Each of these then obviously has a concrete implementation. Even if you only have one concrete view, model and presenter. The View and Model have an interface. This allows the views and models to differ. The presenter creates/is given view and model objects and it just serves to pass messages between them. Only the View has an interface. The Model has static methods and is not created, thus no need for an interface. If you want a different model, the presenter calls a different set of static class methods. Being static the Model has no link to the presenter. Personal thoughts From all the different variations I have presented (most I have probably used in some form) of which I am sure there are more. I prefer A3 as keeping business logic reusable outside just MVP, B2 for less data duplication and less events being fired. C1 for not adding in another class, sure it puts a small amount of non unit testable logic into a view (how a domain object is visualised) but this could be code reviewed, or simply viewed in the application. If the logic was complex I would agree to an adapter class but not in all cases. For section D, i feel D1 creates a view that is too big atleast for a menu example. I have used D2 and D3 before. Problem with D2 is you end up having to write lots of code to route events to and from the presenter to the correct child view, and its not drag/drop compatible, each new control needs more wiring in to support the single presenter. D3 is my prefered choice but adds in yet more classes as presenters and models to deal with the view, even if the view happens to be very simple or has no need to be reused. i think a mixture of D2 and D3 is best based on circumstances. As to section E, I think everything having an interface could be overkill I already do it for domain/business objects and often see no advantage in the "design" by doing so, but it does help in mocking objects in tests. Personally I would see E2 as a classic solution, although have seen E3 used in 2 projects I have worked on previously. Question Am I implementing MVP correctly? Is there a right way of going about it? I've read Martin Fowler's work that has variations, and I remember when I first started doing MVC, I understood the concept, but could not originally work out where is the entry point, everything has its own function but what controls and creates the original set of MVC objects.

Read the article
Parent Objects

- by Ali Bahrami

Support for Parent Objects was added in Solaris 11 Update 1. The following material is adapted from the PSARC arc case, and the Solaris Linker and Libraries Manual. A "plugin" is a shared object, usually loaded via dlopen(), that is used by a program in order to allow the end user to add functionality to the program. Examples of plugins include those used by web browsers (flash, acrobat, etc), as well as mdb and elfedit modules. The object that loads the plugin at runtime is called the "parent object". Unlike most object dependencies, the parent is not identified by name, but by its status as the object doing the load. Historically, building a good plugin is has been more complicated than it should be: A parent and its plugin usually share a 2-way dependency: The plugin provides one or more routines for the parent to call, and the parent supplies support routines for use by the plugin for things like memory allocation and error reporting. It is a best practice to build all objects, including plugins, with the -z defs option, in order to ensure that the object specifies all of its dependencies, and is self contained. However: The parent is usually an executable, which cannot be linked to via the usual library mechanisms provided by the link editor. Even if the parent is a shared object, which could be a normal library dependency to the plugin, it may be desirable to build plugins that can be used by more than one parent, in which case embedding a dependency NEEDED entry for one of the parents is undesirable. The usual way to build a high quality plugin with -z defs uses a special mapfile provided by the parent. This mapfile defines the parent routines, specifying the PARENT attribute (see example below). This works, but is inconvenient, and error prone. The symbol table in the parent already describes what it makes available to plugins — ideally the plugin would obtain that information directly rather than from a separate mapfile. The new -z parent option to ld allows a plugin to link to the parent and access the parent symbol table. This differs from a typical dependency: No NEEDED record is created. The relationship is recorded as a logical connection to the parent, rather than as an explicit object name However, it operates in the same manner as any other dependency in terms of making symbols available to the plugin. When the -z parent option is used, the link-editor records the basename of the parent object in the dynamic section, using the new tag DT_SUNW_PARENT. This is an informational tag, which is not used by the runtime linker to locate the parent, but which is available for diagnostic purposes. The ld(1) manpage documentation for the -z parent option is: -z parent=object Specifies a "parent object", which can be an executable or shared object, against which to link the output object. This option is typically used when creating "plugin" shared objects intended to be loaded by an executable at runtime via the dlopen() function. The symbol table from the parent object is used to satisfy references from the plugin object. The use of the -z parent option makes symbols from the object calling dlopen() available to the plugin. Example For this example, we use a main program, and a plugin. The parent provides a function named parent_callback() for the plugin to call. The plugin provides a function named plugin_func() to the parent: % cat main.c #include <stdio.h> #include <dlfcn.h> #include <link.h> void parent_callback(void) { printf("plugin_func() has called parent_callback()\n"); } int main(int argc, char **argv) { typedef void plugin_func_t(void); void *hdl; plugin_func_t *plugin_func; if (argc != 2) { fprintf(stderr, "usage: main plugin\n"); return (1); } if ((hdl = dlopen(argv[1], RTLD_LAZY)) == NULL) { fprintf(stderr, "unable to load plugin: %s\n", dlerror()); return (1); } plugin_func = (plugin_func_t *) dlsym(hdl, "plugin_func"); if (plugin_func == NULL) { fprintf(stderr, "unable to find plugin_func: %s\n", dlerror()); return (1); } (*plugin_func)(); return (0); } % cat plugin.c #include <stdio.h> extern void parent_callback(void); void plugin_func(void) { printf("parent has called plugin_func() from plugin.so\n"); parent_callback(); } Building this in the traditional manner, without -zdefs: % cc -o main main.c % cc -G -o plugin.so plugin.c % ./main ./plugin.so parent has called plugin_func() from plugin.so plugin_func() has called parent_callback() As noted above, when building any shared object, the -z defs option is recommended, in order to ensure that the object is self contained and specifies all of its dependencies. However, the use of -z defs prevents the plugin object from linking due to the unsatisfied symbol from the parent object: % cc -zdefs -G -o plugin.so plugin.c Undefined first referenced symbol in file parent_callback plugin.o ld: fatal: symbol referencing errors. No output written to plugin.so A mapfile can be used to specify to ld that the parent_callback symbol is supplied by the parent object. % cat plugin.mapfile $mapfile_version 2 SYMBOL_SCOPE { global: parent_callback { FLAGS = PARENT }; }; % cc -zdefs -Mplugin.mapfile -G -o plugin.so plugin.c However, the -z parent option to ld is the most direct solution to this problem, allowing the plugin to actually link against the parent object, and obtain the available symbols from it. An added benefit of using -z parent instead of a mapfile, is that the name of the parent object is recorded in the dynamic section of the plugin, and can be displayed by the file utility: % cc -zdefs -zparent=main -G -o plugin.so plugin.c % elfdump -d plugin.so | grep PARENT [0] SUNW_PARENT 0xcc main % file plugin.so plugin.so: ELF 32-bit LSB dynamic lib 80386 Version 1, parent main, dynamically linked, not stripped % ./main ./plugin.so parent has called plugin_func() from plugin.so plugin_func() has called parent_callback() We can also observe this in elfedit plugins on Solaris systems running Solaris 11 Update 1 or newer: % file /usr/lib/elfedit/dyn.so /usr/lib/elfedit/dyn.so: ELF 32-bit LSB dynamic lib 80386 Version 1, parent elfedit, dynamically linked, not stripped, no debugging information available Related Other Work The GNU ld has an option named --just-symbols that can be used in a similar manner: --just-symbols=filename Read symbol names and their addresses from filename, but do not relocate it or include it in the output. This allows your output file to refer symbolically to absolute locations of memory defined in other programs. You may use this option more than once. -z parent is a higher level operation aimed specifically at simplifying the construction of high quality plugins. Although it employs the same operation, it differs from --just symbols in 2 significant ways: There can only be one parent. The parent is recorded in the created object, and can be displayed by 'file', or other similar tools.

Read the article
SSIS: Deploying OLAP cubes using C# script tasks and AMO

- by DrJohn

As part of the continuing series on Building dynamic OLAP data marts on-the-fly, this blog entry will focus on how to automate the deployment of OLAP cubes using SQL Server Integration Services (SSIS) and Analysis Services Management Objects (AMO). OLAP cube deployment is usually done using the Analysis Services Deployment Wizard. However, this option was dismissed for a variety of reasons. Firstly, invoking external processes from SSIS is fraught with problems as (a) it is not always possible to ensure SSIS waits for the external program to terminate; (b) we cannot log the outcome properly and (c) it is not always possible to control the server's configuration to ensure the executable works correctly. Another reason for rejecting the Deployment Wizard is that it requires the 'answers' to be written into four XML files. These XML files record the three things we need to change: the name of the server, the name of the OLAP database and the connection string to the data mart. Although it would be reasonably straight forward to change the content of the XML files programmatically, this adds another set of complication and level of obscurity to the overall process. When I first investigated the possibility of using C# to deploy a cube, I was surprised to find that there are no other blog entries about the topic. I can only assume everyone else is happy with the Deployment Wizard! SSIS "forgets" assembly references If you build your script task from scratch, you will have to remember how to overcome one of the major annoyances of working with SSIS script tasks: the forgetful nature of SSIS when it comes to assembly references. Basically, you can go through the process of adding an assembly reference using the Add Reference dialog, but when you close the script window, SSIS "forgets" the assembly reference so the script will not compile. After repeating the operation several times, you will find that SSIS only remembers the assembly reference when you specifically press the Save All icon in the script window. This problem is not unique to the AMO assembly and has certainly been a "feature" since SQL Server 2005, so I am not amazed it is still present in SQL Server 2008 R2! Sample Package So let's take a look at the sample SSIS package I have provided which can be downloaded from here: DeployOlapCubeExample.zip Below is a screenshot after a successful run. Connection Managers The package has three connection managers: AsDatabaseDefinitionFile is a file connection manager pointing to the .asdatabase file you wish to deploy. Note that this can be found in the bin directory of you OLAP database project once you have clicked the "Build" button in Visual Studio TargetOlapServerCS is an Analysis Services connection manager which identifies both the deployment server and the target database name. SourceDataMart is an OLEDB connection manager pointing to the data mart which is to act as the source of data for your cube. This will be used to replace the connection string found in your .asdatabase file Once you have configured the connection managers, the sample should run and deploy your OLAP database in a few seconds. Of course, in a production environment, these connection managers would be associated with package configurations or set at runtime. When you run the sample, you should see that the script logs its activity to the output screen (see screenshot above). If you configure logging for the package, then these messages will also appear in your SSIS logging. Sample Code Walkthrough Next let's walk through the code. The first step is to parse the connection string provided by the TargetOlapServerCS connection manager and obtain the name of both the target OLAP server and also the name of the OLAP database. Note that the target database does not have to exist to be referenced in an AS connection manager, so I am using this as a convenient way to define both properties. We now connect to the server and check for the existence of the OLAP database. If it exists, we drop the database so we can re-deploy. svr.Connect(olapServerName); if (svr.Connected) { // Drop the OLAP database if it already exists Database db = svr.Databases.FindByName(olapDatabaseName); if (db != null) { db.Drop(); } // rest of script } Next we start building the XMLA command that will actually perform the deployment. Basically this is a small chuck of XML which we need to wrap around the large .asdatabase file generated by the Visual Studio build process. // Start generating the main part of the XMLA command XmlDocument xmlaCommand = new XmlDocument(); xmlaCommand.LoadXml(string.Format("<Batch Transaction='false' xmlns='http://schemas.microsoft.com/analysisservices/2003/engine'><Alter AllowCreate='true' ObjectExpansion='ExpandFull'><Object><DatabaseID>{0}</DatabaseID></Object><ObjectDefinition/></Alter></Batch>", olapDatabaseName)); Next we need to merge two XML files which we can do by simply using setting the InnerXml property of the ObjectDefinition node as follows: // load OLAP Database definition from .asdatabase file identified by connection manager XmlDocument olapCubeDef = new XmlDocument(); olapCubeDef.Load(Dts.Connections["AsDatabaseDefinitionFile"].ConnectionString); // merge the two XML files by obtain a reference to the ObjectDefinition node oaRootNode.InnerXml = olapCubeDef.InnerXml; One hurdle I had to overcome was removing detritus from the .asdabase file left by the Visual Studio build. Through an iterative process, I found I needed to remove several nodes as they caused the deployment to fail. The XMLA error message read "Cannot set read-only node: CreatedTimestamp" or similar. In comparing the XMLA generated with by the Deployment Wizard with that generated by my code, these read-only nodes were missing, so clearly I just needed to strip them out. This was easily achieved using XPath to find the relevant XML nodes, of which I show one example below: foreach (XmlNode node in rootNode.SelectNodes("//ns1:CreatedTimestamp", nsManager)) { node.ParentNode.RemoveChild(node); } Now we need to change the database name in both the ID and Name nodes using code such as: XmlNode databaseID = xmlaCommand.SelectSingleNode("//ns1:Database/ns1:ID", nsManager); if (databaseID != null) databaseID.InnerText = olapDatabaseName; Finally we need to change the connection string to point at the relevant data mart. Again this is easily achieved using XPath to search for the relevant nodes and then replace the content of the node with the new name or connection string. XmlNode connectionStringNode = xmlaCommand.SelectSingleNode("//ns1:DataSources/ns1:DataSource/ns1:ConnectionString", nsManager); if (connectionStringNode != null) { connectionStringNode.InnerText = Dts.Connections["SourceDataMart"].ConnectionString; } Finally we need to perform the deployment using the Execute XMLA command and check the returned XmlaResultCollection for errors before setting the Dts.TaskResult. XmlaResultCollection oResults = svr.Execute(xmlaCommand.InnerXml); // check for errors during deployment foreach (Microsoft.AnalysisServices.XmlaResult oResult in oResults) { foreach (Microsoft.AnalysisServices.XmlaMessage oMessage in oResult.Messages) { if ((oMessage.GetType().Name == "XmlaError")) { FireError(oMessage.Description); HadError = true; } } } If you are not familiar with XML programming, all this may all seem a bit daunting, but perceiver as the sample code is pretty short. If you would like the script to process the OLAP database, simply uncomment the lines in the vicinity of Process method. Of course, you can extend the script to perform your own custom processing and to even synchronize the database to a front-end server. Personally, I like to keep the deployment and processing separate as the code can become overly complex for support staff.If you want to know more, come see my session at the forthcoming SQLBits conference.

Read the article
The Data Scientist

- by BuckWoody

A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

Read the article
Converting projects to use Automatic NuGet restore

- by terje

Originally posted on: http://geekswithblogs.net/terje/archive/2014/06/11/converting-projects-to-use-automatic-nuget-restore.aspxDownload tool In version 2.7 of NuGet automatic nuget restore was introduced, meaning you no longer need to distort your msbuild project files with nuget target information. Visual Studio and TFS 2013 build have this enabled by default. However, if your project was created before this was introduced, and/or if you have used the “Enable NuGet Package Restore” afterwards, you now have a series of unwanted things in your projects, and a series of project files that have been modified – and – you no longer neither want nor need this ! You might also get into some unwanted issues due to these modifications. This is a MSBuild modification that was needed only before NuGet 2.7 ! So: DON’T USE THIS FUNCTION !!! There is an issue https://nuget.codeplex.com/workitem/4019 on this on the NuGet project site to get this function removed, renamed or at least moved farther away from the top level (please help vote it up!). The response seems to be that it WILL BE removed, around version 3.0. This function does nothing you need after the introduction of NuGet 2.7. What is also unfortunate is the naming of it – it implies that it is needed, it is not, and what is worse, there is no corresponding function to remove what it does ! So to fix this use the tool named IFix, that will fix this issue for you - all free of course, and the code is open source. Also report issues there: https://github.com/OsirisTerje/IFix IFix information DOWNLOAD HERE This command line tool installs using an MSI, and add itself to the system path. If you work in a team, you will probably need to use the tool multiple times. Anyone in the team may at any time use the “Enable NuGet Package Restore” function and mess up your project again. The IFix program can be run either in a check modus, where it does not write anything back – it only checks if you have any issues, or in a Fix mode, where it will also perform the necessary fixes for you. The IFix program is used like this: IFix <command> [-c/--check] [-f/--fix] [-v/--verbose] The command in this case is “nugetrestore”. It will do a check from the location where it is being called, and run through all subfolders from that location. So “IFix nugetrestore --check” , will do the check , and “IFix nugetrestore --fix” will perform the changes, for all files and folders below the current working directory. (Note that --check can be replaced with only –c, and --fix with –f, and so on. ) BEWARE: When you run the fix option, all solutions to be affected must be closed in Visual Studio ! So, if you just want to DO it, then: IFix nugetrestore --check to see if you have issues then IFix nugetrestore --fix to fix them. How does it work IFix nugetrestore checks and optionally fixes four issues that the older enabling of nuget restore did. The issues are related to the MSBuild projess, and are: Deleting the nuget.targets file. Deleting the nuget.exe that is located under the .nuget folder Removing all references to nuget.targets in the solution file Removing all properties and target imports of nuget.targets inside the csproj files. IFix fixes these issues in the same sequence. The first step, removing the nuget.targets file is the most critical one, and all instances of the nuget.targets file within the scope of a solution has to be removed, and in addition it has to be done with the solution closed in Visual Studio. If Visual Studio finds a nuget.targets file, the csproj files will be automatically messed up again. This means the removal process above might need to be done multiple times, specially when you’re working with a team, and that solution context menu still has the “Enable NuGet Package Restore” function. Someone on the team might inadvertently do this at any time. It can be a good idea to add this check to a checkin policy – if you run TFS standard version control, but that will have no effect if you use TFS Git version control of course. So, better be prepared to run the IFix check from time to time. Or, even better, install IFix on your build servers, and add a call to IFix nugetrestore --check in the TFS Build script. How does it look As a first example I have run the IFix program from the top of a set of git repositories, so it spans multiple repositories with multiple solutions. The result from the check option is as follows: We see the four red lines, there is one for each of the four checks we talked about in the previous section. The fact that they are red, means we have that particular issue. The first section (above the first red text line) is the nuget targets section. Notice No.1, it says it has found no paths to copy. What IFix does here is to check if there are any defined paths to other nuget galleries. If there are, then those are copied over to the nuget.config file, where is where it should be in version 2.7 and above. No.2 says it has found the particular nuget.targets file, No.3 states it HAS found some other nuget galleries defines in the targets file, which then it would like to copy to the config.file. No.4 is the section for nuget.exe files, and list those it has found, and which it would like to delete. No 5 states it has found a reference to nuget.targets in the solution file. This reference comes from the fact that the .nuget folder is a solution folder, and the items within are described in the solution file. It then checks the csproj files, and as can be seen from the last red line, it ha found issues in 96 out of 198 csproj files. There are two possible issues in a csproj files. No.6 is the first one, and the most common and most important one, an “Import project” section. This is the section that calls the nuget.targets files. No.7 is another issue, which seems to sometimes be there, sometimes not, it is a RestorePackages property, which also should go away. Now, if we run the IFix nugetrestore –fix command, and then the check again after that, the result is: All green !

Read the article
T-SQL Tuesday #31 - Logging Tricks with CONTEXT_INFO

- by Most Valuable Yak (Rob Volk)

This month's T-SQL Tuesday is being hosted by Aaron Nelson [b | t], fellow Atlantan (the city in Georgia, not the famous sunken city, or the resort in the Bahamas) and covers the topic of logging (the recording of information, not the harvesting of trees) and maintains the fine T-SQL Tuesday tradition begun by Adam Machanic [b | t] (the SQL Server guru, not the guy who fixes cars, check the spelling again, there will be a quiz later). This is a trick I learned from Fernando Guerrero [b | t] waaaaaay back during the PASS Summit 2004 in sunny, hurricane-infested Orlando, during his session on Secret SQL Server (not sure if that's the correct title, and I haven't used parentheses in this paragraph yet). CONTEXT_INFO is a neat little feature that's existed since SQL Server 2000 and perhaps even earlier. It lets you assign data to the current session/connection, and maintains that data until you disconnect or change it. In addition to the CONTEXT_INFO() function, you can also query the context_info column in sys.dm_exec_sessions, or even sysprocesses if you're still running SQL Server 2000, if you need to see it for another session. While you're limited to 128 bytes, one big advantage that CONTEXT_INFO has is that it's independent of any transactions. If you've ever logged to a table in a transaction and then lost messages when it rolled back, you can understand how aggravating it can be. CONTEXT_INFO also survives across multiple SQL batches (GO separators) in the same connection, so for those of you who were going to suggest "just log to a table variable, they don't get rolled back": HA-HA, I GOT YOU! Since GO starts a new batch all variable declarations are lost. Here's a simple example I recently used at work. I had to test database mirroring configurations for disaster recovery scenarios and measure the network throughput. I also needed to log how long it took for the script to run and include the mirror settings for the database in question. I decided to use AdventureWorks as my database model, and Adam Machanic's Big Adventure script to provide a fairly large workload that's repeatable and easily scalable. My test would consist of several copies of AdventureWorks running the Big Adventure script while I mirrored the databases (or not). Since Adam's script contains several batches, I decided CONTEXT_INFO would have to be used. As it turns out, I only needed to grab the start time at the beginning, I could get the rest of the data at the end of the process. The code is pretty small: declare @time binary(128)=cast(getdate() as binary(8)) set context_info @time ... rest of Big Adventure code ... go use master; insert mirror_test(server,role,partner,db,state,safety,start,duration) select @@servername, mirroring_role_desc, mirroring_partner_instance, db_name(database_id), mirroring_state_desc, mirroring_safety_level_desc, cast(cast(context_info() as binary(8)) as datetime), datediff(s,cast(cast(context_info() as binary(8)) as datetime),getdate()) from sys.database_mirroring where db_name(database_id) like 'Adv%'; I declared @time as a binary(128) since CONTEXT_INFO is defined that way. I couldn't convert GETDATE() to binary(128) as it would pad the first 120 bytes as 0x00. To keep the CAST functions simple and avoid using SUBSTRING, I decided to CAST GETDATE() as binary(8) and let SQL Server do the implicit conversion. It's not the safest way perhaps, but it works on my machine. :) As I mentioned earlier, you can query system views for sessions and get their CONTEXT_INFO. With a little boilerplate code this can be used to monitor long-running procedures, in case you need to kill a process, or are just curious how long certain parts take. In this example, I added code to Adam's Big Adventure script to set CONTEXT_INFO messages at strategic places I want to monitor. (His code is in UPPERCASE as it was in the original, mine is all lowercase): declare @msg binary(128) set @msg=cast('Altering bigProduct.ProductID' as binary(128)) set context_info @msg go ALTER TABLE bigProduct ALTER COLUMN ProductID INT NOT NULL GO set context_info 0x0 go declare @msg1 binary(128) set @msg1=cast('Adding pk_bigProduct Constraint' as binary(128)) set context_info @msg1 go ALTER TABLE bigProduct ADD CONSTRAINT pk_bigProduct PRIMARY KEY (ProductID) GO set context_info 0x0 go declare @msg2 binary(128) set @msg2=cast('Altering bigTransactionHistory.TransactionID' as binary(128)) set context_info @msg2 go ALTER TABLE bigTransactionHistory ALTER COLUMN TransactionID INT NOT NULL GO set context_info 0x0 go declare @msg3 binary(128) set @msg3=cast('Adding pk_bigTransactionHistory Constraint' as binary(128)) set context_info @msg3 go ALTER TABLE bigTransactionHistory ADD CONSTRAINT pk_bigTransactionHistory PRIMARY KEY NONCLUSTERED(TransactionID) GO set context_info 0x0 go declare @msg4 binary(128) set @msg4=cast('Creating IX_ProductId_TransactionDate Index' as binary(128)) set context_info @msg4 go CREATE NONCLUSTERED INDEX IX_ProductId_TransactionDate ON bigTransactionHistory(ProductId,TransactionDate) INCLUDE(Quantity,ActualCost) GO set context_info 0x0 This doesn't include the entire script, only those portions that altered a table or created an index. One annoyance is that SET CONTEXT_INFO requires a literal or variable, you can't use an expression. And since GO starts a new batch I need to declare a variable in each one. And of course I have to use CAST because it won't implicitly convert varchar to binary. And even though context_info is a nullable column, you can't SET CONTEXT_INFO NULL, so I have to use SET CONTEXT_INFO 0x0 to clear the message after the statement completes. And if you're thinking of turning this into a UDF, you can't, although a stored procedure would work. So what does all this aggravation get you? As the code runs, if I want to see which stage the session is at, I can run the following (assuming SPID 51 is the one I want): select CAST(context_info as varchar(128)) from sys.dm_exec_sessions where session_id=51 Since SQL Server 2005 introduced the new system and dynamic management views (DMVs) there's not as much need for tagging a session with these kinds of messages. You can get the session start time and currently executing statement from them, and neatly presented if you use Adam's sp_whoisactive utility (and you absolutely should be using it). Of course you can always use xp_cmdshell, a CLR function, or some other tricks to log information outside of a SQL transaction. All the same, I've used this trick to monitor long-running reports at a previous job, and I still think CONTEXT_INFO is a great feature, especially if you're still using SQL Server 2000 or want to supplement your instrumentation. If you'd like an exercise, consider adding the system time to the messages in the last example, and an automated job to query and parse it from the system tables. That would let you track how long each statement ran without having to run Profiler. #TSQL2sDay

Read the article
How can I keep the cpu temp low?

- by Newton

I have an HP pavilion dv7, I'm using ubuntu 12.04 so the overheating problem with sandybridge cpu is a lot better. However my laptop is still becoming too hot to keep on my legs. The problem is that the fan wait too much before starting, so the medium temp is too hight. When I'm using windows 7 the laptop is room-temperature cold, I've absolutely no problem. On windows the fan is always spinning very low & very silently so the heat is continuously removed, without reaching an unconfortable temp. How can I force the computer to act like that also on ubuntu? PS The bios can't let me control this kind of thing, and this is my experience with lm-sensors and fancontrol al@notebook:~$ sudo sensors-detect [sudo] password for al: # sensors-detect revision 5984 (2011-07-10 21:22:53 +0200) # System: Hewlett-Packard HP Pavilion dv7 Notebook PC (laptop) # Board: Hewlett-Packard 1800 This program will help you determine which kernel modules you need to load to use lm_sensors most effectively. It is generally safe and recommended to accept the default answers to all questions, unless you know what you're doing. Some south bridges, CPUs or memory controllers contain embedded sensors. Do you want to scan for them? This is totally safe. (YES/no): y Module cpuid loaded successfully. Silicon Integrated Systems SIS5595... No VIA VT82C686 Integrated Sensors... No VIA VT8231 Integrated Sensors... No AMD K8 thermal sensors... No AMD Family 10h thermal sensors... No AMD Family 11h thermal sensors... No AMD Family 12h and 14h thermal sensors... No AMD Family 15h thermal sensors... No AMD Family 15h power sensors... No Intel digital thermal sensor... Success! (driver `coretemp') Intel AMB FB-DIMM thermal sensor... No VIA C7 thermal sensor... No VIA Nano thermal sensor... No Some Super I/O chips contain embedded sensors. We have to write to standard I/O ports to probe them. This is usually safe. Do you want to scan for Super I/O sensors? (YES/no): y Probing for Super-I/O at 0x2e/0x2f Trying family `National Semiconductor/ITE'... No Trying family `SMSC'... No Trying family `VIA/Winbond/Nuvoton/Fintek'... No Trying family `ITE'... No Probing for Super-I/O at 0x4e/0x4f Trying family `National Semiconductor/ITE'... Yes Found unknown chip with ID 0x8518 Some hardware monitoring chips are accessible through the ISA I/O ports. We have to write to arbitrary I/O ports to probe them. This is usually safe though. Yes, you do have ISA I/O ports even if you do not have any ISA slots! Do you want to scan the ISA I/O ports? (YES/no): y Probing for `National Semiconductor LM78' at 0x290... No Probing for `National Semiconductor LM79' at 0x290... No Probing for `Winbond W83781D' at 0x290... No Probing for `Winbond W83782D' at 0x290... No Lastly, we can probe the I2C/SMBus adapters for connected hardware monitoring devices. This is the most risky part, and while it works reasonably well on most systems, it has been reported to cause trouble on some systems. Do you want to probe the I2C/SMBus adapters now? (YES/no): y Using driver `i2c-i801' for device 0000:00:1f.3: Intel Cougar Point (PCH) Module i2c-i801 loaded successfully. Module i2c-dev loaded successfully. Next adapter: i915 gmbus disabled (i2c-0) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 gmbus ssc (i2c-1) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 GPIOB (i2c-2) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 gmbus vga (i2c-3) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 GPIOA (i2c-4) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 gmbus panel (i2c-5) Do you want to scan it? (YES/no/selectively): y Client found at address 0x50 Probing for `Analog Devices ADM1033'... No Probing for `Analog Devices ADM1034'... No Probing for `SPD EEPROM'... No Probing for `EDID EEPROM'... Yes (confidence 8, not a hardware monitoring chip) Next adapter: i915 GPIOC (i2c-6) Do you want to scan it? (YES/no/selectively): y Client found at address 0x50 Probing for `Analog Devices ADM1033'... No Probing for `Analog Devices ADM1034'... No Probing for `SPD EEPROM'... No Probing for `EDID EEPROM'... Yes (confidence 8, not a hardware monitoring chip) Next adapter: i915 gmbus dpc (i2c-7) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 GPIOD (i2c-8) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 gmbus dpb (i2c-9) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 GPIOE (i2c-10) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 gmbus reserved (i2c-11) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 gmbus dpd (i2c-12) Do you want to scan it? (YES/no/selectively): y Next adapter: i915 GPIOF (i2c-13) Do you want to scan it? (YES/no/selectively): y Next adapter: DPDDC-B (i2c-14) Do you want to scan it? (YES/no/selectively): y Now follows a summary of the probes I have just done. Just press ENTER to continue: Driver `coretemp': * Chip `Intel digital thermal sensor' (confidence: 9) To load everything that is needed, add this to /etc/modules: #----cut here---- # Chip drivers coretemp #----cut here---- If you have some drivers built into your kernel, the list above will contain too many modules. Skip the appropriate ones! Do you want to add these lines automatically to /etc/modules? (yes/NO)y Successful! Monitoring programs won't work until the needed modules are loaded. You may want to run 'service module-init-tools start' to load them. Unloading i2c-dev... OK Unloading i2c-i801... OK Unloading cpuid... OK al@notebook:~$ sudo /etc/init.d/module-init-tools restart Rather than invoking init scripts through /etc/init.d, use the service(8) utility, e.g. service module-init-tools restart Since the script you are attempting to invoke has been converted to an Upstart job, you may also use the stop(8) and then start(8) utilities, e.g. stop module-init-tools ; start module-init-tools. The restart(8) utility is also available. module-init-tools stop/waiting al@notebook:~$ sudo service module-init-tools restart stop: Unknown instance: module-init-tools stop/waiting al@notebook:~$ sudo service module-init-tools start module-init-tools stop/waiting al@notebook:~$ sudo pwmconfig # pwmconfig revision 5857 (2010-08-22) This program will search your sensors for pulse width modulation (pwm) controls, and test each one to see if it controls a fan on your motherboard. Note that many motherboards do not have pwm circuitry installed, even if your sensor chip supports pwm. We will attempt to briefly stop each fan using the pwm controls. The program will attempt to restore each fan to full speed after testing. However, it is ** very important ** that you physically verify that the fans have been to full speed after the program has completed. /usr/sbin/pwmconfig: There are no pwm-capable sensor modules installed Is my case too desperate?

Read the article
Anatomy of a .NET Assembly - PE Headers

- by Simon Cooper

Today, I'll be starting a look at what exactly is inside a .NET assembly - how the metadata and IL is stored, how Windows knows how to load it, and what all those bytes are actually doing. First of all, we need to understand the PE file format. PE files .NET assemblies are built on top of the PE (Portable Executable) file format that is used for all Windows executables and dlls, which itself is built on top of the MSDOS executable file format. The reason for this is that when .NET 1 was released, it wasn't a built-in part of the operating system like it is nowadays. Prior to Windows XP, .NET executables had to load like any other executable, had to execute native code to start the CLR to read & execute the rest of the file. However, starting with Windows XP, the operating system loader knows natively how to deal with .NET assemblies, rendering most of this legacy code & structure unnecessary. It still is part of the spec, and so is part of every .NET assembly. The result of this is that there are a lot of structure values in the assembly that simply aren't meaningful in a .NET assembly, as they refer to features that aren't needed. These are either set to zero or to certain pre-defined values, specified in the CLR spec. There are also several fields that specify the size of other datastructures in the file, which I will generally be glossing over in this initial post. Structure of a PE file Most of a PE file is split up into separate sections; each section stores different types of data. For instance, the .text section stores all the executable code; .rsrc stores unmanaged resources, .debug contains debugging information, and so on. Each section has a section header associated with it; this specifies whether the section is executable, read-only or read/write, whether it can be cached... When an exe or dll is loaded, each section can be mapped into a different location in memory as the OS loader sees fit. In order to reliably address a particular location within a file, most file offsets are specified using a Relative Virtual Address (RVA). This specifies the offset from the start of each section, rather than the offset within the executable file on disk, so the various sections can be moved around in memory without breaking anything. The mapping from RVA to file offset is done using the section headers, which specify the range of RVAs which are valid within that section. For example, if the .rsrc section header specifies that the base RVA is 0x4000, and the section starts at file offset 0xa00, then an RVA of 0x401d (offset 0x1d within the .rsrc section) corresponds to a file offset of 0xa1d. Because each section has its own base RVA, each valid RVA has a one-to-one mapping with a particular file offset. PE headers As I said above, most of the header information isn't relevant to .NET assemblies. To help show what's going on, I've created a diagram identifying all the various parts of the first 512 bytes of a .NET executable assembly. I've highlighted the relevant bytes that I will refer to in this post: Bear in mind that all numbers are stored in the assembly in little-endian format; the hex number 0x0123 will appear as 23 01 in the diagram. The first 64 bytes of every file is the DOS header. This starts with the magic number 'MZ' (0x4D, 0x5A in hex), identifying this file as an executable file of some sort (an .exe or .dll). Most of the rest of this header is zeroed out. The important part of this header is at offset 0x3C - this contains the file offset of the PE signature (0x80). Between the DOS header & PE signature is the DOS stub - this is a stub program that simply prints out 'This program cannot be run in DOS mode.\r\n' to the console. I will be having a closer look at this stub later on. The PE signature starts at offset 0x80, with the magic number 'PE\0\0' (0x50, 0x45, 0x00, 0x00), identifying this file as a PE executable, followed by the PE file header (also known as the COFF header). The relevant field in this header is in the last two bytes, and it specifies whether the file is an executable or a dll; bit 0x2000 is set for a dll. Next up is the PE standard fields, which start with a magic number of 0x010b for x86 and AnyCPU assemblies, and 0x20b for x64 assemblies. Most of the rest of the fields are to do with the CLR loader stub, which I will be covering in a later post. After the PE standard fields comes the NT-specific fields; again, most of these are not relevant for .NET assemblies. The one that is is the highlighted Subsystem field, and specifies if this is a GUI or console app - 0x20 for a GUI app, 0x30 for a console app. Data directories & section headers After the PE and COFF headers come the data directories; each directory specifies the RVA (first 4 bytes) and size (next 4 bytes) of various important parts of the executable. The only relevant ones are the 2nd (Import table), 13th (Import Address table), and 15th (CLI header). The Import and Import Address table are only used by the startup stub, so we will look at those later on. The 15th points to the CLI header, where the CLR-specific metadata begins. After the data directories comes the section headers; one for each section in the file. Each header starts with the section's ASCII name, null-padded to 8 bytes. Again, most of each header is irrelevant, but I've highlighted the base RVA and file offset in each header. In the diagram, you can see the following sections: .text: base RVA 0x2000, file offset 0x200 .rsrc: base RVA 0x4000, file offset 0xa00 .reloc: base RVA 0x6000, file offset 0x1000 The .text section contains all the CLR metadata and code, and so is by far the largest in .NET assemblies. The .rsrc section contains the data you see in the Details page in the right-click file properties page, but is otherwise unused. The .reloc section contains address relocations, which we will look at when we study the CLR startup stub. What about the CLR? As you can see, most of the first 512 bytes of an assembly are largely irrelevant to the CLR, and only a few bytes specify needed things like the bitness (AnyCPU/x86 or x64), whether this is an exe or dll, and the type of app this is. There are some bytes that I haven't covered that affect the layout of the file (eg. the file alignment, which determines where in a file each section can start). These values are pretty much constant in most .NET assemblies, and don't affect the CLR data directly. Conclusion To summarize, the important data in the first 512 bytes of a file is: DOS header. This contains a pointer to the PE signature. DOS stub, which we'll be looking at in a later post. PE signature PE file header (aka COFF header). This specifies whether the file is an exe or a dll. PE standard fields. This specifies whether the file is AnyCPU/32bit or 64bit. PE NT-specific fields. This specifies what type of app this is, if it is an app. Data directories. The 15th entry (at offset 0x168) contains the RVA and size of the CLI header inside the .text section. Section headers. These are used to map between RVA and file offset. The important one is .text, which is where all the CLR data is stored. In my next post, we'll start looking at the metadata used by the CLR directly, which is all inside the .text section.

Read the article
Dell Studio 1737 Overheating

- by Sean

I am using a Dell Studio 1737 laptop. I have been running Linux and have ran Windows recently for a very long time. I upgraded to the 10.10 distribution and since that distro, it seems that for some reason all Linuxes want to push my laptop to extremes. I have recently upgraded to Ubuntu 12.04 since I heart that it contains kernel fixes for overheating issues. 12.04 will actually eventually cool the system, but that is after the fans run to the point it sounds like a jet aircraft taking off and the laptop makes my hands sweat. In trying to combat the heat problems I have done the following: I installed the propriatery driver for my ATI Mobility HD 3600. I have tried both the one in the Additional Drivers and also tried ATI's latest greatest version. If I don't install this my laptop will overheat and shut off in minutes. Both seem to perform similarly, but the heat problem remains. I have tried limiting the CPU by installing the CPUFreq Indicator. This does help keep the machine from shutting off, but the heat is still uncomfortable to be around the machine. I usually run in power saver mode or run the cpu at 1.6 GHZ just to error on safety. I ran sensors-detect and here are the results: sean@sean-Studio-1737:~$ sudo sensors-detect # sensors-detect revision 5984 (2011-07-10 21:22:53 +0200) # System: Dell Inc. Studio 1737 (laptop) # Board: Dell Inc. 0F237N This program will help you determine which kernel modules you need to load to use lm_sensors most effectively. It is generally safe and recommended to accept the default answers to all questions, unless you know what you're doing. Some south bridges, CPUs or memory controllers contain embedded sensors. Do you want to scan for them? This is totally safe. (YES/no): y Module cpuid loaded successfully. Silicon Integrated Systems SIS5595... No VIA VT82C686 Integrated Sensors... No VIA VT8231 Integrated Sensors... No AMD K8 thermal sensors... No AMD Family 10h thermal sensors... No AMD Family 11h thermal sensors... No AMD Family 12h and 14h thermal sensors... No AMD Family 15h thermal sensors... No AMD Family 15h power sensors... No Intel digital thermal sensor... Success! (driver `coretemp') Intel AMB FB-DIMM thermal sensor... No VIA C7 thermal sensor... No VIA Nano thermal sensor... No Some Super I/O chips contain embedded sensors. We have to write to standard I/O ports to probe them. This is usually safe. Do you want to scan for Super I/O sensors? (YES/no): y Probing for Super-I/O at 0x2e/0x2f Trying family `National Semiconductor/ITE'... No Trying family `SMSC'... No Trying family `VIA/Winbond/Nuvoton/Fintek'... No Trying family `ITE'... No Probing for Super-I/O at 0x4e/0x4f Trying family `National Semiconductor/ITE'... Yes Found `ITE IT8512E/F/G Super IO' (but not activated) Some hardware monitoring chips are accessible through the ISA I/O ports. We have to write to arbitrary I/O ports to probe them. This is usually safe though. Yes, you do have ISA I/O ports even if you do not have any ISA slots! Do you want to scan the ISA I/O ports? (YES/no): y Probing for `National Semiconductor LM78' at 0x290... No Probing for `National Semiconductor LM79' at 0x290... No Probing for `Winbond W83781D' at 0x290... No Probing for `Winbond W83782D' at 0x290... No Lastly, we can probe the I2C/SMBus adapters for connected hardware monitoring devices. This is the most risky part, and while it works reasonably well on most systems, it has been reported to cause trouble on some systems. Do you want to probe the I2C/SMBus adapters now? (YES/no): y Using driver `i2c-i801' for device 0000:00:1f.3: Intel ICH9 Module i2c-i801 loaded successfully. Module i2c-dev loaded successfully. Now follows a summary of the probes I have just done. Just press ENTER to continue: Driver `coretemp': * Chip `Intel digital thermal sensor' (confidence: 9) To load everything that is needed, add this to /etc/modules: #----cut here---- # Chip drivers coretemp #----cut here---- If you have some drivers built into your kernel, the list above will contain too many modules. Skip the appropriate ones! Do you want to add these lines automatically to /etc/modules? (yes/NO)y Successful! Monitoring programs won't work until the needed modules are loaded. You may want to run 'service module-init-tools start' to load them. Unloading i2c-dev... OK Unloading i2c-i801... OK Unloading cpuid... OK sean@sean-Studio-1737:~$ sudo service module-init-tools start module-init-tools stop/waiting I also tried installing i8k but that didn't work since it didn't seem to be able to communicate with the hardware (probably for different kind of device). Also I ran acpi -V and here are the results: Battery 0: Full, 100% Battery 0: design capacity 613 mAh, last full capacity 260 mAh = 42% Adapter 0: on-line Thermal 0: ok, 49.0 degrees C Thermal 0: trip point 0 switches to mode critical at temperature 100.0 degrees C Thermal 1: ok, 48.0 degrees C Thermal 1: trip point 0 switches to mode critical at temperature 100.0 degrees C Thermal 2: ok, 51.0 degrees C Thermal 2: trip point 0 switches to mode critical at temperature 100.0 degrees C Cooling 0: LCD 0 of 15 Cooling 1: Processor 0 of 10 Cooling 2: Processor 0 of 10 I have hit a wall and don't know what to do now. Any advice is appreciated.

Read the article
T-SQL Tuesday #31 - Logging Tricks with CONTEXT_INFO

- by Most Valuable Yak (Rob Volk)

This month's T-SQL Tuesday is being hosted by Aaron Nelson [b | t], fellow Atlantan (the city in Georgia, not the famous sunken city, or the resort in the Bahamas) and covers the topic of logging (the recording of information, not the harvesting of trees) and maintains the fine T-SQL Tuesday tradition begun by Adam Machanic [b | t] (the SQL Server guru, not the guy who fixes cars, check the spelling again, there will be a quiz later). This is a trick I learned from Fernando Guerrero [b | t] waaaaaay back during the PASS Summit 2004 in sunny, hurricane-infested Orlando, during his session on Secret SQL Server (not sure if that's the correct title, and I haven't used parentheses in this paragraph yet). CONTEXT_INFO is a neat little feature that's existed since SQL Server 2000 and perhaps even earlier. It lets you assign data to the current session/connection, and maintains that data until you disconnect or change it. In addition to the CONTEXT_INFO() function, you can also query the context_info column in sys.dm_exec_sessions, or even sysprocesses if you're still running SQL Server 2000, if you need to see it for another session. While you're limited to 128 bytes, one big advantage that CONTEXT_INFO has is that it's independent of any transactions. If you've ever logged to a table in a transaction and then lost messages when it rolled back, you can understand how aggravating it can be. CONTEXT_INFO also survives across multiple SQL batches (GO separators) in the same connection, so for those of you who were going to suggest "just log to a table variable, they don't get rolled back": HA-HA, I GOT YOU! Since GO starts a new batch all variable declarations are lost. Here's a simple example I recently used at work. I had to test database mirroring configurations for disaster recovery scenarios and measure the network throughput. I also needed to log how long it took for the script to run and include the mirror settings for the database in question. I decided to use AdventureWorks as my database model, and Adam Machanic's Big Adventure script to provide a fairly large workload that's repeatable and easily scalable. My test would consist of several copies of AdventureWorks running the Big Adventure script while I mirrored the databases (or not). Since Adam's script contains several batches, I decided CONTEXT_INFO would have to be used. As it turns out, I only needed to grab the start time at the beginning, I could get the rest of the data at the end of the process. The code is pretty small: declare @time binary(128)=cast(getdate() as binary(8)) set context_info @time ... rest of Big Adventure code ... go use master; insert mirror_test(server,role,partner,db,state,safety,start,duration) select @@servername, mirroring_role_desc, mirroring_partner_instance, db_name(database_id), mirroring_state_desc, mirroring_safety_level_desc, cast(cast(context_info() as binary(8)) as datetime), datediff(s,cast(cast(context_info() as binary(8)) as datetime),getdate()) from sys.database_mirroring where db_name(database_id) like 'Adv%'; I declared @time as a binary(128) since CONTEXT_INFO is defined that way. I couldn't convert GETDATE() to binary(128) as it would pad the first 120 bytes as 0x00. To keep the CAST functions simple and avoid using SUBSTRING, I decided to CAST GETDATE() as binary(8) and let SQL Server do the implicit conversion. It's not the safest way perhaps, but it works on my machine. :) As I mentioned earlier, you can query system views for sessions and get their CONTEXT_INFO. With a little boilerplate code this can be used to monitor long-running procedures, in case you need to kill a process, or are just curious how long certain parts take. In this example, I added code to Adam's Big Adventure script to set CONTEXT_INFO messages at strategic places I want to monitor. (His code is in UPPERCASE as it was in the original, mine is all lowercase): declare @msg binary(128) set @msg=cast('Altering bigProduct.ProductID' as binary(128)) set context_info @msg go ALTER TABLE bigProduct ALTER COLUMN ProductID INT NOT NULL GO set context_info 0x0 go declare @msg1 binary(128) set @msg1=cast('Adding pk_bigProduct Constraint' as binary(128)) set context_info @msg1 go ALTER TABLE bigProduct ADD CONSTRAINT pk_bigProduct PRIMARY KEY (ProductID) GO set context_info 0x0 go declare @msg2 binary(128) set @msg2=cast('Altering bigTransactionHistory.TransactionID' as binary(128)) set context_info @msg2 go ALTER TABLE bigTransactionHistory ALTER COLUMN TransactionID INT NOT NULL GO set context_info 0x0 go declare @msg3 binary(128) set @msg3=cast('Adding pk_bigTransactionHistory Constraint' as binary(128)) set context_info @msg3 go ALTER TABLE bigTransactionHistory ADD CONSTRAINT pk_bigTransactionHistory PRIMARY KEY NONCLUSTERED(TransactionID) GO set context_info 0x0 go declare @msg4 binary(128) set @msg4=cast('Creating IX_ProductId_TransactionDate Index' as binary(128)) set context_info @msg4 go CREATE NONCLUSTERED INDEX IX_ProductId_TransactionDate ON bigTransactionHistory(ProductId,TransactionDate) INCLUDE(Quantity,ActualCost) GO set context_info 0x0 This doesn't include the entire script, only those portions that altered a table or created an index. One annoyance is that SET CONTEXT_INFO requires a literal or variable, you can't use an expression. And since GO starts a new batch I need to declare a variable in each one. And of course I have to use CAST because it won't implicitly convert varchar to binary. And even though context_info is a nullable column, you can't SET CONTEXT_INFO NULL, so I have to use SET CONTEXT_INFO 0x0 to clear the message after the statement completes. And if you're thinking of turning this into a UDF, you can't, although a stored procedure would work. So what does all this aggravation get you? As the code runs, if I want to see which stage the session is at, I can run the following (assuming SPID 51 is the one I want): select CAST(context_info as varchar(128)) from sys.dm_exec_sessions where session_id=51 Since SQL Server 2005 introduced the new system and dynamic management views (DMVs) there's not as much need for tagging a session with these kinds of messages. You can get the session start time and currently executing statement from them, and neatly presented if you use Adam's sp_whoisactive utility (and you absolutely should be using it). Of course you can always use xp_cmdshell, a CLR function, or some other tricks to log information outside of a SQL transaction. All the same, I've used this trick to monitor long-running reports at a previous job, and I still think CONTEXT_INFO is a great feature, especially if you're still using SQL Server 2000 or want to supplement your instrumentation. If you'd like an exercise, consider adding the system time to the messages in the last example, and an automated job to query and parse it from the system tables. That would let you track how long each statement ran without having to run Profiler. #TSQL2sDay

Read the article
ASP.NET WebAPI Security 3: Extensible Authentication Framework

- by Your DisplayName here!

In my last post, I described the identity architecture of ASP.NET Web API. The short version was, that Web API (beta 1) does not really have an authentication system on its own, but inherits the client security context from its host. This is fine in many situations (e.g. AJAX style callbacks with an already established logon session). But there are many cases where you don’t use the containing web application for authentication, but need to do it yourself. Examples of that would be token based authentication and clients that don’t run in the context of the web application (e.g. desktop clients / mobile). Since Web API provides a nice extensibility model, it is easy to implement whatever security framework you want on top of it. My design goals were: Easy to use. Extensible. Claims-based. ..and of course, this should always behave the same, regardless of the hosting environment. In the rest of the post I am outlining some of the bits and pieces, So you know what you are dealing with, in case you want to try the code. At the very heart… is a so called message handler. This is a Web API extensibility point that gets to see (and modify if needed) all incoming and outgoing requests. Handlers run after the conversion from host to Web API, which means that handler code deals with HttpRequestMessage and HttpResponseMessage. See Pedro’s post for more information on the processing pipeline. This handler requires a configuration object for initialization. Currently this is very simple, it contains: Settings for the various authentication and credential types Settings for claims transformation Ability to block identity inheritance from host The most important part here is the credential type support, but I will come back to that later. The logic of the message handler is simple: Look at the incoming request. If the request contains an authorization header, try to authenticate the client. If this is successful, create a claims principal and populate the usual places. If not, return a 401 status code and set the Www-Authenticate header. Look at outgoing response, if the status code is 401, set the Www-Authenticate header. Credential type support Under the covers I use the WIF security token handler infrastructure to validate credentials and to turn security tokens into claims. The idea is simple: an authorization header consists of two pieces: the schema and the actual “token”. My configuration object allows to associate a security token handler with a scheme. This way you only need to implement support for a specific credential type, and map that to the incoming scheme value. The current version supports HTTP Basic Authentication as well as SAML and SWT tokens. (I needed to do some surgery on the standard security token handlers, since WIF does not directly support string-ified tokens. The next version of .NET will fix that, and the code should become simpler then). You can e.g. use this code to hook up a username/password handler to the Basic scheme (the default scheme name for Basic Authentication). config.Handler.AddBasicAuthenticationHandler( (username, password) => username == password); You simply have to provide a password validation function which could of course point back to your existing password library or e.g. membership. The following code maps a token handler for Simple Web Tokens (SWT) to the Bearer scheme (the currently favoured scheme name for OAuth2). You simply have to specify the issuer name, realm and shared signature key: config.Handler.AddSimpleWebTokenHandler( "Bearer", http://identity.thinktecture.com/trust, Constants.Realm, "Dc9Mpi3jaaaUpBQpa/4R7XtUsa3D/ALSjTVvK8IUZbg="); For certain integration scenarios it is very useful if your Web API can consume SAML tokens. This is also easily accomplishable. The following code uses the standard WIF API to configure the usual SAMLisms like issuer, audience, service certificate and certificate validation. Both SAML 1.1 and 2.0 are supported. var registry = new ConfigurationBasedIssuerNameRegistry(); registry.AddTrustedIssuer( "d1 c5 b1 25 97 d0 36 94 65 1c e2 64 fe 48 06 01 35 f7 bd db", "ADFS"); var adfsConfig = new SecurityTokenHandlerConfiguration(); adfsConfig.AudienceRestriction.AllowedAudienceUris.Add( new Uri(Constants.Realm)); adfsConfig.IssuerNameRegistry = registry; adfsConfig.CertificateValidator = X509CertificateValidator.None; // token decryption (read from configuration section) adfsConfig.ServiceTokenResolver = FederatedAuthentication.ServiceConfiguration.CreateAggregateTokenResolver(); config.Handler.AddSaml11SecurityTokenHandler("SAML", adfsConfig); Claims Transformation After successful authentication, if configured, the standard WIF ClaimsAuthenticationManager is called to run claims transformation and validation logic. This stage is used to transform the “technical” claims from the security token into application claims. You can either have a separate transformation logic, or share on e.g. with the containing web application. That’s just a matter of configuration. Adding the authentication handler to a Web API application In the spirit of Web API this is done in code, e.g. global.asax for web hosting: protected void Application_Start() { AreaRegistration.RegisterAllAreas(); ConfigureApis(GlobalConfiguration.Configuration); RegisterGlobalFilters(GlobalFilters.Filters); RegisterRoutes(RouteTable.Routes); BundleTable.Bundles.RegisterTemplateBundles(); } private void ConfigureApis(HttpConfiguration configuration) { configuration.MessageHandlers.Add( new AuthenticationHandler(ConfigureAuthentication())); } private AuthenticationConfiguration ConfigureAuthentication() { var config = new AuthenticationConfiguration { // sample claims transformation for consultants sample, comment out to see raw claims ClaimsAuthenticationManager = new ApiClaimsTransformer(), // value of the www-authenticate header, // if not set, the first scheme added to the handler collection is used DefaultAuthenticationScheme = "Basic" }; // add token handlers - see above return config; } You can find the full source code and some samples here. In the next post I will describe some of the samples in the download, and then move on to authorization. HTH

Read the article
Waterfall Model (SDLC) vs. Prototyping Model

The characters in the fable of the Tortoise and the Hare can easily be used to demonstrate the similarities and differences between the Waterfall and Prototyping software development models. This children fable is about a race between a consistently slow moving but steadfast turtle and an extremely fast but unreliable rabbit. After closely comparing each character’s attributes in correlation with both software development models, a trend seems to appear in that the Waterfall closely resembles the Tortoise in that Waterfall Model is typically a slow moving process that is broken up in to multiple sequential steps that must be executed in a standard linear pattern. The Tortoise can be quoted several times in the story saying “Slow and steady wins the race.” This is the perfect mantra for the Waterfall Model in that this model is seen as a cumbersome and slow moving. Waterfall Model Phases Requirement Analysis & Definition This phase focuses on defining requirements for a project that is to be developed and determining if the project is even feasible. Requirements are collected by analyzing existing systems and functionality in correlation with the needs of the business and the desires of the end users. The desired output for this phase is a list of specific requirements from the business that are to be designed and implemented in the subsequent steps. In addition this phase is used to determine if any value will be gained by completing the project. System Design This phase focuses primarily on the actual architectural design of a system, and how it will interact within itself and with other existing applications. Projects at this level should be viewed at a high level so that actual implementation details are decided in the implementation phase. However major environmental decision like hardware and platform decision are typically decided in this phase. Furthermore the basic goal of this phase is to design an application at the system level in those classes, interfaces, and interactions are defined. Additionally decisions about scalability, distribution and reliability should also be considered for all decisions. The desired output for this phase is a functional design document that states all of the architectural decisions that have been made in regards to the project as well as a diagrams like a sequence and class diagrams. Software Design This phase focuses primarily on the refining of the decisions found in the functional design document. Classes and interfaces are further broken down in to logical modules based on the interfaces and interactions previously indicated. The output of this phase is a formal design document. Implementation / Coding This phase focuses primarily on implementing the previously defined modules in to units of code. These units are developed independently are intergraded as the system is put together as part of a whole system. Software Integration & Verification This phase primarily focuses on testing each of the units of code developed as well as testing the system as a whole. There are basic types of testing at this phase and they include: Unit Test and Integration Test. Unit Test are built to test the functionality of a code unit to ensure that it preforms its desired task. Integration testing test the system as a whole because it focuses on results of combining specific units of code and validating it against expected results. The output of this phase is a test plan that includes test with expected results and actual results. System Verification This phase primarily focuses on testing the system as a whole in regards to the list of project requirements and desired operating environment. Operation & Maintenance his phase primarily focuses on handing off the competed project over to the customer so that they can verify that all of their requirements have been met based on their original requirements. This phase will also validate the correctness of their requirements and if any changed need to be made. In addition, any problems not resolved in the previous phase will be handled in this section. The Waterfall Model’s linear and sequential methodology does offer a project certain advantages and disadvantages. Advantages of the Waterfall Model Simplistic to implement and execute for projects and/or company wide Limited demand on resources Large emphasis on documentation Disadvantages of the Waterfall Model Completed phases cannot be revisited regardless if issues arise within a project Accurate requirement are never gather prior to the completion of the requirement phase due to the lack of clarification in regards to client’s desires. Small changes or errors that arise in applications may cause additional problems The client cannot change any requirements once the requirements phase has been completed leaving them no options for changes as they see their requirements changes as the customers desires change. Excess documentation Phases are cumbersome and slow moving Learn more about the Major Process in the Sofware Development Life Cycle and Waterfall Model. Conversely, the Hare shares similar traits with the prototyping software development model in that ideas are rapidly converted to basic working examples and subsequent changes are made to quickly align the project with customers desires as they are formulated and as software strays from the customers vision. The basic concept of prototyping is to eliminate the use of well-defined project requirements. Projects are allowed to grow as the customer needs and request grow. Projects are initially designed according to basic requirements and are refined as requirement become more refined. This process allows customer to feel their way around the application to ensure that they are developing exactly what they want in the application This model also works well for determining the feasibility of certain approaches in regards to an application. Prototypes allow for quickly developing examples of implementing specific functionality based on certain techniques. Advantages of Prototyping Active participation from users and customers Allows customers to change their mind in specifying requirements Customers get a better understanding of the system as it is developed Earlier bug/error detection Promotes communication with customers Prototype could be used as final production Reduced time needed to develop applications compared to the Waterfall method Disadvantages of Prototyping Promotes constantly redefining project requirements that cause major system rewrites Potential for increased complexity of a system as scope of the system expands Customer could believe the prototype as the working version. Implementation compromises could increase the complexity when applying updates and or application fixes When companies trying to decide between the Waterfall model and Prototype model they need to evaluate the benefits and disadvantages for both models. Typically smaller companies or projects that have major time constraints typically head for more of a Prototype model approach because it can reduce the time needed to complete the project because there is more of a focus on building a project and less on defining requirements and scope prior to the start of a project. On the other hand, Companies with well-defined requirements and time allowed to generate proper documentation should steer towards more of a waterfall model because they are in a position to obtain clarified requirements and have to design and optimal solution prior to the start of coding on a project.

Read the article
Building dynamic OLAP data marts on-the-fly

- by DrJohn

At the forthcoming SQLBits conference, I will be presenting a session on how to dynamically build an OLAP data mart on-the-fly. This blog entry is intended to clarify exactly what I mean by an OLAP data mart, why you may need to build them on-the-fly and finally outline the steps needed to build them dynamically. In subsequent blog entries, I will present exactly how to implement some of the techniques involved. What is an OLAP data mart? In data warehousing parlance, a data mart is a subset of the overall corporate data provided to business users to meet specific business needs. Of course, the term does not specify the technology involved, so I coined the term "OLAP data mart" to identify a subset of data which is delivered in the form of an OLAP cube which may be accompanied by the relational database upon which it was built. To clarify, the relational database is specifically create and loaded with the subset of data and then the OLAP cube is built and processed to make the data available to the end-users via standard OLAP client tools. Why build OLAP data marts? Market research companies sell data to their clients to make money. To gain competitive advantage, market research providers like to "add value" to their data by providing systems that enhance analytics, thereby allowing clients to make best use of the data. As such, OLAP cubes have become a standard way of delivering added value to clients. They can be built on-the-fly to hold specific data sets and meet particular needs and then hosted on a secure intranet site for remote access, or shipped to clients' own infrastructure for hosting. Even better, they support a wide range of different tools for analytical purposes, including the ever popular Microsoft Excel. Extension Attributes: The Challenge One of the key challenges in building multiple OLAP data marts based on the same 'template' is handling extension attributes. These are attributes that meet the client's specific reporting needs, but do not form part of the standard template. Now clearly, these extension attributes have to come into the system via additional files and ultimately be added to relational tables so they can end up in the OLAP cube. However, processing these files and filling dynamically altered tables with SSIS is a challenge as SSIS packages tend to break as soon as the database schema changes. There are two approaches to this: (1) dynamically build an SSIS package in memory to match the new database schema using C#, or (2) have the extension attributes provided as name/value pairs so the file's schema does not change and can easily be loaded using SSIS. The problem with the first approach is the complexity of writing an awful lot of complex C# code. The problem of the second approach is that name/value pairs are useless to an OLAP cube; so they have to be pivoted back into a proper relational table somewhere in the data load process WITHOUT breaking SSIS. How this can be done will be part of future blog entry. What is involved in building an OLAP data mart? There are a great many steps involved in building OLAP data marts on-the-fly. The key point is that all the steps must be automated to allow for the production of multiple OLAP data marts per day (i.e. many thousands, each with its own specific data set and attributes). Now most of these steps have a great deal in common with standard data warehouse practices. The key difference is that the databases are all built to order. The only permanent database is the metadata database (shown in orange) which holds all the metadata needed to build everything else (i.e. client orders, configuration information, connection strings, client specific requirements and attributes etc.). The staging database (shown in red) has a short life: it is built, populated and then ripped down as soon as the OLAP Data Mart has been populated. In the diagram below, the OLAP data mart comprises the two blue components: the Data Mart which is a relational database and the OLAP Cube which is an OLAP database implemented using Microsoft Analysis Services (SSAS). The client may receive just the OLAP cube or both components together depending on their reporting requirements. So, in broad terms the steps required to fulfil a client order are as follows: Step 1: Prepare metadata Create a set of database names unique to the client's order Modify all package connection strings to be used by SSIS to point to new databases and file locations. Step 2: Create relational databases Create the staging and data mart relational databases using dynamic SQL and set the database recovery mode to SIMPLE as we do not need the overhead of logging anything Execute SQL scripts to build all database objects (tables, views, functions and stored procedures) in the two databases Step 3: Load staging database Use SSIS to load all data files into the staging database in a parallel operation Load extension files containing name/value pairs. These will provide client-specific attributes in the OLAP cube. Step 4: Load data mart relational database Load the data from staging into the data mart relational database, again in parallel where possible Allocate surrogate keys and use SSIS to perform surrogate key lookup during the load of fact tables Step 5: Load extension tables & attributes Pivot the extension attributes from their native name/value pairs into proper relational tables Add the extension attributes to the views used by OLAP cube Step 6: Deploy & Process OLAP cube Deploy the OLAP database directly to the server using a C# script task in SSIS Modify the connection string used by the OLAP cube to point to the data mart relational database Modify the cube structure to add the extension attributes to both the data source view and the relevant dimensions Remove any standard attributes that not required Process the OLAP cube Step 7: Backup and drop databases Drop staging database as it is no longer required Backup data mart relational and OLAP database and ship these to the client's infrastructure Drop data mart relational and OLAP database from the build server Mark order complete Start processing the next order, ad infinitum. So my future blog posts and my forthcoming session at the SQLBits conference will all focus on some of the more interesting aspects of building OLAP data marts on-the-fly such as handling the load of extension attributes and how to dynamically alter the structure of an OLAP cube using C#.

Read the article
The SSIS tuning tip that everyone misses

- by Rob Farley

I know that everyone misses this, because I’m yet to find someone who doesn’t have a bit of an epiphany when I describe this. When tuning Data Flows in SQL Server Integration Services, people see the Data Flow as moving from the Source to the Destination, passing through a number of transformations. What people don’t consider is the Source, getting the data out of a database. Remember, the source of data for your Data Flow is not your Source Component. It’s wherever the data is, within your database, probably on a disk somewhere. You need to tune your query to optimise it for SSIS, and this is what most people fail to do. I’m not suggesting that people don’t tune their queries – there’s plenty of information out there about making sure that your queries run as fast as possible. But for SSIS, it’s not about how fast your query runs. Let me say that again, but in bolder text: The speed of an SSIS Source is not about how fast your query runs. If your query is used in a Source component for SSIS, the thing that matters is how fast it starts returning data. In particular, those first 10,000 rows to populate that first buffer, ready to pass down the rest of the transformations on its way to the Destination. Let’s look at a very simple query as an example, using the AdventureWorks database: We’re picking the different Weight values out of the Product table, and it’s doing this by scanning the table and doing a Sort. It’s a Distinct Sort, which means that the duplicates are discarded. It'll be no surprise to see that the data produced is sorted. Obvious, I know, but I'm making a comparison to what I'll do later. Before I explain the problem here, let me jump back into the SSIS world... If you’ve investigated how to tune an SSIS flow, then you’ll know that some SSIS Data Flow Transformations are known to be Blocking, some are Partially Blocking, and some are simply Row transformations. Take the SSIS Sort transformation, for example. I’m using a larger data set for this, because my small list of Weights won’t demonstrate it well enough. Seven buffers of data came out of the source, but none of them could be pushed past the Sort operator, just in case the last buffer contained the data that would be sorted into the first buffer. This is a blocking operation. Back in the land of T-SQL, we consider our Distinct Sort operator. It’s also blocking. It won’t let data through until it’s seen all of it. If you weren’t okay with blocking operations in SSIS, why would you be happy with them in an execution plan? The source of your data is not your OLE DB Source. Remember this. The source of your data is the NCIX/CIX/Heap from which it’s being pulled. Picture it like this... the data flowing from the Clustered Index, through the Distinct Sort operator, into the SELECT operator, where a series of SSIS Buffers are populated, flowing (as they get full) down through the SSIS transformations. Alright, I know that I’m taking some liberties here, because the two queries aren’t the same, but consider the visual. The data is flowing from your disk and through your execution plan before it reaches SSIS, so you could easily find that a blocking operation in your plan is just as painful as a blocking operation in your SSIS Data Flow. Luckily, T-SQL gives us a brilliant query hint to help avoid this. OPTION (FAST 10000) This hint means that it will choose a query which will optimise for the first 10,000 rows – the default SSIS buffer size. And the effect can be quite significant. First let’s consider a simple example, then we’ll look at a larger one. Consider our weights. We don’t have 10,000, so I’m going to use OPTION (FAST 1) instead. You’ll notice that the query is more expensive, using a Flow Distinct operator instead of the Distinct Sort. This operator is consuming 84% of the query, instead of the 59% we saw from the Distinct Sort. But the first row could be returned quicker – a Flow Distinct operator is non-blocking. The data here isn’t sorted, of course. It’s in the same order that it came out of the index, just with duplicates removed. As soon as a Flow Distinct sees a value that it hasn’t come across before, it pushes it out to the operator on its left. It still has to maintain the list of what it’s seen so far, but by handling it one row at a time, it can push rows through quicker. Overall, it’s a lot more work than the Distinct Sort, but if the priority is the first few rows, then perhaps that’s exactly what we want. The Query Optimizer seems to do this by optimising the query as if there were only one row coming through: This 1 row estimation is caused by the Query Optimizer imagining the SELECT operation saying “Give me one row” first, and this message being passed all the way along. The request might not make it all the way back to the source, but in my simple example, it does. I hope this simple example has helped you understand the significance of the blocking operator. Now I’m going to show you an example on a much larger data set. This data was fetching about 780,000 rows, and these are the Estimated Plans. The data needed to be Sorted, to support further SSIS operations that needed that. First, without the hint. ...and now with OPTION (FAST 10000): A very different plan, I’m sure you’ll agree. In case you’re curious, those arrows in the top one are 780,000 rows in size. In the second, they’re estimated to be 10,000, although the Actual figures end up being 780,000. The top one definitely runs faster. It finished several times faster than the second one. With the amount of data being considered, these numbers were in minutes. Look at the second one – it’s doing Nested Loops, across 780,000 rows! That’s not generally recommended at all. That’s “Go and make yourself a coffee” time. In this case, it was about six or seven minutes. The faster one finished in about a minute. But in SSIS-land, things are different. The particular data flow that was consuming this data was significant. It was being pumped into a Script Component to process each row based on previous rows, creating about a dozen different flows. The data flow would take roughly ten minutes to run – ten minutes from when the data first appeared. The query that completes faster – chosen by the Query Optimizer with no hints, based on accurate statistics (rather than pretending the numbers are smaller) – would take a minute to start getting the data into SSIS, at which point the ten-minute flow would start, taking eleven minutes to complete. The query that took longer – chosen by the Query Optimizer pretending it only wanted the first 10,000 rows – would take only ten seconds to fill the first buffer. Despite the fact that it might have taken the database another six or seven minutes to get the data out, SSIS didn’t care. Every time it wanted the next buffer of data, it was already available, and the whole process finished in about ten minutes and ten seconds. When debugging SSIS, you run the package, and sit there waiting to see the Debug information start appearing. You look for the numbers on the data flow, and seeing operators going Yellow and Green. Without the hint, I’d sit there for a minute. With the hint, just ten seconds. You can imagine which one I preferred. By adding this hint, it felt like a magic wand had been waved across the query, to make it run several times faster. It wasn’t the case at all – but it felt like it to SSIS.

Read the article
Getting your bearings and defining the project objective

- by johndoucette

I wrote this two years ago and thought it was worth posting… Some may think this is a daunting task and some may even say “what a waste of time” and want to open MS Project and start typing out tasks because someone asked for an estimate and a task list. Hell, maybe you even use Excel and pump out a spreadsheet with some real scientific formula for guessing how long it will take to code a bunch of classes. However, this short exercise will provide the basis for the entire project, whether small or large and be a great friend when communicating to anyone on your team or even your client. I call this the Project Brief. If you find yourself going beyond a single page, then you must decompose the sections and summarize your findings so there is a complete and clear picture of the project you are working on in a relatively short statement. Here is a great quote from the PMBOK (Project Management Body of Knowledge) relative to what a project is; A project is a temporary endeavor undertaken to create a unique product, service or result. With this in mind, the project brief should encompass the entirety (objective) of the endeavor in its explanation and what it will take (goals) to create the product, service or result (deliverables). Normally the process of identifying the project objective is done during the first stage of a project called the Project Kickoff, but you can perform this very important step anytime to help you get a bearing. There are many more parts to helping a project stay on course, but this is usually the foundation where it can be grounded on. Through a series of 3 exercises, you should be able to come up with the objective, goals and deliverables on your project. Follow these steps, and in no time (about ½ hour), you will have the foundation of your project plan. (See examples below) Exercise 1 – Objectives Begin with the end in mind. Think about your project in business terms with a couple things to help you understand the objective; Reference the business benefit in terms of cost, speed and / or quality, Provide a higher level of what the outcome will look like (future sense) It should be non-measurable, that’s what the goals are all about The output should be a single paragraph with three sentences and take 10 minutes to write. *Typically, agreement must be reached on the objectives of the project before you would proceed to the next steps of the project. Exercise 2 – Goals A project goal is a statement that answers questions about who, what, why, where and when. A good project goal statement; Answers the five “W” questions for the project Is measurable in each of its parts Is published and agreed on by all the owners This helps the Project Manager receive confirmation on defining the project target. Using the established project objective done in the first exercise, think about the things it will take to get the job done. Think about tangible activities which are the top level tasks in a typical Work Breakdown Structure (WBS). The overall goal statement plus all the deliverables (next exercise) can be seen as the project team’s contract with the project owners. Write 3 - 5 goals in about 10 minutes. You should not write the words “Who, what, why, where and when, but merely be able to answer the questions when you read a goal. Exercise 3 – Deliverables Every project creates some type of output and these outputs are called deliverables. There are two classes of deliverables; Internal – produced for project team members to meet their goals External – produced for project owners to meet their expectations The list you enter here provides a checklist for the team’s delivery and/or is a statement of all the expectations of the project owners. Here are some typical project deliverables; Product and product documentation End product/system Requirements/feature documents Installation guides Demo/prototype System design documents User guides/help files Plans Project plan Training plan Conversion/installation/delivery plan Test plans Documentation plan Communication plan Reports and general documentation Progress reports System acceptance tests Outstanding bug list Procedures Risk and issue logs Project history Deliverables should go with each of the goals. Have 3-5 deliverables for each goal. When you are done, you will have established a great foundation for the clarity of your project. This exercise can take some time, but with practice, you should be able to whip this one out in 10 minutes as well, especially if you are intimate with an ongoing project. Samples Objective [Client] is implementing a series of MOSS sites to support external public (Internet), internal employee (Intranet) and an external secure (password protected Internet) applications. This project will focus on the public-facing web site and will provide [Client] with architectural recommendations based on the current design being done by their design partner [Partner] and the internal Content Team. In addition, it will provide [Client] with a development plan and confidence they need to deploy a world class public Internet website. Goals 1. [Consultant] will provide technical guidance and set project team expectations for the implementation of the MOSS Internet site based on provided features/functions within three weeks. 2. [Consultant] will understand phase 2 secure password-protected Internet site design and provide recommendations. Deliverables 1.1 Public Internet (unsecure) Architectural Recommendation Plan 1.2 Physical Site construction Work Breakdown Structure and plan (Time, cost and resources needed) 2.1 Two Factor authentication recommendation document Objective [Client] is currently using an application developed by [Consultant] many years ago called "XXX". This application, although functional, does not meet their new updated business requirements and contains a few defects which [Client] has developed work-around processes. [Client] would like to have a "new and improved" system to support their membership management needs by expanding membership and subscription capabilities, provide accounting integration with internal (GL) and external (VeriSign) systems, and implement hooks to the current CRM solution. This effort will take place through a series of phases, beginning with envisioning. Goals 1. Through discussions with users, [Consultant] will discover current issues/bugs which need to be resolved which must meet the current functionality requirements within three weeks. 2. [Consultant] will gather requirements from the users about what is "needed" vs. "what they have" for enhancements and provide a high level document supporting their needs. 3. [Consultant] will meet with the team members through a series of meetings and help define the overall project plan to deliver a new and improved solution. Deliverables 1.1 Prioritized list of Current application issues/bugs that need to be resolved 1.2 Provide a resolution plan on the issues/bugs identified in the current application 1.3 Risk Assessment Document 2.1 Deliver a Requirements Document showing high-level [Client] needs for the new XXX application. · New feature functionality not in the application today · Existing functionality that will remain in the new functionality 2.2 Reporting Requirements Document 3.1 A Project Plan showing the deliverables and cost for the next (second) phase of this project. 3.2 A Statement of Work for the next (second) phase of this project. 3.3 An Estimate of any work that would need to follow the second phase.

Read the article
Master Data

- by david.butler(at)oracle.com

Let's take a deeper look at what we mean when we talk about 'Master' data. In its most general sense, master data is data that exists in more than one operational application. These are the applications that automate business processes. These applications require significant amounts of data to function correctly. This includes data about the objects that are involved in transactions, as well as the transaction data itself. For example, when a customer buys a product, the transaction is managed by a sales application. The objects of the transaction are the Customer and the Product. The transactional data is the time, place, price, discount, payment methods, etc. used at the point of sale. Many thousands of transactional data attributes are needed within the application. These important data elements are local to the applications and have no bearing on other applications. Harmonization and synchronization across applications is not necessary. The Customer and Product objects of the transaction also have a large number of attributes. Customer for example, includes hierarchies, hierarchical and matrixed relationships, contacts, classifications, preferences, accounts, identifiers, profiles, and addresses galore for 'ship to', 'mail to'; 'service at'; etc. Dozens of attributes exist for individuals, hundreds for organizations, and thousands for products. This data has meaning beyond any particular application. It exists in many applications and drives the vital cross application enterprise business processes. These are the processes that define and differentiate the organization. At every decision point, information about the objects of the process determines the direction of the process flow. This is the nature of the data that exists in more than one application, and this is why we call it 'master data'. Let me elaborate. Parties Oracle has developed a party schema to model all participants in your daily business operations. It models people, organizations, groups, customers, contacts, employees, and suppliers. It models their accounts, locations, classifications, and preferences. And most importantly, it models the vast array of hierarchical and matrixed relationships that exist between all the participants in your real world operations. The model logically separates people and organizations from their relationships and accounts. This separation creates flexibility unmatched in the industry and accounts for the fact that the Oracle schema for Customers, Suppliers, and Accounts is a true superset of the wide variety of commercial and homegrown customer models in existence. Sites Sites are places where business is conducted. They can be addresses, clusters such as retail malls, locations within a cluster, floors within a building, places where meters are located, rooms on floors, etc. Fully understanding all attributes of a site is key to many business processes. Attributes such as 'noise abatement policy' at a point of delivery, or the size of an oven in a business kitchen drive day-to-day activities such as delivery schedules or food promotions. Typically this kind of data is siloed in departments and scattered across applications and spreadsheets. This leads to conflicting information and poor operational efficiencies. Oracle's Global Single Schema can hold all site attributes in one place and enables a single version of authoritative site information across the enterprise. Products and Services The Oracle Global Single Schema also includes a number of entities that define the products and services a company creates and offers for sale. Key entities include Items organized into Catalogs and Price Lists. The Catalog structures provide for the ability to capture different views of a product such as engineering, manufacturing, and service which are based on a unified product model. As a result, designers, manufacturing engineers, purchasers and partners can work simultaneously on a common product definition. The Catalog schema allows for unlimited attributes, combines them into meaningful groups, and maps them to catalog categories to track these different types of information. The model also maps an unlimited number of functional structures for each item. For example, multiple Bills of Material (BOMs) can be constructed representing requirements BOM, features BOM, and packaging BOM for an item. The Catalog model also supports hierarchical information about each item and all standard Global Data Synchronization attributes. Business Processes Utilizing Linked Data Entities Each business entity codified into a centralized master data environment significantly improves the efficiency of the automated business processes that use the consolidated data. When all the key business entities used by an organization's process are so consolidated, the advantages are multiplied. The primary reason for business process breakdowns (i.e. data errors across application boundaries) is eliminated. All processes are positively impacted and business process automation is itself automated. I like to use the "Call to Resolution" business process as an example to help illustrate this important point. It involves call center applications, service applications, RMA applications, transportation applications, inventory applications, etc. Customer, Site, Product and Supplier master data must all be correct and consistent across these applications. What's more, the data relationships between customer and product, and product and suppliers must be right. This is the minimum quality needed to insure the business process flows without error. But that is not the end of the story. Critical master data attributes such as customer loyalty, profitability, credit worthiness, and propensity to buy can optimize the call center point of contact component of the process. Critical product information such as alternative parts or equivalent products can optimize the resolution selected by the process. A comprehensive understanding of the 'service at' location can help insure multiple trips are avoided in the process. Full supplier information on reliability, delivery delays, and potential alternates can prevent supplier exceptions and play a significant role in optimizing the process. In other words, these master data attributes enable the optimization of the "Call to Resolution" enterprise business process. Master data supports and guides business process flows. Thus the phrase 'Master Data' is indeed appropriate. MDM is the software that houses, manages, and governs the master data that resides in all applications and controls the enterprise business processes. A complete master data solution takes a data model that holds fully attributed master data entities and their inter-relationships. Oracle has this model. Oracle, with its deep understanding of application data is the logical choice for managing all your master data within the enterprise whether or not your organization actually runs any Oracle Applications.

Read the article
Network communications mechanisms for SQL Server

- by Akshay Deep Lamba

Problem I am trying to understand how SQL Server communicates on the network, because I'm having to tell my networking team what ports to open up on the firewall for an edge web server to communicate back to the SQL Server on the inside. What do I need to know? Solution In order to understand what needs to be opened where, let's first talk briefly about the two main protocols that are in common use today: TCP - Transmission Control Protocol UDP - User Datagram Protocol Both are part of the TCP/IP suite of protocols. We'll start with TCP. TCP TCP is the main protocol by which clients communicate with SQL Server. Actually, it is more correct to say that clients and SQL Server use Tabular Data Stream (TDS), but TDS actually sits on top of TCP and when we're talking about Windows and firewalls and other networking devices, that's the protocol that rules and controls are built around. So we'll just speak in terms of TCP. TCP is a connection-oriented protocol. What that means is that the two systems negotiate the connection and both agree to it. Think of it like a phone call. While one person initiates the phone call, the other person has to agree to take it and both people can end the phone call at any time. TCP is the same way. Both systems have to agree to the communications, but either side can end it at any time. In addition, there is functionality built into TCP to ensure that all communications can be disassembled and reassembled as necessary so it can pass over various network devices and be put together again properly in the right order. It also has mechanisms to handle and retransmit lost communications. Because of this functionality, TCP is the protocol used by many different network applications. The way the applications all can share is through the use of ports. When a service, like SQL Server, comes up on a system, it must listen on a port. For a default SQL Server instance, the default port is 1433. Clients connect to the port via the TCP protocol, the connection is negotiated and agreed to, and then the two sides can transfer information as needed until either side decides to end the communication. In actuality, both sides will have a port to use for the communications, but since the client's port is typically determined semi-randomly, when we're talking about firewalls and the like, typically we're interested in the port the server or service is using. UDP UDP, unlike TCP, is not connection oriented. A "client" can send a UDP communications to anyone it wants. There's nothing in place to negotiate a communications connection, there's nothing in the protocol itself to coordinate order of communications or anything like that. If that's needed, it's got to be handled by the application or by a protocol built on top of UDP being used by the application. If you think of TCP as a phone call, think of UDP as a postcard. I can put a postcard in the mail to anyone I want, and so long as it is addressed properly and has a stamp on it, the postal service will pick it up. Now, what happens it afterwards is not guaranteed. There's no mechanism for retransmission of lost communications. It's great for short communications that doesn't necessarily need an acknowledgement. Because multiple network applications could be communicating via UDP, it uses ports, just like TCP. The SQL Browser or the SQL Server Listener Service uses UDP. Network Communications - Talking to SQL Server When an instance of SQL Server is set up, what TCP port it listens on depends. A default instance will be set up to listen on port 1433. A named instance will be set to a random port chosen during installation. In addition, a named instance will be configured to allow it to change that port dynamically. What this means is that when a named instance starts up, if it finds something already using the port it normally uses, it'll pick a new port. If you have a named instance, and you have connections coming across a firewall, you're going to want to use SQL Server Configuration Manager to set a static port. This will allow the networking and security folks to configure their devices for maximum protection. While you can change the network port for a default instance of SQL Server, most people don't. Network Communications - Finding a SQL Server When just the name is specified for a client to connect to SQL Server, for instance, MySQLServer, this is an attempt to connect to the default instance. In this case the client will automatically attempt to communicate to port 1433 on MySQLServer. If you've switched the port for the default instance, you'll need to tell the client the proper port, usually by specifying the following syntax in the connection string: <server>,<port>. For instance, if you moved SQL Server to listen on 14330, you'd use MySQLServer,14330 instead of just MySQLServer. However, because a named instance sets up its port dynamically by default, the client never knows at the outset what the port is it should talk to. That's what the SQL Browser or the SQL Server Listener Service (SQL Server 2000) is for. In this case, the client sends a communication via the UDP protocol to port 1434. It asks, "Where is the named instance?" So if I was running a named instance called SQL2008R2, it would be asking the SQL Browser, "Hey, how do I talk to MySQLServer\SQL2008R2?" The SQL Browser would then send back a communications from UDP port 1434 back to the client telling the client how to talk to the named instance. Of course, you can skip all of this of you set that named instance's port statically. Then you can use the <server>,<port> mechanism to connect and the client won't try to talk to the SQL Browser service. It'll simply try to make the connection. So, for instance, is the SQL2008R2 instance was listening on port 20080, specifying MySQLServer,20080 would attempt a connection to the named instance. Network Communications - Named Pipes Named pipes is an older network library communications mechanism and it's generally not used any longer. It shouldn't be used across a firewall. However, if for some reason you need to connect to SQL Server with it, this protocol also sits on top of TCP. Named Pipes is actually used by the operating system and it has its own mechanism within the protocol to determine where to route communications. As far as network communications is concerned, it listens on TCP port 445. This is true whether we're talking about a default or named instance of SQL Server. The Summary Table To put all this together, here is what you need to know: Type of Communication Protocol Used Default Port Finding a SQL Server or SQL Server Named Instance UDP 1434 Communicating with a default instance of SQL Server TCP 1433 Communicating with a named instance of SQL Server TCP * Determined dynamically at start up Communicating with SQL Server via Named Pipes TCP 445

Read the article
SQL Sentry First Impressions

- by AjarnMark

After struggling to defend my SQL Servers from a political attack recently, I realized that I needed better tools to back me up, and SQL Sentry is the leading candidate. A couple of weeks ago, seemingly from out of nowhere, complaints from the business users started coming in that one of the core internal applications was running dramatically slower than normal, and fingers were being pointed at the SQL Server. Unfortunately, we don’t have a production DBA whose entire job is to monitor and maintain our SQL Servers. The responsibility falls to me to do the best I can, investing only a small portion of my time, because there are so many other responsibilities to take care of, and our industry is still deep in recession. I inherited these SQL Servers and have made significant improvements in process and procedure, but I had not yet made the time to take real baseline measurements or keep a really close eye on the performance. Like many DBAs, I wrote several of my own tools and used the “built-in tools” like Profiler, PerfMon, and sp_who2 (did I mention most of our instances are SQL Server 2000?). These have all served me well for in-the-moment troubleshooting and maintenance, but they really fell down on the job when I was called upon to “prove” that SQL Server performance was acceptable and more importantly had not degraded recently (i.e. historical comparisons). I really didn’t have anything from a historical comparison perspective, but I was able to show that current performance was acceptable, and deflect attention back onto other components (which in fact turned out to be the real culprit). That experience dramatically illustrated the need for better monitoring tools. Coincidentally, I had been talking recently to my boss about the mini nightmare of monitoring several critical and interdependent overnight jobs that operate on separate instances of SQL Server. Among other tools, I had been using Idera’s SQL Job Manager which is a free tool and did a nice job of showing me job schedules and histories in a nice calendar view. This worked fairly well, and for the money (did I mention it was free?) it couldn’t be beat. But it is based on the stored job history in MSDB, and there were other performance problems that we ran into when we started changing the settings for how much job history to retain, in order to be able to look back a month or more in the calendar view. Another coincidence (if you believe in such things) was that when we had some of those performance challenges, I posted a couple of questions to the #sqlhelp hashtag on Twitter and Greg Gonzalez (@SQLSensei) suggested I check out SQL Sentry’s Event Manager. At the time, I just thought he worked there, but later found out that he founded the company. When I took a quick look at the features & benefits, the one that really jumped out at me is Chaining and Queueing which sounded like it would really help with our “interdependent jobs on different servers” issue. I know that is a lot of background story and coincidences, but hopefully you have stuck with me so far, and now we have arrived at the point where last week I downloaded and installed the 30-day trial of the SQL Sentry Power Suite, which is Event Manager plus Performance Advisor. And I must say that I really like what I see so far. Here are a few highlights: Great Support. I had two issues getting the trial setup and monitoring a handful of our servers. One of which was entirely my fault (missed a security setting in SQL 2008) and the other was mostly my fault (late change to some config settings that were apparently cached and did not get refreshed properly). In both cases, the support staff at SQL Sentry were very responsive and rather quickly figured out what the cause and fix was for each of them. This left me with a great impression of the company. Kudos to them! Chaining and Queueing. While I have not yet activated this feature, I am very excited about the possibilities. We have jobs on three different instances of SQL Server that have to be run in a certain order, and each has to finish before the next can successfully begin, and I believe this feature will ensure just that. It has been a real pain in the backside when one of those jobs runs just a little too long and does not finish before the job on another instance starts, thus triggering a chain reaction of either outright job failures, or worse, successful completion of completely invalid processing. Calendar View. I really, really like the Event Manager calendar view where I can see all jobs and events across all instances and identify potential resource contention as well as windows of opportunity for maintenance activity. Very well done, and based on Event Manager’s own database of accumulated historical information rather than querying the source instances every time. Performance Advisor Dashboard History View. This view let’s me quickly select a date and time range and it displays graphs of key SQL Server and Windows metrics. This is exactly the thing I needed to answer the “has performance changed recently” question at the beginning of this post. Reporting Services Subscription Jobs with Report Name. This was a big and VERY pleasant surprise. If you have ever looked at the list of SQL Server jobs that SQL Server Reporting Services creates when you make a Subscription, you will notice that they all have some sort of GUID as the name of the job. This is really ugly, and really annoying because when you are just looking at the SQL Agent and Job Activity Monitor, if you see that Job X failed, you really do not have any indication in the name or the properties of the Job itself, as to what Report that was for. But with SQL Sentry Event Manager you do. The Jobs list in the Navigator pane in SQL Sentry, amazingly, displays the name of the Report that the Subscription Job is for. And when you open it to see more details, it shows you the full Reporting Services path to that Report, so you can immediately track it down in the Report Manager in case you want to identify/notify the owner or edit the Subscription information. I did not expect this at all, but I sure do like it. HOORAY! That is just my first impressions from using the tools for a few days. And I haven’t even gotten into how it showed me where I was completely mistaken about one aspect of my SQL Server disk configurations. I’ll share that lesson in another blog entry. But I have to say it again, the combination of Event Manager and Performance Advisor working together have really made me a fan.

Read the article
How do I restrict concurrent statistics gathering to a small set of tables from a single schema?

- by Maria Colgan

I got an interesting question from one of my colleagues in the performance team last week about how to restrict a concurrent statistics gather to a small subset of tables from one schema, rather than the entire schema. I thought I would share the solution we came up with because it was rather elegant, and took advantage of concurrent statistics gathering, incremental statistics, and the not so well known “obj_filter_list” parameter in DBMS_STATS.GATHER_SCHEMA_STATS procedure. You should note that the solution outline below with “obj_filter_list” still applies, even when concurrent statistics gathering and/or incremental statistics gathering is disabled. The reason my colleague had asked the question in the first place was because he wanted to enable incremental statistics for 5 large partitioned tables in one schema. The first time you gather statistics after you enable incremental statistics on a table, you have to gather statistics for all of the existing partitions so that a synopsis may be created for them. If the partitioned table in question is large and contains a lot of partition, this could take a considerable amount of time. Since my colleague only had the Exadata environment at his disposal overnight, he wanted to re-gather statistics on 5 partition tables as quickly as possible to ensure that it all finished before morning. Prior to Oracle Database 11g Release 2, the only way to do this would have been to write a script with an individual DBMS_STATS.GATHER_TABLE_STATS command for each partition, in each of the 5 tables, as well as another one to gather global statistics on the table. Then, run each script in a separate session and manually manage how many of this session could run concurrently. Since each table has over one thousand partitions that would definitely be a daunting task and would most likely keep my colleague up all night! In Oracle Database 11g Release 2 we can take advantage of concurrent statistics gathering, which enables us to gather statistics on multiple tables in a schema (or database), and multiple (sub)partitions within a table concurrently. By using concurrent statistics gathering we no longer have to run individual statistics gathering commands for each partition. Oracle will automatically create a statistics gathering job for each partition, and one for the global statistics on each partitioned table. With the use of concurrent statistics, our script can now be simplified to just five DBMS_STATS.GATHER_TABLE_STATS commands, one for each table. This approach would work just fine but we really wanted to get this down to just one command. So how can we do that? You may be wondering why we didn’t just use the DBMS_STATS.GATHER_SCHEMA_STATS procedure with the OPTION parameter set to ‘GATHER STALE’. Unfortunately the statistics on the 5 partitioned tables were not stale and enabling incremental statistics does not mark the existing statistics stale. Plus how would we limit the schema statistics gather to just the 5 partitioned tables? So we went to ask one of the statistics developers if there was an alternative way. The developer told us the advantage of the “obj_filter_list” parameter in DBMS_STATS.GATHER_SCHEMA_STATS procedure. The “obj_filter_list” parameter allows you to specify a list of objects that you want to gather statistics on within a schema or database. The parameter takes a collection of type DBMS_STATS.OBJECTTAB. Each entry in the collection has 5 feilds; the schema name or the object owner, the object type (i.e., ‘TABLE’ or ‘INDEX’), object name, partition name, and subpartition name. You don't have to specify all five fields for each entry. Empty fields in an entry are treated as if it is a wildcard field (similar to ‘*’ character in LIKE predicates). Each entry corresponds to one set of filter conditions on the objects. If you have more than one entry, an object is qualified for statistics gathering as long as it satisfies the filter conditions in one entry. You first must create the collection of objects, and then gather statistics for the specified collection. It’s probably easier to explain this with an example. I’m using the SH sample schema but needed a couple of additional partitioned table tables to get recreate my colleagues scenario of 5 partitioned tables. So I created SALES2, SALES3, and COSTS2 as copies of the SALES and COSTS table respectively (setup.sql). I also deleted statistics on all of the tables in the SH schema beforehand to more easily demonstrate our approach. Step 0. Delete the statistics on the tables in the SH schema. Step 1. Enable concurrent statistics gathering. Remember, this has to be done at the global level. Step 2. Enable incremental statistics for the 5 partitioned tables. Step 3. Create the DBMS_STATS.OBJECTTAB and pass it to the DBMS_STATS.GATHER_SCHEMA_STATS command. Here, you will notice that we defined two variables of DBMS_STATS.OBJECTTAB type. The first, filter_lst, will be used to pass the list of tables we want to gather statistics on, and will be the value passed to the obj_filter_list parameter. The second, obj_lst, will be used to capture the list of tables that have had statistics gathered on them by this command, and will be the value passed to the objlist parameter. In Oracle Database 11g Release 2, you need to specify the objlist parameter in order to get the obj_filter_list parameter to work correctly due to bug 14539274. Will also needed to define the number of objects we would supply in the obj_filter_list. In our case we ere specifying 5 tables (filter_lst.extend(5)). Finally, we need to specify the owner name and object name for each of the objects in the list. Once the list definition is complete we can issue the DBMS_STATS.GATHER_SCHEMA_STATS command. Step 4. Confirm statistics were gathered on the 5 partitioned tables. Here are a couple of other things to keep in mind when specifying the entries for the obj_filter_list parameter. If a field in the entry is empty, i.e., null, it means there is no condition on this field. In the above example , suppose you remove the statement Obj_filter_lst(1).ownname := ‘SH’; You will get the same result since when you have specified gather_schema_stats so there is no need to further specify ownname in the obj_filter_lst. All of the names in the entry are normalized, i.e., uppercased if they are not double quoted. So in the above example, it is OK to use Obj_filter_lst(1).objname := ‘sales’;. However if you have a table called ‘MyTab’ instead of ‘MYTAB’, then you need to specify Obj_filter_lst(1).objname := ‘”MyTab”’; As I said before, although we have illustrated the usage of the obj_filter_list parameter for partitioned tables, with concurrent and incremental statistics gathering turned on, the obj_filter_list parameter is generally applicable to any gather_database_stats, gather_dictionary_stats and gather_schema_stats command. You can get a copy of the script I used to generate this post here. +Maria Colgan

Read the article
Five Key Strategies in Master Data Management

- by david.butler(at)oracle.com

Here is a very interesting Profit Magazine article on MDM: A recent customer survey reveals the deleterious effects of data fragmentation. by Trevor Naidoo, December 2010 Across industries and geographies, IT organizations have grown in complexity, whether due to mergers and acquisitions, or decentralized systems supporting functional or departmental requirements. With systems architected over time to support unique, one-off process needs, they are becoming costly to maintain, and the Internet has only further added to the complexity. Data fragmentation has become a key inhibitor in delivering flexible, user-friendly systems. The Oracle Insight team conducted a survey assessing customers' master data management (MDM) capabilities over the past two years to get a sense of where they are in terms of their capabilities. The responses, by 27 respondents from six different industries, reveal five key areas in which customers need to improve their data management in order to get better financial results. 1. Less than 15 percent of organizations surveyed understand the sources and quality of their master data, and have a roadmap to address missing data domains. Examples of the types of master data domains referred to are customer, supplier, product, financial and site. Many organizations have multiple sources of master data with varying degrees of data quality in each source -- customer data stored in the customer relationship management system is inconsistent with customer data stored in the order management system. Imagine not knowing how many places you stored your customer information, and whether a customer's address was the most up to date in each source. In fact, more than 55 percent of the respondents in the survey manage their data quality on an ad-hoc basis. It is important for organizations to document their inventory of data sources and then profile these data sources to ensure that there is a consistent definition of key data entities throughout the organization. Some questions to ask are: How do we define a customer? What is a product? How do we define a site? The goal is to strive for one common repository for master data that acts as a cross reference for all other sources and ensures consistent, high-quality master data throughout the organization. 2. Only 18 percent of respondents have an enterprise data management strategy to ensure that data is treated as an asset to the organization. Most respondents handle data at the department or functional level and do not have an enterprise view of their master data. The sales department may track all their interactions with customers as they move through the sales cycle, the service department is tracking their interactions with the same customers independently, and the finance department also has a different perspective on the same customer. The salesperson may not be aware that the customer she is trying to sell to is experiencing issues with existing products purchased, or that the customer is behind on previous invoices. The lack of a data strategy makes it difficult for business users to turn data into information via reports. Without the key building blocks in place, it is difficult to create key linkages between customer, product, site, supplier and financial data. These linkages make it possible to understand patterns. A well-defined data management strategy is aligned to the business strategy and helps create the governance needed to ensure that data stewardship is in place and data integrity is intact. 3. Almost 60 percent of respondents have no strategy to integrate data across operational applications. Many respondents have several disparate sources of data with no strategy to keep them in sync with each other. Even though there is no clear strategy to integrate the data (see #2 above), the data needs to be synced and cross-referenced to keep the business processes running. About 55 percent of respondents said they perform this integration on an ad hoc basis, and in many cases, it is done manually with the help of Microsoft Excel spreadsheets. For example, a salesperson needs a report on global sales for a specific product, but the product has different product numbers in different countries. Typically, an analyst will pull all the data into Excel, manually create a cross reference for that product, and then aggregate the sales. The exact same procedure has to be followed if the same report is needed the following month. A well-defined consolidation strategy will ensure that a central cross-reference is maintained with updates in any one application being propagated to all the other systems, so that data is synchronized and up to date. This can be done in real time or in batch mode using integration technology. 4. Approximately 50 percent of respondents spend manual efforts cleansing and normalizing data. Information stored in various systems usually follows different standards and formats, making it difficult to match the data. A customer's address can be stored in different ways using a variety of abbreviations -- for example, "av" or "ave" for avenue. Similarly, a product's attributes can be stored in a number of different ways; for example, a size attribute can be stored in inches and can also be entered as "'' ". These types of variations make it difficult to match up data from different sources. Today, most customers rely on manual, heroic efforts to match, cleanse, and de-duplicate data -- clearly not a scalable, sustainable model. To solve this challenge, organizations need the ability to standardize data for customers, products, sites, suppliers and financial accounts; however, less than 10 percent of respondents have technology in place to automatically resolve duplicates. It is no wonder, therefore, that we get communications about products we don't own, at addresses we don't reside, and using channels (like direct mail) we don't like. An all-too-common example of a potential challenge follows: Customers end up receiving duplicate communications, which not only impacts customer satisfaction, but also incurs additional mailing costs. Cleansing, normalizing, and standardizing data will help address most of these issues. 5. Only 10 percent of respondents have the ability to share data that was mastered in a master data hub. Close to 60 percent of respondents have efforts in place that profile, standardize and cleanse data manually, and the output of these efforts are stored in spreadsheets in various parts of the organization. This valuable information is not easily shared with the rest of the organization and, more importantly, this enriched information cannot be sent back to the source systems so that the data is fixed at the source. A key benefit of a master data management strategy is not only to clean the data, but to also share the data back to the source systems as well as other systems that need the information. Aside from the source systems, another key beneficiary of this data is the business intelligence system. Having clean master data as input to business intelligence systems provides more accurate and enhanced reporting. Characteristics of Stellar MDM When deciding on the right master data management technology, organizations should look for solutions that have four main characteristics: enterprise-grade MDM performance complete technology that can be rapidly deployed and addresses multiple business issues end-to-end MDM process management with data quality monitoring and assurance pre-built MDM business relevant applications with data stores and workflows These master data management capabilities will aid in moving closer to a best-practice maturity level, delivering tremendous efficiencies and savings as well as revenue growth opportunities as a result of better understanding your customers. Trevor Naidoo is a senior director in Industry Strategy and Insight at Oracle.

Read the article
DBA Best Practices - A Blog Series: Episode 1 - Backups

- by Argenis

This blog post is part of the DBA Best Practices series, on which various topics of concern for daily database operations are discussed. Your feedback and comments are very much welcome, so please drop by the comments section and be sure to leave your thoughts on the subject. Morning Coffee When I was a DBA, the first thing I did when I sat down at my desk at work was checking that all backups had completed successfully. It really was more of a ritual, since I had a dual system in place to check for backup completion: 1) the scheduled agent jobs to back up the databases were set to alert the NOC in failure, and 2) I had a script run from a central server every so often to check for any backup failures. Why the redundancy, you might ask. Well, for one I was once bitten by the fact that database mail doesn't work 100% of the time. Potential causes for failure include issues on the SMTP box that relays your server email, firewall problems, DNS issues, etc. And so to be sure that my backups completed fine, I needed to rely on a mechanism other than having the servers do the taking - I needed to interrogate the servers and ask each one if an issue had occurred. This is why I had a script run every so often. Some of you might have monitoring tools in place like Microsoft System Center Operations Manager (SCOM) or similar 3rd party products that would track all these things for you. But at that moment, we had no resort but to write our own Powershell scripts to do it. Now it goes without saying that if you don't have backups in place, you might as well find another career. Your most sacred job as a DBA is to protect the data from a disaster, and only properly safeguarded backups can offer you peace of mind here. "But, we have a cluster...we don't need backups" Sadly I've heard this line more than I would have liked to. You need to understand that a cluster is comprised of shared storage, and that is precisely your single point of failure. A cluster will protect you from an issue at the Operating System level, and also under an outage of any SQL-related service or dependent devices. But it will most definitely NOT protect you against corruption, nor will it protect you against somebody deleting data from a table - accidentally or otherwise. Backup, fine. How often do I take a backup? The answer to this is something you will hear frequently when working with databases: it depends. What does it depend on? For one, you need to understand how much data your business is willing to lose. This is what's called Recovery Point Objective, or RPO. If you don't know how much data your business is willing to lose, you need to have an honest and realistic conversation about data loss expectations with your customers, internal or external. From my experience, their first answer to the question "how much data loss can you withstand?" will be "zero". In that case, you will need to explain how zero data loss is very difficult and very costly to achieve, even in today's computing environments. Do you want to go ahead and take full backups of all your databases every hour, or even every day? Probably not, because of the impact that taking a full backup can have on a system. That's what differential and transaction log backups are for. Have I answered the question of how often to take a backup? No, and I did that on purpose. You need to think about how much time you have to recover from any event that requires you to restore your databases. This is what's called Recovery Time Objective. Again, if you go ask your customer how long of an outage they can withstand, at first you will get a completely unrealistic number - and that will be your starting point for discussing a solution that is cost effective. The point that I'm trying to get across is that you need to have a plan. This plan needs to be practiced, and tested. Like a football playbook, you need to rehearse the moves you'll perform when the time comes. How often is up to you, and the objective is that you feel better about yourself and the steps you need to follow when emergency strikes. A backup is nothing more than an untested restore Backups are files. Files are prone to corruption. Put those two together and realize how you feel about those backups sitting on that network drive. When was the last time you restored any of those? Restoring your backups on another box - that, by the way, doesn't have to match the specs of your production server - will give you two things: 1) peace of mind, because now you know that your backups are good and 2) a place to offload your consistency checks with DBCC CHECKDB or any of the other DBCC commands like CHECKTABLE or CHECKCATALOG. This is a great strategy for VLDBs that cannot withstand the additional load created by the consistency checks. If you choose to offload your consistency checks to another server though, be sure to run DBCC CHECKDB WITH PHYSICALONLY on the production server, and if you're using SQL Server 2008 R2 SP1 CU4 and above, be sure to enable traceflags 2562 and/or 2549, which will speed up the PHYSICALONLY checks further - you can read more about this enhancement here. Back to the "How Often" question for a second. If you have the disk, and the network latency, and the system resources to do so, why not backup the transaction log often? As in, every 5 minutes, or even less than that? There's not much downside to doing it, as you will have to clear the log with a backup sooner than later, lest you risk running out space on your tlog, or even your drive. The one drawback to this approach is that you will have more files to deal with at restore time, and processing each file will add a bit of extra time to the entire process. But it might be worth that time knowing that you minimized the amount of data lost. Again, test your plan to make sure that it matches your particular needs. Where to back up to? Network share? Locally? SAN volume? This is another topic where everybody has a favorite choice. So, I'll stick to mentioning what I like to do and what I consider to be the best practice in this regard. I like to backup to a SAN volume, i.e., a drive that actually lives in the SAN, and can be easily attached to another server in a pinch, saving you valuable time - you wouldn't need to restore files on the network (slow) or pull out drives out a dead server (been there, done that, it’s also slow!). The key is to have a copy of those backup files made quickly, and, if at all possible, to a remote target on a different datacenter - or even the cloud. There are plenty of solutions out there that can help you put such a solution together. That right there is the first step towards a practical Disaster Recovery plan. But there's much more to DR, and that's material for a different blog post in this series.

Read the article

Search Results

Search found 9439 results on 378 pages for 'telepath needed'.

Page 119/378 | < Previous Page | 115 116 117 118 119 120 121 122 123 124 125 126 | Next Page >

- by Javier Treviño

- by Rick Strahl

- by Gene Eun

- by Adam M.

- by JonWillis

- by Ali Bahrami

- by DrJohn

- by BuckWoody

- by terje

- by Most Valuable Yak (Rob Volk)

- by Newton

- by Simon Cooper

- by Sean

- by Most Valuable Yak (Rob Volk)

- by Your DisplayName here!

- by DrJohn

- by Rob Farley

- by johndoucette

- by david.butler(at)oracle.com

- by Akshay Deep Lamba

- by AjarnMark

- by Maria Colgan

- by david.butler(at)oracle.com

- by Argenis

< Previous Page | 115 116 117 118 119 120 121 122 123 124 125 126 | Next Page >