Search Results

Search found 10711 results on 429 pages for 'ghost blog'.

Page 71/429 | < Previous Page | 67 68 69 70 71 72 73 74 75 76 77 78  | Next Page >

  • IBM "per core" comparisons for SPECjEnterprise2010

    - by jhenning
    I recently stumbled upon a blog entry from Roman Kharkovski (an IBM employee) comparing some SPECjEnterprise2010 results for IBM vs. Oracle. Mr. Kharkovski's blog claims that SPARC delivers half the transactions per core vs. POWER7. Prior to any argument, I should say that my predisposition is to like Mr. Kharkovski, because he says that his blog is intended to be factual; that the intent is to try to avoid marketing hype and FUD tactic; and mostly because he features a picture of himself wearing a bike helmet (me too). Therefore, in a spirit of technical argument, rather than FUD fight, there are a few areas in his comparison that should be discussed. Scaling is not free For any benchmark, if a small system scores 13k using quantity R1 of some resource, and a big system scores 57k using quantity R2 of that resource, then, sure, it's tempting to divide: is  13k/R1 > 57k/R2 ? It is tempting, but not necessarily educational. The problem is that scaling is not free. Building big systems is harder than building small systems. Scoring  13k/R1  on a little system provides no guarantee whatsoever that one can sustain that ratio when attempting to handle more than 4 times as many users. Choosing the denominator radically changes the picture When ratios are used, one can vastly manipulate appearances by the choice of denominator. In this case, lots of choices are available for the resource to be compared (R1 and R2 above). IBM chooses to put cores in the denominator. Mr. Kharkovski provides some reasons for that choice in his blog entry. And yet, it should be noted that the very concept of a core is: arbitrary: not necessarily comparable across vendors; fluid: modern chips shift chip resources in response to load; and invisible: unless you have a microscope, you can't see it. By contrast, one can actually see processor chips with the naked eye, and they are a bit easier to count. If we put chips in the denominator instead of cores, we get: 13161.07 EjOPS / 4 chips = 3290 EjOPS per chip for IBM vs 57422.17 EjOPS / 16 chips = 3588 EjOPS per chip for Oracle The choice of denominator makes all the difference in the appearance. Speaking for myself, dividing by chips just seems to make more sense, because: I can see chips and count them; and I can accurately compare the number of chips in my system to the count in some other vendor's system; and Tthe probability of being able to continue to accurately count them over the next 10 years of microprocessor development seems higher than the probability of being able to accurately and comparably count "cores". SPEC Fair use requirements Speaking as an individual, not speaking for SPEC and not speaking for my employer, I wonder whether Mr. Kharkovski's blog article, taken as a whole, meets the requirements of the SPEC Fair Use rule www.spec.org/fairuse.html section I.D.2. For example, Mr. Kharkovski's footnote (1) begins Results from http://www.spec.org as of 04/04/2013 Oracle SUN SPARC T5-8 449 EjOPS/core SPECjEnterprise2010 (Oracle's WLS best SPECjEnterprise2010 EjOPS/core result on SPARC). IBM Power730 823 EjOPS/core (World Record SPECjEnterprise2010 EJOPS/core result) The questionable tactic, from a Fair Use point of view, is that there is no such metric at the designated location. At www.spec.org, You can find the SPEC metric 57422.17 SPECjEnterprise2010 EjOPS for Oracle and You can also find the SPEC metric 13161.07 SPECjEnterprise2010 EjOPS for IBM. Despite the implication of the footnote, you will not find any mention of 449 nor anything that says 823. SPEC says that you can, under its fair use rule, derive your own values; but it emphasizes: "The context must not give the appearance that SPEC has created or endorsed the derived value." Substantiation and transparency Although SPEC disclaims responsibility for non-SPEC information (section I.E), it says that non-SPEC data and methods should be accurate, should be explained, should be substantiated. Unfortunately, it is difficult or impossible for the reader to independently verify the pricing: Were like units compared to like (e.g. list price to list price)? Were all components (hw, sw, support) included? Were all fees included? Note that when tpc.org shows IBM pricing, there are often items such as "PROCESSOR ACTIVATION" and "MEMORY ACTIVATION". Without the transparency of a detailed breakdown, the pricing claims are questionable. T5 claim for "Fastest Processor" Mr. Kharkovski several times questions Oracle's claim for fastest processor, writing You see, when you publish industry benchmarks, people may actually compare your results to other vendor's results. Well, as we performance people always say, "it depends". If you believe in performance-per-core as the primary way of looking at the world, then yes, the POWER7+ is impressive, spending its chip resources to support up to 32 threads (8 cores x 4 threads). Or, it just might be useful to consider performance-per-chip. Each SPARC T5 chip allows 128 hardware threads to be simultaneously executing (16 cores x 8 threads). The Industry Standard Benchmark that focuses specifically on processor chip performance is SPEC CPU2006. For this very well known and popular benchmark, SPARC T5: provides better performance than both POWER7 and POWER7+, for 1 chip vs. 1 chip, for 8 chip vs. 8 chip, for integer (SPECint_rate2006) and floating point (SPECfp_rate2006), for Peak tuning and for Base tuning. For example, at the 8-chip level, integer throughput (SPECint_rate2006) is: 3750 for SPARC 2170 for POWER7+. You can find the details at the March 2013 BestPerf CPU2006 page SPEC is a trademark of the Standard Performance Evaluation Corporation, www.spec.org. The two specific results quoted for SPECjEnterprise2010 are posted at the URLs linked from the discussion. Results for SPEC CPU2006 were verified at spec.org 1 July 2013, and can be rechecked here.

    Read the article

  • A new SQL, a new Analysis Services, a new Workshop! #ssas #sql2012

    - by Marco Russo (SQLBI)
    One week ago Microsoft SQL Server 2012 finally debuted with a virtual launch event and you can find many intro sessions there (20 minutes each). There is a lot of new content available if you want to learn more about SQL 2012 and in this blog post I’d like to provide a few link to sessions, documents, bits and courses that are available now or very soon. First of all, the release of Analysis Services 2012 has finally released PowerPivot 2012 (many of us called it PowerPivot v2 before this official name) and also the new Data Mining Add-in for Microsoft Office 2010, now available also for Excel 64bit! And, of course, don’t miss the Microsoft SQL Server 2012 Feature Pack, there are a lot of upgrades for both DBAs and developers. I just discovered there is a new LocalDB version of SQL Express that can run in user mode without any setup. Is this the end of SQL CE? But now, back to Analysis Services: if you want some tutorial on Tabular, the Microsoft Virtual Academy has a whole track dedicated to Analysis Services 2012 but you will probably be interested also in the one about Reporting Services 2012. If you think that virtual is good but it’s not enough, there are plenty of conferences in the coming months – these are just those where I and Alberto will deliver some SSAS Tabular presentations: SQLBits X, London, March 29-31, 2012: if you are in London or want a good reason to go, this is the most important SQL Server event in Europe this year, no doubts about it. And not only because of the high number of attendees, but also because there is an impressive number of speakers (excluding me, of course) coming from all over the world. This is an event second only to PASS Summit in Seattle so there are no good reasons to not attend it. Microsoft SQL Server & Business Intelligence Conference 2012, Milan, March 28-29, 2012: this is an Italian conference so the language might be a barrier, but many of us also speak English and the food is good! Just a few seats still available. TechEd North America, Orlando, June 11-14, 2012: you know, this is a big event and it contains everything – if you want to spend a whole day learning the SSAS Tabular model with me and Alberto, don’t miss our pre-conference day “Using BISM Tabular in Microsoft SQL Server Analysis Services 2012” (be careful, it is on June 10, a nice study-Sunday!). TechEd Europe, Amsterdam, June 26-29, 2012: the European version of TechEd provides almost the same content and you don’t have to go overseas. We also run the same pre-conference day “Using BISM Tabular in Microsoft SQL Server Analysis Services 2012” (in this case, it is on June 25, that’s a regular Monday). I and Alberto will also speak at some user group meeting around Europe during… well, we’re going to travel a lot in the next months. In fact, if you want to get a complete training on SSAS Tabular, you should spend two days with us in one of our SSAS Tabular Workshop! We prepared a 2-day seminar, a very intense one, that start from the simple tabular modeling and cover architecture, DAX, query, advanced modeling, security, deployment, optimization, monitoring, relationships with PowerPivot and Multidimensional… Really, there are a lot of stuffs here! We announced the first dates in Europe and also an online edition optimized for America’s time zone: Apr 16-17, 2012 – Amsterdam, Netherlands Apr 26-27, 2012 – Copenhagen, Denmark May 7-8, 2012 – Online for America’s time zone May 14-15, 2012 – Brussels, Belgium May 21-22, 2012 – Oslo, Norway May 24-25, 2012 – Stockholm, Sweden May 28-29, 2012 – London, United Kingdom May 31-Jun 1, 2012 – Milan, Italy (Italian language) Also Chris Webb will join us in this workshop and in every date you can find who is the speaker on the web site. The course is based on our upcoming book, almost 600 pages (!) about SSAS Tabular, an incredible effort that will be available very soon in a preview (rough cuts from O’Reilly) and will be on the shelf in May. I will provide a link to order it as soon as we have one! And if you think that this is not enough… you’re right! Do you know what is the only thing you can do to optimize your Tabular model? Optimize your DAX code. Learning DAX is easy, mastering DAX requires some knowledge… and our DAX Advanced Workshop will provide exactly the required content. Public classes will be available later this year, by now we just deliver it on demand.

    Read the article

  • Stand-Up Desk 2012 Update

    - by BuckWoody
    One of the more popular topics here on my technical blog doesn't have to do with technology, per-se - it's about the choice I made to go to a stand-up desk work environment. If you're interested in the history of those, check here: Stand-Up Desk Part One Stand-Up Desk Part Two I have made some changes and I was asked to post those here.Yes, I'm still standing - I think the experiment has worked well, so I'm continuing to work this way. I've become so used to it that I notice when I sit for a long time. If I'm flying, or driving a long way, or have long meetings, I take breaks to stand up and move around. That being said, I don't stand as much as I did. I started out by standing the entire day - which did not end well. As you can read in my second post, I found that sitting down for a few minutes each hour worked out much better. And over time I would say that I now stand about 70-80% of the day, depending on the day. Some days I don't even notice I'm standing, so I don't sit as often. Other days I find that I really tire quickly - so I sit more often. But in both cases, I stand more than I sit. In the first post you can read about how I used a simple coffee-table from Ikea to elevate my desktop to the right height. I then adjusted the height where I stand by using a small plastic square and some carpet. Over time I found this did not work as well as I'd like. The primary reason is that the front of these are at the same depth - so my knees would hit the desk or table when I sat down. Also, the desk was at a certain height, and I had to adjust, rather than the other way around.  Also, I like a lot of surface area on top of a desk - almost more of a table. Routing cables and wiring was a pain, and of course moving it was out of the question.   So I've changed what I use. I found a perfect solution for what I was looking for - industrial wire shelving: I bought one, built only half of it (for the right height I wanted) and arranged the shelves the way I wanted. I then got a 5'x4' piece of wood from Lowes, and mounted it to where the top was balanced, but had an over-hang  I could get my knees under easily.My wife sewed a piece of fake-leather for the top. This arrangement provides the following benefits: Very strong Rolls easily, wheels can lock to prevent rolling Long, wide shelves Wire-frame allows me to route any kind of wiring and other things all over the desk I plugged in my UPS and ran it's longer power-cable to the wall outlet. I then ran the router's LAN connection along that wire, and covered both with a large insulation sleeve. I then plugged in everything to the UPS, and routed all the wiring. I can now roll the desk almost anywhere in the room so that I can record, look out the window, get closer to or farther away from the door and more. I put a few boxes on the shelves as "drawers" and tidied that part up. Even my printer fits on a shelf. Laser-dog not included - some assembly required In the second post you can read about the bar-stool I purchased from Target for the desk. I cheaped-out on this one, and it proved to be a bad choice. Because I had to raise it so high, and was constantly sitting on it and then standing up, the gas-cylinder in it just gave out. So it became a very short stool that I ended up getting rid of. In the end, this one from Ikea proved to be a better choice: And so this arrangement is working out perfectly. I'm finding myself VERY productive this way. I hope these posts help you if you decide to try working at a stand-up desk. Although I was skeptical at first, I've found it to be a very healthy, easy way to code, design and especially present over a web-cam. It's natural to stand to speak when you're presenting, and it feels more energetic than sitting down to talk to others.

    Read the article

  • Sort Data in Windows Phone using Collection View Source

    - by psheriff
    When you write a Windows Phone application you will most likely consume data from a web service somewhere. If that service returns data to you in a sort order that you do not want, you have an easy alternative to sort the data without writing any C# or VB code. You use the built-in CollectionViewSource object in XAML to perform the sorting for you. This assumes that you can get the data into a collection that implements the IEnumerable or IList interfaces.For this example, I will be using a simple Product class with two properties, and a list of Product objects using the Generic List class. Try this out by creating a Product class as shown in the following code:public class Product {  public Product(int id, string name)   {    ProductId = id;    ProductName = name;  }  public int ProductId { get; set; }  public string ProductName { get; set; }}Create a collection class that initializes a property called DataCollection with some sample data as shown in the code below:public class Products : List<Product>{  public Products()  {    InitCollection();  }  public List<Product> DataCollection { get; set; }  List<Product> InitCollection()  {    DataCollection = new List<Product>();    DataCollection.Add(new Product(3,        "PDSA .NET Productivity Framework"));    DataCollection.Add(new Product(1,        "Haystack Code Generator for .NET"));    DataCollection.Add(new Product(2,        "Fundamentals of .NET eBook"));    return DataCollection;  }}Notice that the data added to the collection is not in any particular order. Create a Windows Phone page and add two XML namespaces to the Page.xmlns:scm="clr-namespace:System.ComponentModel;assembly=System.Windows"xmlns:local="clr-namespace:WPSortData"The 'local' namespace is an alias to the name of the project that you created (in this case WPSortData). The 'scm' namespace references the System.Windows.dll and is needed for the SortDescription class that you will use for sorting the data. Create a phone:PhoneApplicationPage.Resources section in your Windows Phone page that looks like the following:<phone:PhoneApplicationPage.Resources>  <local:Products x:Key="products" />  <CollectionViewSource x:Key="prodCollection"      Source="{Binding Source={StaticResource products},                       Path=DataCollection}">    <CollectionViewSource.SortDescriptions>      <scm:SortDescription PropertyName="ProductName"                           Direction="Ascending" />    </CollectionViewSource.SortDescriptions>  </CollectionViewSource></phone:PhoneApplicationPage.Resources>The first line of code in the resources section creates an instance of your Products class. The constructor of the Products class calls the InitCollection method which creates three Product objects and adds them to the DataCollection property of the Products class. Once the Products object is instantiated you now add a CollectionViewSource object in XAML using the Products object as the source of the data to this collection. A CollectionViewSource has a SortDescriptions collection that allows you to specify a set of SortDescription objects. Each object can set a PropertyName and a Direction property. As you see in the above code you set the PropertyName equal to the ProductName property of the Product object and tell it to sort in an Ascending direction.All you have to do now is to create a ListBox control and set its ItemsSource property to the CollectionViewSource object. The ListBox displays the data in sorted order by ProductName and you did not have to write any LINQ queries or write other code to sort the data!<ListBox    ItemsSource="{Binding Source={StaticResource prodCollection}}"   DisplayMemberPath="ProductName" />SummaryIn this blog post you learned that you can sort any data without having to change the source code of where the data comes from. Simply feed the data into a CollectionViewSource in XAML and set some sort descriptions in XAML and the rest is done for you! This comes in very handy when you are consuming data from a source where the data is given to you and you do not have control over the sorting.NOTE: You can download this article and many samples like the one shown in this blog entry at my website. http://www.pdsa.com/downloads. Select “Tips and Tricks”, then “Sort Data in Windows Phone using Collection View Source” from the drop down list.Good Luck with your Coding,Paul Sheriff** SPECIAL OFFER FOR MY BLOG READERS **We frequently offer a FREE gift for readers of my blog. Visit http://www.pdsa.com/Event/Blog for your FREE gift!

    Read the article

  • SSIS Reporting Pack v0.4 – Execution Report updated

    - by jamiet
    SSIS Reporting Pack is a suite of reports that I maintain at http://ssisreportingpack.codeplex.com/ that provide visualisation over the SSIS Catalog in SQL Server 2012 and attempt to add value over the reports that ship in the box. Work on the reports has stalled (my last SSIS Reporting Pack blog post was on 4th September 2011) as I’ve had rather more important things going on my life of late however I have recently checked-in a fix that couldn’t really be delayed. I discovered a problem with the Execution report that was causing the report to effectively hang, it was caused by this bit of SQL hidden away in the report definition: [generated_executables] AS (   SELECT  [new_executable].[execution_path],[new_executable].[parent_execution_path]   FROM    (           SELECT  [execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_close_square] + 1)           ,       [parent_execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_open_square])           FROM    (                   SELECT  [execution_path]                   ,       [char_index_open_square] = CHARINDEX('[',REVERSE([execution_path]),1)                   ,       [char_index_close_square] = CHARINDEX(']',REVERSE([execution_path]),1)                   ,       [length_exec_path] = LEN([execution_path])                   FROM    [exec_stats] es                   WHERE   execution_path LIKE '%\[%]%'  ESCAPE '\'                   )AS [loop_iteration]           ) AS [new_executable]   GROUP   BY [new_executable].[execution_path],[new_executable].[parent_execution_path]) It was there because SSIS does not currently treat a loop iteration as an executable yet I figured there was still value in being able to view it as such – this SQL essentially “invents” new executables for those loop iterations; its what enabled the following visualisation: where each of the three iterations of a For Each Loop called “FEL Loop over top performing regions” appear in the report. Unfortunately, as I alluded, this could under certain circumstances (most likely when there were many loop iterations) cause the report to hang as it waited for the results to be constructed and returned. The change that I have made eradicates this generation of “fake” executables and thus produces this visualisation instead: Notice that the three “children” of the For Each Loop are no longer the three iterations but actually the task (“EPT Call Data Export Package”) contained within that For Each Loop. The problem here is of course that there is no longer a visual distinction between those three iterations; I have instead made the full execution path viewable via a tooltip:   If you preferred the “old” way of presenting this information and are happy to put up with the performance degradation then I have kept the old version of the report hanging around in the reporting pack as “execution loop with iterations” however none of the other reports link to it so you will have to browse to it manually if you want to use it. Please let me know if you ARE using it – I would be very interested to hear about your experiences.   The last change to make you aware of in the execution report is that by default I no longer show OnPreValidate or OnPostValidate messages as I consider them to be superfluous and only serve to clutter up the results. If you want to put them back, well, its open source so go right ahead!   The latest release of SSIS Reporting Pack that contains all of these changes is v0.4 and can be downloaded from http://ssisreportingpack.codeplex.com/releases/view/88178   Feedback on all of the above changes would be very much appreciated. @Jamiet

    Read the article

  • Building a plug-in for Windows Live Writer

    - by mbcrump
    This tutorial will show you how to build a plug-in for Windows Live Writer. Windows Live Writer is a blogging tool that Microsoft provides for free. It includes an open API for .NET developers to create custom plug-ins. In this tutorial, I will show you how easy it is to build one. Open VS2008 or VS2010 and create a new project. Set the target framework to 2.0, Application Type to Class Library and give it a name. In this tutorial, we are going to create a plug-in that generates a twitter message with your blog post name and a TinyUrl link to the blog post.  It will do all of this automatically after you publish your post. Once, we have a new projected created. We need to setup the references. Add a reference to the WindowsLive.Writer.Api.dll located in the C:\Program Files (x86)\Windows Live\Writer\ folder, if you are using X64 version of Windows. You will also need to add a reference to System.Windows.Forms System.Web from the .NET tab as well. Once that is complete, add your “using” statements so that it looks like whats shown below: Live Writer Plug-In "Using" using System; using System.Collections.Generic; using System.Text; using WindowsLive.Writer.Api; using System.Web; Now, we are going to setup some build events to make it easier to test our custom class. Go into the Properties of your project and select Build Events, click edit the Post-build and copy/paste the following line: XCOPY /D /Y /R "$(TargetPath)" "C:\Program Files (x86)\Windows Live\Writer\Plugins\" Your screen should look like the one pictured below: Next, we are going to launch an external program on debug. Click the debug tab and enter C:\Program Files (x86)\Windows Live\Writer\WindowsLiveWriter.exe Your screen should look like the one pictured below:   Now we have a blank project and we need to add some code. We start with adding the attributes for the Live Writer Plugin. Before we get started creating the Attributes, we need to create a GUID. This GUID will uniquely identity our plug-in. So, to create a GUID follow the steps in VS2008/2010. Click Tools from the VS Menu ->Create GUID It will generate a GUID like the one listed below: GUID <Guid("56ED8A2C-F216-420D-91A1-F7541495DBDA")> We only want what’s inside the quotes, so your final product should be: "56ED8A2C-F216-420D-91A1-F7541495DBDA". Go ahead and paste this snipped into your class just above the public class. Live Writer Plug-In Attributes [WriterPlugin("56ED8A2C-F216-420D-91A1-F7541495DBDA",    "Generate Twitter Message",    Description = "After your new post has been published, this plug-in will attempt to generate a Twitter status messsage with the Title and TinyUrl link.",    HasEditableOptions = false,    Name = "Generate Twitter Message",    PublisherUrl = "http://michaelcrump.net")] [InsertableContentSource("Generate Twitter Message")] So far, it should look like the following: Next, we need to implement the PublishNotifcationHook class and override the OnPostPublish. I’m not going to dive into what the code is doing as you should be able to follow pretty easily. The code below is the entire code used in the project. PublishNotificationHook public class Class1 :  PublishNotificationHook  {      public override void OnPostPublish(System.Windows.Forms.IWin32Window dialogOwner, IProperties properties, IPublishingContext publishingContext, bool publish)      {          if (!publish) return;          if (string.IsNullOrEmpty(publishingContext.PostInfo.Permalink))          {              PluginDiagnostics.LogError("Live Tweet didn't execute, due to blank permalink");          }          else          {                var strBlogName = HttpUtility.UrlEncode("#blogged : " + publishingContext.PostInfo.Title);  //Blog Post Title              var strUrlFinal = getTinyUrl(publishingContext.PostInfo.Permalink); //Blog Permalink URL Converted to TinyURL              System.Diagnostics.Process.Start("http://twitter.com/home?status=" + strBlogName + strUrlFinal);            }      } We are going to go ahead and create a method to create the short url (tinyurl). TinyURL Helper Method private static string getTinyUrl(string url) {     var cmpUrl = System.Globalization.CultureInfo.InvariantCulture.CompareInfo;     if (!cmpUrl.IsPrefix(url, "http://tinyurl.com"))     {         var address = "http://tinyurl.com/api-create.php?url=" + url;         var client = new System.Net.WebClient();         return (client.DownloadString(address));     }     return (url); } Go ahead and build your project, it should have copied the .DLL into the Windows Live Writer Plugin Directory. If it did not, then you will want to check your configuration. Once that is complete, open Windows Live Writer and select Tools-> Options-> Plug-ins and enable your plug-in that you just created. Your screen should look like the one pictured below: Go ahead and click OK and publish your blog post. You should get a pop-up with the following: Hit OK and It should open a Twitter and either ask for a login or fill in your status as shown below:   That should do it, you can do so many other things with the API. I suggest that if you want to build something really useful consult the MSDN pages. This plug-in that I created was perfect for what I needed and I hope someone finds it useful.

    Read the article

  • SSIS Reporting Pack v0.4 – Execution Report updated

    - by jamiet
    SSIS Reporting Pack is a suite of reports that I maintain at http://ssisreportingpack.codeplex.com/ that provide visualisation over the SSIS Catalog in SQL Server 2012 and attempt to add value over the reports that ship in the box. Work on the reports has stalled (my last SSIS Reporting Pack blog post was on 4th September 2011) as I’ve had rather more important things going on my life of late however I have recently checked-in a fix that couldn’t really be delayed. I discovered a problem with the Execution report that was causing the report to effectively hang, it was caused by this bit of SQL hidden away in the report definition: [generated_executables] AS (   SELECT  [new_executable].[execution_path],[new_executable].[parent_execution_path]   FROM    (           SELECT  [execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_close_square] + 1)           ,       [parent_execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_open_square])           FROM    (                   SELECT  [execution_path]                   ,       [char_index_open_square] = CHARINDEX('[',REVERSE([execution_path]),1)                   ,       [char_index_close_square] = CHARINDEX(']',REVERSE([execution_path]),1)                   ,       [length_exec_path] = LEN([execution_path])                   FROM    [exec_stats] es                   WHERE   execution_path LIKE '%\[%]%'  ESCAPE '\'                   )AS [loop_iteration]           ) AS [new_executable]   GROUP   BY [new_executable].[execution_path],[new_executable].[parent_execution_path]) It was there because SSIS does not currently treat a loop iteration as an executable yet I figured there was still value in being able to view it as such – this SQL essentially “invents” new executables for those loop iterations; its what enabled the following visualisation: where each of the three iterations of a For Each Loop called “FEL Loop over top performing regions” appear in the report. Unfortunately, as I alluded, this could under certain circumstances (most likely when there were many loop iterations) cause the report to hang as it waited for the results to be constructed and returned. The change that I have made eradicates this generation of “fake” executables and thus produces this visualisation instead: Notice that the three “children” of the For Each Loop are no longer the three iterations but actually the task (“EPT Call Data Export Package”) contained within that For Each Loop. The problem here is of course that there is no longer a visual distinction between those three iterations; I have instead made the full execution path viewable via a tooltip:   If you preferred the “old” way of presenting this information and are happy to put up with the performance degradation then I have kept the old version of the report hanging around in the reporting pack as “execution loop with iterations” however none of the other reports link to it so you will have to browse to it manually if you want to use it. Please let me know if you ARE using it – I would be very interested to hear about your experiences.   The last change to make you aware of in the execution report is that by default I no longer show OnPreValidate or OnPostValidate messages as I consider them to be superfluous and only serve to clutter up the results. If you want to put them back, well, its open source so go right ahead!   The latest release of SSIS Reporting Pack that contains all of these changes is v0.4 and can be downloaded from http://ssisreportingpack.codeplex.com/releases/view/88178   Feedback on all of the above changes would be very much appreciated. @Jamiet

    Read the article

  • Using Subjects to Deploy Queries Dynamically

    - by Roman Schindlauer
    In the previous blog posting, we showed how to construct and deploy query fragments to a StreamInsight server, and how to re-use them later. In today’s posting we’ll integrate this pattern into a method of dynamically composing a new query with an existing one. The construct that enables this scenario in StreamInsight V2.1 is a Subject. A Subject lets me create a junction element in an existing query that I can tap into while the query is running. To set this up as an end-to-end example, let’s first define a stream simulator as our data source: var generator = myApp.DefineObservable(     (TimeSpan t) => Observable.Interval(t).Select(_ => new SourcePayload())); This ‘generator’ produces a new instance of SourcePayload with a period of t (system time) as an IObservable. SourcePayload happens to have a property of type double as its payload data. Let’s also define a sink for our example—an IObserver of double values that writes to the console: var console = myApp.DefineObserver(     (string label) => Observer.Create<double>(e => Console.WriteLine("{0}: {1}", label, e)))     .Deploy("ConsoleSink"); The observer takes a string as parameter which is used as a label on the console, so that we can distinguish the output of different sink instances. Note that we also deploy this observer, so that we can retrieve it later from the server from a different process. Remember how we defined the aggregation as an IQStreamable function in the previous article? We will use that as well: var avg = myApp     .DefineStreamable((IQStreamable<SourcePayload> s, TimeSpan w) =>         from win in s.TumblingWindow(w)         select win.Avg(e => e.Value))     .Deploy("AverageQuery"); Then we define the Subject, which acts as an observable sequence as well as an observer. Thus, we can feed a single source into the Subject and have multiple consumers—that can come and go at runtime—on the other side: var subject = myApp.CreateSubject("Subject", () => new Subject<SourcePayload>()); Subject are always deployed automatically. Their name is used to retrieve them from a (potentially) different process (see below). Note that the Subject as we defined it here doesn’t know anything about temporal streams. It is merely a sequence of SourcePayloads, without any notion of StreamInsight point events or CTIs. So in order to compose a temporal query on top of the Subject, we need to 'promote' the sequence of SourcePayloads into an IQStreamable of point events, including CTIs: var stream = subject.ToPointStreamable(     e => PointEvent.CreateInsert<SourcePayload>(e.Timestamp, e),     AdvanceTimeSettings.StrictlyIncreasingStartTime); In a later posting we will show how to use Subjects that have more awareness of time and can be used as a junction between QStreamables instead of IQbservables. Having turned the Subject into a temporal stream, we can now define the aggregate on this stream. We will use the IQStreamable entity avg that we defined above: var longAverages = avg(stream, TimeSpan.FromSeconds(5)); In order to run the query, we need to bind it to a sink, and bind the subject to the source: var standardQuery = longAverages     .Bind(console("5sec average"))     .With(generator(TimeSpan.FromMilliseconds(300)).Bind(subject)); Lastly, we start the process: standardQuery.Run("StandardProcess"); Now we have a simple query running end-to-end, producing results. What follows next is the crucial part of tapping into the Subject and adding another query that runs in parallel, using the same query definition (the “AverageQuery”) but with a different window length. We are assuming that we connected to the same StreamInsight server from a different process or even client, and thus have to retrieve the previously deployed entities through their names: // simulate the addition of a 'fast' query from a separate server connection, // by retrieving the aggregation query fragment // (instead of simply using the 'avg' object) var averageQuery = myApp     .GetStreamable<IQStreamable<SourcePayload>, TimeSpan, double>("AverageQuery"); // retrieve the input sequence as a subject var inputSequence = myApp     .GetSubject<SourcePayload, SourcePayload>("Subject"); // retrieve the registered sink var sink = myApp.GetObserver<string, double>("ConsoleSink"); // turn the sequence into a temporal stream var stream2 = inputSequence.ToPointStreamable(     e => PointEvent.CreateInsert<SourcePayload>(e.Timestamp, e),     AdvanceTimeSettings.StrictlyIncreasingStartTime); // apply the query, now with a different window length var shortAverages = averageQuery(stream2, TimeSpan.FromSeconds(1)); // bind new sink to query and run it var fastQuery = shortAverages     .Bind(sink("1sec average"))     .Run("FastProcess"); The attached solution demonstrates the sample end-to-end. Regards, The StreamInsight Team

    Read the article

  • wrong return value with jquery ajax and codeigniter

    - by matthew
    I do not know if this is specifically a jquery problem, actually I think it has to mostly do with my logic in the php code. What Im trying to do is make a voting system that when the user clicks on the vote up or vote down link in the web page, it triggers an ajax call to a php function that first updates the database with with the required value, on success of the database updating the another function is called just to get the required updated html for the that particular post that the user has voted on. (hope I havnt lost you). The problem I think deals with specifically with this one line of code. When I make the call it only returns the value of $row-beer_down and completly ignores everything else in that line. Funny thing is the same php code to display the html view works perfectly before the ajax function updates it. echo "<p>Score " . $row->beer_up + $row->beer_down . "</p>"; so here is the code to hope you can help as I have absolutely no idea how to fix this. here is the view file where it generates the page. This part is the query ajax function. <script type="text/javascript"> $(function() { $(".vote").click(function(){ var id = $(this).attr("id"); var name = $(this).attr("name"); var dataString = 'id='+ id ; var parent = $(this); if(name=='up') { $.ajax({ type: "POST", url: "http://127.0.0.1/CodeIgniter/blog/add_vote/" + id, data: dataString, cache: false, success: function(html) { //parent.html(html); $("." + id).load('http://127.0.0.1/CodeIgniter/blog/get_post/' + id).fadeIn("slow"); } }); } else { $.ajax({ type: "POST", url: "http://127.0.0.1/CodeIgniter/blog/minus_vote/" + id, data: dataString, cache: false, success: function(html) { //parent.html(html); $("." + id).load('http://127.0.0.1/CodeIgniter/blog/get_post/' + id).fadeIn("slow"); } }); } return false; }); }); </script> Here is the html and php part of the page to display the post. div id="post_container"> <?php //echo $this->table->generate($records); ?> <?php foreach($records->result() as $row) { ?> <?php echo "<div id=\"post\" class=\"" . $row->id . "\">"; ?> <h2><?php echo $row->id ?></h2> <?php echo "<img src=\"" . base_url() . $dir . $row->account_id . "/" . $row->file_name . "\">" ?> <p><?php echo $row->content ?></p> <p><?php echo $row->user_name ?> On <?php echo $row->time ?></p> <p>Score <?php echo $row->beer_up + $row->beer_down ?></p> <?php echo anchor('blog/add_vote/' . $row->id, 'Up Vote', array('class' => 'vote', 'id' => $row->id, 'name' => 'up')); echo anchor('blog/minus_vote/' . $row->id, 'Down Vote', array('class' => 'vote', 'id' => $row->id, 'name' => 'down')); echo anchor('blog/comments/' . $row->id, 'View Comments'); ?> </div> <?php } ?> here is the function the ajax calls when it is successfull: function get_post() { $this->db->where('id', $this->uri->segment(3)); $records = $this->db->get('post'); $dir = "/uploads/user_uploads/"; foreach($records->result() as $row) { echo "<div id=\"post\" class=\"" . $row->id . "\">"; echo "<h2>" . $row->id . "</h2>"; echo "<img src=\"" . base_url() . $dir . $row->account_id . "/" . $row->file_name . "\">"; echo "<p>" . $row->content . "</p>"; echo "<p>" . $row->user_name . " On " . $row->time . "</p>"; echo "<p>Score " . $row->beer_up + $row->beer_down . "</p>"; echo "<p>Up score" . $row->beer_up . "beer down" . $row->beer_down . "</p>"; echo anchor('blog/comments/' . $row->id, 'View Comments'); echo "</div>"; }

    Read the article

  • BNF – how to read syntax?

    - by Piotr Rodak
    A few days ago I read post of Jen McCown (blog) about her idea of blogging about random articles from Books Online. I think this is a great idea, even if Jen says that it’s not exciting or sexy. I noticed that many of the questions that appear on forums and other media arise from pure fact that people asking questions didn’t bother to read and understand the manual – Books Online. Jen came up with a brilliant, concise acronym that describes very well the category of posts about Books Online – RTFM365. I take liberty of tagging this post with the same acronym. I often come across questions of type – ‘Hey, i am trying to create a table, but I am getting an error’. The error often says that the syntax is invalid. 1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT DEFAULT Guid_Default NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) ); 5 The answer is usually(1), ‘Ok, let me check it out.. Ah yes – you have to put name of the DEFAULT constraint before the type of constraint: 1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) ); Why many people stumble on syntax errors? Is the syntax poorly documented? No, the issue is, that correct syntax of the CREATE TABLE statement is documented very well in Books Online and is.. intimidating. Many people can be taken aback by the rather complex block of code that describes all intricacies of the statement. However, I don’t know better way of defining syntax of the statement or command. The notation that is used to describe syntax in Books Online is a form of Backus-Naur notatiion, called BNF for short sometimes. This is a notation that was invented around 50 years ago, and some say that even earlier, around 400 BC – would you believe? Originally it was used to define syntax of, rather ancient now, ALGOL programming language (in 1950’s, not in ancient India). If you look closer at the definition of the BNF, it turns out that the principles of this syntax are pretty simple. Here are a few bullet points: italic_text is a placeholder for your identifier <italic_text_in_angle_brackets> is a definition which is described further. [everything in square brackets] is optional {everything in curly brackets} is obligatory everything | separated | by | operator is an alternative ::= “assigns” definition to an identifier Yes, it looks like these six simple points give you the key to understand even the most complicated syntax definitions in Books Online. Books Online contain an article about syntax conventions – have you ever read it? Let’s have a look at fragment of the CREATE TABLE statement: 1 CREATE TABLE 2 [ database_name . [ schema_name ] . | schema_name . ] table_name 3 ( { <column_definition> | <computed_column_definition> 4 | <column_set_definition> } 5 [ <table_constraint> ] [ ,...n ] ) 6 [ ON { partition_scheme_name ( partition_column_name ) | filegroup 7 | "default" } ] 8 [ { TEXTIMAGE_ON { filegroup | "default" } ] 9 [ FILESTREAM_ON { partition_scheme_name | filegroup 10 | "default" } ] 11 [ WITH ( <table_option> [ ,...n ] ) ] 12 [ ; ] Let’s look at line 2 of the above snippet: This line uses rules 3 and 5 from the list. So you know that you can create table which has specified one of the following. just name – table will be created in default user schema schema name and table name – table will be created in specified schema database name, schema name and table name – table will be created in specified database, in specified schema database name, .., table name – table will be created in specified database, in default schema of the user. Note that this single line of the notation describes each of the naming schemes in deterministic way. The ‘optionality’ of the schema_name element is nested within database_name.. section. You can use either database_name and optional schema name, or just schema name – this is specified by the pipe character ‘|’. The error that user gets with execution of the first script fragment in this post is as follows: Msg 156, Level 15, State 1, Line 2 Incorrect syntax near the keyword 'DEFAULT'. Ok, let’s have a look how to find out the correct syntax. Line number 3 of the BNF fragment above contains reference to <column_definition>. Since column_definition is in angle brackets, we know that this is a reference to notion described further in the code. And indeed, the very next fragment of BNF contains syntax of the column definition. 1 <column_definition> ::= 2 column_name <data_type> 3 [ FILESTREAM ] 4 [ COLLATE collation_name ] 5 [ NULL | NOT NULL ] 6 [ 7 [ CONSTRAINT constraint_name ] DEFAULT constant_expression ] 8 | [ IDENTITY [ ( seed ,increment ) ] [ NOT FOR REPLICATION ] 9 ] 10 [ ROWGUIDCOL ] [ <column_constraint> [ ...n ] ] 11 [ SPARSE ] Look at line 7 in the above fragment. It says, that the column can have a DEFAULT constraint which, if you want to name it, has to be prepended with [CONSTRAINT constraint_name] sequence. The name of the constraint is optional, but I strongly recommend you to make the effort of coming up with some meaningful name yourself. So the correct syntax of the CREATE TABLE statement from the beginning of the article is like this: 1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) ); That is practically everything you should know about BNF. I encourage you to study the syntax definitions for various statements and commands in Books Online, you can find really interesting things hidden there. Technorati Tags: SQL Server,t-sql,BNF,syntax   (1) No, my answer usually is a question – ‘What error message? What does it say?’. You’d be surprised to know how many people think I can go through time and space and look at their screen at the moment they received the error.

    Read the article

  • SQL Server and Hyper-V Dynamic Memory Part 3

    - by SQLOS Team
    In parts 1 and 2 of this series we looked at the basics of Hyper-V Dynamic Memory and SQL Server memory management. In this part Serdar looks at configuration guidelines for SQL Server memory management. Part 3: Configuration Guidelines for Hyper-V Dynamic Memory and SQL Server Now that we understand SQL Server Memory Management and Hyper-V Dynamic Memory basics, let’s take a look at general configuration guidelines in order to utilize benefits of Hyper-V Dynamic Memory in your SQL Server VMs. Requirements Host Operating System Requirements Hyper-V Dynamic Memory feature is introduced with Windows Server 2008 R2 SP1. Therefore in order to use Dynamic Memory for your virtual machines, you need to have Windows Server 2008 R2 SP1 or Microsoft Hyper-V Server 2008 R2 SP1 in your Hyper-V host. Guest Operating System Requirements In addition to this Dynamic Memory is only supported in Standard, Web, Enterprise and Datacenter editions of windows running inside VMs. Make sure that your VM is running one of these editions. For additional requirements on each operating system see “Dynamic Memory Configuration Guidelines” here. SQL Server Requirements All versions of SQL Server support Hyper-V Dynamic Memory. However, only certain editions of SQL Server are aware of dynamically changing system memory. To have a truly dynamic environment for your SQL Server VMs make sure that you are running one of the SQL Server editions listed below: ·         SQL Server 2005 Enterprise ·         SQL Server 2008 Enterprise / Datacenter Editions ·         SQL Server 2008 R2 Enterprise / Datacenter Editions Configuration guidelines for other versions of SQL Server are covered below in the FAQ section. Guidelines for configuring Dynamic Memory Parameters Here is how to configure Dynamic Memory for your SQL VMs in a nutshell: Hyper-V Dynamic Memory Parameter Recommendation Startup RAM 1 GB + SQL Min Server Memory Maximum RAM > SQL Max Server Memory Memory Buffer % 5 Memory Weight Based on performance needs   Startup RAM In order to ensure that your SQL Server VMs can start correctly, ensure that Startup RAM is higher than configured SQL Min Server Memory for your VMs. Otherwise SQL Server service will need to do paging in order to start since it will not be able to see enough memory during startup. Also note that Startup Memory will always be reserved for your VMs. This will guarantee a certain level of performance for your SQL Servers, however setting this too high will limit the consolidation benefits you’ll get out of your virtualization environment. Maximum RAM This one is obvious. If you’ve configured SQL Max Server Memory for your SQL Server, make sure that Dynamic Memory Maximum RAM configuration is higher than this value. Otherwise your SQL Server will not grow to memory values higher than the value configured for Dynamic Memory. Memory Buffer % Memory buffer configuration is used to provision file cache to virtual machines in order to improve performance. Due to the fact that SQL Server is managing its own buffer pool, Memory Buffer setting should be configured to the lowest value possible, 5%. Configuring a higher memory buffer will prevent low resource notifications from Windows Memory Manager and it will prevent reclaiming memory from SQL Server VMs. Memory Weight Memory weight configuration defines the importance of memory to a VM. Configure higher values for the VMs that have higher performance requirements. VMs with higher memory weight will have more memory under high memory pressure conditions on your host. Questions and Answers Q1 – Which SQL Server memory model is best for Dynamic Memory? The best SQL Server model for Dynamic Memory is “Locked Page Memory Model”. This memory model ensures that SQL Server memory is never paged out and it’s also adaptive to dynamically changing memory in the system. This will be extremely useful when Dynamic Memory is attempting to remove memory from SQL Server VMs ensuring no SQL Server memory is paged out. You can find instructions on configuring “Locked Page Memory Model” for your SQL Servers here. Q2 – What about other SQL Server Editions, how should I configure Dynamic Memory for them? Other editions of SQL Server do not adapt to dynamically changing environments. They will determine how much memory they should allocate during startup and don’t change this value afterwards. Therefore make sure that you configure a higher startup memory for your VM because that will be all the memory that SQL Server utilize Tune Maximum Memory and Memory Buffer based on the other workloads running on the system. If there are no other workloads consider using Static Memory for these editions. Q3 – What if I have multiple SQL Server instances in a VM? Having multiple SQL Server instances in a VM is not a general recommendation for predictable performance, manageability and isolation. In order to achieve a predictable behavior make sure that you configure SQL Min Server Memory and SQL Max Server Memory for each instance in the VM. And make sure that: ·         Dynamic Memory Startup Memory is greater than the sum of SQL Min Server Memory values for the instances in the VM ·         Dynamic Memory Maximum Memory is greater than the sum of SQL Max Server Memory values for the instances in the VM Q4 – I’m using Large Page Memory Model for my SQL Server. Can I still use Dynamic Memory? The short answer is no. SQL Server does not dynamically change its memory size when configured with Large Page Memory Model. In virtualized environments Hyper-V provides large page support by default. Most of the time, Large Page Memory Model doesn’t bring any benefits to a SQL Server if it’s running in virtualized environments. Q5 – How do I monitor SQL performance when I’m trying Dynamic Memory on my VMs? Use the performance counters below to monitor memory performance for SQL Server: Process - Working Set: This counter is available in the VM via process performance counters. It represents the actual amount of physical memory being used by SQL Server process in the VM. SQL Server – Buffer Cache Hit Ratio: This counter is available in the VM via SQL Server counters. This represents the paging being done by SQL Server. A rate of 90% or higher is desirable. Conclusion These blog posts are a quick start to a story that will be developing more in the near future. We’re still continuing our testing and investigations to provide more detailed configuration guidelines with example performance numbers with a white paper in the upcoming months. Now it’s time to give SQL Server and Hyper-V Dynamic Memory a try. Use this guidelines to kick-start your environment. See what you think about it and let us know of your experiences. - Serdar Sutay Originally posted at http://blogs.msdn.com/b/sqlosteam/

    Read the article

  • Print SSRS Report / PDF automatically from SQL Server agent or Windows Service

    - by Jeremy Ramos
    Originally posted on: http://geekswithblogs.net/JeremyRamos/archive/2013/10/22/print-ssrs-report--pdf-from-sql-server-agent-or.aspxI have turned the Web upside-down to find a solution to this considering the least components and least maintenance as possible to achieve automated printing of an SSRS report. This is for the reason that we do not have a full software development team to maintain an app and we have to minimize the support overhead for the support team.Here is my setup:SQL Server 2008 R2 in Windows Server 2008 R2PDF format reports generated by SSRS Reports subscriptions to a Windows File ShareNetwork printerColoured reports with logo and brandingI have found and tested the following solutions to no avail:ProsConsCalling Adobe Acrobat Reader exe: "C:\Program Files (x86)\Adobe\Reader 11.0\Reader\acroRd32.exe" /n /s /o /h /t "C:\temp\print.pdf" \\printserver\printername"Very simple optionAdobe Acrobat reader requires to launch the GUI to send a job to a printer. Hence, this option cannot be used when printing from a service.Calling Adobe Acrobat Reader exe as a process from a .NET console appA bit harder than above, but still a simple solutionSame as cons abovePowershell script(Start-Process -FilePath "C:\temp\print.pdf" -Verb Print)Very simple optionUses default PDF client in quiet mode to Print, but also requires an active session.    Foxit ReaderVery simple optionRequires GUI same as Adobe Acrobat Reader Using the Reporting Services Web service to run and stream the report to an image object and then passed to the printerQuite complexThis is what we're trying to avoid  After pulling my hair out for two days, testing and evaluating the above solutions, I ended up learning more about printers (more than ever in my entire life) and how printer drivers work with PostScripts. I then bumped on to a PostScript interpreter called GhostScript (http://www.ghostscript.com/) and then the solution starts to get clearer and clearer.I managed to achieve a solution (maybe not be the simplest but efficient enough to achieve the least-maintenance-least-components goal) in 3-simple steps:Install GhostScript (http://www.ghostscript.com/download/) - this is an open-source PostScript and PDF interpreter. Printing directly using GhostScript only produces grayscale prints using the laserjet generic driver unless you save as BMP image and then interpret the colours using the imageInstall GSView (http://pages.cs.wisc.edu/~ghost/gsview/)- this is a GhostScript add-on to make it easier to directly print to a Windows printer. GSPrint automates the above  PDF -> BMP -> Printer Driver.Run the GSPrint command from SQL Server agent or Windows Service:"C:\Program Files\Ghostgum\gsview\gsprint.exe" -color -landscape -all -printer "printername" "C:\temp\print.pdf"Command line options are here: http://pages.cs.wisc.edu/~ghost/gsview/gsprint.htmAnother lesson learned is, since you are calling the script from the Service Account, it will not necessarily have the Printer mapped in its Windows profile (if it even has one). The workaround to this is by adding a local printer as you normally would and then map this printer to the network printer. Note that you may need to install the Printer Driver locally in the server.So, that's it! There are many ways to achieve a solution. The key thing is how you provide the smartest solution!

    Read the article

  • TechEd 2010 Day One – How I Travel

    - by BuckWoody
    Normally when I blog on the first day of a conference, well, there hasn’t been a first day yet. So I talk about the value of a conference or some other facet. And normally in my (non-conference) blogs, I show you how I have learned to be a data professional – things I’ve learned how to do over the years. But in all that time, I don’t think I’ve ever talked about a big part of my job – traveling. I’ve traveled a lot throughout the years, when I’ve taught, gone to conferences, consulted and in my current role assisting Microsoft customers with large-scale database system designs.  So I’ll share a few thoughts about what I do. Keep in mind that I travel for short durations, just a day or so, and sometimes I travel internationally. For those I prepare differently – what I’m talking about here is what I do for a multi-day, same-country trip. Hopefully you find it useful. I’ll tag a few other travelers I know to add their thoughts.  Preparing for Travel   When I’m notified of a trip, I begin researching the location. I find the flights, hotel and (if I have to) a car to use while I’m away. We have an in-house system we use to book the travel, but when I travel not-for-Microsoft I use Expedia and Kayak to find what I need.  Traveling on Sunday and Friday is the worst. I have to do it sometimes (like this week) and it’s always a bad idea. But you can blunt the impact by booking as early as you can stand it. That means I have to be up super-early, but the flights are normally on time. I stay flexible, and always have a backup plan in case the flights are delayed or canceled.  For the hotel, I tend to go on the cheaper side, and I look for older hotels that have been renovated, or quirky ones. For instance, in Boise, ID recently I stayed at a 60’s-themed (think Mad-Men) hotel that was very cool. Always I go on the less expensive side – I find the “luxury” hotels nail me for Internet, food, everything. The cheaper places include all kinds of things, and even have breakfasts, shuttles and all kinds of things that start to add up. I even call ahead to make sure there’s an iron and ironing board available, since I’ll need those when I get there.  I find any way I can not to get a car. I use mass-transit wherever possible, and try to make friends and pay their gas to take me places. In a pinch, I’ll use a taxi. It ends up being cheaper, faster, and less stressful all around.  Packing  Over the years I’ve learned never to check luggage whenever I can. To do that, I lay out everything I want to take with me on the bed, and then try and make sure I’m really going to use it. I wear a dark wool set of pants, which I can clean and wear in hot and cold climates. I bring undies and socks of course, and for most places I have to wear “dress up” shirts. I bring at least two print T-Shirts in case I want to dress down for something while I’m gone, but I only bring one set of shoes. All the  clothes are rolled as tightly as possible as I learned in the military. Then I use those to cushion the electronics I take.  For toiletries I bring a shaver, toothpaste and toothbrush, D/O and a small brush. Everything else the hotel will provide.  For entertainment, I take a small Zune, a full PC-Headset (so I can make IP calls on the road) and my laptop. I don’t take books or anything else – everything is electronic. I use E-books (downloaded from our Library), Audio-Books (on the Zune) and I also bring along a Kaossilator (more here) to play music in the hotel room or even on the plane without being heard.  If I can, I pack into one roll-on bag. There’s not a lot better than this one, but I also have a Bag I was given as a prize for something or other here at Microsoft. Either way, I like something with less pockets and more big, open compartments. Everything gets rolled up and packed in, with all of the wires and charges in small bags my wife made for me. The laptop (and anything I don’t want gate-checked) goes on top or in an outside pouch so I can grab it quickly if I have to gate-check the bag. As much as I can, I try to go in one bag. When I can’t (like this week) I use this bag since it can expand, roll up, crush and even be put away later. It’s super-heavy canvas and worth the price. This allows me to not check a bag.  Journey Logistics The day of the trip, I have everything ready since I’m getting up early. I pack a few small snacks inside a plastic large-mouth water bottle, which protects the snacks and lets me get water in the terminal. I bring along those little powdered drink mixes to add to the water.  At the airport, I make a beeline for the power-outlets. I charge up my laptop and phone, and download all my e-mails so I can work on them off-line in the air. I don’t travel as often as I used to – just every month or so now, so I don’t have a membership to an airline club. If I travel much more, I’ll invest in one again – they are WELL worth the money, for the wifi, food and quiet if for nothing else.  I print out my logistics on paper and put that in my pocket – flight numbers, hotel addresses and phones for everything. That way if I have to make a change, I don’t have to boot up anything or even have power to be able to roll with the punches if things change.  Working While Away  While I’m away I realize I’m going to be swamped with things at the conference or with my clients. So I turn on Out-Of-Office notifications to let people know I won’t be as responsive, and I keep my Outlook calendar up to date so my co-workers know what I’m up to. I even update it with hotel and phone info in case they really need to reach me. I share my calendar with my wife so my family knows what I’m doing as well.  I check my e-mail during breaks, but I only respond to them in the evening or early morning at the hotel. I tweet during conferences. The point is to be as present as possible during the event or when I’m at the clients. Both deserve it.  So those are my initial thoughts. I’ll tag Brent Ozar, Brad McGeHee and Paul Randal, and they can tag whomever they wish. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Replication Services as ETL extraction tool

    - by jorg
    In my last blog post I explained the principles of Replication Services and the possibilities it offers in a BI environment. One of the possibilities I described was the use of snapshot replication as an ETL extraction tool: “Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards. In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!” Well I have tried it out and I must say it worked well. I was able to let replication services do work in a fraction of the time it would cost me to do the same in SSIS. What I did was the following: Configure snapshot replication for some Adventure Works tables, this was quite simple and straightforward. Create an SSIS package that executes the snapshot replication on demand and waits for its completion. This is something that you can’t do with out of the box functionality. While configuring the snapshot replication two SQL Agent Jobs are created, one for the creation of the snapshot and one for the distribution of the snapshot. Unfortunately these jobs are  asynchronous which means that if you execute them they immediately report back if the job started successfully or not, they do not wait for completion and report its result afterwards. So I had to create an SSIS package that executes the jobs and waits for their completion before the rest of the ETL process continues. Fortunately I was able to create the SSIS package with the desired functionality. I have made a step-by-step guide that will help you configure the snapshot replication and I have uploaded the SSIS package you need to execute it. Configure snapshot replication   The first step is to create a publication on the database you want to replicate. Connect to SQL Server Management Studio and right-click Replication, choose for New.. Publication…   The New Publication Wizard appears, click Next Choose your “source” database and click Next Choose Snapshot publication and click Next   You can now select tables and other objects that you want to publish Expand Tables and select the tables that are needed in your ETL process In the next screen you can add filters on the selected tables which can be very useful. Think about selecting only the last x days of data for example. Its possible to filter out rows and/or columns. In this example I did not apply any filters. Schedule the Snapshot Agent to run at a desired time, by doing this a SQL Agent Job is created which we need to execute from a SSIS package later on. Next you need to set the Security Settings for the Snapshot Agent. Click on the Security Settings button.   In this example I ran the Agent under the SQL Server Agent service account. This is not recommended as a security best practice. Fortunately there is an excellent article on TechNet which tells you exactly how to set up the security for replication services. Read it here and make sure you follow the guidelines!   On the next screen choose to create the publication at the end of the wizard Give the publication a name (SnapshotTest) and complete the wizard   The publication is created and the articles (tables in this case) are added Now the publication is created successfully its time to create a new subscription for this publication.   Expand the Replication folder in SSMS and right click Local Subscriptions, choose New Subscriptions   The New Subscription Wizard appears   Select the publisher on which you just created your publication and select the database and publication (SnapshotTest)   You can now choose where the Distribution Agent should run. If it runs at the distributor (push subscriptions) it causes extra processing overhead. If you use a separate server for your ETL process and databases choose to run each agent at its subscriber (pull subscriptions) to reduce the processing overhead at the distributor. Of course we need a database for the subscription and fortunately the Wizard can create it for you. Choose for New database   Give the database the desired name, set the desired options and click OK You can now add multiple SQL Server Subscribers which is not necessary in this case but can be very useful.   You now need to set the security settings for the Distribution Agent. Click on the …. button Again, in this example I ran the Agent under the SQL Server Agent service account. Read the security best practices here   Click Next   Make sure you create a synchronization job schedule again. This job is also necessary in the SSIS package later on. Initialize the subscription at first synchronization Select the first box to create the subscription when finishing this wizard Complete the wizard by clicking Finish The subscription will be created In SSMS you see a new database is created, the subscriber. There are no tables or other objects in the database available yet because the replication jobs did not ran yet. Now expand the SQL Server Agent, go to Jobs and search for the job that creates the snapshot:   Rename this job to “CreateSnapshot” Now search for the job that distributes the snapshot:   Rename this job to “DistributeSnapshot” Create an SSIS package that executes the snapshot replication We now need an SSIS package that will take care of the execution of both jobs. The CreateSnapshot job needs to execute and finish before the DistributeSnapshot job runs. After the DistributeSnapshot job has started the package needs to wait until its finished before the package execution finishes. The Execute SQL Server Agent Job Task is designed to execute SQL Agent Jobs from SSIS. Unfortunately this SSIS task only executes the job and reports back if the job started succesfully or not, it does not report if the job actually completed with success or failure. This is because these jobs are asynchronous. The SSIS package I’ve created does the following: It runs the CreateSnapshot job It checks every 5 seconds if the job is completed with a for loop When the CreateSnapshot job is completed it starts the DistributeSnapshot job And again it waits until the snapshot is delivered before the package will finish successfully Quite simple and the package is ready to use as standalone extract mechanism. After executing the package the replicated tables are added to the subscriber database and are filled with data:   Download the SSIS package here (SSIS 2008) Conclusion In this example I only replicated 5 tables, I could create a SSIS package that does the same in approximately the same amount of time. But if I replicated all the 70+ AdventureWorks tables I would save a lot of time and boring work! With replication services you also benefit from the feature that schema changes are applied automatically which means your entire extract phase wont break. Because a snapshot is created using the bcp utility (bulk copy) it’s also quite fast, so the performance will be quite good. Disadvantages of using snapshot replication as extraction tool is the limitation on source systems. You can only choose SQL Server or Oracle databases to act as a publisher. So if you plan to build an extract phase for your ETL process that will invoke a lot of tables think about replication services, it would save you a lot of time and thanks to the Extract SSIS package I’ve created you can perfectly fit it in your usual SSIS ETL process.

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

  • Fraud Detection with the SQL Server Suite Part 2

    - by Dejan Sarka
    This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

    Read the article

  • Developing a Cost Model for Cloud Applications

    - by BuckWoody
    Note - please pay attention to the date of this post. As much as I attempt to make the information below accurate, the nature of distributed computing means that components, units and pricing will change over time. The definitive costs for Microsoft Windows Azure and SQL Azure are located here, and are more accurate than anything you will see in this post: http://www.microsoft.com/windowsazure/offers/  When writing software that is run on a Platform-as-a-Service (PaaS) offering like Windows Azure / SQL Azure, one of the questions you must answer is how much the system will cost. I will not discuss the comparisons between on-premise costs (which are nigh impossible to calculate accurately) versus cloud costs, but instead focus on creating a general model for estimating costs for a given application. You should be aware that there are (at this writing) two billing mechanisms for Windows and SQL Azure: “Pay-as-you-go” or consumption, and “Subscription” or commitment. Conceptually, you can consider the former a pay-as-you-go cell phone plan, where you pay by the unit used (at a slightly higher rate) and the latter as a standard cell phone plan where you commit to a contract and thus pay lower rates. In this post I’ll stick with the pay-as-you-go mechanism for simplicity, which should be the maximum cost you would pay. From there you may be able to get a lower cost if you use the other mechanism. In any case, the model you create should hold. Developing a good cost model is essential. As a developer or architect, you’ll most certainly be asked how much something will cost, and you need to have a reliable way to estimate that. Businesses and Organizations have been used to paying for servers, software licenses, and other infrastructure as an up-front cost, and power, people to the systems and so on as an ongoing (and sometimes not factored) cost. When presented with a new paradigm like distributed computing, they may not understand the true cost/value proposition, and that’s where the architect and developer can guide the conversation to make a choice based on features of the application versus the true costs. The two big buckets of use-types for these applications are customer-based and steady-state. In the customer-based use type, each successful use of the program results in a sale or income for your organization. Perhaps you’ve written an application that provides the spot-price of foo, and your customer pays for the use of that application. In that case, once you’ve estimated your cost for a successful traversal of the application, you can build that into the price you charge the user. It’s a standard restaurant model, where the price of the meal is determined by the cost of making it, plus any profit you can make. In the second use-type, the application will be used by a more-or-less constant number of processes or users and no direct revenue is attached to the system. A typical example is a customer-tracking system used by the employees within your company. In this case, the cost model is often created “in reverse” - meaning that you pilot the application, monitor the use (and costs) and that cost is held steady. This is where the comparison with an on-premise system becomes necessary, even though it is more difficult to estimate those on-premise true costs. For instance, do you know exactly how much cost the air conditioning is because you have a team of system administrators? This may sound trivial, but that, along with the insurance for the building, the wiring, and every other part of the system is in fact a cost to the business. There are three primary methods that I’ve been successful with in estimating the cost. None are perfect, all are demand-driven. The general process is to lay out a matrix of: components units cost per unit and then multiply that times the usage of the system, based on which components you use in the program. That sounds a bit simplistic, but using those metrics in a calculation becomes more detailed. In all of the methods that follow, you need to know your application. The components for a PaaS include computing instances, storage, transactions, bandwidth and in the case of SQL Azure, database size. In most cases, architects start with the first model and progress through the other methods to gain accuracy. Simple Estimation The simplest way to calculate costs is to architect the application (even UML or on-paper, no coding involved) and then estimate which of the components you’ll use, and how much of each will be used. Microsoft provides two tools to do this - one is a simple slider-application located here: http://www.microsoft.com/windowsazure/pricing-calculator/  The other is a tool you download to create an “Return on Investment” (ROI) spreadsheet, which has the advantage of leading you through various questions to estimate what you plan to use, located here: https://roianalyst.alinean.com/msft/AutoLogin.do?d=176318219048082115  You can also just create a spreadsheet yourself with a structure like this: Program Element Azure Component Unit of Measure Cost Per Unit Estimated Use of Component Total Cost Per Component Cumulative Cost               Of course, the consideration with this model is that it is difficult to predict a system that is not running or hasn’t even been developed. Which brings us to the next model type. Measure and Project A more accurate model is to actually write the code for the application, using the Software Development Kit (SDK) which can run entirely disconnected from Azure. The code should be instrumented to estimate the use of the application components, logging to a local file on the development system. A series of unit and integration tests should be run, which will create load on the test system. You can use standard development concepts to track this usage, and even use Windows Performance Monitor counters. The best place to start with this method is to use the Windows Azure Diagnostics subsystem in your code, which you can read more about here: http://blogs.msdn.com/b/sumitm/archive/2009/11/18/introducing-windows-azure-diagnostics.aspx This set of API’s greatly simplifies tracking the application, and in fact you can use this information for more than just a cost model. After you have the tracking logs, you can plug the numbers into ay of the tools above, which should give a representative cost or in some cases a unit cost. The consideration with this model is that the SDK fabric is not a one-to-one comparison with performance on the actual Windows Azure fabric. Those differences are usually smaller, but they do need to be considered. Also, you may not be able to accurately predict the load on the system, which might lead to an architectural change, which changes the model. This leads us to the next, most accurate method for a cost model. Sample and Estimate Using standard statistical and other predictive math, once the application is deployed you will get a bill each month from Microsoft for your Azure usage. The bill is quite detailed, and you can export the data from it to do analysis, and using methods like regression and so on project out into the future what the costs will be. I normally advise that the architect also extrapolate a unit cost from those metrics as well. This is the information that should be reported back to the executives that pay the bills: the past cost, future projected costs, and unit cost “per click” or “per transaction”, as your case warrants. The challenge here is in the model itself - statistical methods are not foolproof, and the larger the sample (in this case I recommend the entire population, not a smaller sample) is key. References and Tools Articles: http://blogs.msdn.com/b/patrick_butler_monterde/archive/2010/02/10/windows-azure-billing-overview.aspx http://technet.microsoft.com/en-us/magazine/gg213848.aspx http://blog.codingoutloud.com/2011/06/05/azure-faq-how-much-will-it-cost-me-to-run-my-application-on-windows-azure/ http://blogs.msdn.com/b/johnalioto/archive/2010/08/25/10054193.aspx http://geekswithblogs.net/iupdateable/archive/2010/02/08/qampa-how-can-i-calculate-the-tco-and-roi-when.aspx   Other Tools: http://cloud-assessment.com/ http://communities.quest.com/community/cloud_tools

    Read the article

  • Merge replication stopping without errors in SQL 2008 R2

    - by Rob Farley
    A non-SQL MVP friend of mine, who also happens to be a client, asked me for some help again last week. I was planning on writing this up even before Rob Volk (@sql_r) listed his T-SQL Tuesday topic for this month. Earlier in the year, I (well, LobsterPot Solutions, although I’d been the person mostly involved) had helped out with a merge replication problem. The Merge Agent on the subscriber was just stopping every time, shortly after it started. With no errors anywhere – not in the Windows Event Log, the SQL Agent logs, not anywhere. We’d managed to get the system working again, but didn’t have a good reason about what had happened, and last week, the problem occurred again. I asked him about writing up the experience in a blog post, largely because of the red herrings that we encountered. It was an interesting experience for me, also because I didn’t end up touching my computer the whole time – just tapping on my phone via Twitter and Live Msgr. You see, the thing with replication is that a useful troubleshooting option is to reinitialise the thing. We’d done that last time, and it had started to work again – eventually. I say eventually, because the link being used between the sites is relatively slow, and it took a long while for the initialisation to finish. Meanwhile, we’d been doing some investigation into what the problem could be, and were suitably pleased when the problem disappeared. So I got a message saying that a replication problem had occurred again. Reinitialising wasn’t going to be an option this time either. In this scenario, the subscriber having the problem happened to be in a different domain to the publisher. The other subscribers (within the domain) were fine, just this one in a different domain had the problem. Part of the problem seemed to be a log file that wasn’t being backed up properly. They’d been trying to back up to a backup device that had a corruption, and the log file was growing. Turned out, this wasn’t related to the problem, but of course, any time you’re troubleshooting and you see something untoward, you wonder. Having got past that problem, my next thought was that perhaps there was a problem with the account being used. But the other subscribers were using the same account, without any problems. The client pointed out that that it was almost exactly six months since the last failure (later shown to be a complete red herring). It sounded like something might’ve expired. Checking through certificates and trusts showed no sign of anything, and besides, there wasn’t a problem running a command-prompt window using the account in question, from the subscriber box. ...except that when he ran the sqlcmd –E –S servername command I recommended, it failed with a Named Pipes error. I’ve seen problems with firewalls rejecting connections via Named Pipes but letting TCP/IP through, so I got him to look into SQL Configuration Manager to see what kind of connection was being preferred... Everything seemed fine. And strangely, he could connect via Management Studio. Turned out, he had a typo in the servername of the sqlcmd command. That particular red herring must’ve been reflected in his cheeks as he told me. During the time, I also pinged a friend of mine to find out who I should ask, and Ted Kruger (@onpnt) ‘s name came up. Ted (and thanks again, Ted – really) reconfirmed some of my thoughts around the idea of an account expiring, and also suggesting bumping up the logging to level 4 (2 is Verbose, 4 is undocumented ridiculousness). I’d just told the client to push the logging up to level 2, but the log file wasn’t appearing. Checking permissions showed that the user did have permission on the folder, but still no file was appearing. Then it was noticed that the user had been switched earlier as part of the troubleshooting, and switching it back to the real user caused the log file to appear. Still no errors. A lot more information being pushed out, but still no errors. Ted suggested making sure the FQDNs were okay from both ends, in case the servers were unable to talk to each other. DNS problems can lead to hassles which can stop replication from working. No luck there either – it was all working fine. Another server started to report a problem as well. These two boxes were both SQL 2008 R2 (SP1), while the others, still working, were SQL 2005. Around this time, the client tried an idea that I’d shown him a few years ago – using a Profiler trace to see what was being called on the servers. It turned out that the last call being made on the publisher was sp_MSenumschemachange. A quick interwebs search on that showed a problem that exists in SQL Server 2008 R2, when stored procedures have more than 4000 characters. Running that stored procedure (with the same parameters) manually on SQL 2005 listed three stored procedures, the first of which did indeed have more than 4000 characters. Still no error though, and the problem as listed at http://support.microsoft.com/kb/2539378 describes an error that should occur in the Event log. However, this problem is the type of thing that is fixed by a reinitialisation (because it doesn’t need to send the procedure change across as a transaction). And a look in the change history of the long stored procs (you all keep them, right?), showed that the problem from six months earlier could well have been down to this too. Applying SP2 (with sufficient paranoia about backups and how to get back out again if necessary) fixed the problem. The stored proc changes went through immediately after the service pack was applied, and it’s been running happily since. The funny thing is that I didn’t solve the problem. He had put the Profiler trace on the server, and had done the search that found a forum post pointing at this particular problem. I’d asked Ted too, and although he’d given some useful information, nothing that he’d come up with had actually been the solution either. Sometimes, asking for help is the most useful thing you can do. Often though, you don’t end up getting the help from the person you asked – the sounding board is actually what you need. @rob_farley

    Read the article

  • How I Work: Staying Productive Whilst Traveling

    - by BuckWoody
    I travel a lot. Not like some folks that are gone every week, mind you, although in the last month I’ve been to: Cambridge, UK; Anchorage, AK; San Jose, CA; Copenhagen, DK, Boston, MA; and I’m currently en-route to Anaheim, CA.  While this many places in a month is a bit unusual for me, I would say I travel frequently. I’ve travelled most of my 28+ years in IT, and at one time was a consultant traveling weekly.   With that much time away from my primary work location, I have to find ways to stay productive. Some might say “just rest – take a nap!” – but I’m not able to do that. For one thing, I’m a very light sleeper and I’ve never slept on a plane - even a 30+ hour trip to New Zealand in Business Class - so that just isn’t option. I also am not always in the plane, of course. There’s the hotel, the taxi/bus/train, the airport and then all that over again when I arrive. Since my regular jobs have many demands, I have to get work done.   Note: No, I’m not always focused on work. I need downtime just like everyone else. Sometimes I just think, watch a movie or listen to tunes – and I give myself permission to do that anytime – sometimes the whole trip. I have too fewheartbeats left in life to only focus on work – it’s just not that important, and neither am I. Some of these tasks are letters to friends and family, or other personal things. What I’m talking about here is a plan, not some task list I have to follow. When I get to the location I’m traveling to, I always build in as much time as I can to ensure I enjoy those sights and the people I’m with. I would find traveling to be a waste if not for that.   The Unrealistic Expectation As I would evaluate the trip I was taking – say a 6-8 hour flight – I would expect to get 10-12 hours of work done. After all, there’s the time at the airport, the taxi and so on, and then of course the time in the air with all of the room, power, internet and everything else I needed to get my work done. I would pile up tasks at home, pack my bags, and head happily to the magical land of the TSA.   Right. On return from the trip, I had accomplished little, had more e-mails and other work that had piled up, and I was tired, hungry, and unorganized. This had to change. So, I decided to do three things: Segment my work Set realistic expectations Plan accordingly  Segmenting By Available Resources The first task was to decide what kind of work I could do in each location – if any. I found that I was dependent on a few things to get work done, such as power, the Internet, and a place to sit down. Before I fly, I take some time at home to get all of the work I’d like to accomplish while away segmented into these areas, and print that out on paper, which goes in my suit-coat pocket along with a mechanical pencil. I print my tickets, and I’m all set for the adventure ahead. Then I simply do each kind of work whenever I’m in that situation. No power There are certain times when I don’t have power available. But not only that, I might not even be able to use most of my electronics. So I now schedule as many phone calls as I can for the taxi/bus/train ride and the airports as I can. I have a paper notebook (Moleskine, of course) and a pencil and I print out any notes or numbers I need prior to the trip. Once I’m airborne or at the airport, I work on my laptop. I check and respond to e-mails, create slides, write code, do architecture, whatever I can.  If I can’t use any electronics, or once the power runs out, I schedule time for reading. I can read at the airport or anywhere, actually, even in-flight or any other transport. I “read with a pencil”, meaning I take a lot of notes, which I liketo put in OneNote, but since in most cases I don’t have power, I use the Moleskine to do that. Speaking of which, sometimes as I’m thinking I come up with new topics, ideas, blog posts, or things to teach in my classes. Once again I take out the notebook and write it down. All of these notes get a check-mark when I get back to the office and transfer the writing to OneNote. I’ve tried those “smart pens” and so on to automate this, but it just never works out. Pencil and paper are just fine. As I mentioned, sometime I just need to think. I’ll do nothing, and let my mind wander, thinking of nothing in particular, or some math problem or science question I’m interested in. My only issue with this is that I communicate tothink, and I don’t want to drive people crazy by being that guy that won’t shut up, so I think in a different way. Power, but no Internet or Phone If I have power but no Internet or phone, I focus on the laptop and the tablet as before, and I also recharge my other gadgets. Power, Internet, Phone and a Place to Work At first I thought that when I arrived at the hotel or event I could get the same amount of work done that I do at the office. Not so. There’s simply too many distractions, things you need, or other issues that allow this. Of course, Ican work on any device, read, think, write or whatever, but I am simply not as productive as I am in my home office. So I plan for about 25-50% as much work getting done in this environment as I think I could really do. I’ve done some measurements, and this holds out to be true almost every time. The key is that I re-set my expectations (and my co-worker’s expectations as well) that this is the case. I use the Out-Of-Office notices to let people know that I’m just not going to be 100% at this time – it’s hard for everyone, but it’s more honest and realistic, and I’d rather they know that – and that I realize that – than to let them think I’m totally available. Because I’m not – I’m traveling. I don’t tend to put too much detail, because after all I don’t necessarily want to let people know when I’m not home :) but I do think it’s important to let people that depend on my know that I’ll get back with them later. I hope this helps you think through your own methodology of staying productive when you travel. Or perhaps you just go offline, and don’t worry about any of this – good for you! That’s completely valid as well.   (Oh, and yes, I wrote this at 35K feet, on Alaska Airlines on a trip. :)  Practice what you preach, Buck.)

    Read the article

  • apache performance improvements and maxclients

    - by updog
    I know this has been asked a few (thousand) times around the internet but I was hoping someone whose in the know might be able to comment on my particular setup. I have a web server hosting one site (php/codeigniter) with a wordpress blog in a sub directory. The server has 2GB RAM, 3GHz CPU and I have offloaded the static assets to CloudFlare which is has reduced bandwidth for the actual server by almost 75%. The problem I have is when an email campaign is sent out that links to the site or blog, it slows down. Below is my settings in apache2.conf. Average apache process size is 80M and there is 1.5GB available for apache. <IfModule mpm_prefork_module> StartServers 8 MinSpareServers 5 MaxSpareServers 20 MaxClients 20 MaxRequestsPerChild 2000 </IfModule> I have already setup and installed apc and built some caching into the site and used w3totalcache on the blog. The number of concurrent users is around 2-300 when there is a campaign, are there any further optimisations before

    Read the article

  • I Can't Get Ruby on Rails + Passenger + Apache to Work

    - by Luke Crowe
    I'm sorry if this is a stupid question, but I can't get Ruby on Rails to work on my Apache server. I'm using Phusion Passenger (mod_rails, mod_rack) for app deployment. Here is my RoR-specific configuration code in my website's Apache configuration file: Alias /rails /var/www/syyborg.com/ruby/blog/public <Directory /var/www/syyborg.com/ruby/blog/public Options FollowSymLinks AllowOverride None Order Allow,Deny Allow from All </Directory RailsBaseURI /rails Again, I really have very little knowledge of this kind of thing; I have never set up a server from scratch before. Anyways, my rails app, as you can see, is located at /var/www/syyborg.com/ruby/blog/. I am trying to access it from http://[my domain, syyborg.com]/rails. However, when I try to load the site, I get a "403 Forbidden" error. Any help would be greatly appreciated, and I can provide further details if they are required. Thanks in advance!

    Read the article

  • Redirecting to a folder

    - by RN
    Apache2 Plesk 9.x I have a website www.example.com and my blog is on www.example.com/blog I have no content on www.example.com as of now So I want all requests for example.com to be redirected to www.example.com/blog How should I do that ? Is this something I can do in Apache? I am using the GoDaddy DNS server. Not sure if it matters- but I have multiple domains hosted n the same server. And I am using Plesk to manage my virtual hosts.

    Read the article

  • Set WordPress permalinks directly in httpd.conf?

    - by songdogtech
    Is is possible to configure WordPress permalinks directly in Apache httpd.conf? I have a server situation (Apache 2.2.3 CentOS PHP5.1.6) where I can't use .htaccess for performance reasons, but can use httpd.conf. The admin says that mod_rewrite is enabled, but AllowOverride is not, and I can't change those settings. And I need to restrict the permalinks to just the "blog" directory. This is what would go in .htaccess but needs to go into httpd.conf: <IfModule mod_rewrite.c> RewriteEngine On RewriteBase /blog/ RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /blog/index.php [L] </IfModule> Thanks...

    Read the article

< Previous Page | 67 68 69 70 71 72 73 74 75 76 77 78  | Next Page >