Search Results

Search found 867 results on 35 pages for 'vary'.

Page 33/35 | < Previous Page | 29 30 31 32 33 34 35 | Next Page >

ASP.NET GZip Encoding Caveats

- by Rick Strahl

GZip encoding in ASP.NET is pretty easy to accomplish using the built-in GZipStream and DeflateStream classes and applying them to the Response.Filter property. While applying GZip and Deflate behavior is pretty easy there are a few caveats that you have watch out for as I found out today for myself with an application that was throwing up some garbage data. But before looking at caveats let’s review GZip implementation for ASP.NET. ASP.NET GZip/Deflate Basics Response filters basically are applied to the Response.OutputStream and transform it as data is written to it through the ASP.NET Response object. So a Response.Write eventually gets written into the output stream which if a filter is also written through the filter stream’s interface. To perform the actual GZip (and Deflate) encoding typically used by Web pages .NET includes the GZipStream and DeflateStream stream classes which can be readily assigned to the Repsonse.OutputStream. With these two stream classes in place it’s almost trivially easy to create a couple of reusable methods that allow you to compress your HTTP output. In my standard WebUtils utility class (from the West Wind West Wind Web Toolkit) created two static utility methods – IsGZipSupported and GZipEncodePage – that check whether the client supports GZip encoding and then actually encodes the current output (note that although the method includes ‘Page’ in its name this code will work with any ASP.NET output). /// <summary> /// Determines if GZip is supported /// </summary> /// <returns></returns> public static bool IsGZipSupported() { string AcceptEncoding = HttpContext.Current.Request.Headers["Accept-Encoding"]; if (!string.IsNullOrEmpty(AcceptEncoding) && (AcceptEncoding.Contains("gzip") || AcceptEncoding.Contains("deflate"))) return true; return false; } /// <summary> /// Sets up the current page or handler to use GZip through a Response.Filter /// IMPORTANT: /// You have to call this method before any output is generated! /// </summary> public static void GZipEncodePage() { HttpResponse Response = HttpContext.Current.Response; if (IsGZipSupported()) { string AcceptEncoding = HttpContext.Current.Request.Headers["Accept-Encoding"]; if (AcceptEncoding.Contains("deflate")) { Response.Filter = new System.IO.Compression.DeflateStream(Response.Filter, System.IO.Compression.CompressionMode.Compress); Response.Headers.Remove("Content-Encoding"); Response.AppendHeader("Content-Encoding", "deflate"); } else { Response.Filter = new System.IO.Compression.GZipStream(Response.Filter, System.IO.Compression.CompressionMode.Compress); Response.Headers.Remove("Content-Encoding"); Response.AppendHeader("Content-Encoding", "gzip"); } } } As you can see the actual assignment of the Filter is as simple as: Response.Filter = new DeflateStream(Response.Filter, System.IO.Compression.CompressionMode.Compress); which applies the filter to the OutputStream. You also need to ensure that your response reflects the new GZip or Deflate encoding and ensure that any pages that are cached in Proxy servers can differentiate between pages that were encoded with the various different encodings (or no encoding). To use this utility function now is trivially easy: In any ASP.NET code that wants to compress its Response output you simply use: protected void Page_Load(object sender, EventArgs e) { WebUtils.GZipEncodePage(); Entry = WebLogFactory.GetEntry(); var entries = Entry.GetLastEntries(App.Configuration.ShowEntryCount, "pk,Title,SafeTitle,Body,Entered,Feedback,Location,ShowTopAd", "TEntries"); if (entries == null) throw new ApplicationException("Couldn't load WebLog Entries: " + Entry.ErrorMessage); this.repEntries.DataSource = entries; this.repEntries.DataBind(); } Here I use an ASP.NET page, but the above WebUtils.GZipEncode() method call will work in any ASP.NET application type including HTTP Handlers. The only requirement is that the filter needs to be applied before any other output is sent to the OutputStream. For example, in my CallbackHandler service implementation by default output over a certain size is GZip encoded. The output that is generated is JSON or XML and if the output is over 5k in size I apply WebUtils.GZipEncode(): if (sbOutput.Length > GZIP_ENCODE_TRESHOLD) WebUtils.GZipEncodePage(); Response.ContentType = ControlResources.STR_JsonContentType; HttpContext.Current.Response.Write(sbOutput.ToString()); Ok, so you probably get the idea: Encoding GZip/Deflate content is pretty easy. Hold on there Hoss –Watch your Caching Or is it? There are a few caveats that you need to watch out for when dealing with GZip content. The fist issue is that you need to deal with the fact that some clients don’t support GZip or Deflate content. Most modern browsers support it, but if you have a programmatic Http client accessing your content GZip/Deflate support is by no means guaranteed. For example, WinInet Http clients don’t support GZip out of the box – it has to be explicitly implemented. Other low level HTTP clients on other platforms too don’t support GZip out of the box. The problem is that your application, your Web Server and Proxy Servers on the Internet might be caching your generated content. If you return content with GZip once and then again without, either caching is not applied or worse the wrong type of content is returned back to the client from a cache or proxy. The result is an unreadable response for *some clients* which is also very hard to debug and fix once in production. You already saw the issue of Proxy servers addressed in the GZipEncodePage() function: // Allow proxy servers to cache encoded and unencoded versions separately Response.AppendHeader("Vary", "Content-Encoding"); This ensures that any Proxy servers also check for the Content-Encoding HTTP Header to cache their content – not just the URL. The same thing applies if you do OutputCaching in your own ASP.NET code. If you generate output for GZip on an OutputCached page the GZipped content will be cached (either by ASP.NET’s cache or in some cases by the IIS Kernel Cache). But what if the next client doesn’t support GZip? She’ll get served a cached GZip page that won’t decode and she’ll get a page full of garbage. Wholly undesirable. To fix this you need to add some custom OutputCache rules by way of the GetVaryByCustom() HttpApplication method in your global_ASAX file: public override string GetVaryByCustomString(HttpContext context, string custom) { // Override Caching for compression if (custom == "GZIP") { string acceptEncoding = HttpContext.Current.Response.Headers["Content-Encoding"]; if (string.IsNullOrEmpty(acceptEncoding)) return ""; else if (acceptEncoding.Contains("gzip")) return "GZIP"; else if (acceptEncoding.Contains("deflate")) return "DEFLATE"; return ""; } return base.GetVaryByCustomString(context, custom); } In a page that use Output caching you then specify: <%@ OutputCache Duration="180" VaryByParam="none" VaryByCustom="GZIP" %> To use that custom rule. It’s all Fun and Games until ASP.NET throws an Error Ok, so you’re up and running with GZip, you have your caching squared away and your pages that you are applying it to are jamming along. Then BOOM, something strange happens and you get a lovely garbled page that look like this: Lovely isn’t it? What’s happened here is that I have WebUtils.GZipEncode() applied to my page, but there’s an error in the page. The error falls back to the ASP.NET error handler and the error handler removes all existing output (good) and removes all the custom HTTP headers I’ve set manually (usually good, but very bad here). Since I applied the Response.Filter (via GZipEncode) the output is now GZip encoded, but ASP.NET has removed my Content-Encoding header, so the browser receives the GZip encoded content without a notification that it is encoded as GZip. The result is binary output. Here’s what Fiddler says about the raw HTTP header output when an error occurs when GZip encoding was applied: HTTP/1.1 500 Internal Server Error Cache-Control: private Content-Type: text/html; charset=utf-8 Date: Sat, 30 Apr 2011 22:21:08 GMT Content-Length: 2138 Connection: close ?`I?%&/m?{J?J??t??` … binary output striped here Notice: no Content-Encoding header and that’s why we’re seeing this garbage. ASP.NET has stripped the Content-Encoding header but left our filter intact. So how do we fix this? In my applications I typically have a global Application_Error handler set up and in this case I’ve been using that. One thing that you can do in the Application_Error handler is explicitly clear out the Response.Filter and set it to null at the top: protected void Application_Error(object sender, EventArgs e) { // Remove any special filtering especially GZip filtering Response.Filter = null; … } And voila I get my Yellow Screen of Death or my custom generated error output back via uncompressed content. BTW, the same is true for Page level errors handled in Page_Error or ASP.NET MVC Error handling methods in a controller. Another and possibly even better solution is to check whether a filter is attached just before the headers are sent to the client as pointed out by Adam Schroeder in the comments: protected void Application_PreSendRequestHeaders() { // ensure that if GZip/Deflate Encoding is applied that headers are set // also works when error occurs if filters are still active HttpResponse response = HttpContext.Current.Response; if (response.Filter is GZipStream && response.Headers["Content-encoding"] != "gzip") response.AppendHeader("Content-encoding", "gzip"); else if (response.Filter is DeflateStream && response.Headers["Content-encoding"] != "deflate") response.AppendHeader("Content-encoding", "deflate"); } This uses the Application_PreSendRequestHeaders() pipeline event to check for compression encoding in a filter and adjusts the content accordingly. This is actually a better solution since this is generic – it’ll work regardless of how the content is cleaned up. For example, an error Response.Redirect() or short error display might get changed and the filter not cleared and this code actually handles that. Sweet, thanks Adam. It’s unfortunate that ASP.NET doesn’t natively clear out Response.Filters when an error occurs just as it clears the Response and Headers. I can’t see where leaving a Filter in place in an error situation would make any sense, but hey - this is what it is and it’s easy enough to fix as long as you know where to look. Riiiight! IIS and GZip I should also mention that IIS 7 includes good support for compression natively. If you can defer encoding to let IIS perform it for you rather than doing it in your code by all means you should do it! Especially any static or semi-dynamic content that can be made static should be using IIS built-in compression. Dynamic caching is also supported but is a bit more tricky to judge in terms of performance and footprint. John Forsyth has a great article on the benefits and drawbacks of IIS 7 compression which gives some detailed performance comparisons and impact reviews. I’ll post another entry next with some more info on IIS compression since information on it seems to be a bit hard to come by. Related Content Built-in GZip/Deflate Compression in IIS 7.x HttpWebRequest and GZip Responses © Rick Strahl, West Wind Technologies, 2005-2011Posted in ASP.NET IIS7

Read the article
First round playing with Memcached

- by Shaun

To be honest I have not been very interested in the caching before I’m going to a project which would be using the multi-site deployment and high connection and concurrency and very sensitive to the user experience. That means we must cache the output data for better performance. After looked for the Internet I finally focused on the Memcached. What’s the Memcached? I think the description on its main site gives us a very good and simple explanation. Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. Memcached is simple yet powerful. Its simple design promotes quick deployment, ease of development, and solves many problems facing large data caches. Its API is available for most popular languages. The original Memcached was built on *nix system are is being widely used in the PHP world. Although it’s not a problem to use the Memcached installed on *nix system there are some windows version available fortunately. Since we are WISC (Windows – IIS – SQL Server – C#, which on the opposite of LAMP) it would be much easier for us to use the Memcached on Windows rather than *nix. I’m using the Memcached Win X64 version provided by NorthScale. There are also the x86 version and other operation system version. Install Memcached Unpack the Memcached file to a folder on the machine you want it to be installed, we can see that there are only 3 files and the main file should be the “memcached.exe”. Memcached would be run on the server as a service. To install the service just open a command windows and navigate to the folder which contains the “memcached.exe”, let’s say “C:\Memcached\”, and then type “memcached.exe -d install”. If you are using Windows Vista and Windows 7 system please be execute the command through the administrator role. Right-click the command item in the start menu and use “Run as Administrator”, otherwise the Memcached would not be able to be installed successfully. Once installed successful we can type “memcached.exe -d start” to launch the service. Now it’s ready to be used. The default port of Memcached is 11211 but you can change it through the command argument. You can find the help by typing “memcached -h”. Using Memcached Memcahed has many good and ready-to-use providers for vary program language. After compared and reviewed I chose the Memcached Providers. It’s built based on another 3rd party Memcached client named enyim.com Memcached Client. The Memcached Providers is very simple to set/get the cached objects through the Memcached servers and easy to be configured through the application configuration file (aka web.config and app.config). Let’s create a console application for the demonstration and add the 3 DLL files from the package of the Memcached Providers to the project reference. Then we need to add the configuration for the Memcached server. Create an App.config file and firstly add the section on top of it. Here we need three sections: the section for Memcached Providers, for enyim.com Memcached client and the log4net. 1: <configSections> 2: <section name="cacheProvider" 3: type="MemcachedProviders.Cache.CacheProviderSection, MemcachedProviders" 4: allowDefinition="MachineToApplication" 5: restartOnExternalChanges="true"/> 6: <sectionGroup name="enyim.com"> 7: <section name="memcached" 8: type="Enyim.Caching.Configuration.MemcachedClientSection, Enyim.Caching"/> 9: </sectionGroup> 10: <section name="log4net" 11: type="log4net.Config.Log4NetConfigurationSectionHandler,log4net"/> 12: </configSections> Then we will add the configuration for 3 of them in the App.config file. The Memcached server information would be defined under the enyim.com section since it will be responsible for connect to the Memcached server. Assuming I installed the Memcached on two servers with the default port, the configuration would be like this. 1: <enyim.com> 2: <memcached> 3: <servers> 4:  5: <add address="192.168.0.149" port="11211"/> 6: <add address="10.10.20.67" port="11211"/> 7: </servers> 8: <socketPool minPoolSize="10" maxPoolSize="100" connectionTimeout="00:00:10" deadTimeout="00:02:00"/> 9: </memcached> 10: </enyim.com> Memcached supports the multi-deployment which means you can install the Memcached on the servers as many as you need. The protocol of the Memcached responsible for routing the cached objects into the proper server. So it’s very easy to scale-out your system by Memcached. And then define the Memcached Providers configuration. The defaultExpireTime indicates how long the objected cached in the Memcached would be expired, the default value is 2000 ms. 1: <cacheProvider defaultProvider="MemcachedCacheProvider"> 2: <providers> 3: <add name="MemcachedCacheProvider" 4: type="MemcachedProviders.Cache.MemcachedCacheProvider, MemcachedProviders" 5: keySuffix="_MySuffix_" 6: defaultExpireTime="2000"/> 7: </providers> 8: </cacheProvider> The last configuration would be the log4net. 1: <log4net> 2:  3: <appender name="ConsoleAppender" type="log4net.Appender.ConsoleAppender"> 4: <layout type="log4net.Layout.PatternLayout"> 5: <conversionPattern value="%date [%thread] %-5level %logger [%property{NDC}] - %message%newline"/> 6: </layout> 7: </appender> 8:  9:  10: <root> 11: <priority value="WARN"/> 12: <appender-ref ref="ConsoleAppender"> 13: <filter type="log4net.Filter.LevelRangeFilter"> 14: <levelMin value="WARN"/> 15: <levelMax value="FATAL"/> 16: </filter> 17: </appender-ref> 18: </root> 19: </log4net> Get, Set and Remove the Cached Objects Once we finished the configuration it would be very simple to consume the Memcached servers. The Memcached Providers gives us a static class named DistCache that can be used to operate the Memcached servers. Get<T>: Retrieve the cached object from the Memcached servers. If failed it will return null or the default value. Add: Add an object with a unique key into the Memcached servers. Assuming that we have an operation that retrieve the email from the name which is time consuming. This is the operation that should be cached. The method would be like this. I utilized Thread.Sleep to simulate the long-time operation. 1: static string GetEmailByNameSlowly(string name) 2: { 3: Thread.Sleep(2000); 4: return name + "@ethos.com.cn"; 5: } Then in the real retrieving method we will firstly check whether the name, email information had been searched previously and cached. If yes we will just return them from the Memcached, otherwise we will invoke the slowly method to retrieve it and then cached. 1: static string GetEmailByName(string name) 2: { 3: var email = DistCache.Get<string>(name); 4: if (string.IsNullOrEmpty(email)) 5: { 6: Console.WriteLine("==> The name/email not be in memcached so need slow loading. (name = {0})==>", name); 7: email = GetEmailByNameSlowly(name); 8: DistCache.Add(name, email); 9: } 10: else 11: { 12: Console.WriteLine("==> The name/email had been in memcached. (name = {0})==>", name); 13: } 14: return email; 15: } Finally let’s finished the calling method and execute. 1: static void Main(string[] args) 2: { 3: var name = string.Empty; 4: while (name != "q") 5: { 6: Console.Write("==> Please enter the name to find the email: "); 7: name = Console.ReadLine(); 8: 9: var email = GetEmailByName(name); 10: Console.WriteLine("==> The email of {0} is {1}.", name, email); 11: } 12: } The first time I entered “ziyanxu” it takes about 2 seconds to get the email since there’s nothing cached. But the next time I entered “ziyanxu” it returned very quickly from the Memcached. Summary In this post I explained a bit on why we need cache, what’s Memcached and how to use it through the C# application. The example is fairly simple but hopefully demonstrated on how to use it. Memcached is very easy and simple to be used since it gives you the full opportunity to consider what, when and how to cache the objects. And when using Memcached you don’t need to consider the cache servers. The Memcached would be like a huge object pool in front of you. The next step I’m thinking now are: What kind of data should be cached? And how to determined the key? How to implement the cache as a layer on top of the business layer so that the application will not notice that the cache is there. How to implement the cache by AOP so that the business logic no need to consider the cache. I will investigate on them in the future and will share my thoughts and results. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article
SQL SERVER – Guest Posts – Feodor Georgiev – The Context of Our Database Environment – Going Beyond the Internal SQL Server Waits – Wait Type – Day 21 of 28

- by pinaldave

This guest post is submitted by Feodor. Feodor Georgiev is a SQL Server database specialist with extensive experience of thinking both within and outside the box. He has wide experience of different systems and solutions in the fields of architecture, scalability, performance, etc. Feodor has experience with SQL Server 2000 and later versions, and is certified in SQL Server 2008. In this article Feodor explains the server-client-server process, and concentrated on the mutual waits between client and SQL Server. This is essential in grasping the concept of waits in a ‘global’ application plan. Recently I was asked to write a blog post about the wait statistics in SQL Server and since I had been thinking about writing it for quite some time now, here it is. It is a wide-spread idea that the wait statistics in SQL Server will tell you everything about your performance. Well, almost. Or should I say – barely. The reason for this is that SQL Server is always a part of a bigger system – there are always other players in the game: whether it is a client application, web service, any other kind of data import/export process and so on. In short, the SQL Server surroundings look like this: This means that SQL Server, aside from its internal waits, also depends on external waits and settings. As we can see in the picture above, SQL Server needs to have an interface in order to communicate with the surrounding clients over the network. For this communication, SQL Server uses protocol interfaces. I will not go into detail about which protocols are best, but you can read this article. Also, review the information about the TDS (Tabular data stream). As we all know, our system is only as fast as its slowest component. This means that when we look at our environment as a whole, the SQL Server might be a victim of external pressure, no matter how well we have tuned our database server performance. Let’s dive into an example: let’s say that we have a web server, hosting a web application which is using data from our SQL Server, hosted on another server. The network card of the web server for some reason is malfunctioning (think of a hardware failure, driver failure, or just improper setup) and does not send/receive data faster than 10Mbs. On the other end, our SQL Server will not be able to send/receive data at a faster rate either. This means that the application users will notify the support team and will say: “My data is coming very slow.” Now, let’s move on to a bit more exciting example: imagine that there is a similar setup as the example above – one web server and one database server, and the application is not using any stored procedure calls, but instead for every user request the application is sending 80kb query over the network to the SQL Server. (I really thought this does not happen in real life until I saw it one day.) So, what happens in this case? To make things worse, let’s say that the 80kb query text is submitted from the application to the SQL Server at least 100 times per minute, and as often as 300 times per minute in peak times. Here is what happens: in order for this query to reach the SQL Server, it will have to be broken into a of number network packets (according to the packet size settings) – and will travel over the network. On the other side, our SQL Server network card will receive the packets, will pass them to our network layer, the packets will get assembled, and eventually SQL Server will start processing the query – parsing, allegorizing, generating the query execution plan and so on. So far, we have already had a serious network overhead by waiting for the packets to reach our Database Engine. There will certainly be some processing overhead – until the database engine deals with the 80kb query and its 20 subqueries. The waits you see in the DMVs are actually collected from the point the query reaches the SQL Server and the packets are assembled. Let’s say that our query is processed and it finally returns 15000 rows. These rows have a certain size as well, depending on the data types returned. This means that the data will have converted to packages (depending on the network size package settings) and will have to reach the application server. There will also be waits, however, this time you will be able to see a wait type in the DMVs called ASYNC_NETWORK_IO. What this wait type indicates is that the client is not consuming the data fast enough and the network buffers are filling up. Recently Pinal Dave posted a blog on Client Statistics. What Client Statistics does is captures the physical flow characteristics of the query between the client(Management Studio, in this case) and the server and back to the client. As you see in the image, there are three categories: Query Profile Statistics, Network Statistics and Time Statistics. Number of server roundtrips–a roundtrip consists of a request sent to the server and a reply from the server to the client. For example, if your query has three select statements, and they are separated by ‘GO’ command, then there will be three different roundtrips. TDS Packets sent from the client – TDS (tabular data stream) is the language which SQL Server speaks, and in order for applications to communicate with SQL Server, they need to pack the requests in TDS packets. TDS Packets sent from the client is the number of packets sent from the client; in case the request is large, then it may need more buffers, and eventually might even need more server roundtrips. TDS packets received from server –is the TDS packets sent by the server to the client during the query execution. Bytes sent from client – is the volume of the data set to our SQL Server, measured in bytes; i.e. how big of a query we have sent to the SQL Server. This is why it is best to use stored procedures, since the reusable code (which already exists as an object in the SQL Server) will only be called as a name of procedure + parameters, and this will minimize the network pressure. Bytes received from server – is the amount of data the SQL Server has sent to the client, measured in bytes. Depending on the number of rows and the datatypes involved, this number will vary. But still, think about the network load when you request data from SQL Server. Client processing time – is the amount of time spent in milliseconds between the first received response packet and the last received response packet by the client. Wait time on server replies – is the time in milliseconds between the last request packet which left the client and the first response packet which came back from the server to the client. Total execution time – is the sum of client processing time and wait time on server replies (the SQL Server internal processing time) Here is an illustration of the Client-server communication model which should help you understand the mutual waits in a client-server environment. Keep in mind that a query with a large ‘wait time on server replies’ means the server took a long time to produce the very first row. This is usual on queries that have operators that need the entire sub-query to evaluate before they proceed (for example, sort and top operators). However, a query with a very short ‘wait time on server replies’ means that the query was able to return the first row fast. However a long ‘client processing time’ does not necessarily imply the client spent a lot of time processing and the server was blocked waiting on the client. It can simply mean that the server continued to return rows from the result and this is how long it took until the very last row was returned. The bottom line is that developers and DBAs should work together and think carefully of the resource utilization in the client-server environment. From experience I can say that so far I have seen only cases when the application developers and the Database developers are on their own and do not ask questions about the other party’s world. I would recommend using the Client Statistics tool during new development to track the performance of the queries, and also to find a synchronous way of utilizing resources between the client – server – client. Here is another example: think about similar setup as above, but add another server to the game. Let’s say that we keep our media on a separate server, and together with the data from our SQL Server we need to display some images on the webpage requested by our user. No matter how simple or complicated the logic to get the images is, if the images are 500kb each our users will get the page slowly and they will still think that there is something wrong with our data. Anyway, I don’t mean to get carried away too far from SQL Server. Instead, what I would like to say is that DBAs should also be aware of ‘the big picture’. I wrote a blog post a while back on this topic, and if you are interested, you can read it here about the big picture. And finally, here are some guidelines for monitoring the network performance and improving it: Run a trace and outline all queries that return more than 1000 rows (in Profiler you can actually filter and sort the captured trace by number of returned rows). This is not a set number; it is more of a guideline. The general thought is that no application user can consume that many rows at once. Ask yourself and your fellow-developers: ‘why?’. Monitor your network counters in Perfmon: Network Interface:Output queue length, Redirector:Network errors/sec, TCPv4: Segments retransmitted/sec and so on. Make sure to establish a good friendship with your network administrator (buy them coffee, for example J ) and get into a conversation about the network settings. Have them explain to you how the network cards are setup – are they standalone, are they ‘teamed’, what are the settings – full duplex and so on. Find some time to read a bit about networking. In this short blog post I hope I have turned your attention to ‘the big picture’ and the fact that there are other factors affecting our SQL Server, aside from its internal workings. As a further reading I would still highly recommend the Wait Stats series on this blog, also I would recommend you have the coffee break conversation with your network admin as soon as possible. This guest post is written by Feodor Georgiev. Read all the post in the Wait Types and Queue series. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL

Read the article
Back to Basics: When does a .NET Assembly Dependency get loaded

- by Rick Strahl

When we work on typical day to day applications, it's easy to forget some of the core features of the .NET framework. For me personally it's been a long time since I've learned about some of the underlying CLR system level services even though I rely on them on a daily basis. I often think only about high level application constructs and/or high level framework functionality, but the low level stuff is often just taken for granted. Over the last week at DevConnections I had all sorts of low level discussions with other developers about the inner workings of this or that technology (especially in light of my Low Level ASP.NET Architecture talk and the Razor Hosting talk). One topic that came up a couple of times and ended up a point of confusion even amongst some seasoned developers (including some folks from Microsoft <snicker>) is when assemblies actually load into a .NET process. There are a number of different ways that assemblies are loaded in .NET. When you create a typical project assemblies usually come from: The Assembly reference list of the top level 'executable' project The Assembly references of referenced projects Dynamically loaded at runtime via AppDomain/Reflection loading In addition .NET automatically loads mscorlib (most of the System namespace) the boot process that hosts the .NET runtime in EXE apps, or some other kind of runtime hosting environment (runtime hosting in servers like IIS, SQL Server or COM Interop). In hosting environments the runtime host may also pre-load a bunch of assemblies on its own (for example the ASP.NET host requires all sorts of assemblies just to run itself, before ever routing into your user specific code). Assembly Loading The most obvious source of loaded assemblies is the top level application's assembly reference list. You can add assembly references to a top level application and those assembly references are then available to the application. In a nutshell, referenced assemblies are not immediately loaded - they are loaded on the fly as needed. So regardless of whether you have an assembly reference in a top level project, or a dependent assembly assemblies typically load on an as needed basis, unless explicitly loaded by user code. The same is true of dependent assemblies. To check this out I ran a simple test: I have a utility assembly Westwind.Utilities which is a general purpose library that can work in any type of project. Due to a couple of small requirements for encoding and a logging piece that allows logging Web content (dependency on HttpContext.Current) this utility library has a dependency on System.Web. Now System.Web is a pretty large assembly and generally you'd want to avoid adding it to a non-Web project if it can be helped. So I created a Console Application that loads my utility library: You can see that the top level Console app a reference to Westwind.Utilities and System.Data (beyond the core .NET libs). The Westwind.Utilities project on the other hand has quite a few dependencies including System.Web. I then add a main program that accesses only a simple utillity method in the Westwind.Utilities library that doesn't require any of the classes that access System.Web: static void Main(string[] args) { Console.WriteLine(StringUtils.NewStringId()); Console.ReadLine(); } StringUtils.NewStringId() calls into Westwind.Utilities, but it doesn't rely on System.Web. Any guesses what the assembly list looks like when I stop the code on the ReadLine() command? I'll wait here while you think about it… … … So, when I stop on ReadLine() and then fire up Process Explorer and check the assembly list I get: We can see here that .NET has not actually loaded any of the dependencies of the Westwind.Utilities assembly. Also not loaded is the top level System.Data reference even though it's in the dependent assembly list of the top level project. Since this particular function I called only uses core System functionality (contained in mscorlib) there's in fact nothing else loaded beyond the main application and my Westwind.Utilities assembly that contains the method accessed. None of the dependencies of Westwind.Utilities loaded. If you were to open the assembly in a disassembler like Reflector or ILSpy, you would however see all the compiled in dependencies. The referenced assemblies are in the dependency list and they are loadable, but they are not immediately loaded by the application. In other words the C# compiler and .NET linker are smart enough to figure out the dependencies based on the code that actually is referenced from your application and any dependencies cascading down into the dependencies from your top level application into the referenced assemblies. In the example above the usage requirement is pretty obvious since I'm only calling a single static method and then exiting the app, but in more complex applications these dependency relationships become very complicated - however it's all taken care of by the compiler and linker figuring out what types and members are actually referenced and including only those assemblies that are in fact referenced in your code or required by any of your dependencies. The good news here is: That if you are referencing an assembly that has a dependency on something like System.Web in a few places that are not actually accessed by any of your code or any dependent assembly code that you are calling, that assembly is never loaded into memory! Some Hosting Environments pre-load Assemblies The load behavior can vary however. In Console and desktop applications we have full control over assembly loading so we see the core CLR behavior. However other environments like ASP.NET for example will preload referenced assemblies explicitly as part of the startup process - primarily to minimize load conflicts. Specifically ASP.NET pre-loads all assemblies referenced in the assembly list and the /bin folder. So in Web applications it definitely pays to minimize your top level assemblies if they are not used. Understanding when Assemblies Load To clarify and see it actually happen what I described in the first example , let's look at a couple of other scenarios. To see assemblies loading at runtime in real time lets create a utility function to print out loaded assemblies to the console: public static void PrintAssemblies() { var assemblies = AppDomain.CurrentDomain.GetAssemblies(); foreach (var assembly in assemblies) { Console.WriteLine(assembly.GetName()); } } Now let's look at the first scenario where I have class method that references internally uses System.Web. In the first scenario lets add a method to my main program like this: static void Main(string[] args) { Console.WriteLine(StringUtils.NewStringId()); Console.ReadLine(); PrintAssemblies(); } public static void WebLogEntry() { var entry = new WebLogEntry(); entry.UpdateFromRequest(); Console.WriteLine(entry.QueryString); } UpdateFromWebRequest() internally accesses HttpContext.Current to read some information of the ASP.NET Request object so it clearly needs a reference System.Web to work. In this first example, the method that holds the calling code is never called, but exists as a static method that can potentially be called externally at some point. What do you think will happen here with the assembly loading? Will System.Web load in this example? No - it doesn't. Because the WebLogEntry() method is never called by the mainline application (or anywhere else) System.Web is not loaded. .NET dynamically loads assemblies as code that needs it is called. No code references the WebLogEntry() method and so System.Web is never loaded. Next, let's add the call to this method, which should trigger System.Web to be loaded because a dependency exists. Let's change the code to: static void Main(string[] args) { Console.WriteLine(StringUtils.NewStringId()); Console.WriteLine("--- Before:"); PrintAssemblies(); WebLogEntry(); Console.WriteLine("--- After:"); PrintAssemblies(); Console.ReadLine(); } public static void WebLogEntry() { var entry = new WebLogEntry(); entry.UpdateFromRequest(); Console.WriteLine(entry.QueryString); } Looking at the code now, when do you think System.Web will be loaded? Will the before list include it? Yup System.Web gets loaded, but only after it's actually referenced. In fact, just until before the call to UpdateFromRequest() System.Web is not loaded - it only loads when the method is actually called and requires the reference in the executing code. Moral of the Story So what have we learned - or maybe remembered again? Dependent Assembly References are not pre-loaded when an application starts (by default) Dependent Assemblies that are not referenced by executing code are never loaded Dependent Assemblies are just in time loaded when first referenced in code All of this is nothing new - .NET has always worked like this. But it's good to have a refresher now and then and go through the exercise of seeing it work in action. It's not one of those things we think about everyday, and as I found out last week, I couldn't remember exactly how it worked since it's been so long since I've learned about this. And apparently I'm not the only one as several other people I had discussions with in relation to loaded assemblies also didn't recall exactly what should happen or assumed incorrectly that just having a reference automatically loads the assembly. The moral of the story for me is: Trying at all costs to eliminate an assembly reference from a component is not quite as important as it's often made out to be. For example, the Westwind.Utilities module described above has a logging component, including a Web specific logging entry that supports pulling information from the active HTTP Context. Adding that feature requires a reference to System.Web. Should I worry about this in the scope of this library? Probably not, because if I don't use that one class of nearly a hundred, System.Web never gets pulled into the parent process. IOW, System.Web only loads when I use that specific feature and if I am, well I clearly have to be running in a Web environment anyway to use it realistically. The alternative would be considerably uglier: Pulling out the WebLogEntry class and sticking it into another assembly and breaking up the logging code. In this case - definitely not worth it. So, .NET definitely goes through some pretty nifty optimizations to ensure that it loads only what it needs and in most cases you can just rely on .NET to do the right thing. Sometimes though assembly loading can go wrong (especially when signed and versioned local assemblies are involved), but that's subject for a whole other post…© Rick Strahl, West Wind Technologies, 2005-2012Posted in .NET CSharp Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Time Warp

- by Jesse

It’s no secret that daylight savings time can wreak havoc on systems that rely heavily on dates. The system I work on is centered around recording dates and times, so naturally my co-workers and I have seen our fair share of date-related bugs. From time to time, however, we come across something that we haven’t seen before. A few weeks ago the following error message started showing up in our logs: “The supplied DateTime represents an invalid time. For example, when the clock is adjusted forward, any time in the period that is skipped is invalid.” This seemed very cryptic, especially since it was coming from areas of our application that are typically only concerned with capturing date-only (no explicit time component) from the user, like reports that take a “start date” and “end date” parameter. For these types of parameters we just leave off the time component when capturing the date values, so midnight is used as a “placeholder” time. How is midnight an “invalid time”? Globalization Is Hard Over the last couple of years our software has been rolled out to users in several countries outside of the United States, including Brazil. Brazil begins and ends daylight savings time at midnight on pre-determined days of the year. On October 16, 2011 at midnight many areas in Brazil began observing daylight savings time at which time their clocks were set forward one hour. This means that at the instant it became midnight on October 16, it actually became 1:00 AM, so any time between 12:00 AM and 12:59:59 AM never actually happened. Because we store all date values in the database in UTC, always adjust any “local” dates provided by a user to UTC before using them as filters in a query. The error we saw was thrown by .NET when trying to convert the Brazilian local time of 2011-10-16 12:00 AM to UTC since that local time never actually existed. We hadn’t experienced this same issue with any of our US customers because the daylight savings time changes in the US occur at 2:00 AM which doesn’t conflict with our “placeholder” time of midnight. Detecting Invalid Times In .NET you might use code similar to the following for converting a local time to UTC: var localDate = new DateTime(2011, 10, 16); //2011-10-16 @ midnight const string timeZoneId = "E. South America Standard Time"; //Windows system timezone Id for "Brasilia" timezone. var localTimeZone = TimeZoneInfo.FindSystemTimeZoneById(timeZoneId); var convertedDate = TimeZoneInfo.ConvertTimeToUtc(localDate, localTimeZone); The code above throws the “invalid time” exception referenced above. We could try to detect whether or not the local time is invalid with something like this: var localDate = new DateTime(2011, 10, 16); //2011-10-16 @ midnight const string timeZoneId = "E. South America Standard Time"; //Windows system timezone Id for "Brasilia" timezone. var localTimeZone = TimeZoneInfo.FindSystemTimeZoneById(timeZoneId); if (localTimeZone.IsInvalidTime(localDate)) localDate = localDate.AddHours(1); var convertedDate = TimeZoneInfo.ConvertTimeToUtc(localDate, localTimeZone); This code works in this particular scenario, but it hardly seems robust. It also does nothing to address the issue that can arise when dealing with the ambiguous times that fall around the end of daylight savings. When we roll the clocks back an hour they record the same hour on the same day twice in a row. To continue on with our Brazil example, on February 19, 2012 at 12:00 AM, it will immediately become February 18, 2012 at 11:00 PM all over again. In this scenario, how should we interpret February 18, 2011 11:30 PM? Enter Noda Time I heard about Noda Time, the .NET port of the Java library Joda Time, a little while back and filed it away in the back of my mind under the “sounds-like-it-might-be-useful-someday” category. Let’s see how we might deal with the issue of invalid and ambiguous local times using Noda Time (note that as of this writing the samples below will only work using the latest code available from the Noda Time repo on Google Code. The NuGet package version 0.1.0 published 2011-08-19 will incorrectly report unambiguous times as being ambiguous) : var localDateTime = new LocalDateTime(2011, 10, 16, 0, 0); const string timeZoneId = "Brazil/East"; var timezone = DateTimeZone.ForId(timeZoneId); var localDateTimeMaping = timezone.MapLocalDateTime(localDateTime); ZonedDateTime unambiguousLocalDateTime; switch (localDateTimeMaping.Type) { case ZoneLocalMapping.ResultType.Unambiguous: unambiguousLocalDateTime = localDateTimeMaping.UnambiguousMapping; break; case ZoneLocalMapping.ResultType.Ambiguous: unambiguousLocalDateTime = localDateTimeMaping.EarlierMapping; break; case ZoneLocalMapping.ResultType.Skipped: unambiguousLocalDateTime = new ZonedDateTime( localDateTimeMaping.ZoneIntervalAfterTransition.Start, timezone); break; default: throw new InvalidOperationException(string.Format("Unexpected mapping result type: {0}", localDateTimeMaping.Type)); } var convertedDateTime = unambiguousLocalDateTime.ToInstant().ToDateTimeUtc(); Let’s break this sample down: I’m using the Noda Time ‘LocalDateTime’ object to represent the local date and time. I’ve provided the year, month, day, hour, and minute (zeros for the hour and minute here represent midnight). You can think of a ‘LocalDateTime’ as an “invalidated” date and time; there is no information available about the time zone that this date and time belong to, so Noda Time can’t make any guarantees about its ambiguity. The ‘timeZoneId’ in this sample is different than the ones above. In order to use the .NET TimeZoneInfo class we need to provide Windows time zone ids. Noda Time expects an Olson (tz / zoneinfo) time zone identifier and does not currently offer any means of mapping the Windows time zones to their Olson counterparts, though project owner Jon Skeet has said that some sort of mapping will be publicly accessible at some point in the future. I’m making use of the Noda Time ‘DateTimeZone.MapLocalDateTime’ method to disambiguate the original local date time value. This method returns an instance of the Noda Time object ‘ZoneLocalMapping’ containing information about the provided local date time maps to the provided time zone. The disambiguated local date and time value will be stored in the ‘unambiguousLocalDateTime’ variable as an instance of the Noda Time ‘ZonedDateTime’ object. An instance of this object represents a completely unambiguous point in time and is comprised of a local date and time, a time zone, and an offset from UTC. Instances of ZonedDateTime can only be created from within the Noda Time assembly (the constructor is ‘internal’) to ensure to callers that each instance represents an unambiguous point in time. The value of the ‘unambiguousLocalDateTime’ might vary depending upon the ‘ResultType’ returned by the ‘MapLocalDateTime’ method. There are three possible outcomes: If the provided local date time is unambiguous in the provided time zone I can immediately set the ‘unambiguousLocalDateTime’ variable from the ‘Unambiguous Mapping’ property of the mapping returned by the ‘MapLocalDateTime’ method. If the provided local date time is ambiguous in the provided time zone (i.e. it falls in an hour that was repeated when moving clocks backward from Daylight Savings to Standard Time), I can use the ‘EarlierMapping’ property to get the earlier of the two possible local dates to define the unambiguous local date and time that I need. I could have also opted to use the ‘LaterMapping’ property in this case, or even returned an error and asked the user to specify the proper choice. The important thing to note here is that as the programmer I’ve been forced to deal with what appears to be an ambiguous date and time. If the provided local date time represents a skipped time (i.e. it falls in an hour that was skipped when moving clocks forward from Standard Time to Daylight Savings Time), I have access to the time intervals that fell immediately before and immediately after the point in time that caused my date to be skipped. In this case I have opted to disambiguate my local date and time by moving it forward to the beginning of the interval immediately following the skipped period. Again, I could opt to use the end of the interval immediately preceding the skipped period, or raise an error depending on the needs of the application. The point of this code is to convert a local date and time to a UTC date and time for use in a SQL Server database, so the final ‘convertedDate’ variable (typed as a plain old .NET DateTime) has its value set from a Noda Time ‘Instant’. An 'Instant’ represents a number of ticks since 1970-01-01 at midnight (Unix epoch) and can easily be converted to a .NET DateTime in the UTC time zone using the ‘ToDateTimeUtc()’ method. This sample is admittedly contrived and could certainly use some refactoring, but I think it captures the general approach needed to take a local date and time and convert it to UTC with Noda Time. At first glance it might seem that Noda Time makes this “simple” code more complicated and verbose because it forces you to explicitly deal with the local date disambiguation, but I feel that the length and complexity of the Noda Time sample is proportionate to the complexity of the problem. Using TimeZoneInfo leaves you susceptible to overlooking ambiguous and skipped times that could result in run-time errors or (even worse) run-time data corruption in the form of a local date and time being adjusted to UTC incorrectly. I should point out that this research is my first look at Noda Time and I know that I’ve only scratched the surface of its full capabilities. I also think it’s safe to say that it’s still beta software for the time being so I’m not rushing out to use it production systems just yet, but I will definitely be tinkering with it more and keeping an eye on it as it progresses.

Read the article
Building the Ultimate SharePoint 2010 Development Environment

- by Manesh Karunakaran

It’s been more than a month since SharePoint 2010 RTMed. And a lot of people have downloaded and set up their very own SharePoint 2010 development rigs. And quite a few people have written blogs about setting up good development environments, there is even an MSDN article on it. Two of the blogs worth noting are from MVPs Sahil Malik and Wictor Wilén. Make sure that you check these out as well. Part of the bad side-effects of being a geek is the need to do the technical stuff the best way possible (pragmatic or otherwise), but the problem with this is that what is considered “best” is relative. Precisely the reason why you are reading this post now. Most of the posts that I read are out dated/need updations or are using the wrong OS’es or virtualization solutions (again, opinions vary) or using them the wrong way. Here’s a developer’s view of Building the Ultimate SharePoint 2010 Development Rig. If you are a sales guy, it’s time to close this window. Confusion 1: Which Host Operating System and Virtualization Solution to use? This point has been beaten to death in numerous blog posts in the past, if you have time to invest, read this excellent post by our very own SharePoint Joel on this subject. But if you are planning to build the Ultimate Development Rig, then Windows Server 2008 R2 with Hyper-V is the option that you should be looking at. I have been using this as my primary OS for about 6-7 months now, and I haven’t had any Driver issue or Application compatibility issue. In my experience all the Windows 7 drivers work fine with WIN2008 R2 also. You can enable Aero for eye candy (and the Windows 7 look and feel) and except for a few things like the Hibernation support (which a can be enabled if you really want it), Windows Server 2008 R2, is the best Workstation OS that I have used till date. But frankly the answer to this question of which OS to use depends primarily on one question - Are you willing to change your primary OS? If the answer to that is ‘Yes’, then Windows 2008 R2 with Hyper-V is the best option, if not look at vmWare or VirtualBox, both are equally good. Those who are familiar with a Virtual PC background might prefer Sun VirtualBox. Besides, these provide support for running 64 bit guest machines on 32 bit hosts if the underlying hardware is truly 64 bit. See my earlier post on this. Since we are going to make the ultimate rig, we will use Windows Server 2008 R2 with Hyper-V, for reasons mentioned above. Confusion 2: Should I use a multi-(virtual) server set up? A lot of people use multiple servers for their development environments - like Wictor Wilén is suggesting - one server hosting the Active directory, one hosting SharePoint Server and another one for SQL Server. True, this mimics the production environment the best possible way, but as somebody who has fallen for this set up earlier, I can tell you that you don’t really get anything by doing this. Microsoft has done well to ensure that if you can do it on one machine, you can do it in a farm environment as well. Besides, when you run multiple Server class machine instances in parallel, there are a lot of unwanted processor cycles wasted for no good use. In my personal experience, as somebody who needs to switch between MOSS 2007/SharePoint 2010 environments from time to time, the best possible solution is to Make the host Windows Server 2008 R2 machine your Domain Controller (AD Server) Make all your Virtual Guest OS’es join this domain. Have each Individual Guest OS Image have it’s own local SQL Server instance. The advantages are that you can reuse the users and groups in each of the Guest operating systems, you can manage the users in one place, AD is light weight and doesn't take too much resources on your host machine and also having separate SQL instances for each of the Development images gives you maximum flexibility in terms of configuration, for example your SharePoint rigs can have simpler DB configurations, compared to your MS BI blast pits. Confusion 3: Which Operating System should I use to run SharePoint 2010 Now that’s a no brainer. Use Windows 2008 R2 as your Guest OS. When you are building the ultimate rig, why compromise? If you are planning to run Windows Server 2008 as your Guest OS, there are a few patches that you need to install at different times during the installation, for that follow the steps mentioned here Okay now that we have made our choices, let’s get to the interesting part of building the rig, Step 1: Prepare the host machine – Install Windows Server 2008 R2 Install Windows Server 2008 R2 on your best Desktop/Laptop. If you have read this far, I am quite sure that you are somebody who can install an OS on your own, so go ahead and do that. Make sure that you run the compatibility wizard before you go ahead and nuke your current OS. There are plenty of blogs telling you how to make a good Windows 2008 R2 Workstation that feels and behaves like a Windows 7 machine, follow one and once you are done, head to Step 2. Step 2: Configure the host machine as a Domain Controller Before we begin this, let me tell you, this step is completely optional, you don’t really need to do this, you can simply use the local users on the Guest machines instead, but if this is a much cleaner approach to manage users and groups if you run multiple guest operating systems. This post neatly explains how to configure your Windows Server 2008 R2 host machine as a Domain Controller. Follow those simple steps and you are good to go. If you are not able to get it to work, try this. Step 3: Prepare the guest machine – Install Windows Server 2008 R2 Open Hyper-V Manager Choose to Create a new Guest Operating system Allocate at least 2 GB of Memory to the Guest OS Choose the Windows 2008 R2 Installation Media Start the Virtual Machine to commence installation. Once the Installation is done, Activate the OS. Step 4: Make the Guest operating systems Join the Domain This step is quite simple, just follow these steps below, Fire up Hyper-V Manager, open your Guest OS Click on Start, and Right click on ‘Computer’ and choose ‘Properties’ On the window that pops-up, click on ‘Change Settings’ On the ‘System Properties’ Window that comes up, Click on the ‘Change’ button Now a window named ‘Computer Name/Domain Changes’ opens up, In the text box titled Domain, type in the Domain name from Step 2. Click Ok and windows will show you the welcome to domain message and ask you to restart the machine, click OK to restart. If the addition to domain fails, that means that you have not set up networking in Hyper-V for the Guest OS to communicate with the Host. To enable it, follow the steps I had mentioned in this post earlier. Step 5: Install SQL Server 2008 R2 on the Guest Machine SQL Server 2008 R2 gets installed with out hassle on Windows Server 2008 R2. SQL Server 2008 needs SP2 to work properly on WIN2008 R2. Also SQL Server 2008 R2 allows you to directly add PowerPivot support to SharePoint. Choose to install in SharePoint Integrated Mode in Reporting Server Configuration. Step 6: Install KB971831 and SharePoint 2010 Pre-requisites Now install the WCF Hotfix for Microsoft Windows (KB971831) from this location, and SharePoint 2010 Pre-requisites from the SP2010 Installation media. Step 7: Install and Configure SharePoint 2010 Install SharePoint 2010 from the installation media, after the installation is complete, you are prompted to start the SharePoint Products and Technologies Configuration Wizard. If you are using a local instance of Microsoft SQL Server 2008, install the Microsoft SQL Server 2008 KB 970315 x64 before starting the wizard. If your development environment uses a remote instance of Microsoft SQL Server 2008 or if it has a pre-existing installation of Microsoft SQL Server 2008 on which KB 970315 x64 has already been applied, this step is not necessary. With the wizard open, do the following: Install SQL Server 2008 KB 970315 x64. After the Microsoft SQL Server 2008 KB 970315 x64 installation is finished, complete the wizard. Alternatively, you can choose not to run the wizard by clearing the SharePoint Products and Technologies Configuration Wizard check box and closing the completed installation dialog box. Install SQL Server 2008 KB 970315 x64, and then manually start the SharePoint Products and Technologies Configuration Wizard by opening a Command Prompt window and executing the following command: C:\Program Files\Common Files\Microsoft Shared Debug\Web Server Extensions\14\BIN\psconfigui.exe The SharePoint Products and Technologies Configuration Wizard may fail if you are using a computer that is joined to a domain but that is not connected to a domain controller. Step 8: Install Visual Studio 2010 and SharePoint 2010 SDK Install Visual Studio 2010 Download and Install the Microsoft SharePoint 2010 SDK Step 9: Install PowerPivot for SharePoint and Configure Reporting Services Pop-In the SQLServer 2008 R2 installation media once again and install PowerPivot for SharePoint. This will get added as another instance named POWERPIVOT. Configure Reporting Services by following the steps mentioned here, if you need to get down to the details on how the integration between SharePoint 2010 and SQL Server 2008 R2 works, see Working Together: SQL Server 2008 R2 Reporting Services Integration in SharePoint 2010 an excellent article by Alan Le Marquand Step 10: Download and Install Sample Databases for Microsoft SQL Server 2008R2 SharePoint 2010 comes with a lot of cool stuff like PerformancePoint Services and BCS, if you need to try these out, you need to have data in your databases. So if you want to save yourself the trouble of creating sample data for your PerformancePoint and BCS experiments, download and install Sample Databases for Microsoft SQL Server 2008R2 from CodePlex. And you are done! Fire up your Visual Studio 2010 and Start Coding away!!

Read the article
Performance Optimization – It Is Faster When You Can Measure It

- by Alois Kraus

Performance optimization in bigger systems is hard because the measured numbers can vary greatly depending on the measurement method of your choice. To measure execution timing of specific methods in your application you usually use Time Measurement Method Potential Pitfalls Stopwatch Most accurate method on recent processors. Internally it uses the RDTSC instruction. Since the counter is processor specific you can get greatly different values when your thread is scheduled to another core or the core goes into a power saving mode. But things do change luckily: Intel's Designer's vol3b, section 16.11.1 "16.11.1 Invariant TSC The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource." DateTime.Now Good but it has only a resolution of 16ms which can be not enough if you want more accuracy. Reporting Method Potential Pitfalls Console.WriteLine Ok if not called too often. Debug.Print Are you really measuring performance with Debug Builds? Shame on you. Trace.WriteLine Better but you need to plug in some good output listener like a trace file. But be aware that the first time you call this method it will read your app.config and deserialize your system.diagnostics section which does also take time. In general it is a good idea to use some tracing library which does measure the timing for you and you only need to decorate some methods with tracing so you can later verify if something has changed for the better or worse. In my previous article I did compare measuring performance with quantum mechanics. This analogy does work surprising well. When you measure a quantum system there is a lower limit how accurately you can measure something. The Heisenberg uncertainty relation does tell us that you cannot measure of a quantum system the impulse and location of a particle at the same time with infinite accuracy. For programmers the two variables are execution time and memory allocations. If you try to measure the timings of all methods in your application you will need to store them somewhere. The fastest storage space besides the CPU cache is the memory. But if your timing values do consume all available memory there is no memory left for the actual application to run. On the other hand if you try to record all memory allocations of your application you will also need to store the data somewhere. This will cost you memory and execution time. These constraints are always there and regardless how good the marketing of tool vendors for performance and memory profilers are: Any measurement will disturb the system in a non predictable way. Commercial tool vendors will tell you they do calculate this overhead and subtract it from the measured values to give you the most accurate values but in reality it is not entirely true. After falling into the trap to trust the profiler timings several times I have got into the habit to Measure with a profiler to get an idea where potential bottlenecks are. Measure again with tracing only the specific methods to check if this method is really worth optimizing. Optimize it Measure again. Be surprised that your optimization has made things worse. Think harder Implement something that really works. Measure again Finished! - Or look for the next bottleneck. Recently I have looked into issues with serialization performance. For serialization DataContractSerializer was used and I was not sure if XML is really the most optimal wire format. After looking around I have found protobuf-net which uses Googles Protocol Buffer format which is a compact binary serialization format. What is good for Google should be good for us. A small sample app to check out performance was a matter of minutes: using ProtoBuf; using System; using System.Diagnostics; using System.IO; using System.Reflection; using System.Runtime.Serialization; [DataContract, Serializable] class Data { [DataMember(Order=1)] public int IntValue { get; set; } [DataMember(Order = 2)] public string StringValue { get; set; } [DataMember(Order = 3)] public bool IsActivated { get; set; } [DataMember(Order = 4)] public BindingFlags Flags { get; set; } } class Program { static MemoryStream _Stream = new MemoryStream(); static MemoryStream Stream { get { _Stream.Position = 0; _Stream.SetLength(0); return _Stream; } } static void Main(string[] args) { DataContractSerializer ser = new DataContractSerializer(typeof(Data)); Data data = new Data { IntValue = 100, IsActivated = true, StringValue = "Hi this is a small string value to check if serialization does work as expected" }; var sw = Stopwatch.StartNew(); int Runs = 1000 * 1000; for (int i = 0; i < Runs; i++) { //ser.WriteObject(Stream, data); Serializer.Serialize<Data>(Stream, data); } sw.Stop(); Console.WriteLine("Did take {0:N0}ms for {1:N0} objects", sw.Elapsed.TotalMilliseconds, Runs); Console.ReadLine(); } } The results are indeed promising: Serializer Time in ms N objects protobuf-net 807 1000000 DataContract 4402 1000000 Nearly a factor 5 faster and a much more compact wire format. Lets use it! After switching over to protbuf-net the transfered wire data has dropped by a factor two (good) and the performance has worsened by nearly a factor two. How is that possible? We have measured it? Protobuf-net is much faster! As it turns out protobuf-net is faster but it has a cost: For the first time a type is de/serialized it does use some very smart code-gen which does not come for free. Lets try to measure this one by setting of our performance test app the Runs value not to one million but to 1. Serializer Time in ms N objects protobuf-net 85 1 DataContract 24 1 The code-gen overhead is significant and can take up to 200ms for more complex types. The break even point where the code-gen cost is amortized by its faster serialization performance is (assuming small objects) somewhere between 20.000-40.000 serialized objects. As it turned out my specific scenario involved about 100 types and 1000 serializations in total. That explains why the good old DataContractSerializer is not so easy to take out of business. The final approach I ended up was to reduce the number of types and to serialize primitive types via BinaryWriter directly which turned out to be a pretty good alternative. It sounded good until I measured again and found that my optimizations so far do not help much. After looking more deeper at the profiling data I did found that one of the 1000 calls did take 50% of the time. So how do I find out which call it was? Normal profilers do fail short at this discipline. A (totally undeserved) relatively unknown profiler is SpeedTrace which does unlike normal profilers create traces of your applications by instrumenting your IL code at runtime. This way you can look at the full call stack of the one slow serializer call to find out if this stack was something special. Unfortunately the call stack showed nothing special. But luckily I have my own tracing as well and I could see that the slow serializer call did happen during the serialization of a bool value. When you encounter after much analysis something unreasonable you cannot explain it then the chances are good that your thread was suspended by the garbage collector. If there is a problem with excessive GCs remains to be investigated but so far the serialization performance seems to be mostly ok. When you do profile a complex system with many interconnected processes you can never be sure that the timings you just did measure are accurate at all. Some process might be hitting the disc slowing things down for all other processes for some seconds as well. There is a big difference between warm and cold startup. If you restart all processes you can basically forget the first run because of the OS disc cache, JIT and GCs make the measured timings very flexible. When you are in need of a random number generator you should measure cold startup times of a sufficiently complex system. After the first run you can try again getting different and much lower numbers. Now try again at least two times to get some feeling how stable the numbers are. Oh and try to do the same thing the next day. It might be that the bottleneck you found yesterday is gone today. Thanks to GC and other random stuff it can become pretty hard to find stuff worth optimizing if no big bottlenecks except bloatloads of code are left anymore. When I have found a spot worth optimizing I do make the code changes and do measure again to check if something has changed. If it has got slower and I am certain that my change should have made it faster I can blame the GC again. The thing is that if you optimize stuff and you allocate less objects the GC times will shift to some other location. If you are unlucky it will make your faster working code slower because you see now GCs at times where none were before. This is where the stuff does get really tricky. A safe escape hatch is to create a repro of the slow code in an isolated application so you can change things fast in a reliable manner. Then the normal profilers do also start working again. As Vance Morrison does point out it is much more complex to profile a system against the wall clock compared to optimize for CPU time. The reason is that for wall clock time analysis you need to understand how your system does work and which threads (if you have not one but perhaps 20) are causing a visible delay to the end user and which threads can wait a long time without affecting the user experience at all. Next time: Commercial profiler shootout.

Read the article
Using Table-Valued Parameters With SQL Server Reporting Services

- by Jesse

In my last post I talked about using table-valued parameters to pass a list of integer values to a stored procedure without resorting to using comma-delimited strings and parsing out each value into a TABLE variable. In this post I’ll extend the “Customer Transaction Summary” report example to see how we might leverage this same stored procedure from within an SQL Server Reporting Services (SSRS) report. I’ve worked with SSRS off and on for the past several years and have generally found it to be a very useful tool for building nice-looking reports for end users quickly and easily. That said, I’ve been frustrated by SSRS from time to time when seemingly simple things are difficult to accomplish or simply not supported at all. I thought that using table-valued parameters from within a SSRS report would be simple, but unfortunately I was wrong. Customer Transaction Summary Example Let’s take the “Customer Transaction Summary” report example from the last post and try to plug that same stored procedure into an SSRS report. Our report will have three parameters: Start Date – beginning of the date range for which the report will summarize customer transactions End Date – end of the date range for which the report will summarize customer transactions Customer Ids – One or more customer Ids representing the customers that will be included in the report The simplest way to get started with this report will be to create a new dataset and point it at our Customer Transaction Summary report stored procedure (note that I’m using SSRS 2012 in the screenshots below, but there should be little to no difference with SSRS 2008): When you initially create this dataset the SSRS designer will try to invoke the stored procedure to determine what the parameters and output fields are for you automatically. As part of this process the following dialog pops-up: Obviously I can’t use this dialog to specify a value for the ‘@customerIds’ parameter since it is of the IntegerListTableType user-defined type that we created in the last post. Unfortunately this really throws the SSRS designer for a loop, and regardless of what combination of Data Type, Pass Null Value, or Parameter Value I used here, I kept getting this error dialog with the message, "Operand type clash: nvarchar is incompatible with IntegerListTableType". This error message makes some sense considering that the nvarchar type is indeed incompatible with the IntegerListTableType, but there’s little clue given as to how to remedy the situation. I don’t know for sure, but I think that behind-the-scenes the SSRS designer is trying to give the @customerIds parameter an nvarchar-typed SqlParameter which is causing the issue. When I first saw this error I figured that this might just be a limitation of the dataset designer and that I’d be able to work around the issue by manually defining the parameters. I know that there are some special steps that need to be taken when invoking a stored procedure with a table-valued parameter from ADO .NET, so I figured that I might be able to use some custom code embedded in the report to create a SqlParameter instance with the needed properties and value to make this work, but the “Operand type clash" error message persisted. The Text Query Approach Just because we’re using a stored procedure to create the dataset for this report doesn’t mean that we can’t use the ‘Text’ Query Type option and construct an EXEC statement that will invoke the stored procedure. In order for this to work properly the EXEC statement will also need to declare and populate an IntegerListTableType variable to pass into the stored procedure. Before I go any further I want to make one point clear: this is a really ugly hack and it makes me cringe to do it. Simply put, I strongly feel that it should not be this difficult to use a table-valued parameter with SSRS. With that said, let’s take a look at what we’ll have to do to make this work. Manually Define Parameters First, we’ll need to manually define the parameters for report by right-clicking on the ‘Parameters’ folder in the ‘Report Data’ window. We’ll need to define the ‘@startDate’ and ‘@endDate’ as simple date parameters. We’ll also create a parameter called ‘@customerIds’ that will be a mutli-valued Integer parameter: In the ‘Available Values’ tab we’ll point this parameter at a simple dataset that just returns the CustomerId and CustomerName of each row in the Customers table of the database or manually define a handful of Customer Id values to make available when the report runs. Once we have these parameters properly defined we can take another crack at creating the dataset that will invoke the ‘rpt_CustomerTransactionSummary’ stored procedure. This time we’ll choose the ‘Text’ query type option and put the following into the ‘Query’ text area: 1: exec('declare @customerIdList IntegerListTableType ' + @customerIdInserts + 2: ' EXEC rpt_CustomerTransactionSummary 3: @startDate=''' + @startDate + ''', 4: @endDate='''+ @endDate + ''', 5: @customerIds=@customerIdList') By using the ‘Text’ query type we can enter any arbitrary SQL that we we want to and then use parameters and string concatenation to inject pieces of that query at run time. It can be a bit tricky to parse this out at first glance, but from the SSRS designer’s point of view this query defines three parameters: @customerIdInserts – This will be a Text parameter that we use to define INSERT statements that will populate the @customerIdList variable that is being declared in the SQL. This parameter won’t actually ever get passed into the stored procedure. I’ll go into how this will work in a bit. @startDate – This is a simple date parameter that will get passed through directly into the @startDate parameter of the stored procedure on line 3. @endDate – This is another simple data parameter that will get passed through into the @endDate parameter of the stored procedure on line 4. At this point the dataset designer will be able to correctly parse the query and should even be able to detect the fields that the stored procedure will return without needing to specify any values for query when prompted to. Once the dataset has been correctly defined we’ll have a @customerIdInserts parameter listed in the ‘Parameters’ tab of the dataset designer. We need to define an expression for this parameter that will take the values selected by the user for the ‘@customerIds’ parameter that we defined earlier and convert them into INSERT statements that will populate the @customerIdList variable that we defined in our Text query. In order to do this we’ll need to add some custom code to our report using the ‘Report Properties’ dialog: Any custom code defined in the Report Properties dialog gets embedded into the .rdl of the report itself and (unfortunately) must be written in VB .NET. Note that you can also add references to custom .NET assemblies (which could be written in any language), but that’s outside the scope of this post so we’ll stick with the “quick and dirty” VB .NET approach for now. Here’s the VB .NET code (note that any embedded code that you add here must be defined in a static/shared function, though you can define as many functions as you want): 1: Public Shared Function BuildIntegerListInserts(ByVal variableName As String, ByVal paramValues As Object()) As String 2: Dim insertStatements As New System.Text.StringBuilder() 3: For Each paramValue As Object In paramValues 4: insertStatements.AppendLine(String.Format("INSERT {0} VALUES ({1})", variableName, paramValue)) 5: Next 6: Return insertStatements.ToString() 7: End Function This method takes a variable name and an array of objects. We use an array of objects here because that is how SSRS will pass us the values that were selected by the user at run-time. The method uses a StringBuilder to construct INSERT statements that will insert each value from the object array into the provided variable name. Once this method has been defined in the custom code for the report we can go back into the dataset designer’s Parameters tab and update the expression for the ‘@customerIdInserts’ parameter by clicking on the button with the “function” symbol that appears to the right of the parameter value. We’ll set the expression to: 1: =Code.BuildIntegerListInserts("@customerIdList ", Parameters!customerIds.Value) In order to invoke our custom code method we simply need to invoke “Code.<method name>” and pass in any needed parameters. The first parameter needs to match the name of the IntegerListTableType variable that we used in the EXEC statement of our query. The second parameter will come from the Value property of the ‘@customerIds’ parameter (this evaluates to an object array at run time). Finally, we’ll need to edit the properties of the ‘@customerIdInserts’ parameter on the report to mark it as a nullable internal parameter so that users aren’t prompted to provide a value for it when running the report. Limitations And Final Thoughts When I first started looking into the text query approach described above I wondered if there might be an upper limit to the size of the string that can be used to run a report. Obviously, the size of the actual query could increase pretty dramatically if you have a parameter that has a lot of potential values or you need to support several different table-valued parameters in the same query. I tested the example Customer Transaction Summary report with 1000 selected customers without any issue, but your mileage may vary depending on how much data you might need to pass into your query. If you think that the text query hack is a lot of work just to use a table-valued parameter, I agree! I think that it should be a lot easier than this to use a table-valued parameter from within SSRS, but so far I haven’t found a better way. It might be possible to create some custom .NET code that could build the EXEC statement for a given set of parameters automatically, but exploring that will have to wait for another post. For now, unless there’s a really compelling reason or requirement to use table-valued parameters from SSRS reports I would probably stick with the tried and true “join-multi-valued-parameter-to-CSV-and-split-in-the-query” approach for using mutli-valued parameters in a stored procedure.

Read the article
Windows Azure Evolution – TFS Integration (WAWS Part 2)

- by Shaun

So this is the fourth blog post about the new features of Windows Azure and the second part of Windows Azure Web Sites. But this is not just focus on the WAWS since the function I’m going to introduce is available in both Windows Azure Web Sites and Windows Azure Cloud Service (a.k.a. hosted service). In the previous post I talked about the Windows Azure Web Sites and how to use its gallery to build a WordPress personal blog without coding. Besides the gallery we can create an empty web site and upload our website from vary approaches. And one of the highlighted feature here is that, we can make our web site integrated with a source control service, such as TFS and Git, so that it will be deployed automatically once a new commit or build available. Create New Empty Web Site In the developer portal when creating a new web site, we can select QUICK CREATE item. This will create an empty web site with only one shared instance without any database associated. Let’s specify the URL, region and subscription and click OK. After a few seconds our website will be ready. And now we can click the BROWSE button to open this empty website. As you can see there is a welcome page available in my website even thought I didn’t upload or deploy anything. This means even though the website will be charged even before anything was deployed, similar as the cloud service (hosted service). It is because once we created a website, Windows Azure platform had arranged a hosting process (w3wp.exe) in the group of virtual machines. Create Project in TFS Preview Service and Setup Link Currently the Windows Azure Web Sites can integrate with TFS and Git as its deployment source, and it only support the Microsoft TFS Preview Service for now. I will not deep into how to use the TFS preview service in this post but once we click into the website we had just created and then clicked the “Set up TFS publishing”, there will be a dialog helping us to connect to this service. If you don’t have an account you can click the link shown below to request one. Assuming we have already had an account of TFS service then we need to create a new project firstly. Go to your TFS service website and create a new project, giving the project name, description and the process template. Then, back to the developer portal and clicked the “Set up TFS publishing” link. In the popping up window I will provide my TFS service URL and click the “Authorize now” link. Click “Accept” button to allow my windows azure to connect to my TFS service. Then it will be back to the developer portal and list all projects in my account. Just select the one I had just created and click OK. Then our website is linking to the TFS project I specified and finally it will show similar like this below. This means the web site had been linked to the TFS successfully. Work with TFS Preview Service in VS2010 In the figure above there are some links to guide us how to connect to the TFS server through Visual Studio 2010 and 2012 RC. If you are using Visual Studio 2012 RC, you don’t need any extension. But if you are using Visual Studio 2010 you must have SP1 and KB2581206 installed. To connect to my TFS service just open the Visual Studio and in the Team Explorer, we can add a new TFS server and paste the URL of my TFS service from the developer portal. And select the project I had just created, then it will be listed in my Team Explorer. Now let’s start to build our website. Since the website we are going to build will be deployed to WAWS, it’s NOT a cloud service, NOT a web role. So in this case we need to create a normal ASP.NET web application. For example, an ASP.NET MVC 3 web application. Next, right click on the solution and select “Add Solution to Source Control”, select the project I had just created. Then check my code in. Once the check-in finished we can see that there is a build running in the TFS server. And if we back to the developer portal, we will see in our web site deployment page there’s a deployment running. In fact, once we linked our web site to our TFS then it will create a new build definition in our TFS project. It will be triggered by each check-in and deploy to the web site we linked automatically. So that when our code had been compiled it will be published to our web site from our TFS server. Once the build and deployment finished we can see it’s now active on our developer portal. Now we can see the web site that created from my Visual Studio and deployed by my TFS. Continue Deployment through VS and TFS A big benefit when using TFS publishing is the continue deployment. Now if I changed some code in my Visual Studio, for example update some text on the home page and check in my changes, then it will trigger an new build and deploy to my WAWS automatically. And even more, if we wanted to rollback to a previous version we can just select an existing deployment listed in the portal and click REDEPLOY at the bottom. Q&A: Can Web Site use Storage work with a Worker Role? Stacy asked a question in my previous post, which was “can a web site use Windows Azure Storage and furthermore working with a worker role”. Since the web site is deployed on the windows azure virtual machines in data center, it must be able to use all windows azure features such as the storage, SQL databases, CDN, etc.. But since when using web site we normally have a standard ASP.NET web application, PHP website or NodeJS, the windows azure SDK was not referenced by default. But we can add them by ourselves. In our sample project let’s right click on my MVC project and clicked the “Manage NuGet packages”. And in the dialog I will search windows azure packages and select the “Windows Azure Storage” to install. Then we will have the assemblies to access windows azure storage such as tables, queues and blobs. Since I have a storage account already, let’s have a quick demo, just to list all blobs in a container. The code would be like this. 1: using System; 2: using System.Collections.Generic; 3: using System.Linq; 4: using System.Web; 5: using System.Web.Mvc; 6: using Microsoft.WindowsAzure; 7: using Microsoft.WindowsAzure.StorageClient; 8: 9: namespace WAASTFSDemo.Controllers 10: { 11: public class HomeController : Controller 12: { 13: public ActionResult Index() 14: { 15: ViewBag.Message = "Welcome to Windows Azure!"; 16: 17: var credentials = new StorageCredentialsAccountAndKey("[STORAGE_ACCOUNT]", "[STORAGE_KEY]"); 18: var account = new CloudStorageAccount(credentials, false); 19: var client = account.CreateCloudBlobClient(); 20: var container = client.GetContainerReference("shared"); 21: ViewBag.Blobs = container.ListBlobs().Select(b => b.Uri.AbsoluteUri); 22: 23: return View(); 24: } 25: 26: public ActionResult About() 27: { 28: return View(); 29: } 30: } 31: } 1: @{ 2: ViewBag.Title = "Home Page"; 3: } 4: 5: <h2>@ViewBag.Message</h2> 6: <p> 7: To learn more about ASP.NET MVC visit <a href="http://asp.net/mvc" title="ASP.NET MVC Website">http://asp.net/mvc</a>. 8: </p> 9: <div> 10: <ul> 11: @foreach (var blob in ViewBag.Blobs) 12: { 13: <li>@blob</li> 14: } 15: </ul> 16: </div> And then just check in the code, it will be deployed to my web site. Finally we can see the blobs in my storage. This is just an example but it proves that web sites can connect to storage, table, blob and queue as well. So the answer to Stacy should be “yes”. The web site can use queue storage to work with worker role. Summary In this post I demonstrated how to integrate with TFS from Windows Azure Web Sites. You can see our website can be built, uploaded and deployed automatically by TFS service. All we need to do is to provide the TFS name and select the project. Not only the Windows Azure Web Site, in this upgrade the Windows Azure Cloud Services (hosted service) can be published through TFS as well. Very similar as what we have shown below. But currently, only Microsoft TFS Service Preview can be integrated with Windows Azure. But I think in the future we can link the TFS in our enterprise and some 3rd party TFS such as CodePlex to Windows Azure. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article
Investigation: Can different combinations of components effect Dataflow performance?

- by jamiet

Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation! The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test: One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category CSPL Split by Month of Birth and Income Category Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category CSPL Split by Month of Birth 1& 2 Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

Read the article
Python Mechanize unable to avoid redirect when Post

- by Enric Geijo

I am trying to crawl a site using mechanize. The site provides search results in different pages. When posting to get the next set of results, something is wrong and the server redirects me to the first page, asking mechanize to update the SearchSession Cookie. I have been debugging the requests using Firefox and they look quite the same, and I am unable to find the problem. Any suggestion? Below the requests: ----------- FIRST THE RIGHT SEQUENCE, USING TAMPER IN FIREFOX ------------------------- POST XXX/JobSearch/Results.aspx?Keywords=Python&LTxt=London%2c+South+East&Radius=0&LIds2=ZV&clid=1621&cltypeid=2&clName=London Load Flags[LOAD_DOCUMENT_URI LOAD_INITIAL_DOCUMENT_URI ] Content Size[-1] Mime Type[text/html] Request Headers: Host[www.cwjobs.co.uk] User-Agent[Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100401 Ubuntu/9.10 (karmic) Firefox/3.5.9] Accept[text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8] Accept-Language[en-us,en;q=0.5] Accept-Encoding[gzip,deflate] Accept-Charset[ISO-8859-1,utf-8;q=0.7,*;q=0.7] Keep-Alive[300] Connection[keep-alive] Referer[XXX/JobSearch/Results.aspx?Keywords=Python&LTxt=London%2c+South+East&Radius=0&LIds2=ZV&clid=1621&cltypeid=2&clName=London] Cookie[ecos=774803468-0; AnonymousUser=MemberId=acc079dd-66b6-4081-9b07-60d6955ee8bf&IsAnonymous=True; PJBIPPOPUP=; WT_FPC=id=86.181.183.106-2262469600.30073025:lv=1272812851736:ss=1272812789362; SearchSession=SessionGuid=71de63de-3bd0-4787-895d-b6b9e7c93801&LogSource=NAT] Post Data: __EVENTTARGET[srpPager%24btnForward] __EVENTARGUMENT[] hdnSearchResults[BV%2CA%2CC0P5x%2COou-%2CB4S-%2CBuC-%2CDzx-%2CHwn-%2CKPP-%2CIVA-%2CC9D-%2CH6X-%2CH7x-%2CJ0x-%2CCvX-%2CCra-%2COHa-%2CHhP-%2CCoj-%2CBlM-%2CE9W-%2CIm8-%2CBqG-%2CPFy-%2CN%2Fm-%2Ceaa%2CCvj-%2CCtJ-%2CCr7-%2CBpu-%2Cmh%2CMb6-%2CJ%2Fk-%2CHY8-%2COJ7-%2CNtF-%2CEya-%2CErT-%2CEo4-%2CEKU-%2CDnL-%2CC5M-%2CCyB-%2CBsD-%2CBrc-%2CBpU-%2Col%2C30%2CC1%2Cd4N%2COo8-%2COi0-%2CLz%2F-%2CLxP-%2CFyp-%2CFVR-%2CEHL-%2CPrP-%2CLmE-%2CK3H-%2CKXJ-%2CFyn%2CIcq-%2CIco-%2CIK4-%2CIIg-%2CH2k-%2CH0N-%2CHwp-%2CHvF-%2CFij-%2CFhl-%2CCwj-%2CCb5-%2CCQj-%2CCQh-%2CB%2B2-%2CBc6-%2ChFo%2CNLq-%2CNI%2F-%2CFzM-%2Cdu-%2CHg2-%2CBug-%2CBse-%2CB9Q-] __VIEWSTATE[%2FwEPDwUKLTkyMzI2ODA4Ng9kFgYCBA8WBB4EaHJlZgWJAWh0dHA6Ly93d3cuY3dqb2JzLmNvLnVrL0pvYlNlYXJjaC9SU1MuYXNweD9LZXl3b3Jkcz1QeXRob24mTFR4dD1Mb25kb24lMmMrU291dGgrRWFzdCZSYWRpdXM9MCZMSWRzMj1aViZjbGlkPTE2MjEmY2x0eXBlaWQ9MiZjbE5hbWU9TG9uZG9uHgV0aXRsZQUkTGF0ZXN0IFB5dGhvbiBqb2JzIGZyb20gQ1dKb2JzLmNvLnVrZAIGDxYCHgRUZXh0BV48bGluayByZWw9ImNhbm9uaWNhbCIgaHJlZj0iaHR0cDovL3d3dy5jd2pvYnMuY28udWsvSm9iU2Vla2luZy9QeXRob25fTG9uZG9uX2wxNjIxX3QyLmh0bWwiIC8%2BZAIIEGRkFg4CBw8WAh8CBV9Zb3VyIHNlYXJjaCBvbiA8Yj5LZXl3b3JkczogUHl0aG9uOyBMb2NhdGlvbjogTG9uZG9uLCBTb3V0aCBFYXN0OyA8L2I%2BIHJldHVybmVkIDxiPjg1PC9iPiBqb2JzLmQCCQ8WAh4HVmlzaWJsZWhkAgsPFgIfAgUoVGhlIG1vc3QgcmVsZXZhbnQgam9icyBhcmUgbGlzdGVkIGZpcnN0LmQCEw8PFgIeC05hdmlnYXRlVXJsBQF%2BZGQCFQ9kFgYCBQ8PFgYfAgUGUHl0aG9uHgtEZWZhdWx0VGV4dAUMZS5nLiBhbmFseXN0HhNEZWZhdWx0VGV4dENzc0NsYXNzZWRkAgsPDxYGHwIFEkxvbmRvbiwgU291dGggRWFzdB8FBQllLmcuIEJhdGgfBmVkZAIRDxAPFgYeDURhdGFUZXh0RmllbGQFClJhZGl1c05hbWUeDkRhdGFWYWx1ZUZpZWxkBQZSYWRpdXMeC18hRGF0YUJvdW5kZ2QQFREHMCBtaWxlcwcyIG1pbGVzBzUgbWlsZXMIMTAgbWlsZXMIMTUgbWlsZXMIMjAgbWlsZXMIMjUgbWlsZXMIMzAgbWlsZXMIMzUgbWlsZXMINDAgbWlsZXMINDUgbWlsZXMINTAgbWlsZXMINjAgbWlsZXMINzAgbWlsZXMIODAgbWlsZXMIOTAgbWlsZXMJMTAwIG1pbGVzFREBMAEyATUCMTACMTUCMjACMjUCMzACMzUCNDACNDUCNTACNjACNzACODACOTADMTAwFCsDEWdnZ2dnZ2dnZ2dnZ2dnZ2dnZGQCFw9kFgQCAQ9kFgQCBA8QZA8WA2YCAQICFgMQBQhBbGwgam9icwUBMGcQBRlEaXJlY3QgZW1wbG95ZXIgam9icyBvbmx5BQEyZxAFEEFnZW5jeSBqb2JzIG9ubHkFATFnZGQCBg8QZA8WA2YCAQICFgMQBQlSZWxldmFuY2UFATFnEAUERGF0ZQUBMmcQBQZTYWxhcnkFATNnZGQCBQ8PFgYeClBhZ2VOdW1iZXICAh4PTnVtYmVyT2ZSZXN1bHRzAlUeDlJlc3VsdHNQZXJQYWdlAhRkZAIZDxYCHwNoZGQ%3D] Refinesearch%24txtKeywords[Python] Refinesearch%24txtLocation[London%2C+South+East] Refinesearch%24ddlRadius[0] ddlCompanyType[0] ddlSort[1] Response Headers: Cache-Control[private] Date[Sun, 02 May 2010 16:09:27 GMT] Content-Type[text/html; charset=utf-8] Expires[Sat, 02 May 2009 16:09:27 GMT] Server[Microsoft-IIS/6.0] X-SiteConHost[P310] X-Powered-By[ASP.NET] X-AspNet-Version[2.0.50727] Set-Cookie[SearchSession=SessionGuid=71de63de-3bd0-4787-895d-b6b9e7c93801&LogSource=NAT; path=/] Content-Encoding[gzip] Vary[Accept-Encoding] Transfer-Encoding[chunked] -------- NOW WHAT I'AM SENDING USING MECHANIZE, SOME HEADERS ADDED, ETC ----------- POST /JobSearch/Results.aspx?Keywords=Python&LTxt=London%2c+South+East&Radius=0&LIds2=ZV&clid=1621&cltypeid=2&clName=London HTTP/1.1\r\nContent-Length: 2424\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip\r\n Host: www.cwjobs.co.uk\r\n Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Connection: keep-alive\r\n Cookie: AnonymousUser=MemberId=8fa5ddd7-17ed-425e-b189-82693bfbaa0c&IsAnonymous=True; SearchSession=SessionGuid=33e4e439-c2d6-423f-900f-574099310d5a&LogSource=NAT\r\n Referer: XXX/JobSearch/Results.aspx?Keywords=Python&LTxt=London%2c+South+East&Radius=0&LIds2=ZV&clid=1621&cltypeid=2&clName=London\r\n Content-Type: application/x-www-form-urlencoded\r\n\r\n' '__EVENTTARGET=srpPager%24btnForward& __EVENTARGUMENT=& hdnSearchResults=BV%2CA%2CC0eif%2CMwc%2CM6s%2COou%2CK09%2CG4H%2CEZf%2CGTu%2CLrr%2CGuX%2CGs9%2CEz9%2CL5X%2CL9U%2ChU%2CHHf%2CMAL%2CNDi%2CJrY%2CGBy%2CM%2Bo%2CdE-%2CpI%2CtDI%2CL5L%2CL7l%2CL8z%2CM%2FA%2CPPP%2CCM0%2CEpK%2CHPy%2Cez%2C7p%2CJ2U%2CJ9b%2CJ%2F2%2CKea%2CLBj%2CLvi%2CL2t%2CM8r%2CM9S%2CM%2Fa%2CPRT%2CPgi%2Csg7%2CF6%2CI2F%2CJTd%2CO-%2CC0v%2CC3f%2CDCq%2CDxn%2CERl%2CUbV%2CGME%2CGMG%2CGd2%2CGgO%2CGyK%2CG0h%2CG4F%2CG5p%2CJGL%2CJHJ%2CKhj%2CL4L%2CMM1%2CMYL%2CMYN%2CMp4%2CNL0%2COrj%2CvuW%2CBdE%2CBfv%2CI1i%2CBCh-%2COLA%2CHH4%2CM6O%2CM8Q%2CMre& __VIEWSTATE=%2FwEPDwUKLTkyMzI2ODA4Ng9kFgYCBA8WBB4EaHJlZgWJAWh0dHA6Ly93d3cuY3dqb2JzLmNvLnVrL0pvYlNlYXJjaC9SU1MuYXNweD9LZXl3b3Jkcz1QeXRob24mTFR4dD1Mb25kb24lMmMrU291dGgrRWFzdCZSYWRpdXM9MCZMSWRzMj1aViZjbGlkPTE2MjEmY2x0eXBlaWQ9MiZjbE5hbWU9TG9uZG9uHgV0aXRsZQUkTGF0ZXN0IFB5dGhvbiBqb2JzIGZyb20gQ1dKb2JzLmNvLnVrZAIGDxYCHgRUZXh0BV48bGluayByZWw9ImNhbm9uaWNhbCIgaHJlZj0iaHR0cDovL3d3dy5jd2pvYnMuY28udWsvSm9iU2Vla2luZy9QeXRob25fTG9uZG9uX2wxNjIxX3QyLmh0bWwiIC8%2BZAIIEGRkFg4CBw8WAh8CBV9Zb3VyIHNlYXJjaCBvbiA8Yj5LZXl3b3JkczogUHl0aG9uOyBMb2NhdGlvbjogTG9uZG9uLCBTb3V0aCBFYXN0OyA8L2I%2BIHJldHVybmVkIDxiPjg1PC9iPiBqb2JzLmQCCQ8WAh4HVmlzaWJsZWhkAgsPFgIfAgUoVGhlIG1vc3QgcmVsZXZhbnQgam9icyBhcmUgbGlzdGVkIGZpcnN0LmQCEw8PFgIeC05hdmlnYXRlVXJsBQF%2BZGQCFQ9kFgYCBQ8PFgYfAgUGUHl0aG9uHgtEZWZhdWx0VGV4dAUMZS5nLiBhbmFseXN0HhNEZWZhdWx0VGV4dENzc0NsYXNzZWRkAgsPDxYGHwIFEkxvbmRvbiwgU291dGggRWFzdB8FBQllLmcuIEJhdGgfBmVkZAIRDxAPFgYeDURhdGFUZXh0RmllbGQFClJhZGl1c05hbWUeDkRhdGFWYWx1ZUZpZWxkBQZSYWRpdXMeC18hRGF0YUJvdW5kZ2QQFREHMCBtaWxlcwcyIG1pbGVzBzUgbWlsZXMIMTAgbWlsZXMIMTUgbWlsZXMIMjAgbWlsZXMIMjUgbWlsZXMIMzAgbWlsZXMIMzUgbWlsZXMINDAgbWlsZXMINDUgbWlsZXMINTAgbWlsZXMINjAgbWlsZXMINzAgbWlsZXMIODAgbWlsZXMIOTAgbWlsZXMJMTAwIG1pbGVzFREBMAEyATUCMTACMTUCMjACMjUCMzACMzUCNDACNDUCNTACNjACNzACODACOTADMTAwFCsDEWdnZ2dnZ2dnZ2dnZ2dnZ2dnZGQCFw9kFgQCAQ9kFgQCBA8QZA8WA2YCAQICFgMQBQhBbGwgam9icwUBMGcQBRlEaXJlY3QgZW1wbG95ZXIgam9icyBvbmx5BQEyZxAFEEFnZW5jeSBqb2JzIG9ubHkFATFnZGQCBg8QZA8WA2YCAQICFgMQBQlSZWxldmFuY2UFATFnEAUERGF0ZQUBMmcQBQZTYWxhcnkFATNnZGQCBQ8PFgYeClBhZ2VOdW1iZXICAR4PTnVtYmVyT2ZSZXN1bHRzAlUeDlJlc3VsdHNQZXJQYWdlAhRkZAIZDxYCHwNoZGQ%3D& Refinesearch%24txtKeywords=Python& Refinesearch%24txtLocation=London%2CSouth+East& Refinesearch%24ddlRadius=0& Refinesearch%24btnSearch=Search& ddlCompanyType=0& ddlSort=1'

Read the article
Most simple way to do holiday calculation?

- by brainfrog

I want to make a little free calendar program to help me and others calculate how much time we have got left in a project. I mean real working time, not just time. Time in a raw form is not saying much. Typically when my boss tells me that I have time until 05-05-2011 it doesn't tell me really how much time I have to do my job. You know...so many things stop me from work: A) beeing at home, not at work (so called "free time" or "spare time"). That is in my case I work exactly 8 hours a day and then the cleaning ladies throw me out of the office with their incredible loud industrial vacuum cleaners every evening (my boss accepts that as an excuse to go home in time, regularly). B) weekends, or more precisely saturdays and sundays C) official holiday rescuing me from having to go to work. what I want to do is make a little utility which tells me how many working hours I really have in a given time period. The first two things A and B are pretty easy to implement. But the last thing C scares my pants off. Holidays. OOOHHH man. You know what that means. Chaos. Pure chaos. The huge question is: HOW TO CALCULATE HOLIDAYS?! Since I want my program to be useful for anyone anywhere in the world, I can't just hardcode all holidays for my little town. So which options do I have? I) I could hand-craft downloadable lists of holidays. Users search them within the application and download them from an webserver. Or I ship all of them in the package. But I would get very, very old if I tried that by myself for every country, state and town. II) I make an initial data sheet with holidays for my town, and don't care about the rest. However, I make that sheet with an how-to public, so that everyone who feels like beeing very nice can provide holiday data for his country / region / whatever. Those are made public on a webserver and everyone can get the data packages he/she needs for the app. III) ? I care a lot about usability. I don't want to make an ugly linux hack style hard to use app that only computer freaks can use. So you need to tell me more about holiday science. I was never really clever at this. I assume every single country in the world has it's own set of holidays. In every country there may be several states. For example the US has some, and Germany has also some states. Holidays vary from state to state. But I know from an good programmer he told me never assume anything. So the questions about holiday science are: Which categories do I need to make holiday-data-packs searchable? A guy from India should find quickly his holiday data pack, and a guy from Sillicon Valley should find his pack as equally fast. It makes most sense to me to filter for COUNTRY STATE WHATEVER. Like a drill-down-search. Did I miss something? What would be the best data format to hold holiday information? A holiday has a start and end date and a name. That should be enough. Would I put all this stuff in thousands of XML files? How would you go about this? Any hint / help is highly welcome! Thanks to everyone!

Read the article
CURL - HTTPS Wierd error

- by Vincent

All, I am having trouble requesting info from HTTPS site using CURL and PHP. I am using Solaris 10. It so happens that sometimes it works and sometimes it doesn't. I am not sure what is the cause. If it doesn't work, this is the entry recorded in the verbose log: * About to connect() to 10.10.101.12 port 443 (#0) * Trying 10.10.101.12... * connected * Connected to 10.10.101.12 (10.10.101.12) port 443 (#0) * error setting certificate verify locations, continuing anyway: * CAfile: /etc/opt/webstack/curl/curlCA CApath: none * error:80089077:lib(128):func(137):reason(119) * Closing connection #0 If it works, this is the entry recorded in the verbose log: * About to connect() to 10.10.101.12 port 443 (#0) * Trying 10.10.101.12... * connected * Connected to 10.10.101.12 (10.10.101.12) port 443 (#0) * error setting certificate verify locations, continuing anyway: * CAfile: /etc/opt/webstack/curl/curlCA CApath: none * SSL connection using DHE-RSA-AES256-SHA * Server certificate: * subject: C=CA, ST=British Columnbia, L=Vancouver, O=google, OU=FDN, CN=g.googlenet.com, [email protected] * start date: 2007-07-24 23:06:32 GMT * expire date: 2027-09-07 23:06:32 GMT * issuer: C=US, ST=California, L=Sunnyvale, O=Google, OU=Certificate Authority, CN=support, [email protected] * SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway. > POST /gportal/gpmgr HTTP/1.1^M Host: 10.10.101.12^M Accept: */*^M Accept-Encoding: gzip,deflate^M Content-Length: 1623^M Content-Type: application/x-www-form-urlencoded^M Expect: 100-continue^M ^M < HTTP/1.1 100 Continue^M < HTTP/1.1 200 OK^M < Date: Wed, 28 Apr 2010 21:56:15 GMT^M < Server: Apache^M < Cache-Control: no-cache^M < Pragma: no-cache^M < Vary: Accept-Encoding^M < Content-Encoding: gzip^M < Content-Length: 1453^M < Content-Type: application/json^M < ^M * Connection #0 to host 10.10.101.12 left intact * Closing connection #0 My CURL options are as under: $ch = curl_init(); $devnull = fopen('/tmp/curlcookie.txt', 'w'); $fp_err = fopen('/tmp/verbose_file.txt', 'ab+'); fwrite($fp_err, date('Y-m-d H:i:s')."\n\n"); curl_setopt($ch, CURLOPT_STDERR, $devnull); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_URL, $desturl); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,120); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate'); curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata); curl_setopt($ch, CURLOPT_VERBOSE,1); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_STDERR, $fp_err); $ret = curl_exec($ch); Anybody has any idea, why it works sometimes but fails mostly? Thanks

Read the article
Top 5 Developer Enabling Nuggets in MySQL 5.6

- by Rob Young

MySQL 5.6 is truly a better MySQL and reflects Oracle's commitment to the evolution of the most popular and widelyused open source database on the planet. The feature-complete 5.6 release candidate was announced at MySQL Connect in late September and the production-ready, generally available ("GA") product should be available in early 2013. While the message around 5.6 has been focused mainly on mass appeal, advanced topics like performance/scale, high availability, and self-healing replication clusters, MySQL 5.6 also provides many developer-friendly nuggets that are designed to enable those who are building the next generation of web-based and embedded applications and services. Boiling down the 5.6 feature set into a smaller set, of simple, easy to use goodies designed with developer agility in mind, these things deserve a quick look:Subquery Optimizations Using semi-JOINs and late materialization, the MySQL 5.6 Optimizer delivers greatly improved subquery performance. Specifically, the optimizer is now more efficient in handling subqueries in the FROM clause; materialization of subqueries in the FROM clause is now postponed until their contents are needed during execution. Additionally, the optimizer may add an index to derived tables during execution to speed up row retrieval. Internal tests run using the DBT-3 benchmark Query #13, shown below, demonstrate an order of magnitude improvement in execution times (from days to seconds) over previous versions. select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)from customer, orders, lineitemwhere o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 313 ) and c_custkey = o_custkey and o_orderkey = l_orderkeygroup by c_name, c_custkey, o_orderkey, o_orderdate, o_totalpriceorder by o_totalprice desc, o_orderdateLIMIT 100;What does this mean for developers? For starters, simplified subqueries can now be coded instead of complex joins for cross table lookups: SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor GROUP BY film_id HAVING count(*) > 12); And even more importantly subqueries embedded in packaged applications no longer need to be re-written into joins. This is good news for both ISVs and their customers who have access to the underlying queries and who have spent development cycles writing, testing and maintaining their own versions of re-written queries across updated versions of a packaged app.The details are in the MySQL 5.6 docs. Online DDL OperationsToday's web-based applications are designed to rapidly evolve and adapt to meet business and revenue-generationrequirements. As a result, development SLAs are now most often measured in minutes vs days or weeks. For example, when an application must quickly support new product lines or new products within existing product lines, the backend database schema must adapt in kind, and most commonly while the application remains available for normal business operations. MySQL 5.6 supports this level of online schema flexibility and agility by providing the following new ALTER TABLE online DDL syntax additions: CREATE INDEX DROP INDEX Change AUTO_INCREMENT value for a column ADD/DROP FOREIGN KEY Rename COLUMN Change ROW FORMAT, KEY_BLOCK_SIZE for a table Change COLUMN NULL, NOT_NULL Add, drop, reorder COLUMN Again, the details are in the MySQL 5.6 docs. Key-value access to InnoDB via Memcached APIMany of the next generation of web, cloud, social and mobile applications require fast operations against simple Key/Value pairs. At the same time, they must retain the ability to run complex queries against the same data, as well as ensure the data is protected with ACID guarantees. With the new NoSQL API for InnoDB, developers have allthe benefits of a transactional RDBMS, coupled with the performance capabilities of Key/Value store.MySQL 5.6 provides simple, key-value interaction with InnoDB data via the familiar Memcached API. Implemented via a new Memcached daemon plug-in to mysqld, the new Memcached protocol is mapped directly to the native InnoDB API and enables developers to use existing Memcached clients to bypass the expense of query parsing and go directly to InnoDB data for lookups and transactional compliant updates. The API makes it possible to re-use standard Memcached libraries and clients, while extending Memcached functionality by integrating a persistent, crash-safe, transactional database back-end. The implementation is shown here:So does this option provide a performance benefit over SQL? Internal performance benchmarks using a customized Java application and test harness show some very promising results with a 9X improvement in overall throughput for SET/INSERT operations:You can follow the InnoDB team blog for the methodology, implementation and internal test cases that generated these results here. How to get started with Memcached API to InnoDB is here. New Instrumentation in Performance SchemaThe MySQL Performance Schema was introduced in MySQL 5.5 and is designed to provide point in time metrics for key performance indicators. MySQL 5.6 improves the Performance Schema in answer to the most common DBA and Developer problems. New instrumentations include: Statements/Stages What are my most resource intensive queries? Where do they spend time? Table/Index I/O, Table Locks Which application tables/indexes cause the most load or contention? Users/Hosts/Accounts Which application users, hosts, accounts are consuming the most resources? Network I/O What is the network load like? How long do sessions idle? Summaries Aggregated statistics grouped by statement, thread, user, host, account or object. The MySQL 5.6 Performance Schema is now enabled by default in the my.cnf file with optimized and auto-tune settings that minimize overhead (< 5%, but mileage will vary), so using the Performance Schema ona production server to monitor the most common application use cases is less of an issue. In addition, new atomic levels of instrumentation enable the capture of granular levels of resource consumption by users, hosts, accounts, applications, etc. for billing and chargeback purposes in cloud computing environments.The MySQL docs are an excellent resource for all that is available and that can be done with the 5.6 Performance Schema. Better Condition Handling - GET DIAGNOSTICSMySQL 5.6 enables developers to easily check for error conditions and code for exceptions by introducing the new MySQL Diagnostics Area and corresponding GET DIAGNOSTICS interface command. The Diagnostic Area can be populated via multiple options and provides 2 kinds of information:Statement - which provides affected row count and number of conditions that occurredCondition - which provides error codes and messages for all conditions that were returned by a previous operation The addressable items for each are: The new GET DIAGNOSTICS command provides a standard interface into the Diagnostics Area and can be used via the CLI or from within application code to easily retrieve and handle the results of the most recent statement execution. An example of how it is used might be:mysql> DROP TABLE test.no_such_table; ERROR 1051 (42S02): Unknown table 'test.no_such_table' mysql> GET DIAGNOSTICS CONDITION 1 -> @p1 = RETURNED_SQLSTATE, @p2 = MESSAGE_TEXT; mysql> SELECT @p1, @p2; +-------+------------------------------------+| @p1 | @p2 | +-------+------------------------------------+| 42S02 | Unknown table 'test.no_such_table' | +-------+------------------------------------+ Options for leveraging the MySQL Diagnotics Area and GET DIAGNOSTICS are detailed in the MySQL Docs.While the above is a summary of some of the key developer enabling 5.6 features, it is by no means exhaustive. You can dig deeper into what MySQL 5.6 has to offer by reading this developer zone article or checking out "What's New in MySQL 5.6" in the MySQL docs.BONUS ALERT! If you are developing on Windows or are considering MySQL as an alternative to SQL Server for your next project, application or shipping product, you should check out the MySQL Installer for Windows. The installer includes the MySQL 5.6 RC database, all drivers, Visual Studio and Excel plugins, tray monitor and development tools all a single download and GUI installer. So what are your next steps? Register for Dec. 13 "MySQL 5.6: Building the Next Generation of Web-Based Applications and Services" live web event. Hurry! Seats are limited. Download the MySQL 5.6 Release Candidate (look under the Development Releases tab) Provide Feedback <link to http://bugs.mysql.com/> Join the Developer discussion on the MySQL Forums Explore all MySQL Products and Developer Tools As always, thanks for your continued support of MySQL!

Read the article
Making a Statement: How to retrieve the T-SQL statement that caused an event

- by extended_events

If you’ve done any troubleshooting of T-SQL, you know that sooner or later, probably sooner, you’re going to want to take a look at the actual statements you’re dealing with. In extended events we offer an action (See the BOL topic that covers Extended Events Objects for a description of actions) named sql_text that seems like it is just the ticket. Well…not always – sounds like a good reason for a blog post. When is a statement not THE statement? The sql_text action returns the same information that is returned from DBCC INPUTBUFFER, which may or may not be what you want. For example, if you execute a stored procedure, the sql_text action will return something along the lines of “EXEC sp_notwhatiwanted” assuming that is the statement you sent from the client. Often times folks would like something more specific, like the actual statements that are being run from within the stored procedure or batch. Enter the stack Extended events offers another action, this one with the descriptive name of tsql_stack, that includes the sql_handle and offset information about the statements being run when an event occurs. With the sql_handle and offset values you can retrieve the specific statement you seek using the DMV dm_exec_sql_statement. The BOL topic for dm_exec_sql_statement provides an example for how to extract this information, so I’ll cover the gymnastics required to get the sql_handle and offset values out of the tsql_stack data collected by the action. I’m the first to admit that this isn’t pretty, but this is what we have in SQL Server 2008 and 2008 R2. We will be making it easier to get statement level information in the next major release of SQL Server. The sample code For this example I have a stored procedure that includes multiple statements and I have a need to differentiate between those two statements in my tracing. I’m going to track two events: module_end tracks the completion of the stored procedure execution and sp_statement_completed tracks the execution of each statement within a stored procedure. I’m adding the tsql_stack events (since that’s the topic of this post) and the sql_text action for comparison sake. (If you have questions about creating event sessions, check out Pedro’s post Introduction to Extended Events.) USE AdventureWorks2008GO -- Test SPCREATE PROCEDURE sp_multiple_statementsASSELECT 'This is the first statement'SELECT 'this is the second statement'GO -- Create a session to look at the spCREATE EVENT SESSION track_sprocs ON SERVERADD EVENT sqlserver.module_end (ACTION (sqlserver.tsql_stack, sqlserver.sql_text)),ADD EVENT sqlserver.sp_statement_completed (ACTION (sqlserver.tsql_stack, sqlserver.sql_text))ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS)GO -- Start the sessionALTER EVENT SESSION track_sprocs ON SERVERSTATE = STARTGO -- Run the test procedureEXEC sp_multiple_statementsGO -- Stop collection of events but maintain ring bufferALTER EVENT SESSION track_sprocs ON SERVERDROP EVENT sqlserver.module_end,DROP EVENT sqlserver.sp_statement_completedGO Aside: Altering the session to drop the events is a neat little trick that allows me to stop collection of events while keeping in-memory targets such as the ring buffer available for use. If you stop the session the in-memory target data is lost. Now that we’ve collected some events related to running the stored procedure, we need to do some processing of the data. I’m going to do this in multiple steps using temporary tables so you can see what’s going on; kind of like having to “show your work” on a math test. The first step is to just cast the target data into XML so I can work with it. After that you can pull out the interesting columns, for our purposes I’m going to limit the output to just the event name, object name, stack and sql text. You can see that I’ve don a second CAST, this time of the tsql_stack column, so that I can further process this data. -- Store the XML data to a temp tableSELECT CAST( t.target_data AS XML) xml_dataINTO #xml_event_dataFROM sys.dm_xe_sessions s INNER JOIN sys.dm_xe_session_targets t ON s.address = t.event_session_addressWHERE s.name = 'track_sprocs' SELECT * FROM #xml_event_data -- Parse the column data out of the XML blockSELECT event_xml.value('(./@name)', 'varchar(100)') as [event_name], event_xml.value('(./data[@name="object_name"]/value)[1]', 'varchar(255)') as [object_name], CAST(event_xml.value('(./action[@name="tsql_stack"]/value)[1]','varchar(MAX)') as XML) as [stack_xml], event_xml.value('(./action[@name="sql_text"]/value)[1]', 'varchar(max)') as [sql_text]INTO #event_dataFROM #xml_event_data CROSS APPLY xml_data.nodes('//event') n (event_xml) SELECT * FROM #event_data event_name object_name stack_xml sql_text sp_statement_completed NULL <frame level="1" handle="0x03000500D0057C1403B79600669D00000100000000000000" line="4" offsetStart="94" offsetEnd="172" /><frame level="2" handle="0x01000500CF3F0331B05EC084000000000000000000000000" line="1" offsetStart="0" offsetEnd="-1" /> EXEC sp_multiple_statements sp_statement_completed NULL <frame level="1" handle="0x03000500D0057C1403B79600669D00000100000000000000" line="6" offsetStart="174" offsetEnd="-1" /><frame level="2" handle="0x01000500CF3F0331B05EC084000000000000000000000000" line="1" offsetStart="0" offsetEnd="-1" /> EXEC sp_multiple_statements module_end sp_multiple_statements <frame level="1" handle="0x03000500D0057C1403B79600669D00000100000000000000" line="0" offsetStart="0" offsetEnd="0" /><frame level="2" handle="0x01000500CF3F0331B05EC084000000000000000000000000" line="1" offsetStart="0" offsetEnd="-1" /> EXEC sp_multiple_statements After parsing the columns it’s easier to see what is recorded. You can see that I got back two sp_statement_completed events, which makes sense given the test procedure I’m running, and I got back a single module_end for the entire statement. As described, the sql_text isn’t telling me what I really want to know for the first two events so a little extra effort is required. -- Parse the tsql stack information into columnsSELECT event_name, object_name, frame_xml.value('(./@level)', 'int') as [frame_level], frame_xml.value('(./@handle)', 'varchar(MAX)') as [sql_handle], frame_xml.value('(./@offsetStart)', 'int') as [offset_start], frame_xml.value('(./@offsetEnd)', 'int') as [offset_end]INTO #stack_data FROM #event_data CROSS APPLY stack_xml.nodes('//frame') n (frame_xml) SELECT * from #stack_data event_name object_name frame_level sql_handle offset_start offset_end sp_statement_completed NULL 1 0x03000500D0057C1403B79600669D00000100000000000000 94 172 sp_statement_completed NULL 2 0x01000500CF3F0331B05EC084000000000000000000000000 0 -1 sp_statement_completed NULL 1 0x03000500D0057C1403B79600669D00000100000000000000 174 -1 sp_statement_completed NULL 2 0x01000500CF3F0331B05EC084000000000000000000000000 0 -1 module_end sp_multiple_statements 1 0x03000500D0057C1403B79600669D00000100000000000000 0 0 module_end sp_multiple_statements 2 0x01000500CF3F0331B05EC084000000000000000000000000 0 -1 Parsing out the stack information doubles the fun and I get two rows for each event. If you examine the stack from the previous table, you can see that each stack has two frames and my query is parsing each event into frames, so this is expected. There is nothing magic about the two frames, that’s just how many I get for this example, it could be fewer or more depending on your statements. The key point here is that I now have a sql_handle and the offset values for those handles, so I can use dm_exec_sql_statement to get the actual statement. Just a reminder, this DMV can only return what is in the cache – if you have old data it’s possible your statements have been ejected from the cache. “Old” is a relative term when talking about caches and can be impacted by server load and how often your statement is actually used. As with most things in life, your mileage may vary. SELECT qs.*, SUBSTRING(st.text, (qs.offset_start/2)+1, ((CASE qs.offset_end WHEN -1 THEN DATALENGTH(st.text) ELSE qs.offset_end END - qs.offset_start)/2) + 1) AS statement_textFROM #stack_data AS qsCROSS APPLY sys.dm_exec_sql_text(CONVERT(varbinary(max),sql_handle,1)) AS st event_name object_name frame_level sql_handle offset_start offset_end statement_text sp_statement_completed NULL 1 0x03000500D0057C1403B79600669D00000100000000000000 94 172 SELECT 'This is the first statement' sp_statement_completed NULL 1 0x03000500D0057C1403B79600669D00000100000000000000 174 -1 SELECT 'this is the second statement' module_end sp_multiple_statements 1 0x03000500D0057C1403B79600669D00000100000000000000 0 0 C Now that looks more like what we were after, the statement_text field is showing the actual statement being run when the sp_statement_completed event occurs. You’ll notice that it’s back down to one row per event, what happened to frame 2? The short answer is, “I don’t know.” In SQL Server 2008 nothing is returned from dm_exec_sql_statement for the second frame and I believe this to be a bug; this behavior has changed in the next major release and I see the actual statement run from the client in frame 2. (In other words I see the same statement that is returned by the sql_text action or DBCC INPUTBUFFER) There is also something odd going on with frame 1 returned from the module_end event; you can see that the offset values are both 0 and only the first letter of the statement is returned. It seems like the offset_end should actually be –1 in this case and I’m not sure why it’s not returning this correctly. This behavior is being investigated and will hopefully be corrected in the next major version. You can workaround this final oddity by ignoring the offsets and just returning the entire cached statement. SELECT event_name, sql_handle, ts.textFROM #stack_data CROSS APPLY sys.dm_exec_sql_text(CONVERT(varbinary(max),sql_handle,1)) as ts event_name sql_handle text sp_statement_completed 0x0300070025999F11776BAF006F9D00000100000000000000 CREATE PROCEDURE sp_multiple_statements AS SELECT 'This is the first statement' SELECT 'this is the second statement' sp_statement_completed 0x0300070025999F11776BAF006F9D00000100000000000000 CREATE PROCEDURE sp_multiple_statements AS SELECT 'This is the first statement' SELECT 'this is the second statement' module_end 0x0300070025999F11776BAF006F9D00000100000000000000 CREATE PROCEDURE sp_multiple_statements AS SELECT 'This is the first statement' SELECT 'this is the second statement' Obviously this gives more than you want for the sp_statement_completed events, but it’s the right information for module_end. I leave it to you to determine when this information is needed and use the workaround when appropriate. Aside: You might think it’s odd that I’m showing apparent bugs with my samples, but you’re going to see this behavior if you use this method, so you need to know about it.I’m all about transparency. Happy Eventing- Mike Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Juju Openstack bundle: Can't launch an instance

- by user281985

Deployed bundle:~makyo/openstack/2/openstack, on top of 7 physical boxes and 3 virtual ones. After changing vip_iface strings to point to right devices, e.g., br0 instead of eth0, and defining "/mnt/loopback|30G", in Cinder's block-device string, am able to navigate through openstack dashboard, error free. Following http://docs.openstack.org/grizzly/openstack-compute/install/apt/content/running-an-instance.html instructions, attempted to launch cirros 0.3.1 image; however, novalist shows the instance in error state. ubuntu@node7:~$ nova --debug boot --flavor 1 --image 28bed1bc-bc1c-4533-beee-8e0428ad40dd --key_name key2 --security_group default cirros REQ: curl -i http://keyStone.IP:5000/v2.0/tokens -X POST -H "Content-Type: application/json" -H "Accept: application/json" -H "User-Agent: python-novaclient" -d '{"auth": {"tenantName": "admin", "passwordCredentials": {"username": "admin", "password": "openstack"}}}' INFO (connectionpool:191) Starting new HTTP connection (1): keyStone.IP DEBUG (connectionpool:283) "POST /v2.0/tokens HTTP/1.1" 200 None RESP: [200] {'date': 'Tue, 10 Jun 2014 00:01:02 GMT', 'transfer-encoding': 'chunked', 'vary': 'X-Auth-Token', 'content-type': 'application/json'} RESP BODY: {"access": {"token": {"expires": "2014-06-11T00:01:02Z", "id": "3eefa1837d984426a633fe09259a1534", "tenant": {"description": "Created by Juju", "enabled": true, "id": "08cff06d13b74492b780d9ceed699239", "name": "admin"}}, "serviceCatalog": [{"endpoints": [{"adminURL": "http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239", "region": "RegionOne", "internalURL": "http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239", "publicURL": "http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239"}], "endpoints_links": [], "type": "compute", "name": "nova"}, {"endpoints": [{"adminURL": "http://nova.cloud.controller:9696", "region": "RegionOne", "internalURL": "http://nova.cloud.controller:9696", "publicURL": "http://nova.cloud.controller:9696"}], "endpoints_links": [], "type": "network", "name": "quantum"}, {"endpoints": [{"adminURL": "http://nova.cloud.controller:3333", "region": "RegionOne", "internalURL": "http://nova.cloud.controller:3333", "publicURL": "http://nova.cloud.controller:3333"}], "endpoints_links": [], "type": "s3", "name": "s3"}, {"endpoints": [{"adminURL": "http://i.p.s.36:9292", "region": "RegionOne", "internalURL": "http://i.p.s.36:9292", "publicURL": "http://i.p.s.36:9292"}], "endpoints_links": [], "type": "image", "name": "glance"}, {"endpoints": [{"adminURL": "http://i.p.s.39:8776/v1/08cff06d13b74492b780d9ceed699239", "region": "RegionOne", "internalURL": "http://i.p.s.39:8776/v1/08cff06d13b74492b780d9ceed699239", "publicURL": "http://i.p.s.39:8776/v1/08cff06d13b74492b780d9ceed699239"}], "endpoints_links": [], "type": "volume", "name": "cinder"}, {"endpoints": [{"adminURL": "http://nova.cloud.controller:8773/services/Cloud", "region": "RegionOne", "internalURL": "http://nova.cloud.controller:8773/services/Cloud", "publicURL": "http://nova.cloud.controller:8773/services/Cloud"}], "endpoints_links": [], "type": "ec2", "name": "ec2"}, {"endpoints": [{"adminURL": "http://keyStone.IP:35357/v2.0", "region": "RegionOne", "internalURL": "http://keyStone.IP:5000/v2.0", "publicURL": "http://i.p.s.44:5000/v2.0"}], "endpoints_links": [], "type": "identity", "name": "keystone"}], "user": {"username": "admin", "roles_links": [], "id": "b3730a52a32e40f0a9500440d1ef1c7d", "roles": [{"id": "e020001eb9a049f4a16540238ab158aa", "name": "Admin"}, {"id": "b84fbff4d5554d53bbbffdaad66b56cb", "name": "KeystoneServiceAdmin"}, {"id": "129c8b49d42b4f0796109aaef2069aa9", "name": "KeystoneAdmin"}], "name": "admin"}}} REQ: curl -i http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: 3eefa1837d984426a633fe09259a1534" INFO (connectionpool:191) Starting new HTTP connection (1): nova.cloud.controller DEBUG (connectionpool:283) "GET /v1.1/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd HTTP/1.1" 200 719 RESP: [200] {'date': 'Tue, 10 Jun 2014 00:01:03 GMT', 'x-compute-request-id': 'req-7f3459f8-d3d5-47f1-97a3-8407a4419a69', 'content-type': 'application/json', 'content-length': '719'} RESP BODY: {"image": {"status": "ACTIVE", "updated": "2014-06-09T22:17:54Z", "links": [{"href": "http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd", "rel": "self"}, {"href": "http://nova.cloud.controller:8774/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd", "rel": "bookmark"}, {"href": "http://External.Public.Port:9292/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd", "type": "application/vnd.openstack.image", "rel": "alternate"}], "id": "28bed1bc-bc1c-4533-beee-8e0428ad40dd", "OS-EXT-IMG-SIZE:size": 13147648, "name": "Cirros 0.3.1", "created": "2014-06-09T22:17:54Z", "minDisk": 0, "progress": 100, "minRam": 0, "metadata": {}}} REQ: curl -i http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/flavors/1 -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: 3eefa1837d984426a633fe09259a1534" INFO (connectionpool:191) Starting new HTTP connection (1): nova.cloud.controller DEBUG (connectionpool:283) "GET /v1.1/08cff06d13b74492b780d9ceed699239/flavors/1 HTTP/1.1" 200 418 RESP: [200] {'date': 'Tue, 10 Jun 2014 00:01:04 GMT', 'x-compute-request-id': 'req-2c153110-6969-4f3a-b51c-8f1a6ce75bee', 'content-type': 'application/json', 'content-length': '418'} RESP BODY: {"flavor": {"name": "m1.tiny", "links": [{"href": "http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/flavors/1", "rel": "self"}, {"href": "http://nova.cloud.controller:8774/08cff06d13b74492b780d9ceed699239/flavors/1", "rel": "bookmark"}], "ram": 512, "OS-FLV-DISABLED:disabled": false, "vcpus": 1, "swap": "", "os-flavor-access:is_public": true, "rxtx_factor": 1.0, "OS-FLV-EXT-DATA:ephemeral": 0, "disk": 0, "id": "1"}} REQ: curl -i http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/servers -X POST -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: 3eefa1837d984426a633fe09259a1534" -d '{"server": {"name": "cirros", "imageRef": "28bed1bc-bc1c-4533-beee-8e0428ad40dd", "key_name": "key2", "flavorRef": "1", "max_count": 1, "min_count": 1, "security_groups": [{"name": "default"}]}}' INFO (connectionpool:191) Starting new HTTP connection (1): nova.cloud.controller DEBUG (connectionpool:283) "POST /v1.1/08cff06d13b74492b780d9ceed699239/servers HTTP/1.1" 202 436 RESP: [202] {'date': 'Tue, 10 Jun 2014 00:01:05 GMT', 'x-compute-request-id': 'req-41e53086-6454-4efb-bb35-a30dc2c780be', 'content-type': 'application/json', 'location': 'http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/servers/2eb5e3ad-3044-41c1-bbb7-10f398f83e43', 'content-length': '436'} RESP BODY: {"server": {"security_groups": [{"name": "default"}], "OS-DCF:diskConfig": "MANUAL", "id": "2eb5e3ad-3044-41c1-bbb7-10f398f83e43", "links": [{"href": "http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/servers/2eb5e3ad-3044-41c1-bbb7-10f398f83e43", "rel": "self"}, {"href": "http://nova.cloud.controller:8774/08cff06d13b74492b780d9ceed699239/servers/2eb5e3ad-3044-41c1-bbb7-10f398f83e43", "rel": "bookmark"}], "adminPass": "oFRbvRqif2C8"}} REQ: curl -i http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/servers/2eb5e3ad-3044-41c1-bbb7-10f398f83e43 -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: 3eefa1837d984426a633fe09259a1534" INFO (connectionpool:191) Starting new HTTP connection (1): nova.cloud.controller DEBUG (connectionpool:283) "GET /v1.1/08cff06d13b74492b780d9ceed699239/servers/2eb5e3ad-3044-41c1-bbb7-10f398f83e43 HTTP/1.1" 200 1349 RESP: [200] {'date': 'Tue, 10 Jun 2014 00:01:05 GMT', 'x-compute-request-id': 'req-d91d0858-7030-469d-8e55-40e05e4d00fd', 'content-type': 'application/json', 'content-length': '1349'} RESP BODY: {"server": {"status": "BUILD", "updated": "2014-06-10T00:01:05Z", "hostId": "", "OS-EXT-SRV-ATTR:host": null, "addresses": {}, "links": [{"href": "http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/servers/2eb5e3ad-3044-41c1-bbb7-10f398f83e43", "rel": "self"}, {"href": "http://nova.cloud.controller:8774/08cff06d13b74492b780d9ceed699239/servers/2eb5e3ad-3044-41c1-bbb7-10f398f83e43", "rel": "bookmark"}], "key_name": "key2", "image": {"id": "28bed1bc-bc1c-4533-beee-8e0428ad40dd", "links": [{"href": "http://nova.cloud.controller:8774/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd", "rel": "bookmark"}]}, "OS-EXT-STS:task_state": "scheduling", "OS-EXT-STS:vm_state": "building", "OS-EXT-SRV-ATTR:instance_name": "instance-00000004", "OS-EXT-SRV-ATTR:hypervisor_hostname": null, "flavor": {"id": "1", "links": [{"href": "http://nova.cloud.controller:8774/08cff06d13b74492b780d9ceed699239/flavors/1", "rel": "bookmark"}]}, "id": "2eb5e3ad-3044-41c1-bbb7-10f398f83e43", "security_groups": [{"name": "default"}], "OS-EXT-AZ:availability_zone": "nova", "user_id": "b3730a52a32e40f0a9500440d1ef1c7d", "name": "cirros", "created": "2014-06-10T00:01:04Z", "tenant_id": "08cff06d13b74492b780d9ceed699239", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "progress": 0, "OS-EXT-STS:power_state": 0, "config_drive": "", "metadata": {}}} REQ: curl -i http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/flavors/1 -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: 3eefa1837d984426a633fe09259a1534" INFO (connectionpool:191) Starting new HTTP connection (1): nova.cloud.controller DEBUG (connectionpool:283) "GET /v1.1/08cff06d13b74492b780d9ceed699239/flavors/1 HTTP/1.1" 200 418 RESP: [200] {'date': 'Tue, 10 Jun 2014 00:01:05 GMT', 'x-compute-request-id': 'req-896c0120-1102-4408-9e09-cd628f2dd699', 'content-type': 'application/json', 'content-length': '418'} RESP BODY: {"flavor": {"name": "m1.tiny", "links": [{"href": "http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/flavors/1", "rel": "self"}, {"href": "http://nova.cloud.controller:8774/08cff06d13b74492b780d9ceed699239/flavors/1", "rel": "bookmark"}], "ram": 512, "OS-FLV-DISABLED:disabled": false, "vcpus": 1, "swap": "", "os-flavor-access:is_public": true, "rxtx_factor": 1.0, "OS-FLV-EXT-DATA:ephemeral": 0, "disk": 0, "id": "1"}} REQ: curl -i http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: 3eefa1837d984426a633fe09259a1534" INFO (connectionpool:191) Starting new HTTP connection (1): nova.cloud.controller DEBUG (connectionpool:283) "GET /v1.1/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd HTTP/1.1" 200 719 RESP: [200] {'date': 'Tue, 10 Jun 2014 00:01:05 GMT', 'x-compute-request-id': 'req-454e9651-c247-4d31-8049-6b254de050ae', 'content-type': 'application/json', 'content-length': '719'} RESP BODY: {"image": {"status": "ACTIVE", "updated": "2014-06-09T22:17:54Z", "links": [{"href": "http://nova.cloud.controller:8774/v1.1/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd", "rel": "self"}, {"href": "http://nova.cloud.controller:8774/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd", "rel": "bookmark"}, {"href": "http://External.Public.Port:9292/08cff06d13b74492b780d9ceed699239/images/28bed1bc-bc1c-4533-beee-8e0428ad40dd", "type": "application/vnd.openstack.image", "rel": "alternate"}], "id": "28bed1bc-bc1c-4533-beee-8e0428ad40dd", "OS-EXT-IMG-SIZE:size": 13147648, "name": "Cirros 0.3.1", "created": "2014-06-09T22:17:54Z", "minDisk": 0, "progress": 100, "minRam": 0, "metadata": {}}} +-------------------------------------+--------------------------------------+ | Property | Value | +-------------------------------------+--------------------------------------+ | OS-EXT-STS:task_state | scheduling | | image | Cirros 0.3.1 | | OS-EXT-STS:vm_state | building | | OS-EXT-SRV-ATTR:instance_name | instance-00000004 | | flavor | m1.tiny | | id | 2eb5e3ad-3044-41c1-bbb7-10f398f83e43 | | security_groups | [{u'name': u'default'}] | | user_id | b3730a52a32e40f0a9500440d1ef1c7d | | OS-DCF:diskConfig | MANUAL | | accessIPv4 | | | accessIPv6 | | | progress | 0 | | OS-EXT-STS:power_state | 0 | | OS-EXT-AZ:availability_zone | nova | | config_drive | | | status | BUILD | | updated | 2014-06-10T00:01:05Z | | hostId | | | OS-EXT-SRV-ATTR:host | None | | key_name | key2 | | OS-EXT-SRV-ATTR:hypervisor_hostname | None | | name | cirros | | adminPass | oFRbvRqif2C8 | | tenant_id | 08cff06d13b74492b780d9ceed699239 | | created | 2014-06-10T00:01:04Z | | metadata | {} | +-------------------------------------+--------------------------------------+ ubuntu@node7:~$ ubuntu@node7:~$ nova list +--------------------------------------+--------+--------+----------+ | ID | Name | Status | Networks | +--------------------------------------+--------+--------+----------+ | 2eb5e3ad-3044-41c1-bbb7-10f398f83e43 | cirros | ERROR | | +--------------------------------------+--------+--------+----------+ ubuntu@node7:~$ var/log/nova/nova-compute.log shows the following error: ... 2014-06-10 00:01:06.048 AUDIT nova.compute.claims [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Attempting claim: memory 512 MB, disk 0 GB, VCPUs 1 2014-06-10 00:01:06.049 AUDIT nova.compute.claims [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Total Memory: 3885 MB, used: 512 MB 2014-06-10 00:01:06.049 AUDIT nova.compute.claims [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Memory limit: 5827 MB, free: 5315 MB 2014-06-10 00:01:06.049 AUDIT nova.compute.claims [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Total Disk: 146 GB, used: 0 GB 2014-06-10 00:01:06.050 AUDIT nova.compute.claims [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Disk limit not specified, defaulting to unlimited 2014-06-10 00:01:06.050 AUDIT nova.compute.claims [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Total CPU: 2 VCPUs, used: 0 VCPUs 2014-06-10 00:01:06.050 AUDIT nova.compute.claims [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] CPU limit not specified, defaulting to unlimited 2014-06-10 00:01:06.051 AUDIT nova.compute.claims [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Claim successful 2014-06-10 00:01:06.963 WARNING nova.network.quantumv2.api [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] No network configured! 2014-06-10 00:01:08.347 ERROR nova.compute.manager [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Instance failed to spawn 2014-06-10 00:01:08.347 32223 TRACE nova.compute.manager [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Traceback (most recent call last): 2014-06-10 00:01:08.347 32223 TRACE nova.compute.manager [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1118, in _spawn 2014-06-10 00:01:08.347 32223 TRACE nova.compute.manager [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] self._legacy_nw_info(network_info), 2014-06-10 00:01:08.347 32223 TRACE nova.compute.manager [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 703, in _legacy_nw_info 2014-06-10 00:01:08.347 32223 TRACE nova.compute.manager [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] network_info = network_info.legacy() 2014-06-10 00:01:08.347 32223 TRACE nova.compute.manager [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] AttributeError: 'list' object has no attribute 'legacy' 2014-06-10 00:01:08.347 32223 TRACE nova.compute.manager [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] 2014-06-10 00:01:08.919 AUDIT nova.compute.manager [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Terminating instance 2014-06-10 00:01:09.712 32223 ERROR nova.virt.libvirt.driver [-] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] During wait destroy, instance disappeared. 2014-06-10 00:01:09.718 INFO nova.virt.libvirt.firewall [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Attempted to unfilter instance which is not filtered 2014-06-10 00:01:09.719 INFO nova.virt.libvirt.driver [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Deleting instance files /var/lib/nova/instances/2eb5e3ad-3044-41c1-bbb7-10f398f83e43 2014-06-10 00:01:10.044 ERROR nova.compute.manager [req-41e53086-6454-4efb-bb35-a30dc2c780be b3730a52a32e40f0a9500440d1ef1c7d 08cff06d13b74492b780d9ceed699239] [instance: 2eb5e3ad-3044-41c1-bbb7-10f398f83e43] Error: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 864, in _run_instance\n set_access_ip=set_access_ip)\n', ' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1123, in _spawn\n LOG.exception(_(\'Instance failed to spawn\'), instance=instance)\n', ' File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__\n self.gen.next()\n', ' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1118, in _spawn\n self._legacy_nw_info(network_info),\n', ' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 703, in _legacy_nw_info\n network_info = network_info.legacy()\n', "AttributeError: 'list' object has no attribute 'legacy'\n"] 2014-06-10 00:01:40.951 32223 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources 2014-06-10 00:01:41.072 32223 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 2861 2014-06-10 00:01:41.072 32223 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 146 2014-06-10 00:01:41.073 32223 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 1 2014-06-10 00:01:41.262 32223 INFO nova.compute.resource_tracker [-] Compute_service record updated for node5:node5.maas ... Can't seem to find any entries in quantum.conf related to "legacy". Any help would be appreciated. Cheers,

Read the article
Solaris: What comes next?

- by alanc

As you probably know by now, a few months ago, we released Solaris 11 after years of development. That of course means we now need to figure out what comes next - if Solaris 11 is “The First Cloud OS”, then what do we need to make future releases of Solaris be, to be modern and competitive when they're released? So we've been having planning and brainstorming meetings, and I've captured some notes here from just one of those we held a couple weeks ago with a number of the Silicon Valley based engineers. Now before someone sees an idea here and calls their product rep wanting to know what's up, please be warned what follows are rough ideas, and as I'll discuss later, none of them have any committment, schedule, working code, or even plan for integration in any possible future product at this time. (Please don't make me force you to read the full Oracle future product disclaimer here, you should know it by heart already from the front of every Oracle product slide deck.) To start with, we did some background research, looking at ideas from other Oracle groups, and competitive OS'es. We examined what was hot in the technology arena and where the interesting startups were heading. We then looked at Solaris to see where we could apply those ideas. Making Network Admins into Socially Networking Admins We all know an admin who has grumbled about being the only one stuck late at work to fix a problem on the server, or having to work the weekend alone to do scheduled maintenance. But admins are humans (at least most are), and crave companionship and community with their fellow humans. And even when they're alone in the server room, they're never far from a network connection, allowing access to the wide world of wonders on the Internet. Our solution here is not building a new social network - there's enough of those already, and Oracle even has its own Oracle Mix social network already. What we proposed is integrating Solaris features to help engage our system admins with these social networks, building community and bringing them recognition in the workplace, using achievement recognition systems as found in many popular gaming platforms. For instance, if you had a Facebook account, and a group of admin friends there, you could register it with our Social Network Utility For Facebook, and then your friends might see: Alan earned the achievement Critically Patched (April 2012) for patching all his servers. Matt is only at 50% - encourage him to complete this achievement today! To avoid any undue risk of advertising who has unpatched servers that are easier targets for hackers to break into, this information would be tightly protected via Facebook's world-renowned privacy settings to avoid it falling into the wrong hands. A related form of gamification we considered was replacing simple certfications with role-playing-game-style Experience Levels. Instead of just knowing an admin passed a test establishing a given level of competency, these would provide recruiters with a more detailed level of how much real-world experience an admin has. Achievements such as the one above would feed into it, but larger numbers of experience points would be gained by tougher or more critical tasks - such as recovering a down system, or migrating a service to a new platform. (As long as it was an Oracle platform of course - migrating to an HP or IBM platform would cause the admin to lose points with us.) Unfortunately, we couldn't figure out a good way to prevent (if you will) “gaming” the system. For instance, a disgruntled admin might decide to start ignoring warnings from FMA that a part is beginning to fail or skip preventative maintenance, in the hopes that they'd cause a catastrophic failure to earn more points for bolstering their resume as they look for a job elsewhere, and not worrying about the effect on your business of a mission critical server going down. More Z's for ZFS Our suggested new feature for ZFS was inspired by the worlds most successful Z-startup of all time: Zynga. Using the Social Network Utility For Facebook described above, we'd tie it in with ZFS monitoring to help you out when you find yourself in a jam needing more disk space than you have, and can't wait a month to get a purchase order through channels to buy more. Instead with the click of a button you could post to your group: Alan can't find any space in his server farm! Can you help? Friends could loan you some space on their connected servers for a few weeks, knowing that you'd return the favor when needed. ZFS would create a new filesystem for your use on their system, and securely share it with your system using Kerberized NFS. If none of your friends have space, then you could buy temporary use space in small increments at affordable rates right there in Facebook, using your Facebook credits, and then file an expense report later, after the urgent need has passed. Universal Single Sign On One thing all the engineers agreed on was that we still had far too many "Single" sign ons to deal with in our daily work. On the web, every web site used to have its own password database, forcing us to hope we could remember what login name was still available on each site when we signed up, and which unique password we came up with to avoid having to disclose our other passwords to a new site. In recent years, the web services world has finally been reducing the number of logins we have to manage, with many services allowing you to login using your identity from Google, Twitter or Facebook. So we proposed following their lead, introducing PAM modules for web services - no more would you have to type in whatever login name IT assigned and try to remember the password you chose the last time password aging forced you to change it - you'd simply choose which web service you wanted to authenticate against, and would login to your Solaris account upon reciept of a cookie from their identity service. Pinning notes to the cloud We also all noted that we all have our own pile of notes we keep in our daily work - in text files in our home directory, in notebooks we carry around, on white boards in offices and common areas, on sticky notes on our monitors, or on scraps of paper pinned to our bulletin boards. The contents of the notes vary, some are things just for us, some are useful for our groups, some we would share with the world. For instance, when our group moved to a new building a couple years ago, we had a white board in the hallway listing all the NIS & DNS servers, subnets, and other network configuration information we needed to set up our Solaris machines after the move. Similarly, as Solaris 11 was finishing and we were all learning the new network configuration commands, we shared notes in wikis and e-mails with our fellow engineers. Users may also remember one of the popular features of Sun's old BigAdmin site was a section for sharing scripts and tips such as these. Meanwhile, the online "pin board" at Pinterest is taking the web by storm. So we thought, why not mash those up to solve this problem? We proposed a new BigAddPin site where users could “pin” notes, command snippets, configuration information, and so on. For instance, once they had worked out the ideal Automated Installation manifest for their app server, they could pin it up to share with the rest of their group, or choose to make it public as an example for the world. Localized data, such as our group's notes on the servers for our subnet, could be shared only to users connecting from that subnet. And notes that they didn't want others to see at all could be marked private, such as the list of phone numbers to call for late night pizza delivery to the machine room, the birthdays and anniversaries they can never remember but would be sleeping on the couch if they forgot, or the list of automatically generated completely random, impossible to remember root passwords to all their servers. For greater integration with Solaris, we'd put support right into the command shells — redirect output to a pinned note, set your path to include pinned notes as scripts you can run, or bring up your recent shell history and pin a set of commands to save for the next time you need to remember how to do that operation. Location service for Solaris servers A longer term plan would involve convincing the hardware design groups to put GPS locators with wireless transmitters in future server designs. This would help both admins and service personnel trying to find servers in todays massive data centers, and could feed into location presence apps to help show potential customers that while they may not see many Solaris machines on the desktop any more, they are all around. For instance, while walking down Wall Street it might show “There are over 2000 Solaris computers in this block.” [Note: this proposal was made before the recent media coverage of a location service aggregrator app with less noble intentions, and in hindsight, we failed to consider what happens when such data similarly falls into the wrong hands. We certainly wouldn't want our app to be misinterpreted as “There are over $20 million dollars of SPARC servers in this building, waiting for you to steal them.” so it's probably best it was rejected.] Harnessing the power of the GPU for Security Most modern OS'es make use of the widespread availability of high powered GPU hardware in today's computers, with desktop environments requiring 3-D graphics acceleration, whether in Ubuntu Unity, GNOME Shell on Fedora, or Aero Glass on Windows, but we haven't yet made Solaris fully take advantage of this, beyond our basic offering of Compiz on the desktop. Meanwhile, more businesses are interested in increasing security by using biometric authentication, but must also comply with laws in many countries preventing discrimination against employees with physical limations such as missing eyes or fingers, not to mention the lost productivity when employees can't login due to tinted contacts throwing off a retina scan or a paper cut changing their fingerprint appearance until it heals. Fortunately, the two groups considering these problems put their heads together and found a common solution, using 3D technology to enable authentication using the one body part all users are guaranteed to have - pam_phrenology.so, a new PAM module that uses an array USB attached web cams (or just one if the user is willing to spin their chair during login) to take pictures of the users head from all angles, create a 3D model and compare it to the one in the authentication database. While Mythbusters has shown how easy it can be to fool common fingerprint scanners, we have not yet seen any evidence that people can impersonate the shape of another user's cranium, no matter how long they spend beating their head against the wall to reshape it. This could possibly be extended to group users, using modern versions of some of the older phrenological studies, such as giving all users with long grey beards access to the System Architect role, or automatically placing users with pointy spikes in their hair into an easy use mode. Unfortunately, there are still some unsolved technical challenges we haven't figured out how to overcome. Currently, a visit to the hair salon causes your existing authentication to expire, and some users have found that shaving their heads is the only way to avoid bad hair days becoming bad login days. Reaction to these ideas After gathering all our notes on these ideas from the engineering brainstorming meeting, we took them in to present to our management. Unfortunately, most of their reaction cannot be printed here, and they chose not to accept any of these ideas as they were, but they did have some feedback for us to consider as they sent us back to the drawing board. They strongly suggested our ideas would be better presented if we weren't trying to decipher ink blotches that had been smeared by the condensation when we put our pint glasses on the napkins we were taking notes on, and to that end let us know they would not be approving any more engineering offsites in Irish themed pubs on the Friday of a Saint Patrick's Day weekend. (Hopefully they mean that situation specifically and aren't going to deny the funding for travel to this year's X.Org Developer's Conference just because it happens to be in Bavaria and ending on the Friday of the weekend Oktoberfest starts.) They recommended our research techniques could be improved over just sitting around reading blogs and checking our Facebook, Twitter, and Pinterest accounts, such as considering input from alternate viewpoints on topics such as gamification. They also mentioned that Oracle hadn't fully adopted some of Sun's common practices and we might have to try harder to get those to be accepted now that we are one unified company. So as I said at the beginning, don't pester your sales rep just yet for any of these, since they didn't get approved, but if you have better ideas, pass them on and maybe they'll get into our next batch of planning.

Read the article
Error data: line 2 column 1 when using pycurl with gzip stream

- by Sagar Hatekar

Thanks for reading. Background: I am trying to read a streaming API feed that returns data in JSON format, and then storing this data to a pymongo collection. The streaming API requires a "Accept-Encoding" : "Gzip" header. What's happening: Code fails on json.loads and outputs - Extra data: line 2 column 1 - line 4 column 1 (char 1891 - 5597) (Refer Error Log below) This does NOT happen while parsing every JSON object - it happens at random. My guess is I am encountering some weird JSON object after every "x" proper JSON objects. I did reference how to use pycurl if requested data is sometimes gzipped, sometimes not? and Encoding error while deserializing a json object from Google but so far have been unsuccessful at resolving this error. Could someone please help me out here? Error Log: Note: The raw dump of the JSON object below is basically using the repr() method. '{"id":"tag:search.twitter.com,2005:207958320747782146","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:493653150","link":"http://www.twitter.com/Deathnews_7_24","displayName":"Death News 7/24","postedTime":"2012-02-16T01:30:12.000Z","image":"http://a0.twimg.com/profile_images/1834408513/deathnewstwittersquare_normal.jpg","summary":"Crashes, Murders, Suicides, Accidents, Crime and Naturals Death News From All Around World","links":[{"href":"http://www.facebook.com/DeathNews724","rel":"me"}],"friendsCount":56,"followersCount":14,"listedCount":1,"statusesCount":1029,"twitterTimeZone":null,"utcOffset":null,"preferredUsername":"Deathnews_7_24","languages":["tr"]},"verb":"post","postedTime":"2012-05-30T22:15:02.000Z","generator":{"displayName":"web","link":"http://twitter.com"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/Deathnews_7_24/statuses/207958320747782146","body":"Kathi Kamen Goldmark, Writers\xe2\x80\x99 Catalyst, Dies at 63 http://t.co/WBsNlNtA","object":{"objectType":"note","id":"object:search.twitter.com,2005:207958320747782146","summary":"Kathi Kamen Goldmark, Writers\xe2\x80\x99 Catalyst, Dies at 63 http://t.co/WBsNlNtA","link":"http://twitter.com/Deathnews_7_24/statuses/207958320747782146","postedTime":"2012-05-30T22:15:02.000Z"},"twitter_entities":{"urls":[{"display_url":"nytimes.com/2012/05/30/boo\xe2\x80\xa6","indices":[52,72],"expanded_url":"http://www.nytimes.com/2012/05/30/books/kathi-kamen-goldmark-writers-catalyst-dies-at-63.html","url":"http://t.co/WBsNlNtA"}],"hashtags":[],"user_mentions":[]},"gnip":{"language":{"value":"en"},"matching_rules":[{"value":"url_contains: nytimes.com","tag":null}],"klout_score":11,"urls":[{"url":"http://t.co/WBsNlNtA","expanded_url":"http://www.nytimes.com/2012/05/30/books/kathi-kamen-goldmark-writers-catalyst-dies-at-63.html?_r=1"}]}}\r\n{"id":"tag:search.twitter.com,2005:207958321003638785","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:178760897","link":"http://www.twitter.com/Mobanu","displayName":"Donald Ochs","postedTime":"2010-08-15T16:33:56.000Z","image":"http://a0.twimg.com/profile_images/1493224811/small_mobany_Logo_normal.jpg","summary":"","links":[{"href":"http://www.mobanuweightloss.com","rel":"me"}],"friendsCount":10272,"followersCount":9698,"listedCount":30,"statusesCount":725,"twitterTimeZone":"Mountain Time (US & Canada)","utcOffset":"-25200","preferredUsername":"Mobanu","languages":["en"],"location":{"objectType":"place","displayName":"Crested Butte, Colorado"}},"verb":"post","postedTime":"2012-05-30T22:15:02.000Z","generator":{"displayName":"twitterfeed","link":"http://twitterfeed.com"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/Mobanu/statuses/207958321003638785","body":"Mobanu: Can Exercise Be Bad for You?: Researchers have found evidence that some people who exercise do worse on ... http://t.co/mTsQlNQO","object":{"objectType":"note","id":"object:search.twitter.com,2005:207958321003638785","summary":"Mobanu: Can Exercise Be Bad for You?: Researchers have found evidence that some people who exercise do worse on ... http://t.co/mTsQlNQO","link":"http://twitter.com/Mobanu/statuses/207958321003638785","postedTime":"2012-05-30T22:15:02.000Z"},"twitter_entities":{"urls":[{"display_url":"nyti.ms/KUmmMa","indices":[116,136],"expanded_url":"http://nyti.ms/KUmmMa","url":"http://t.co/mTsQlNQO"}],"hashtags":[],"user_mentions":[]},"gnip":{"language":{"value":"en"},"matching_rules":[{"value":"url_contains: nytimes.com","tag":null}],"klout_score":12,"urls":[{"url":"http://t.co/mTsQlNQO","expanded_url":"http://well.blogs.nytimes.com/2012/05/30/can-exercise-be-bad-for-you/?utm_medium=twitter&utm_source=twitterfeed"}]}}\r\n' json exception: Extra data: line 2 column 1 - line 4 column 1 (char 1891 - 5597) Header Output: HTTP/1.1 200 OK Content-Type: application/json; charset=UTF-8 Vary: Accept-Encoding Date: Wed, 30 May 2012 22:14:48 UTC Connection: close Transfer-Encoding: chunked Content-Encoding: gzip get_stream.py: #!/usr/bin/env python import sys import pycurl import json import pymongo STREAM_URL = "https://stream.test.com:443/accounts/publishers/twitter/streams/track/Dev.json" AUTH = "userid:passwd" DB_HOST = "127.0.0.1" DB_NAME = "stream_test" class StreamReader: def __init__(self): try: self.count = 0 self.buff = "" self.mongo = pymongo.Connection(DB_HOST) self.db = self.mongo[DB_NAME] self.raw_tweets = self.db["raw_tweets_gnip"] self.conn = pycurl.Curl() self.conn.setopt(pycurl.ENCODING, 'gzip') self.conn.setopt(pycurl.URL, STREAM_URL) self.conn.setopt(pycurl.USERPWD, AUTH) self.conn.setopt(pycurl.WRITEFUNCTION, self.on_receive) self.conn.setopt(pycurl.HEADERFUNCTION, self.header_rcvd) while True: self.conn.perform() except Exception as ex: print "error ocurred : %s" % str(ex) def header_rcvd(self, header_data): print header_data def on_receive(self, data): temp_data = data self.buff += data if data.endswith("\r\n") and self.buff.strip(): try: tweet = json.loads(self.buff, encoding = 'UTF-8') self.buff = "" if tweet: try: self.raw_tweets.insert(tweet) except Exception as insert_ex: print "Error inserting tweet: %s" % str(insert_ex) self.count += 1 if self.count % 10 == 0: print "inserted "+str(self.count)+" tweets" except Exception as json_ex: print "json exception: %s" % str(json_ex) print repr(temp_data) stream = StreamReader()

Read the article
Big GRC: Turning Data into Actionable GRC Intelligence

- by Jenna Danko

While it’s no longer headline news that Governments have carried out large scale data-mining programmes aimed at terrorism detection and identifying other patterns of interest across a wide range of digital data sources, the debate over the ethics and justification over this action, will clearly continue for some time to come. What is becoming clear is that these programmes are a framework for the collation and aggregation of massive amounts of unstructured data and from this, the creation of actionable intelligence from analyses that allowed the analysts to explore and extract a variety of patterns and then direct resources. This data included audio and video chats, phone calls, photographs, e-mails, documents, internet searches, social media posts and mobile phone logs and connections. Although Governance, Risk and Compliance (GRC) professionals are not looking at the implementation of such programmes, there are many similar GRC “Big data” challenges to be faced and potential lessons to be learned from these high profile government programmes that can be applied a lot closer to home. For example, how can GRC professionals collect, manage and analyze an enormous and disparate volume of data to create and manage their own actionable intelligence covering hidden signs and patterns of criminal activity, the early or retrospective, violation of regulations/laws/corporate policies and procedures, emerging risks and weakening controls etc. Not exactly the stuff of James Bond to be sure, but it is certainly more applicable to most GRC professional’s day to day challenges. So what is Big Data and how can it benefit the GRC process? Although it often varies, the definition of Big Data largely refers to the following types of data: Traditional Enterprise Data – includes customer information from CRM systems, transactional ERP data, web store transactions, and general ledger data. Machine-Generated /Sensor Data – includes Call Detail Records (“CDR”), weblogs and trading systems data. Social Data – includes customer feedback streams, micro-blogging sites like Twitter, and social media platforms like Facebook. The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020. But while it’s often the most visible parameter, volume of data is not the only characteristic that matters. In fact, according to sources such as Forrester there are four key characteristics that define big data: Volume. Machine-generated data is produced in much larger quantities than non-traditional data. This is all the data generated by IT systems that power the enterprise. This includes live data from packaged and custom applications – for example, app servers, Web servers, databases, networks, virtual machines, telecom equipment, and much more. Velocity. Social media data streams – while not as massive as machine-generated data – produce a large influx of opinions and relationships valuable to customer relationship management as well as offering early insight into potential reputational risk issues. Even at 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes (over 8 TB per day) need to be managed. Variety. Traditional data formats tend to be relatively well defined by a data schema and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of change. Without question, all GRC professionals work in a dynamic environment and as new services, new products, new business lines are added or new marketing campaigns executed for example, new data types are needed to capture the resultant information. Value. The economic value of data varies significantly. Typically, there is good information hidden amongst a larger body of non-traditional data that GRC professionals can use to add real value to the organisation; the greater challenge is identifying what is valuable and then transforming and extracting that data for analysis and action. For example, customer service calls and emails have millions of useful data points and have long been a source of information to GRC professionals. Those calls and emails are critical in helping GRC professionals better identify hidden patterns and implement new policies that can reduce the amount of customer complaints. Now on a scale and depth far beyond those in place today, all that unstructured call and email data can be captured, stored and analyzed to reveal the reasons for the contact, perhaps with the aggregated customer results cross referenced against what is being said about the organization or a similar peer organization on social media. The organization can then take positive actions, communicating to the market in advance of issues reaching the press, strengthening controls, adjusting risk profiles, changing policy and procedures and completely minimizing, if not eliminating, complaints and compensation for that specific reason in the future. In this one example of many similar ones, the GRC team(s) has demonstrated real and tangible business value. Big Challenges - Big Opportunities As pointed out by recent Forrester research, high performing companies (those that are growing 15% or more year-on-year compared to their peers) are taking a selective approach to investing in Big Data. "Tomorrow's winners understand this, and they are making selective investments aimed at specific opportunities with tangible benefits where big data offers a more economical solution to meet a need." (Forrsights Strategy Spotlight: Business Intelligence and Big Data, Q4 2012) As pointed out earlier, with the ever increasing volume of regulatory demands and fines for getting it wrong, limited resource availability and out of date or inadequate GRC systems all contributing to a higher cost of compliance and/or higher risk profile than desired – a big data investment in GRC clearly falls into this category. However, to make the most of big data organizations must evolve both their business and IT procedures, processes, people and infrastructures to handle these new high-volume, high-velocity, high-variety sources of data and be able integrate them with the pre-existing company data to be analyzed. GRC big data clearly allows the organization access to and management over a huge amount of often very sensitive information that although can help create a more risk intelligent organization, also presents numerous data governance challenges, including regulatory compliance and information security. In addition to client and regulatory demands over better information security and data protection the sheer amount of information organizations deal with the need to quickly access, classify, protect and manage that information can quickly become a key issue from a legal, as well as technical or operational standpoint. However, by making information governance processes a bigger part of everyday operations, organizations can make sure data remains readily available and protected. The Right GRC & Big Data Partnership Becomes Key The "getting it right first time" mantra used in so many companies remains essential for any GRC team that is sponsoring, helping kick start, or even overseeing a big data project. To make a big data GRC initiative work and get the desired value, partnerships with companies, who have a long history of success in delivering successful GRC solutions as well as being at the very forefront of technology innovation, becomes key. Clearly solutions can be built in-house more cheaply than through vendor, but as has been proven time and time again, when it comes to self built solutions covering AML and Fraud for example, few have able to scale or adapt appropriately to meet the changing regulations or challenges that the GRC teams face on a daily basis. This has led to the creation of GRC silo’s that are causing so many headaches today. The solutions that stand out and should be explored are the ones that can seamlessly merge the traditional world of well-known data, analytics and visualization with the new world of seemingly innumerable data sources, utilizing Big Data technologies to generate new GRC insights right across the enterprise.Ultimately, Big Data is here to stay, and organizations that embrace its potential and outline a viable strategy, as well as understand and build a solid analytical foundation, will be the ones that are well positioned to make the most of it. A Blueprint and Roadmap Service for Big Data Big data adoption is first and foremost a business decision. As such it is essential that your partner can align your strategies, goals, and objectives with an architecture vision and roadmap to accelerate adoption of big data for your environment, as well as establish practical, effective governance that will maintain a well managed environment going forward. Key Activities: While your initiatives will clearly vary, there are some generic starting points the team and organization will need to complete: Clearly define your drivers, strategies, goals, objectives and requirements as it relates to big data Conduct a big data readiness and Information Architecture maturity assessment Develop future state big data architecture, including views across all relevant architecture domains; business, applications, information, and technology Provide initial guidance on big data candidate selection for migrations or implementation Develop a strategic roadmap and implementation plan that reflects a prioritization of initiatives based on business impact and technology dependency, and an incremental integration approach for evolving your current state to the target future state in a manner that represents the least amount of risk and impact of change on the business Provide recommendations for practical, effective Data Governance, Data Quality Management, and Information Lifecycle Management to maintain a well-managed environment Conduct an executive workshop with recommendations and next steps There is little debate that managing risk and data are the two biggest obstacles encountered by financial institutions. Big data is here to stay and risk management certainly is not going anywhere, and ultimately financial services industry organizations that embrace its potential and outline a viable strategy, as well as understand and build a solid analytical foundation, will be best positioned to make the most of it. Matthew Long is a Financial Crime Specialist for Oracle Financial Services. He can be reached at matthew.long AT oracle.com.

Read the article
SQL Server 2005 Blocking Problem (ASYNC_NETWORK_IO)

- by ivankolo

I am responsible for a third-party application (no access to source) running on IIS and SQL Server 2005 (500 concurrent users, 1TB data, 8 IIS servers). We have recently started to see significant blocking on the database (after months of running this application in production with no problems). This occurs at random intervals during the day, approximately every 30 minutes, and affects between 20 and 100 sessions each time. All of the sessions eventually hit the application time out and the sessions abort. The problem disappears and then gradually re-emerges. The SPID responsible for the blocking always has the following features: WAIT TYPE = ASYNC_NETWORK_IO The SQL being run is “(@claimid varchar(15))SELECT claimid, enrollid, status, orgclaimid, resubclaimid, primaryclaimid FROM claim WHERE primaryclaimid = @claimid AND primaryclaimid < claimid)”. This is relatively innocuous SQL that should only return one or two records, not a large dataset. NO OTHER SQL statements have been implicated in the blocking, only this SQL statement. This is parameterized SQL for which an execution plan is cached in sys.dm_exec_cached_plans. This SPID has an object-level S lock on the claim table, so all UPDATEs/INSERTs to the claim table are also blocked. HOST ID varies. Different web servers are responsible for the blocking sessions. E.g., sometimes we trace back to web server 1, sometimes web server 2. When we trace back to the web server implicated in the blocking, we see the following: There is always some sort of application related error in the Event Log on the web server, linked to the Host ID and Host Process ID from the SQL Session. The error messages vary, usually some sort of SystemOutofMemory. (These error messages seem to be similar to error messages that we have seen in the past without such dramatic consequences. We think was happening before, but didn’t lead to blocking. Why now?) No known problems with the network adapters on either the web servers or the SQL server. (In any event the record set returned by the offending query would be small.) Things ruled out: Indexes are regularly defragmented. Statistics regularly updated. Increased sample size of statistics on claim.primaryclaimid. Forced recompilation of the cached execution plan. Created a compound index with primaryclaimid, claimid. No networking problems. No known issues on the web server. No changes to application software on web servers. We hypothesize that the chain of events goes something like this: Web server process submits SQL above. SQL server executes the SQL, during which it acquires a lock on the claim table. Web server process gets an error and dies. SQL server session is hung waiting for the web server process to read the data set. SQL Server sessions that need to get X locks on parts of the claim table (anyone processing claims) are blocked by the lock on the claim table and remain blocked until they all hit the application time out. Any suggestions for troubleshooting while waiting for the vendor's assistance would be most welcome. Is there a way to force SQL Server to lock at the row/page level for this particular SQL statement only? Is there a way to set a threshold on ASYNC_NETWORK_IO waits only?

Read the article
I finished coding my program. What's next? what are the steps I should take that would enable me to

- by Luay

Hi, I finished developing a program and would like to sell it online. However, I am not really sure of what to do next. Here is my current plan (in-order): 1- Add a 'deployment' project (i'm using visual studio) to my project so I can create a setup file for my program. 2- Use visual studio 'testing' add ons to test the program. I have no idea how to do it, but will teach myself. 3- When all is done, install the program on my wife's, parents and in-laws computer to further test it under different environments. 3- Setup a small Ltd. company. ( I might start this earlier as it might take a few weeks to complete). 4- Build / finish building a website (actually I started on this step a few weeks ago but haven't devoted enough time for it to complete yet). 5- Find a software / service to protect my program from piracy. I know it is impossible to protect the program from piracy. I understand that fully. But I still want to implement some sort of solution that wouldn't harm or deter honest customers. 6- Set up a business paypal account. 7- Find a software / service to handle payments on my website 8- Find a software / service to handle issuing license codes 9- Set up all of the above and go 'live'. However, I have zero experience in this sort of thing and would like your guidance on some points. 1- Is it necessary to setup a company? Can I sell as an individual? Will selling as an individual deter persons or companies from buying my software? 2- What is a decent choice for as a software / service to protect my program from my piracy. I have done a quick search and found something called Quick License Manager by Interactive Studios. Is this the sort of thing I am looking for? Is it any good? At which stage do I use or implement their service (or any similar service you might point me too? I guess I am really asking: how does this work? Would they give me a file I just add to my project, rebuild it and that's it? 3- What should I implement to handle payments online and how complicated is it? could you point me to any such services, please? 4- What should I implement to handle issuing license codes and how will that software / service coordinate with the software / service that handles the anti-piracy stuff? could you point me to such services, please? 5- In a different Stack Overflow question (by another user) someone suggested a cms system called Software Droid (http://www.softwaredroid.com). Is this what I am after. Will it help? 6- If the steps i'm following are incorrect in order or logic could you please advise me on what I should actually do? I guess these are question for those of you who have 'been there and done that'. So any guidance will be vary much appreciated. Many thanks in advance for your help.

Read the article
Detecting Acceleration in a car (iPhone Accelerometer)

- by TheGazzardian

Hello, I am working on an iPhone app where we are trying to calculate the acceleration of a moving car. Similar apps have accomplished this (Dynolicious), but the difference is that this app is designed to be used during general city driving, not on a drag strip. This leads us to one big concern that Dynolicious was luckily able to avoid: hills. Yes, hills. There are two important stages to this: calibration, and actual driving. Our initial run was simple and suffered the consequences. During the calibration stage, I took the average force on the phone, and during running, I just subtracted the average force from the current force to get the current acceleration this frame. The problem with this is that the typical car receives much more force than just the forward force - everything from turning to potholes was causing the values to go out of sync with what was really happening. The next run was to add the condition that the iPhone must be oriented in such a way that the screen was facing toward the back of the car. Using this method, I attempted to follow only force on the z-axis, but this obviously lead to problems unless the iPhone was oriented directly upright, because of gravity. Some trigonometry later, and I had managed to work gravity out of the equation, so that the car was actually being read very, very well by the iPhone. Until I hit a slope. As soon as the angle of the car changed, suddenly I was receiving accelerations and decelerations that didn't make sense, and we were once again going out of sync. Talking with someone a lot smarter than me at math lead to a solution that I have been trying to implement for longer than I would like to admit. It's steps are as follows: 1) During calibration, measure gravity as a vector instead of a size. Store that vector. 2) When the car initially moves forward, take the vector of motion and subtract gravity. Use this as the forward momentum. (Ignore, for now, the user cases where this will be difficult and let's concentrate on the math :) 3) From the forward vector and the gravity vector, construct a plane. 4) Whenever a force is received, project it onto said plane to get rid of sideways force/etc. 5) Then, use that force, the known magnitude of gravity, and the known direction of forward motion to essentially solve a triangle to get the forward vector. The problem that is causing the most difficulty in this new system is not step 5, which I have gotten to the point where all the numbers look as they should. The difficult part is actually the detection of the forward vector. I am selecting vectors whose magnitude exceeds gravity, and from there, averaging them and subtracting gravity. (I am doing some error checking to make sure that I am not using a force just because the iPhone accelerometer was off by a bit, which happens more frequently than I would like). But if I plot these vectors that I am using, they actually vary by an angle of about 20-30 degrees, which can lead to some strong inaccuracies. The end result is that the app is even more inaccurate now than before. So basically - all you math and iPhone brains out there - any glaring errors? Any potentially better solutions? Any experience that could be useful at all? Award: offering a bounty of $250 to the first answer that leads to a solution.

Read the article
CodePlex Daily Summary for Friday, August 31, 2012

CodePlex Daily Summary for Friday, August 31, 2012Popular ReleasesStartComp: Beta Release 1.0.0: Beta Release 1 Featured Content Bing-Search has been removed Window anchor implemented The listview can now be configured to be shown in details view or tile view through the context menu The listview now allows sorting through the context menu The view, sort order and sort column are now saved for each repository The listview now shows the background image in the lower right The listview now shows a background image for the user defined repositories Added a "Tell-A-Friend" bu...SharePoint Column & View Permission: SharePoint Column and View Permission v1.2: Version 1.2 of this project. If you will find any bugs please let me know at enti@zoznam.sk or post your findings in Issue TrackerDotNetNuke® Form and List: 06.00.04: DotNetNuke Form and List 06.00.04 Don't forget to backup your installation before upgrade. Changes in 06.00.04 Fix: Sql Scripts for 6.003 missed object qualifiers within stored procedures Fix: added missing resource "cmdCancel.Text" in form.ascx.resx Changes in 06.00.03 Fix: MakeThumbnail was broken if the application pool was configured to .Net 4 Change: Data is now stored in nvarchar(max) instead of ntext Changes in 06.00.02 The scripts are now compatible with SQL Azure, tested in a ne...DotNetNuke Translator: 01.00.00 Beta: First release of the project.Audio Pitch & Shift: Audio Pitch And Shift 5.1.0.2: fixed several issues with streaming modeUrlPager: UrlPager 1.2: Fixed bug in which url parameters will lost after paging; ????????url???bug；EntLib.com????????: EntLib.com???????? v3.0: EntLib eCommerce Solution ???Microsoft .Net Framework?????????????????????。Coevery - Free CRM: Coevery 1.0.0.24: Add a sample database, and installation instructions.NicAudio: NicAudio 2.0.6: ac3,dts Solved some initialization issues with no-linear decode.ExpressProfiler: Initial release of ExpressProfiler v1.2: This is initial release of ExpressProfilerNabu Library: 2012-08-29, 14: .Net Framework 4.0, .Net Framework 4.5 Debug and Release builds.Math.NET Numerics: Math.NET Numerics v2.2.1: Major linear algebra rework since v2.1, now available on Codeplex as well (previous versions were only available via NuGet). Since v2.2.0: Student-T density more robust for very large degrees of freedom Sparse Kronecker product much more efficient (now leverages sparsity) Direct access to raw matrix storage implementations for advanced extensibility Now also separate package for signed core library with a strong name (we dropped strong names in v2.2.0) Also available as NuGet packages...Microsoft SQL Server Product Samples: Database: AdventureWorks Databases – 2012, 2008R2 and 2008: About this release This release consolidates AdventureWorks databases for SQL Server 2012, 2008R2 and 2008 versions to one page. Each zip file contains an mdf database file and ldf log file. This should make it easier to find and download AdventureWorks databases since all OLTP versions are on one page. There are no database schema changes. For each release of the product, there is a light-weight and full version of the AdventureWorks sample database. The light-weight version is denoted by ...DotNetNuke® Blog: 05.00.00: Version 5.0.0 - Final This version of the module requires DotNetNuke Core 6.2 or greater. FYI: Developers should be aware that the module uses Visual Studio 2010 only. Release Highlights: Corrected blog comment sorting problem. 20228 - Integrated with the core Journal API. 20789, 21988 - wired in fix submitted by J Sheely around blank author names. 20210 - Updated manifest to 5.0 format (from 3.0). Automated packaging and made project structure more inline with other DotNetNuke m...Christoc's DotNetNuke Module Development Template: DotNetNuke Project Templates V1.1 for VS2012: This release is specifically for Visual Studio 2012 Support, distributed through the Visual Studio Extensions gallery at http://visualstudiogallery.msdn.microsoft.com/ After you build in Release mode the installable packages (source/install) can be found in the INSTALL folder now, within your module's folder, not the packages folder anymore Check out the blog post for all of the details about this release. http://www.dotnetnuke.com/Resources/Blogs/EntryId/3471/New-Visual-Studio-2012-Projec...Home Access Plus+: v8.0: v8.0828.1800 RELEASE CHANGED TO BETA Any issues, please log them on http://www.edugeek.net/forums/home-access-plus/ This is full release, NO upgrade ZIP will be provided as most files require replacing. To upgrade from a previous version, delete everything but your AppData folder, extract all but the AppData folder and run your HAP+ install Documentation is supplied in the Web Zip The Quota Services require executing a script to register the service, this can be found in there install di...Phalanger - The PHP Language Compiler for the .NET Framework: 3.0.0.3406 (September 2012): New features: Extended ReflectionClass libxml error handling, constants DateTime::modify(), DateTime::getOffset() TreatWarningsAsErrors MSBuild option OnlyPrecompiledCode configuration option; allows to use only compiled code Fixes: ArgsAware exception fix accessing .NET properties bug fix ASP.NET session handler fix for OutOfProc mode DateTime methods (WordPress posting fix) Phalanger Tools for Visual Studio: Visual Studio 2010 & 2012 New debugger engine, PHP-like debugging ...MabiCommerce: MabiCommerce 1.0.1: What's NewSetup now creates shortcuts Fix spelling errors Minor enhancement to the Map window.ScintillaNET: ScintillaNET 2.5.2: This release has been built from the 2.5 branch. Version 2.5.2 is functionally identical to the 2.5.1 release but also includes the XML documentation comments file generated by Visual Studio. It is not 100% comprehensive but it will give you Visual Studio IntelliSense for a large part of the API. Just make sure the ScintillaNET.xml file is in the same folder as the ScintillaNET.dll reference you're using in your projects. (The XML file does not need to be distributed with your application)....BlackJumboDog: Ver5.7.1: 2012.08.25 Ver5.7.1 (1)?????·?????LING?????????????? (2)SMTP???（????）????、?????\?????????????????????New ProjectsAbcLibrary: A Library of methods and class types used for ABCAprendendo Windows 8: Não foi feito nada ainda...Auto fill template generator (word): This program was designed to help the automate generation of files using keywords.ClarkTestCodePlex2: clark test Code Razor: This tools translates Razor files to code. This allows the Razor views to be compiled and shared across projects.Contrib.Mod.ChangePassword: It is an evil module that abuses users rights and lets you change anyone's password.CurrentConsumption: CurrentConsumptionDbSettings - An API to store settings in a database: This stores settings in an OleDb/Sql database using an API similar to ApplicationSettingsBase. Settings vary by app, version, user.JCI prototipos: summaryMemberAdminService: This is a test projectMeteor Rendering Engine: The Meteor rendering engine is developed in C# with XNA 4.0, and provides various rendering outputs for 3D scenes.Mod.Colorbox: Orchard module for Mod.ColorboxMogulTestProject1: papaMogulTestTRY: papaosmm: this is a sample test projectServer Survey: Server Survey ScriptShops' Cloud: This project is a Cloud Platform for Mini Shops' Daily Management.Simple Grocery 5: This is a very simple application to help me (or you) out setting up a grocery list and use it on the food market using ALL smart phones or tablets.Tikun Korim: Community site to help people learn to read in Sefer Torah. This project is going to use ASP.NET MVC 4 and as much as open source project as we can. TreeCreeper: TreeCreeper programs (Spatial and NonSpatial) support the taxonomic analysis of species assemblagesVisual Studio Icon Patcher: Visual Studio Icon Patcher allows you to update Visual Studio 2012 with the Solution Explorer icons from Visual Studio 2010.WPT Generator: WPT Generator HTML5 , Google API 3.0, Javascript and CSS 3.0 Web Application for generating a WPT file (Ozi Explorer Format).

Read the article
When someone deletes a shared data source in SSRS

- by Rob Farley

SQL Server Reporting Services plays nicely. You can have things in the catalogue that get shared. You can have Reports that have Links, Datasets that can be used across different reports, and Data Sources that can be used in a variety of ways too. So if you find that someone has deleted a shared data source, you potentially have a bit of a horror story going on. And this works for this month’s T-SQL Tuesday theme, hosted by Nick Haslam, who wants to hear about horror stories. I don’t write about LobsterPot client horror stories, so I’m writing about a situation that a fellow MVP friend asked me about recently instead. The best thing to do is to grab a recent backup of the ReportServer database, restore it somewhere, and figure out what’s changed. But of course, this isn’t always possible. And it’s much nicer to help someone with this kind of thing, rather than to be trying to fix it yourself when you’ve just deleted the wrong data source. Unfortunately, it lets you delete data sources, without trying to scream that the data source is shared across over 400 reports in over 100 folders, as was the case for my friend’s colleague. So, suddenly there’s a big problem – lots of reports are failing, and the time to turn it around is small. You probably know which data source has been deleted, but getting the shared data source back isn’t the hard part (that’s just a connection string really). The nasty bit is all the re-mapping, to get those 400 reports working again. I know from exploring this kind of stuff in the past that the ReportServer database (using its default name) has a table called dbo.Catalog to represent the catalogue, and that Reports are stored here. However, the information about what data sources these deployed reports are configured to use is stored in a different table, dbo.DataSource. You could be forgiven for thinking that shared data sources would live in this table, but they don’t – they’re catalogue items just like the reports. Let’s have a look at the structure of these two tables (although if you’re reading this because you have a disaster, feel free to skim past). Frustratingly, there doesn’t seem to be a Books Online page for this information, sorry about that. I’m also not going to look at all the columns, just ones that I find interesting enough to mention, and that are related to the problem at hand. These fields are consistent all the way through to SQL Server 2012 – there doesn’t seem to have been any changes here for quite a while. dbo.Catalog The Primary Key is ItemID. It’s a uniqueidentifier. I’m not going to comment any more on that. A minor nice point about using GUIDs in unfamiliar databases is that you can more easily figure out what’s what. But foreign keys are for that too… Path, Name and ParentID tell you where in the folder structure the item lives. Path isn’t actually required – you could’ve done recursive queries to get there. But as that would be quite painful, I’m more than happy for the Path column to be there. Path contains the Name as well, incidentally. Type tells you what kind of item it is. Some examples are 1 for a folder and 2 a report. 4 is linked reports, 5 is a data source, 6 is a report model. I forget the others for now (but feel free to put a comment giving the full list if you know it). Content is an image field, remembering that image doesn’t necessarily store images – these days we’d rather use varbinary(max), but even in SQL Server 2012, this field is still image. It stores the actual item definition in binary form, whether it’s actually an image, a report, whatever. LinkSourceID is used for Linked Reports, and has a self-referencing foreign key (allowing NULL, of course) back to ItemID. Parameter is an ntext field containing XML for the parameters of the report. Not sure why this couldn’t be a separate table, but I guess that’s just the way it goes. This field gets changed when the default parameters get changed in Report Manager. There is nothing in dbo.Catalog that describes the actual data sources that the report uses. The default data sources would be part of the Content field, as they are defined in the RDL, but when you deploy reports, you typically choose to NOT replace the data sources. Anyway, they’re not in this table. Maybe it was already considered a bit wide to throw in another ntext field, I’m not sure. They’re in dbo.DataSource instead. dbo.DataSource The Primary key is DSID. Yes it’s a uniqueidentifier... ItemID is a foreign key reference back to dbo.Catalog Fields such as ConnectionString, Prompt, UserName and Password do what they say on the tin, storing information about how to connect to the particular source in question. Link is a uniqueidentifier, which refers back to dbo.Catalog. This is used when a data source within a report refers back to a shared data source, rather than embedding the connection information itself. You’d think this should be enforced by foreign key, but it’s not. It does allow NULLs though. Flags this is an int, and I’ll come back to this. When a Data Source gets deleted out of dbo.Catalog, you might assume that it would be disallowed if there are references to it from dbo.DataSource. Well, you’d be wrong. And not because of the lack of a foreign key either. Deleting anything from the catalogue is done by calling a stored procedure called dbo.DeleteObject. You can look at the definition in there – it feels very much like the kind of Delete stored procedures that many people write, the kind of thing that means they don’t need to worry about allowing cascading deletes with foreign keys – because the stored procedure does the lot. Except that it doesn’t quite do that. If it deleted everything on a cascading delete, we’d’ve lost all the data sources as configured in dbo.DataSource, and that would be bad. This is fine if the ItemID from dbo.DataSource hooks in – if the report is being deleted. But if a shared data source is being deleted, you don’t want to lose the existence of the data source from the report. So it sets it to NULL, and it marks it as invalid. We see this code in that stored procedure. UPDATE [DataSource] SET [Flags] = [Flags] & 0x7FFFFFFD, -- broken link [Link] = NULL FROM [Catalog] AS C INNER JOIN [DataSource] AS DS ON C.[ItemID] = DS.[Link] WHERE (C.Path = @Path OR C.Path LIKE @Prefix ESCAPE '*') Unfortunately there’s no semi-colon on the end (but I’d rather they fix the ntext and image types first), and don’t get me started about using the table name in the UPDATE clause (it should use the alias DS). But there is a nice comment about what’s going on with the Flags field. What I’d LIKE it to do would be to set the connection information to a report-embedded copy of the connection information that’s in the shared data source, the one that’s about to be deleted. I understand that this would cause someone to lose the benefit of having the data sources configured in a central point, but I’d say that’s probably still slightly better than LOSING THE INFORMATION COMPLETELY. Sorry, rant over. I should log a Connect item – I’ll put that on my todo list. So it sets the Link field to NULL, and marks the Flags to tell you they’re broken. So this is your clue to fixing it. A bitwise AND with 0x7FFFFFFD is basically stripping out the ‘2’ bit from a number. So numbers like 2, 3, 6, 7, 10, 11, etc, whose binary representation ends in either 11 or 10 get turned into 0, 1, 4, 5, 8, 9, etc. We can test for it using a WHERE clause that matches the SET clause we’ve just used. I’d also recommend checking for Link being NULL and also having no ConnectionString. And join back to dbo.Catalog to get the path (including the name) of broken reports are – in case you get a surprise from a different data source being broken in the past. SELECT c.Path, ds.Name FROM dbo.[DataSource] AS ds JOIN dbo.[Catalog] AS c ON c.ItemID = ds.ItemID WHERE ds.[Flags] = ds.[Flags] & 0x7FFFFFFD AND ds.[Link] IS NULL AND ds.[ConnectionString] IS NULL; When I just ran this on my own machine, having deleted a data source to check my code, I noticed a Report Model in the list as well – so if you had thought it was just going to be reports that were broken, you’d be forgetting something. So to fix those reports, get your new data source created in the catalogue, and then find its ItemID by querying Catalog, using Path and Name to find it. And then use this value to fix them up. To fix the Flags field, just add 2. I prefer to use bitwise OR which should do the same. Use the OUTPUT clause to get a copy of the DSIDs of the ones you’re changing, just in case you need to revert something later after testing (doing it all in a transaction won’t help, because you’ll just lock out the table, stopping you from testing anything). UPDATE ds SET [Flags] = [Flags] | 2, [Link] = '3AE31CBA-BDB4-4FD1-94F4-580B7FAB939D' /*Insert your own GUID*/ OUTPUT deleted.Name, deleted.DSID, deleted.ItemID, deleted.Flags FROM dbo.[DataSource] AS ds JOIN dbo.[Catalog] AS c ON c.ItemID = ds.ItemID WHERE ds.[Flags] = ds.[Flags] & 0x7FFFFFFD AND ds.[Link] IS NULL AND ds.[ConnectionString] IS NULL; But please be careful. Your mileage may vary. And there’s no reason why 400-odd broken reports needs to be quite the nightmare that it could be. Really, it should be less than five minutes. @rob_farley

Read the article
algorithm q: Fuzzy matching of structured data

- by user86432

I have a fairly small corpus of structured records sitting in a database. Given a tiny fraction of the information contained in a single record, submitted via a web form (so structured in the same way as the table schema), (let us call it the test record) I need to quickly draw up a list of the records that are the most likely matches for the test record, as well as provide a confidence estimate of how closely the search terms match a record. The primary purpose of this search is to discover whether someone is attempting to input a record that is duplicate to one in the corpus. There is a reasonable chance that the test record will be a dupe, and a reasonable chance the test record will not be a dupe. The records are about 12000 bytes wide and the total count of records is about 150,000. There are 110 columns in the table schema and 95% of searches will be on the top 5% most commonly searched columns. The data is stuff like names, addresses, telephone numbers, and other industry specific numbers. In both the corpus and the test record it is entered by hand and is semistructured within an individual field. You might at first blush say "weight the columns by hand and match word tokens within them", but it's not so easy. I thought so too: if I get a telephone number I thought that would indicate a perfect match. The problem is that there isn't a single field in the form whose token frequency does not vary by orders of magnitude. A telephone number might appear 100 times in the corpus or 1 time in the corpus. The same goes for any other field. This makes weighting at the field level impractical. I need a more fine-grained approach to get decent matching. My initial plan was to create a hash of hashes, top level being the fieldname. Then I would select all of the information from the corpus for a given field, attempt to clean up the data contained in it, and tokenize the sanitized data, hashing the tokens at the second level, with the tokens as keys and frequency as value. I would use the frequency count as a weight: the higher the frequency of a token in the reference corpus, the less weight I attach to that token if it is found in the test record. My first question is for the statisticians in the room: how would I use the frequency as a weight? Is there a precise mathematical relationship between n, the number of records, f(t), the frequency with which a token t appeared in the corpus, the probability o that a record is an original and not a duplicate, and the probability p that the test record is really a record x given the test and x contain the same t in the same field? How about the relationship for multiple token matches across multiple fields? Since I sincerely doubt that there is, is there anything that gets me close but is better than a completely arbitrary hack full of magic factors? Barring that, has anyone got a way to do this? I'm especially keen on other suggestions that do not involve maintaining another table in the database, such as a token frequency lookup table :). This is my first post on StackOverflow, thanks in advance for any replies you may see fit to give.

Read the article

Search Results

Search found 867 results on 35 pages for 'vary'.

Page 33/35 | < Previous Page | 29 30 31 32 33 34 35 | Next Page >

- by Rick Strahl

- by Shaun

- by pinaldave

- by Rick Strahl

- by Jesse

- by Manesh Karunakaran

- by Alois Kraus

- by Jesse

- by Shaun

- by jamiet

- by Enric Geijo

- by brainfrog

- by Vincent

- by Rob Young

- by extended_events

- by user281985

- by alanc

- by Sagar Hatekar

- by Jenna Danko

- by ivankolo

- by Luay

- by TheGazzardian

- by Rob Farley

- by user86432

< Previous Page | 29 30 31 32 33 34 35 | Next Page >