usage statistics - Page 239

Integrating external computer into a domain - some recommendations please

- by TomTom

Given: * A multi loation company. Every office has local routers that connect to a central VPN capable rouer in a data center. All fine so far. We now need to move a computer off site into a hosting center across the globe, to get it closer to some supplier computers we work for. it will run limited logic but latency is important, and our latency so far is too large. This computer will be in a data center and does no require incoming connections except for adminsitrative purposes, although it needs outgoing connetions. I have no real chance to put one of my VPN routers there, sadly - otherwise I would have no problem. Usage of RRAs is not recommended (we had various probblems there over time). I could deal with it. The computer MUSt integrate into the corporate structure via VPN and join the domain and be fully "tracked" (controlled for performance). What is the best suggestion? So far it looks like my best bets woudl be to log in via RRAS and deal with whatever issues arise there plus uise the local firewall the limit incoming connections to this computer to what is needed (which runs down to an emergency RDP connection allowance). Anyone a better idea?

Read the article

Computers on network crashing

- by Phil Cross

We have recently upgraded our network to Windows 7 clients with Windows server 2008 servers. The upgrade was completed by the end of September and until now has been fine (apart from the minor bugs). Recently (within the last 2 weeks) we've notice all computers on the network (around 1000) start to slow down to the point their unusable. It starts at about 08:45 and finishes at 09:15. Because of this, we think something may be broadcasting across the network. This happens every day, between these times. I cant use my computer at all at the slowdown peak, and looking at task managers performance graphs, Physical memory is hovering around 35% and CPU usage is at 0-10% (idle) yet still crashing. I've looked on DHCPs server log and cant see anything which stands out. The only change we made prior to the slowdown was installing adobe CS6 on some computers, however the slowdown affects computers without CS6. We have 2 physical machines, each with around 5-7 virtual machines running on them with ample memory. Does anyone have any suggestions as to what we can do to narrow down whats causing the crashes? Any help, suggestions or advice would be appreciated.

Read the article

SQLAuthority News – TechEd India – April 12-14, 2010 Bangalore – An Unforgettable Experience – An Op

- by pinaldave

TechEd India was one of the largest Technology events in India led by Microsoft. This event was attended by more than 3,000 technology enthusiasts, making it one of the most well-organized events of the year. Though I attempted to attend almost all the technology events here, I have not seen any bigger or better event in Indian subcontinents other than this. There are 21 Technical Tracks at Tech·Ed India 2010 that span more than 745 learning opportunities. I was fortunate enough to be a part of this whole event as a speaker and a delegate, as well. TechEd India Speaker Badge and A Token of Lifetime Hotel Selection I presented three different sessions at TechEd India and was also a part of panel discussion. (The details of the sessions are given at the end of this blog post.) Due to extensive traveling, I stay away from my family occasionally. For this reason, I took my wife – Nupur and daughter Shaivi (8 months old) to the event along with me. We stayed at the same hotel where the event was organized so as to maximize my time bonding with my family and to have more time in networking with technology community, at the same time. The hotel Lalit Ashok is the largest and most luxurious venue one can find in Bangalore, located in the middle of the city. The cost of the hotel was a bit pricey, but looking at all the advantages, I had decided to ask for a booking there. Hotel Lalit Ashok Nupur Dave and Shaivi Dave Arrival Day – DAY 0 – April 11, 2010 I reached the event a day earlier, and that was one wise decision for I was able to relax a bit and go over my presentation for the next day’s course. I am a kind of person who likes to get everything ready ahead of time. I was also able to enjoy a pleasant evening with several Microsoft employees and my family friends. I even checked out the location where I would be doing presentations the next day. I was fortunate enough to meet Bijoy Singhal from Microsoft who helped me out with a few of the logistics issues that occured the day before. I was not aware of the fact that the very next day he was going to be “The Man” of the TechEd 2010 event. Vinod Kumar from Microsoft was really very kind as he talked to me regarding my subsequent session. He gave me some suggestions which were really helpful that I was able to incorporate them during my presentation. Finally, I was able to meet Abhishek Kant from Microsoft; his valuable suggestions and unlimited passion have inspired many people like me to work with the Community. Pradipta from Microsoft was also around, being extremely busy with logistics; however, in those busy times, he did find some good spare time to have a chat with me and the other Community leaders. I also met Harish Ranganathan and Sachin Rathi, both from Microsoft. It was so interesting to listen to both of them talking about SharePoint. I just have no words to express my overwhelmed spirit because of all these passionate young guys - Pradipta,Vinod, Bijoy, Harish, Sachin and Ahishek (of course!). Map of TechEd India 2010 Event Day 1 – April 12, 2010 From morning until night time, today was truly a very busy day for me. I had two presentations and one panel discussion for the day. Needless to say, I had a few meetings to attend as well. The day started with a keynote from S. Somaseger where he announced the launch of Visual Studio 2010. The keynote area was really eye-catching because of the very large, bigger-than- life uniform screen. This was truly one to show. The title music of the keynote was very interesting and it featured Bijoy Singhal as the model. It was interesting to talk to him afterwards, when we laughed at jokes together about his modeling assignment. TechEd India Keynote Opening Featuring Bijoy TechEd India 2010 Keynote – S. Somasegar Time: 11:15pm – 11:45pm Session 1: True Lies of SQL Server – SQL Myth Buster Following the excellent keynote, I had my very first session on the subject of SQL Server Myth Buster. At first, I was a bit nervous as right after the keynote, for this was my very first session and during my presentation I saw lots of Microsoft Product Team members. Well, it really went well and I had a really good discussion with attendees of the session. I felt that a well begin was half-done and my confidence was regained. Right after the session, I met a few of my Community friends and had meaningful discussions with them on many subjects. The abstract of the session is as follows: In this 30-minute demo session, I am going to briefly demonstrate few SQL Server Myths and their resolutions as I back them up with some demo. This demo presentation is a must-attend for all developers and administrators who would come to the event. This is going to be a very quick yet fun session. Pinal Presenting session at TechEd India 2010 Time: 1:00 PM – 2:00 PM Lunch with Somasegar After the session I went to see my daughter, and then I headed right away to the lunch with S. Somasegar – the keynote speaker and senior vice president of the Developer Division at Microsoft. I really thank to Abhishek who made it possible for us. Because of his efforts, all the MVPs had the opportunity to meet such a legendary person and had to talk with them on Microsoft Technology. Though Somasegar is currently holding such a high position in Microsoft, he is very polite and a real gentleman, and how I wish that everybody in industry is like him. Believe me, if you spread love and kindness, then that is what you will receive back. As soon as lunch time was over, I ran to the session hall as my second presentation was about to start. Time: 2:30pm – 3:30pm Session 2: Master Data Services in Microsoft SQL Server 2008 R2 Business Intelligence is a subject which was widely talked about at TechEd. Everybody was interested in this subject, and I did not excuse myself from this great concept as well. I consider myself fortunate as I was presenting on the subject of Master Data Services at TechEd. When I had initially learned this subject, I had a bit of confusion about the usage of this tool. Later on, I decided that I would tackle about how we all developers and DBAs are not able to understand something so simple such as this, and even worst, creating confusion about the technology. During system designing, it is very important to have a reference material or master lookup tables. Well, I talked about the same subject and presented the session keeping that as my center talk. The session went very well and I received lots of interesting questions. I got many compliments for talking about this subject on the real-life scenario. I really thank Rushabh Mehta (CEO, Solid Quality Mentors India) for his supportive suggestions that helped me prepare the slide deck, as well as the subject. Pinal Presenting session at TechEd India 2010 The abstract of the session is as follows: SQL Server Master Data Services will ship with SQL Server 2008 R2 and will improve Microsoft’s platform appeal. This session provides an in-depth demonstration of MDS features and highlights important usage scenarios. Master Data Services enables consistent decision-making process by allowing you to create, manage and propagate changes from a single master view of your business entities. Also, MDS – Master Data-hub which is a vital component, helps ensure the consistency of reporting across systems and deliver faster and more accurate results across the enterprise. We will talk about establishing the basis for a centralized approach to defining, deploying, and managing master data in the enterprise. Pinal Presenting session at TechEd India 2010 The day was still not over for me. I had ran into several friends but we were not able keep our enthusiasm under control about all the rumors saying that SQL Server 2008 R2 was about to be launched tomorrow in the keynote. I then ran to my third and final technical event for the day- a panel discussion with the top technologies of India. Time: 5:00pm – 6:00pm Panel Discussion: Harness the power of Web – SEO and Technical Blogging As I have delivered two technical sessions by this time, I was a bit tired but not less enthusiastic when I had to talk about Blog and Technology. We discussed many different topics there. I told them that the most important aspect for any blog is its content. We discussed in depth the issues with plagiarism and how to avoid it. Another topic of discussion was how we technology bloggers can create awareness in the Community about what the right kind of blogging is and what morally and technically wrong acts are. A couple of questions were raised about what type of liberty a person can have in terms of writing blogs. Well, it was generically agreed that a blog is mainly a representation of our ideas and thoughts; it should not be governed by external entities. As long as one is writing what they really want to say, but not providing incorrect information or not practicing plagiarism, a blogger should be allowed to express himself. This panel discussion was supposed to be over in an hour, but the interest of the participants was remarkable and so it was extended for 30 minutes more. Finally, we decided to bring to a close the discussion and agreed that we will continue the topic next year. TechEd India Panel Discussion on Web, Technology and SEO Surprisingly, the day was just beginning after doing all of these. By this time, I have almost met all the MVP who arrived at the event, as well as many Microsoft employees. There were lots of Community folks present, too. I decided that I would go to meet several friends from the Community and continue to communicate with me on SQLAuthority.com. I also met Abhishek Baxi and had a good talk with him regarding Win Mobile and Twitter. He also took a very quick video of me wherein I spoke in my mother’s tongue, Gujarati. It was funny that I talked in Gujarati almost all the day, but when I was talking in the interview I could not find the right Gujarati words to speak. I think we all think in English when we think about Technology, so as to address universality. After meeting them, I headed towards the Speakers’ Dinner. Time: 8:00 PM – onwards Speakers Dinner The Speakers’ dinner was indeed a wonderful opportunity for all the speakers to get together and relax. We talked so many different things, from XBOX to Hindi Movies, and from SQL to Samosas. I just could not express how much fun I had. After a long evening, when I returned tmy room and met Shaivi, I just felt instantly relaxed. Kids are really gifts from God. Today was a really long but exciting day. So many things happened in just one day: Visual Studio Lanch, lunch with Somasegar, 2 technical sessions, 1 panel discussion, community leaders meeting, speakers dinner and, last but not leas,t playing with my child! A perfect day! Day 2 – April 13, 2010 Today started with a bang with the excellent keynote by Kamal Hathi who launched SQL Server 2008 R2 in India and demonstrated the power of PowerPivot to all of us. 101 Million Rows in Excel brought lots of applause from the audience. Kamal Hathi Presenting Keynote at TechEd India 2010 The day was a bit easier one for me. I had no sessions today and no events planned. I had a few meetings planned for the second day of the event. I sat in the speaker’s lounge for half a day and met many people there. I attended nearly 9 different meetings today. The subjects of the meetings were very different. Here is a list of the topics of the Community-related meetings: SQL PASS and its involvement in India and subcontinents How to start community blogging Forums and developing aptitude towards technology Ahmedabad/Gandhinagar User Groups and their developments SharePoint and SQL Business Meeting – a client meeting Business Meeting – a potential performance tuning project Business Meeting – Solid Quality Mentors (SolidQ) And family friends Pinal Dave at TechEd India The day passed by so quickly during this meeting. In the evening, I headed to Partners Expo with friends and checked out few of the booths. I really wanted to talk about some of the products, but due to the freebies there was so much crowd that I finally decided to just take the contact details of the partner. I will now start sending them with my queries and, hopefully, I will have my questions answered. Nupur and Shaivi had also one meeting to attend; it was with our family friend Vijay Raj. Vijay is also a person who loves Technology and loves it more than anybody. I see him growing and learning every day, but still remaining as a ‘human’. I believe that if someone acquires as much knowledge as him, that person will become either a computer or cyborg. Here, Vijay is still a kind gentleman and is able to stay as our close family friend. Shaivi was really happy to play with Uncle Vijay. Pinal Dave and Vijay Raj Renuka Prasad, a Microsoft MVP, impressed me with his passion and knowledge of SQL. Every time he gives me credit for his success, I believe that he is very humble. He has way more certifications than me and has worked many more years with SQL compared to me. He is an excellent photographer as well. Most of the photos in this blog post have been taken by him. I told him if ever he wants to do a part time job, he can do the photography very well. Pinal Dave and Renuka Prasad I also met L Srividya from Microsoft, whom I was looking forward to meet. She is a bundle of knowledge that everyone would surely learn a lot from her. I was able to get a few minutes from her and well, I felt confident. She enlightened me with SQL Server BI concepts, domain management and SQL Server security and few other interesting details. I also had a wonderful time talking about SharePoint with fellow Solid Quality Mentor Joy Rathnayake. He is very passionate about SharePoint but when you talk .NET and SQL with him, he is still overwhelmingly knowledgeable. In fact, while talking to him, I figured out that the recent training he delivered was on SQL Server 2008 R2. I told him a joke that it hurts my ego as he is more popular now in SQL training and consulting than me. I am sure all of you agree that working with good people is a gift from God. I am fortunate enough to work with the best of the best Industry experts. It was a great pleasure to hang out with my Community friends – Ahswin Kini, HimaBindu Vejella, Vasudev G, Suprotim Agrawal, Dhananjay, Vikram Pendse, Mahesh Dhola, Mahesh Mitkari, Manu Zacharia, Shobhan, Hardik Shah, Ashish Mohta, Manan, Subodh Sohani and Sanjay Shetty (of course!) . (Please let me know if I have met you at the event and forgot your name to list here). Time: 8:00 PM – onwards Community Leaders Dinner After lots of meetings, I headed towards the Community Leaders dinner meeting and met almost all the folks I met in morning. The discussion was almost the same but the real good thing was that we were enjoying it. The food was really good. Nupur was invited in the event, but Shaivi could not come. When Nupur tried to enter the event, she was stopped as Shaivi did not have the pass to enter the dinner. Nupur expressed that Shaivi is only 8 months old and does not eat outside food as well and could not stay by herself at this age, but the door keeper did not agree and asked that without the entry details Shaivi could not go in, but Nupur could. Nupur called me on phone and asked me to help her out. By the time, I was outside; the organizer of the event reached to the door and happily approved Shaivi to join the party. Once in the party, Shaivi had lots of fun meeting so many people. Shaivi Dave and Abhishek Kant Dean Guida (Infragistics President and CEO) and Pinal Dave (SQLAuthority.com) Day 3 – April 14, 2010 Though, it was last day, I was very much excited today as I was about to present my very favorite session. Query Optimization and Performance Tuning is my domain expertise and I make my leaving by consulting and training the same. Today’s session was on the same subject and as an additional twist, another subject about Spatial Database was presented. I was always intrigued with Spatial Database and I have enjoyed learning about it; however, I have never thought about Spatial Indexing before it was decided that I will do this session. I really thank Solid Quality Mentor Dr. Greg Low for his assistance in helping me prepare the slide deck and also review the content. Furthermore, today was really what I call my ‘learning day’ . So far I had not attended any session in TechEd and I felt a bit down for that. Everybody spends their valuable time & money to learn something new and exciting in TechEd and I had not attended a single session at the moment thinking that it was already last day of the event. I did have a plan for the day and I attended two technical sessions before my session of spatial database. I attended 2 sessions of Vinod Kumar. Vinod is a natural storyteller and there was no doubt that his sessions would be jam-packed. People attended his sessions simply because Vinod is syhe speaker. He did not have a single time disappointed audience; he is truly a good speaker. He knows his stuff very well. I personally do not think that in India he can be compared to anyone for SQL. Time: 12:30pm-1:30pm SQL Server Query Optimization, Execution and Debugging Query Performance I really had a fun time attending this session. Vinod made this session very interactive. The entire audience really got into the presentation and started participating in the event. Vinod was presenting a small problem with Query Tuning, which any developer would have encountered and solved with their help in such a fashion that a developer feels he or she have already resolved it. In one question, I was the only one who was ready to answer and Vinod told me in a light tone that I am now allowed to answer it! The audience really found it very amusing. There was a huge crowd around Vinod after the session. Vinod – A master storyteller! Time: 3:45pm-4:45pm Data Recovery / consistency with CheckDB This session was much heavier than the earlier one, and I must say this is my most favorite session I EVER attended in India. In this TechEd I have only attended two sessions, but in my career, I have attended numerous technical sessions not only in India, but all over the world. This session had taken my breath away. One by one, Vinod took the different databases, and started to corrupt them in different ways. Each database has some unique ways to get corrupted. Once that was done, Vinod started to show the DBCC CEHCKDB and demonstrated how it can solve your problem. He finally fixed all the databases with this single tool. I do have a good knowledge of this subject, but let me honestly admit that I have learned a lot from this session. I enjoyed and cheered during this session along with other attendees. I had total satisfaction that, just like everyone, I took advantage of the event and learned something. I am now TECHnically EDucated. Pinal Dave and Vinod Kumar After two very interactive and informative SQL Sessions from Vinod Kumar, the next turn me presenting on Spatial Database and Indexing. I got once again nervous but Vinod told me to stay natural and do my presentation. Well, once I got a huge stage with a total of four projectors and a large crowd, I felt better. Time: 5:00pm-6:00pm Session 3: Developing with SQL Server Spatial and Deep Dive into Spatial Indexing Pinal Presenting session at TechEd India 2010 Pinal Presenting session at TechEd India 2010 I kicked off this session with Michael J Swart‘s beautiful spatial image. This session was the last one for the day but, to my surprise, I had more than 200+ attendees. Slowly, the rain was starting outside and I was worried that the hall would not be full; despite this, there was not a single seat available in the first five minutes of the session. Thanks to all of you for attending my presentation. I had demonstrated the map of world (and India) and quickly explained what Geographic and Geometry data types in Spatial Database are. This session had interesting story of Indexing and Comparison, as well as how different traditional indexes are from spatial indexing. Pinal Presenting session at TechEd India 2010 Due to the heavy rain during this event, the power went off for about 22 minutes (just an accident – nobodies fault). During these minutes, there were no audio, no video and no light. I continued to address the mass of 200+ people without any audio device and PowerPoint. I must thank the audience because not a single person left from the session. They all stayed in their place, some moved closure to listen to me properly. I noticed that the curiosity and eagerness to learn new things was at the peak even though it was the very last session of the TechEd. Everybody wanted get the maximum knowledge out of this whole event. I was touched by the support from audience. They listened and participated in my session even without any kinds of technology (no ppt, no mike, no AC, nothing). During these 22 minutes, I had completed my theory verbally. Pinal Presenting session at TechEd India 2010 After a while, we got the projector back online and we continued with some exciting demos. Many thanks to Microsoft people who worked energetically in background to get the backup power for project up. I had a very interesting demo wherein I overlaid Bangalore and Hyderabad on the India Map and find their aerial distance between them. After finding the aerial distance, we browsed online and found that SQL Server estimates the exact aerial distance between these two cities, as compared to the factual distance. There was a huge applause from the crowd on the subject that SQL Server takes into the count of the curvature of the earth and finds the precise distances based on details. During the process of finding the distance, I demonstrated a few examples of the indexes where I expressed how one can use those indexes to find these distances and how they can improve the performance of similar query. I also demonstrated few examples wherein we were able to see in which data type the Index is most useful. We finished the demos with a few more internal stuff. Pinal Presenting session at TechEd India 2010 Despite all issues, I was mostly satisfied with my presentation. I think it was the best session I have ever presented at any conference. There was no help from Technology for a while, but I still got lots of appreciation at the end. When we ended the session, the applause from the audience was so loud that for a moment, the rain was not audible. I was truly moved by the dedication of the Technology enthusiasts. Pinal Dave After Presenting session at TechEd India 2010 The abstract of the session is as follows: The Microsoft SQL Server 2008 delivers new spatial data types that enable you to consume, use, and extend location-based data through spatial-enabled applications. Attend this session to learn how to use spatial functionality in next version of SQL Server to build and optimize spatial queries. This session outlines the new geography data type to store geodetic spatial data and perform operations on it, use the new geometry data type to store planar spatial data and perform operations on it, take advantage of new spatial indexes for high performance queries, use the new spatial results tab to quickly and easily view spatial query results directly from within Management Studio, extend spatial data capabilities by building or integrating location-enabled applications through support for spatial standards and specifications and much more. Time: 8:00 PM – onwards Dinner by Sponsors After the lively session during the day, there was another dinner party courtesy of one of the sponsors of TechEd. All the MVPs and several Community leaders were present at the dinner. I would like to express my gratitude to Abhishek Kant for organizing this wonderful event for us. It was a blast and really relaxing in all angles. We all stayed there for a long time and talked about our sweet and unforgettable memories of the event. Pinal Dave and Bijoy Singhal It was really one wonderful event. After writing this much, I say that I have no words to express about how much I enjoyed TechEd. However, it is true that I shared with you only 1% of the total activities I have done at the event. There were so many people I have met, yet were not mentioned here although I wanted to write their names here, too . Anyway, I have learned so many things and up until now, I am not able to get over all the fun I had in this event. Pinal Dave at TechEd India 2010 The Next Days – April 15, 2010 – till today I am still not able to get my mind out of the whole experience I had at TechEd India 2010. It was like a whole Microsoft Family working together to celebrate a happy occasion. TechEd India – Truly An Unforgettable Experience! Reference : Pinal Dave (http://blog.SQLAuthority.com) Filed under: About Me, MVP, Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Author Visit, SQLAuthority News, SQLServer, T SQL, Technology Tagged: TechEd, TechEdIn

Read the article

Capturing and Transforming ASP.NET Output with Response.Filter

- by Rick Strahl

During one of my Handlers and Modules session at DevConnections this week one of the attendees asked a question that I didn’t have an immediate answer for. Basically he wanted to capture response output completely and then apply some filtering to the output – effectively injecting some additional content into the page AFTER the page had completely rendered. Specifically the output should be captured from anywhere – not just a page and have this code injected into the page. Some time ago I posted some code that allows you to capture ASP.NET Page output by overriding the Render() method, capturing the HtmlTextWriter() and reading its content, modifying the rendered data as text then writing it back out. I’ve actually used this approach on a few occasions and it works fine for ASP.NET pages. But this obviously won’t work outside of the Page class environment and it’s not really generic – you have to create a custom page class in order to handle the output capture. [updated 11/16/2009 – updated ResponseFilterStream implementation and a few additional notes based on comments] Enter Response.Filter However, ASP.NET includes a Response.Filter which can be used – well to filter output. Basically Response.Filter is a stream through which the OutputStream is piped back to the Web Server (indirectly). As content is written into the Response object, the filter stream receives the appropriate Stream commands like Write, Flush and Close as well as read operations although for a Response.Filter that’s uncommon to be hit. The Response.Filter can be programmatically replaced at runtime which allows you to effectively intercept all output generation that runs through ASP.NET. A common Example: Dynamic GZip Encoding A rather common use of Response.Filter hooking up code based, dynamic GZip compression for requests which is dead simple by applying a GZipStream (or DeflateStream) to Response.Filter. The following generic routines can be used very easily to detect GZip capability of the client and compress response output with a single line of code and a couple of library helper routines: WebUtils.GZipEncodePage(); which is handled with a few lines of reusable code and a couple of static helper methods: /// <summary> ///Sets up the current page or handler to use GZip through a Response.Filter ///IMPORTANT: ///You have to call this method before any output is generated! /// </summary> public static void GZipEncodePage() { HttpResponse Response = HttpContext.Current.Response; if(IsGZipSupported()) { stringAcceptEncoding = HttpContext.Current.Request.Headers["Accept-Encoding"]; if(AcceptEncoding.Contains("deflate")) { Response.Filter = newSystem.IO.Compression.DeflateStream(Response.Filter, System.IO.Compression.CompressionMode.Compress); Response.AppendHeader("Content-Encoding", "deflate"); } else { Response.Filter = newSystem.IO.Compression.GZipStream(Response.Filter, System.IO.Compression.CompressionMode.Compress); Response.AppendHeader("Content-Encoding", "gzip"); } } // Allow proxy servers to cache encoded and unencoded versions separately Response.AppendHeader("Vary", "Content-Encoding"); } /// <summary> /// Determines if GZip is supported /// </summary> /// <returns></returns> public static bool IsGZipSupported() { string AcceptEncoding = HttpContext.Current.Request.Headers["Accept-Encoding"]; if (!string.IsNullOrEmpty(AcceptEncoding) && (AcceptEncoding.Contains("gzip") || AcceptEncoding.Contains("deflate"))) return true; return false; } GZipStream and DeflateStream are streams that are assigned to Response.Filter and by doing so apply the appropriate compression on the active Response. Response.Filter content is chunked So to implement a Response.Filter effectively requires only that you implement a custom stream and handle the Write() method to capture Response output as it’s written. At first blush this seems very simple – you capture the output in Write, transform it and write out the transformed content in one pass. And that indeed works for small amounts of content. But you see, the problem is that output is written in small buffer chunks (a little less than 16k it appears) rather than just a single Write() statement into the stream, which makes perfect sense for ASP.NET to stream data back to IIS in smaller chunks to minimize memory usage en route. Unfortunately this also makes it a more difficult to implement any filtering routines since you don’t directly get access to all of the response content which is problematic especially if those filtering routines require you to look at the ENTIRE response in order to transform or capture the output as is needed for the solution the gentleman in my session asked for. So in order to address this a slightly different approach is required that basically captures all the Write() buffers passed into a cached stream and then making the stream available only when it’s complete and ready to be flushed. As I was thinking about the implementation I also started thinking about the few instances when I’ve used Response.Filter implementations. Each time I had to create a new Stream subclass and create my custom functionality but in the end each implementation did the same thing – capturing output and transforming it. I thought there should be an easier way to do this by creating a re-usable Stream class that can handle stream transformations that are common to Response.Filter implementations. Creating a semi-generic Response Filter Stream Class What I ended up with is a ResponseFilterStream class that provides a handful of Events that allow you to capture and/or transform Response content. The class implements a subclass of Stream and then overrides Write() and Flush() to handle capturing and transformation operations. By exposing events it’s easy to hook up capture or transformation operations via single focused methods. ResponseFilterStream exposes the following events: CaptureStream, CaptureString Captures the output only and provides either a MemoryStream or String with the final page output. Capture is hooked to the Flush() operation of the stream. TransformStream, TransformString Allows you to transform the complete response output with events that receive a MemoryStream or String respectively and can you modify the output then return it back as a return value. The transformed output is then written back out in a single chunk to the response output stream. These events capture all output internally first then write the entire buffer into the response. TransformWrite, TransformWriteString Allows you to transform the Response data as it is written in its original chunk size in the Stream’s Write() method. Unlike TransformStream/TransformString which operate on the complete output, these events only see the current chunk of data written. This is more efficient as there’s no caching involved, but can cause problems due to searched content splitting over multiple chunks. Using this implementation, creating a custom Response.Filter transformation becomes as simple as the following code. To hook up the Response.Filter using the MemoryStream version event: ResponseFilterStream filter = new ResponseFilterStream(Response.Filter); filter.TransformStream += filter_TransformStream; Response.Filter = filter; and the event handler to do the transformation: MemoryStream filter_TransformStream(MemoryStream ms) { Encoding encoding = HttpContext.Current.Response.ContentEncoding; string output = encoding.GetString(ms.ToArray()); output = FixPaths(output); ms = new MemoryStream(output.Length); byte[] buffer = encoding.GetBytes(output); ms.Write(buffer,0,buffer.Length); return ms; } private string FixPaths(string output) { string path = HttpContext.Current.Request.ApplicationPath; // override root path wonkiness if (path == "/") path = ""; output = output.Replace("\"~/", "\"" + path + "/").Replace("'~/", "'" + path + "/"); return output; } The idea of the event handler is that you can do whatever you want to the stream and return back a stream – either the same one that’s been modified or a brand new one – which is then sent back to as the final response. The above code can be simplified even more by using the string version events which handle the stream to string conversions for you: ResponseFilterStream filter = new ResponseFilterStream(Response.Filter); filter.TransformString += filter_TransformString; Response.Filter = filter; and the event handler to do the transformation calling the same FixPaths method shown above: string filter_TransformString(string output) { return FixPaths(output); } The events for capturing output and capturing and transforming chunks work in a very similar way. By using events to handle the transformations ResponseFilterStream becomes a reusable component and we don’t have to create a new stream class or subclass an existing Stream based classed. By the way, the example used here is kind of a cool trick which transforms “~/” expressions inside of the final generated HTML output – even in plain HTML controls not HTML controls – and transforms them into the appropriate application relative path in the same way that ResolveUrl would do. So you can write plain old HTML like this: <a href=”~/default.aspx”>Home</a> and have it turned into: <a href=”/myVirtual/default.aspx”>Home</a> without having to use an ASP.NET control like Hyperlink or Image or having to constantly use: <img src=”<%= ResolveUrl(“~/images/home.gif”) %>” /> in MVC applications (which frankly is one of the most annoying things about MVC especially given the path hell that extension-less and endpoint-less URLs impose). I can’t take credit for this idea. While discussing the Response.Filter issues on Twitter a hint from Dylan Beattie who pointed me at one of his examples which does something similar. I thought the idea was cool enough to use an example for future demos of Response.Filter functionality in ASP.NET next I time I do the Modules and Handlers talk (which was great fun BTW). How practical this is is debatable however since there’s definitely some overhead to using a Response.Filter in general and especially on one that caches the output and the re-writes it later. Make sure to test for performance anytime you use Response.Filter hookup and make sure it' doesn’t end up killing perf on you. You’ve been warned :-}. How does ResponseFilterStream work? The big win of this implementation IMHO is that it’s a reusable component – so for implementation there’s no new class, no subclassing – you simply attach to an event to implement an event handler method with a straight forward signature to retrieve the stream or string you’re interested in. The implementation is based on a subclass of Stream as is required in order to handle the Response.Filter requirements. What’s different than other implementations I’ve seen in various places is that it supports capturing output as a whole to allow retrieving the full response output for capture or modification. The exception are the TransformWrite and TransformWrite events which operate only active chunk of data written by the Response. For captured output, the Write() method captures output into an internal MemoryStream that is cached until writing is complete. So Write() is called when ASP.NET writes to the Response stream, but the filter doesn’t pass on the Write immediately to the filter’s internal stream. The data is cached and only when the Flush() method is called to finalize the Stream’s output do we actually send the cached stream off for transformation (if the events are hooked up) and THEN finally write out the returned content in one big chunk. Here’s the implementation of ResponseFilterStream: /// <summary> /// A semi-generic Stream implementation for Response.Filter with /// an event interface for handling Content transformations via /// Stream or String. /// <remarks> /// Use with care for large output as this implementation copies /// the output into a memory stream and so increases memory usage. /// </remarks> /// </summary> public class ResponseFilterStream : Stream { /// <summary> /// The original stream /// </summary> Stream _stream; /// <summary> /// Current position in the original stream /// </summary> long _position; /// <summary> /// Stream that original content is read into /// and then passed to TransformStream function /// </summary> MemoryStream _cacheStream = new MemoryStream(5000); /// <summary> /// Internal pointer that that keeps track of the size /// of the cacheStream /// </summary> int _cachePointer = 0; /// <summary> /// /// </summary> /// <param name="responseStream"></param> public ResponseFilterStream(Stream responseStream) { _stream = responseStream; } /// <summary> /// Determines whether the stream is captured /// </summary> private bool IsCaptured { get { if (CaptureStream != null || CaptureString != null || TransformStream != null || TransformString != null) return true; return false; } } /// <summary> /// Determines whether the Write method is outputting data immediately /// or delaying output until Flush() is fired. /// </summary> private bool IsOutputDelayed { get { if (TransformStream != null || TransformString != null) return true; return false; } } /// <summary> /// Event that captures Response output and makes it available /// as a MemoryStream instance. Output is captured but won't /// affect Response output. /// </summary> public event Action<MemoryStream> CaptureStream; /// <summary> /// Event that captures Response output and makes it available /// as a string. Output is captured but won't affect Response output. /// </summary> public event Action<string> CaptureString; /// <summary> /// Event that allows you transform the stream as each chunk of /// the output is written in the Write() operation of the stream. /// This means that that it's possible/likely that the input /// buffer will not contain the full response output but only /// one of potentially many chunks. /// /// This event is called as part of the filter stream's Write() /// operation. /// </summary> public event Func<byte[], byte[]> TransformWrite; /// <summary> /// Event that allows you to transform the response stream as /// each chunk of bytep[] output is written during the stream's write /// operation. This means it's possibly/likely that the string /// passed to the handler only contains a portion of the full /// output. Typical buffer chunks are around 16k a piece. /// /// This event is called as part of the stream's Write operation. /// </summary> public event Func<string, string> TransformWriteString; /// <summary> /// This event allows capturing and transformation of the entire /// output stream by caching all write operations and delaying final /// response output until Flush() is called on the stream. /// </summary> public event Func<MemoryStream, MemoryStream> TransformStream; /// <summary> /// Event that can be hooked up to handle Response.Filter /// Transformation. Passed a string that you can modify and /// return back as a return value. The modified content /// will become the final output. /// </summary> public event Func<string, string> TransformString; protected virtual void OnCaptureStream(MemoryStream ms) { if (CaptureStream != null) CaptureStream(ms); } private void OnCaptureStringInternal(MemoryStream ms) { if (CaptureString != null) { string content = HttpContext.Current.Response.ContentEncoding.GetString(ms.ToArray()); OnCaptureString(content); } } protected virtual void OnCaptureString(string output) { if (CaptureString != null) CaptureString(output); } protected virtual byte[] OnTransformWrite(byte[] buffer) { if (TransformWrite != null) return TransformWrite(buffer); return buffer; } private byte[] OnTransformWriteStringInternal(byte[] buffer) { Encoding encoding = HttpContext.Current.Response.ContentEncoding; string output = OnTransformWriteString(encoding.GetString(buffer)); return encoding.GetBytes(output); } private string OnTransformWriteString(string value) { if (TransformWriteString != null) return TransformWriteString(value); return value; } protected virtual MemoryStream OnTransformCompleteStream(MemoryStream ms) { if (TransformStream != null) return TransformStream(ms); return ms; } /// <summary> /// Allows transforming of strings /// /// Note this handler is internal and not meant to be overridden /// as the TransformString Event has to be hooked up in order /// for this handler to even fire to avoid the overhead of string /// conversion on every pass through. /// </summary> /// <param name="responseText"></param> /// <returns></returns> private string OnTransformCompleteString(string responseText) { if (TransformString != null) TransformString(responseText); return responseText; } /// <summary> /// Wrapper method form OnTransformString that handles /// stream to string and vice versa conversions /// </summary> /// <param name="ms"></param> /// <returns></returns> internal MemoryStream OnTransformCompleteStringInternal(MemoryStream ms) { if (TransformString == null) return ms; //string content = ms.GetAsString(); string content = HttpContext.Current.Response.ContentEncoding.GetString(ms.ToArray()); content = TransformString(content); byte[] buffer = HttpContext.Current.Response.ContentEncoding.GetBytes(content); ms = new MemoryStream(); ms.Write(buffer, 0, buffer.Length); //ms.WriteString(content); return ms; } /// <summary> /// /// </summary> public override bool CanRead { get { return true; } } public override bool CanSeek { get { return true; } } /// <summary> /// /// </summary> public override bool CanWrite { get { return true; } } /// <summary> /// /// </summary> public override long Length { get { return 0; } } /// <summary> /// /// </summary> public override long Position { get { return _position; } set { _position = value; } } /// <summary> /// /// </summary> /// <param name="offset"></param> /// <param name="direction"></param> /// <returns></returns> public override long Seek(long offset, System.IO.SeekOrigin direction) { return _stream.Seek(offset, direction); } /// <summary> /// /// </summary> /// <param name="length"></param> public override void SetLength(long length) { _stream.SetLength(length); } /// <summary> /// /// </summary> public override void Close() { _stream.Close(); } /// <summary> /// Override flush by writing out the cached stream data /// </summary> public override void Flush() { if (IsCaptured && _cacheStream.Length > 0) { // Check for transform implementations _cacheStream = OnTransformCompleteStream(_cacheStream); _cacheStream = OnTransformCompleteStringInternal(_cacheStream); OnCaptureStream(_cacheStream); OnCaptureStringInternal(_cacheStream); // write the stream back out if output was delayed if (IsOutputDelayed) _stream.Write(_cacheStream.ToArray(), 0, (int)_cacheStream.Length); // Clear the cache once we've written it out _cacheStream.SetLength(0); } // default flush behavior _stream.Flush(); } /// <summary> /// /// </summary> /// <param name="buffer"></param> /// <param name="offset"></param> /// <param name="count"></param> /// <returns></returns> public override int Read(byte[] buffer, int offset, int count) { return _stream.Read(buffer, offset, count); } /// <summary> /// Overriden to capture output written by ASP.NET and captured /// into a cached stream that is written out later when Flush() /// is called. /// </summary> /// <param name="buffer"></param> /// <param name="offset"></param> /// <param name="count"></param> public override void Write(byte[] buffer, int offset, int count) { if ( IsCaptured ) { // copy to holding buffer only - we'll write out later _cacheStream.Write(buffer, 0, count); _cachePointer += count; } // just transform this buffer if (TransformWrite != null) buffer = OnTransformWrite(buffer); if (TransformWriteString != null) buffer = OnTransformWriteStringInternal(buffer); if (!IsOutputDelayed) _stream.Write(buffer, offset, buffer.Length); } } The key features are the events and corresponding OnXXX methods that handle the event hookups, and the Write() and Flush() methods of the stream implementation. All the rest of the members tend to be plain jane passthrough stream implementation code without much consequence. I do love the way Action<t> and Func<T> make it so easy to create the event signatures for the various events – sweet. A few Things to consider Performance Response.Filter is not great for performance in general as it adds another layer of indirection to the ASP.NET output pipeline, and this implementation in particular adds a memory hit as it basically duplicates the response output into the cached memory stream which is necessary since you may have to look at the entire response. If you have large pages in particular this can cause potentially serious memory pressure in your server application. So be careful of wholesale adoption of this (or other) Response.Filters. Make sure to do some performance testing to ensure it’s not killing your app’s performance. Response.Filter works everywhere A few questions came up in comments and discussion as to capturing ALL output hitting the site and – yes you can definitely do that by assigning a Response.Filter inside of a module. If you do this however you’ll want to be very careful and decide which content you actually want to capture especially in IIS 7 which passes ALL content – including static images/CSS etc. through the ASP.NET pipeline. So it is important to filter only on what you’re looking for – like the page extension or maybe more effectively the Response.ContentType. Response.Filter Chaining Originally I thought that filter chaining doesn’t work at all due to a bug in the stream implementation code. But it’s quite possible to assign multiple filters to the Response.Filter property. So the following actually works to both compress the output and apply the transformed content: WebUtils.GZipEncodePage(); ResponseFilterStream filter = new ResponseFilterStream(Response.Filter); filter.TransformString += filter_TransformString; Response.Filter = filter; However the following does not work resulting in invalid content encoding errors: ResponseFilterStream filter = new ResponseFilterStream(Response.Filter); filter.TransformString += filter_TransformString; Response.Filter = filter; WebUtils.GZipEncodePage(); In other words multiple Response filters can work together but it depends entirely on the implementation whether they can be chained or in which order they can be chained. In this case running the GZip/Deflate stream filters apparently relies on the original content length of the output and chokes when the content is modified. But if attaching the compression first it works fine as unintuitive as that may seem. Resources Download example code Capture Output from ASP.NET Pages © Rick Strahl, West Wind Technologies, 2005-2010Posted in ASP.NET

Read the article

SQLAuthority News – TechEd India – April 12-14, 2010 Bangalore – An Unforgettable Experience – An Op

- by pinaldave

TechEd India was one of the largest Technology events in India led by Microsoft. This event was attended by more than 3,000 technology enthusiasts, making it one of the most well-organized events of the year. Though I attempted to attend almost all the technology events here, I have not seen any bigger or better event in Indian subcontinents other than this. There are 21 Technical Tracks at Tech·Ed India 2010 that span more than 745 learning opportunities. I was fortunate enough to be a part of this whole event as a speaker and a delegate, as well. TechEd India Speaker Badge and A Token of Lifetime Hotel Selection I presented three different sessions at TechEd India and was also a part of panel discussion. (The details of the sessions are given at the end of this blog post.) Due to extensive traveling, I stay away from my family occasionally. For this reason, I took my wife – Nupur and daughter Shaivi (8 months old) to the event along with me. We stayed at the same hotel where the event was organized so as to maximize my time bonding with my family and to have more time in networking with technology community, at the same time. The hotel Lalit Ashok is the largest and most luxurious venue one can find in Bangalore, located in the middle of the city. The cost of the hotel was a bit pricey, but looking at all the advantages, I had decided to ask for a booking there. Hotel Lalit Ashok Nupur Dave and Shaivi Dave Arrival Day – DAY 0 – April 11, 2010 I reached the event a day earlier, and that was one wise decision for I was able to relax a bit and go over my presentation for the next day’s course. I am a kind of person who likes to get everything ready ahead of time. I was also able to enjoy a pleasant evening with several Microsoft employees and my family friends. I even checked out the location where I would be doing presentations the next day. I was fortunate enough to meet Bijoy Singhal from Microsoft who helped me out with a few of the logistics issues that occured the day before. I was not aware of the fact that the very next day he was going to be “The Man” of the TechEd 2010 event. Vinod Kumar from Microsoft was really very kind as he talked to me regarding my subsequent session. He gave me some suggestions which were really helpful that I was able to incorporate them during my presentation. Finally, I was able to meet Abhishek Kant from Microsoft; his valuable suggestions and unlimited passion have inspired many people like me to work with the Community. Pradipta from Microsoft was also around, being extremely busy with logistics; however, in those busy times, he did find some good spare time to have a chat with me and the other Community leaders. I also met Harish Ranganathan and Sachin Rathi, both from Microsoft. It was so interesting to listen to both of them talking about SharePoint. I just have no words to express my overwhelmed spirit because of all these passionate young guys - Pradipta,Vinod, Bijoy, Harish, Sachin and Ahishek (of course!). Map of TechEd India 2010 Event Day 1 – April 12, 2010 From morning until night time, today was truly a very busy day for me. I had two presentations and one panel discussion for the day. Needless to say, I had a few meetings to attend as well. The day started with a keynote from S. Somaseger where he announced the launch of Visual Studio 2010. The keynote area was really eye-catching because of the very large, bigger-than- life uniform screen. This was truly one to show. The title music of the keynote was very interesting and it featured Bijoy Singhal as the model. It was interesting to talk to him afterwards, when we laughed at jokes together about his modeling assignment. TechEd India Keynote Opening Featuring Bijoy TechEd India 2010 Keynote – S. Somasegar Time: 11:15pm – 11:45pm Session 1: True Lies of SQL Server – SQL Myth Buster Following the excellent keynote, I had my very first session on the subject of SQL Server Myth Buster. At first, I was a bit nervous as right after the keynote, for this was my very first session and during my presentation I saw lots of Microsoft Product Team members. Well, it really went well and I had a really good discussion with attendees of the session. I felt that a well begin was half-done and my confidence was regained. Right after the session, I met a few of my Community friends and had meaningful discussions with them on many subjects. The abstract of the session is as follows: In this 30-minute demo session, I am going to briefly demonstrate few SQL Server Myths and their resolutions as I back them up with some demo. This demo presentation is a must-attend for all developers and administrators who would come to the event. This is going to be a very quick yet fun session. Pinal Presenting session at TechEd India 2010 Time: 1:00 PM – 2:00 PM Lunch with Somasegar After the session I went to see my daughter, and then I headed right away to the lunch with S. Somasegar – the keynote speaker and senior vice president of the Developer Division at Microsoft. I really thank to Abhishek who made it possible for us. Because of his efforts, all the MVPs had the opportunity to meet such a legendary person and had to talk with them on Microsoft Technology. Though Somasegar is currently holding such a high position in Microsoft, he is very polite and a real gentleman, and how I wish that everybody in industry is like him. Believe me, if you spread love and kindness, then that is what you will receive back. As soon as lunch time was over, I ran to the session hall as my second presentation was about to start. Time: 2:30pm – 3:30pm Session 2: Master Data Services in Microsoft SQL Server 2008 R2 Business Intelligence is a subject which was widely talked about at TechEd. Everybody was interested in this subject, and I did not excuse myself from this great concept as well. I consider myself fortunate as I was presenting on the subject of Master Data Services at TechEd. When I had initially learned this subject, I had a bit of confusion about the usage of this tool. Later on, I decided that I would tackle about how we all developers and DBAs are not able to understand something so simple such as this, and even worst, creating confusion about the technology. During system designing, it is very important to have a reference material or master lookup tables. Well, I talked about the same subject and presented the session keeping that as my center talk. The session went very well and I received lots of interesting questions. I got many compliments for talking about this subject on the real-life scenario. I really thank Rushabh Mehta (CEO, Solid Quality Mentors India) for his supportive suggestions that helped me prepare the slide deck, as well as the subject. Pinal Presenting session at TechEd India 2010 The abstract of the session is as follows: SQL Server Master Data Services will ship with SQL Server 2008 R2 and will improve Microsoft’s platform appeal. This session provides an in-depth demonstration of MDS features and highlights important usage scenarios. Master Data Services enables consistent decision-making process by allowing you to create, manage and propagate changes from a single master view of your business entities. Also, MDS – Master Data-hub which is a vital component, helps ensure the consistency of reporting across systems and deliver faster and more accurate results across the enterprise. We will talk about establishing the basis for a centralized approach to defining, deploying, and managing master data in the enterprise. Pinal Presenting session at TechEd India 2010 The day was still not over for me. I had ran into several friends but we were not able keep our enthusiasm under control about all the rumors saying that SQL Server 2008 R2 was about to be launched tomorrow in the keynote. I then ran to my third and final technical event for the day- a panel discussion with the top technologies of India. Time: 5:00pm – 6:00pm Panel Discussion: Harness the power of Web – SEO and Technical Blogging As I have delivered two technical sessions by this time, I was a bit tired but not less enthusiastic when I had to talk about Blog and Technology. We discussed many different topics there. I told them that the most important aspect for any blog is its content. We discussed in depth the issues with plagiarism and how to avoid it. Another topic of discussion was how we technology bloggers can create awareness in the Community about what the right kind of blogging is and what morally and technically wrong acts are. A couple of questions were raised about what type of liberty a person can have in terms of writing blogs. Well, it was generically agreed that a blog is mainly a representation of our ideas and thoughts; it should not be governed by external entities. As long as one is writing what they really want to say, but not providing incorrect information or not practicing plagiarism, a blogger should be allowed to express himself. This panel discussion was supposed to be over in an hour, but the interest of the participants was remarkable and so it was extended for 30 minutes more. Finally, we decided to bring to a close the discussion and agreed that we will continue the topic next year. TechEd India Panel Discussion on Web, Technology and SEO Surprisingly, the day was just beginning after doing all of these. By this time, I have almost met all the MVP who arrived at the event, as well as many Microsoft employees. There were lots of Community folks present, too. I decided that I would go to meet several friends from the Community and continue to communicate with me on SQLAuthority.com. I also met Abhishek Baxi and had a good talk with him regarding Win Mobile and Twitter. He also took a very quick video of me wherein I spoke in my mother’s tongue, Gujarati. It was funny that I talked in Gujarati almost all the day, but when I was talking in the interview I could not find the right Gujarati words to speak. I think we all think in English when we think about Technology, so as to address universality. After meeting them, I headed towards the Speakers’ Dinner. Time: 8:00 PM – onwards Speakers Dinner The Speakers’ dinner was indeed a wonderful opportunity for all the speakers to get together and relax. We talked so many different things, from XBOX to Hindi Movies, and from SQL to Samosas. I just could not express how much fun I had. After a long evening, when I returned tmy room and met Shaivi, I just felt instantly relaxed. Kids are really gifts from God. Today was a really long but exciting day. So many things happened in just one day: Visual Studio Lanch, lunch with Somasegar, 2 technical sessions, 1 panel discussion, community leaders meeting, speakers dinner and, last but not leas,t playing with my child! A perfect day! Day 2 – April 13, 2010 Today started with a bang with the excellent keynote by Kamal Hathi who launched SQL Server 2008 R2 in India and demonstrated the power of PowerPivot to all of us. 101 Million Rows in Excel brought lots of applause from the audience. Kamal Hathi Presenting Keynote at TechEd India 2010 The day was a bit easier one for me. I had no sessions today and no events planned. I had a few meetings planned for the second day of the event. I sat in the speaker’s lounge for half a day and met many people there. I attended nearly 9 different meetings today. The subjects of the meetings were very different. Here is a list of the topics of the Community-related meetings: SQL PASS and its involvement in India and subcontinents How to start community blogging Forums and developing aptitude towards technology Ahmedabad/Gandhinagar User Groups and their developments SharePoint and SQL Business Meeting – a client meeting Business Meeting – a potential performance tuning project Business Meeting – Solid Quality Mentors (SolidQ) And family friends Pinal Dave at TechEd India The day passed by so quickly during this meeting. In the evening, I headed to Partners Expo with friends and checked out few of the booths. I really wanted to talk about some of the products, but due to the freebies there was so much crowd that I finally decided to just take the contact details of the partner. I will now start sending them with my queries and, hopefully, I will have my questions answered. Nupur and Shaivi had also one meeting to attend; it was with our family friend Vijay Raj. Vijay is also a person who loves Technology and loves it more than anybody. I see him growing and learning every day, but still remaining as a ‘human’. I believe that if someone acquires as much knowledge as him, that person will become either a computer or cyborg. Here, Vijay is still a kind gentleman and is able to stay as our close family friend. Shaivi was really happy to play with Uncle Vijay. Pinal Dave and Vijay Raj Renuka Prasad, a Microsoft MVP, impressed me with his passion and knowledge of SQL. Every time he gives me credit for his success, I believe that he is very humble. He has way more certifications than me and has worked many more years with SQL compared to me. He is an excellent photographer as well. Most of the photos in this blog post have been taken by him. I told him if ever he wants to do a part time job, he can do the photography very well. Pinal Dave and Renuka Prasad I also met L Srividya from Microsoft, whom I was looking forward to meet. She is a bundle of knowledge that everyone would surely learn a lot from her. I was able to get a few minutes from her and well, I felt confident. She enlightened me with SQL Server BI concepts, domain management and SQL Server security and few other interesting details. I also had a wonderful time talking about SharePoint with fellow Solid Quality Mentor Joy Rathnayake. He is very passionate about SharePoint but when you talk .NET and SQL with him, he is still overwhelmingly knowledgeable. In fact, while talking to him, I figured out that the recent training he delivered was on SQL Server 2008 R2. I told him a joke that it hurts my ego as he is more popular now in SQL training and consulting than me. I am sure all of you agree that working with good people is a gift from God. I am fortunate enough to work with the best of the best Industry experts. It was a great pleasure to hang out with my Community friends – Ahswin Kini, HimaBindu Vejella, Vasudev G, Suprotim Agrawal, Dhananjay, Vikram Pendse, Mahesh Dhola, Mahesh Mitkari, Manu Zacharia, Shobhan, Hardik Shah, Ashish Mohta, Manan, Subodh Sohani and Sanjay Shetty (of course!) . (Please let me know if I have met you at the event and forgot your name to list here). Time: 8:00 PM – onwards Community Leaders Dinner After lots of meetings, I headed towards the Community Leaders dinner meeting and met almost all the folks I met in morning. The discussion was almost the same but the real good thing was that we were enjoying it. The food was really good. Nupur was invited in the event, but Shaivi could not come. When Nupur tried to enter the event, she was stopped as Shaivi did not have the pass to enter the dinner. Nupur expressed that Shaivi is only 8 months old and does not eat outside food as well and could not stay by herself at this age, but the door keeper did not agree and asked that without the entry details Shaivi could not go in, but Nupur could. Nupur called me on phone and asked me to help her out. By the time, I was outside; the organizer of the event reached to the door and happily approved Shaivi to join the party. Once in the party, Shaivi had lots of fun meeting so many people. Shaivi Dave and Abhishek Kant Dean Guida (Infragistics President and CEO) and Pinal Dave (SQLAuthority.com) Day 3 – April 14, 2010 Though, it was last day, I was very much excited today as I was about to present my very favorite session. Query Optimization and Performance Tuning is my domain expertise and I make my leaving by consulting and training the same. Today’s session was on the same subject and as an additional twist, another subject about Spatial Database was presented. I was always intrigued with Spatial Database and I have enjoyed learning about it; however, I have never thought about Spatial Indexing before it was decided that I will do this session. I really thank Solid Quality Mentor Dr. Greg Low for his assistance in helping me prepare the slide deck and also review the content. Furthermore, today was really what I call my ‘learning day’ . So far I had not attended any session in TechEd and I felt a bit down for that. Everybody spends their valuable time & money to learn something new and exciting in TechEd and I had not attended a single session at the moment thinking that it was already last day of the event. I did have a plan for the day and I attended two technical sessions before my session of spatial database. I attended 2 sessions of Vinod Kumar. Vinod is a natural storyteller and there was no doubt that his sessions would be jam-packed. People attended his sessions simply because Vinod is syhe speaker. He did not have a single time disappointed audience; he is truly a good speaker. He knows his stuff very well. I personally do not think that in India he can be compared to anyone for SQL. Time: 12:30pm-1:30pm SQL Server Query Optimization, Execution and Debugging Query Performance I really had a fun time attending this session. Vinod made this session very interactive. The entire audience really got into the presentation and started participating in the event. Vinod was presenting a small problem with Query Tuning, which any developer would have encountered and solved with their help in such a fashion that a developer feels he or she have already resolved it. In one question, I was the only one who was ready to answer and Vinod told me in a light tone that I am now allowed to answer it! The audience really found it very amusing. There was a huge crowd around Vinod after the session. Vinod – A master storyteller! Time: 3:45pm-4:45pm Data Recovery / consistency with CheckDB This session was much heavier than the earlier one, and I must say this is my most favorite session I EVER attended in India. In this TechEd I have only attended two sessions, but in my career, I have attended numerous technical sessions not only in India, but all over the world. This session had taken my breath away. One by one, Vinod took the different databases, and started to corrupt them in different ways. Each database has some unique ways to get corrupted. Once that was done, Vinod started to show the DBCC CEHCKDB and demonstrated how it can solve your problem. He finally fixed all the databases with this single tool. I do have a good knowledge of this subject, but let me honestly admit that I have learned a lot from this session. I enjoyed and cheered during this session along with other attendees. I had total satisfaction that, just like everyone, I took advantage of the event and learned something. I am now TECHnically EDucated. Pinal Dave and Vinod Kumar After two very interactive and informative SQL Sessions from Vinod Kumar, the next turn me presenting on Spatial Database and Indexing. I got once again nervous but Vinod told me to stay natural and do my presentation. Well, once I got a huge stage with a total of four projectors and a large crowd, I felt better. Time: 5:00pm-6:00pm Session 3: Developing with SQL Server Spatial and Deep Dive into Spatial Indexing Pinal Presenting session at TechEd India 2010 Pinal Presenting session at TechEd India 2010 I kicked off this session with Michael J Swart‘s beautiful spatial image. This session was the last one for the day but, to my surprise, I had more than 200+ attendees. Slowly, the rain was starting outside and I was worried that the hall would not be full; despite this, there was not a single seat available in the first five minutes of the session. Thanks to all of you for attending my presentation. I had demonstrated the map of world (and India) and quickly explained what Geographic and Geometry data types in Spatial Database are. This session had interesting story of Indexing and Comparison, as well as how different traditional indexes are from spatial indexing. Pinal Presenting session at TechEd India 2010 Due to the heavy rain during this event, the power went off for about 22 minutes (just an accident – nobodies fault). During these minutes, there were no audio, no video and no light. I continued to address the mass of 200+ people without any audio device and PowerPoint. I must thank the audience because not a single person left from the session. They all stayed in their place, some moved closure to listen to me properly. I noticed that the curiosity and eagerness to learn new things was at the peak even though it was the very last session of the TechEd. Everybody wanted get the maximum knowledge out of this whole event. I was touched by the support from audience. They listened and participated in my session even without any kinds of technology (no ppt, no mike, no AC, nothing). During these 22 minutes, I had completed my theory verbally. Pinal Presenting session at TechEd India 2010 After a while, we got the projector back online and we continued with some exciting demos. Many thanks to Microsoft people who worked energetically in background to get the backup power for project up. I had a very interesting demo wherein I overlaid Bangalore and Hyderabad on the India Map and find their aerial distance between them. After finding the aerial distance, we browsed online and found that SQL Server estimates the exact aerial distance between these two cities, as compared to the factual distance. There was a huge applause from the crowd on the subject that SQL Server takes into the count of the curvature of the earth and finds the precise distances based on details. During the process of finding the distance, I demonstrated a few examples of the indexes where I expressed how one can use those indexes to find these distances and how they can improve the performance of similar query. I also demonstrated few examples wherein we were able to see in which data type the Index is most useful. We finished the demos with a few more internal stuff. Pinal Presenting session at TechEd India 2010 Despite all issues, I was mostly satisfied with my presentation. I think it was the best session I have ever presented at any conference. There was no help from Technology for a while, but I still got lots of appreciation at the end. When we ended the session, the applause from the audience was so loud that for a moment, the rain was not audible. I was truly moved by the dedication of the Technology enthusiasts. Pinal Dave After Presenting session at TechEd India 2010 The abstract of the session is as follows: The Microsoft SQL Server 2008 delivers new spatial data types that enable you to consume, use, and extend location-based data through spatial-enabled applications. Attend this session to learn how to use spatial functionality in next version of SQL Server to build and optimize spatial queries. This session outlines the new geography data type to store geodetic spatial data and perform operations on it, use the new geometry data type to store planar spatial data and perform operations on it, take advantage of new spatial indexes for high performance queries, use the new spatial results tab to quickly and easily view spatial query results directly from within Management Studio, extend spatial data capabilities by building or integrating location-enabled applications through support for spatial standards and specifications and much more. Time: 8:00 PM – onwards Dinner by Sponsors After the lively session during the day, there was another dinner party courtesy of one of the sponsors of TechEd. All the MVPs and several Community leaders were present at the dinner. I would like to express my gratitude to Abhishek Kant for organizing this wonderful event for us. It was a blast and really relaxing in all angles. We all stayed there for a long time and talked about our sweet and unforgettable memories of the event. Pinal Dave and Bijoy Singhal It was really one wonderful event. After writing this much, I say that I have no words to express about how much I enjoyed TechEd. However, it is true that I shared with you only 1% of the total activities I have done at the event. There were so many people I have met, yet were not mentioned here although I wanted to write their names here, too . Anyway, I have learned so many things and up until now, I am not able to get over all the fun I had in this event. Pinal Dave at TechEd India 2010 The Next Days – April 15, 2010 – till today I am still not able to get my mind out of the whole experience I had at TechEd India 2010. It was like a whole Microsoft Family working together to celebrate a happy occasion. TechEd India – Truly An Unforgettable Experience! Reference : Pinal Dave (http://blog.SQLAuthority.com) Filed under: About Me, MVP, Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Author Visit, SQLAuthority News, SQLServer, T SQL, Technology Tagged: TechEd, TechEdIn

Read the article

Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5

- by Shaun

Originally posted on: http://geekswithblogs.net/shaunxu/archive/2013/07/01/upload-file-to-windows-azure-blob-in-chunks-through-asp.net.aspxMany people are using Windows Azure Blob Storage to store their data in the cloud. Blob storage provides 99.9% availability with easy-to-use API through .NET SDK and HTTP REST. For example, we can store JavaScript files, images, documents in blob storage when we are building an ASP.NET web application on a Web Role in Windows Azure. Or we can store our VHD files in blob and mount it as a hard drive in our cloud service. If you are familiar with Windows Azure, you should know that there are two kinds of blob: page blob and block blob. The page blob is optimized for random read and write, which is very useful when you need to store VHD files. The block blob is optimized for sequential/chunk read and write, which has more common usage. Since we can upload block blob in blocks through BlockBlob.PutBlock, and them commit them as a whole blob with invoking the BlockBlob.PutBlockList, it is very powerful to upload large files, as we can upload blocks in parallel, and provide pause-resume feature. There are many documents, articles and blog posts described on how to upload a block blob. Most of them are focus on the server side, which means when you had received a big file, stream or binaries, how to upload them into blob storage in blocks through .NET SDK. But the problem is, how can we upload these large files from client side, for example, a browser. This questioned to me when I was working with a Chinese customer to help them build a network disk production on top of azure. The end users upload their files from the web portal, and then the files will be stored in blob storage from the Web Role. My goal is to find the best way to transform the file from client (end user’s machine) to the server (Web Role) through browser. In this post I will demonstrate and describe what I had done, to upload large file in chunks with high speed, and save them as blocks into Windows Azure Blob Storage. Traditional Upload, Works with Limitation The simplest way to implement this requirement is to create a web page with a form that contains a file input element and a submit button. 1: @using (Html.BeginForm("About", "Index", FormMethod.Post, new { enctype = "multipart/form-data" })) 2: { 3: <input type="file" name="file" /> 4: <input type="submit" value="upload" /> 5: } And then in the backend controller, we retrieve the whole content of this file and upload it in to the blob storage through .NET SDK. We can split the file in blocks and upload them in parallel and commit. The code had been well blogged in the community. 1: [HttpPost] 2: public ActionResult About(HttpPostedFileBase file) 3: { 4: var container = _client.GetContainerReference("test"); 5: container.CreateIfNotExists(); 6: var blob = container.GetBlockBlobReference(file.FileName); 7: var blockDataList = new Dictionary<string, byte[]>(); 8: using (var stream = file.InputStream) 9: { 10: var blockSizeInKB = 1024; 11: var offset = 0; 12: var index = 0; 13: while (offset < stream.Length) 14: { 15: var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset); 16: var blockData = new byte[readLength]; 17: offset += stream.Read(blockData, 0, readLength); 18: blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData); 19: 20: index++; 21: } 22: } 23: 24: Parallel.ForEach(blockDataList, (bi) => 25: { 26: blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null); 27: }); 28: blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray()); 29: 30: return RedirectToAction("About"); 31: } This works perfect if we selected an image, a music or a small video to upload. But if I selected a large file, let’s say a 6GB HD-movie, after upload for about few minutes the page will be shown as below and the upload will be terminated. In ASP.NET there is a limitation of request length and the maximized request length is defined in the web.config file. It’s a number which less than about 4GB. So if we want to upload a really big file, we cannot simply implement in this way. Also, in Windows Azure, a cloud service network load balancer will terminate the connection if exceed the timeout period. From my test the timeout looks like 2 - 3 minutes. Hence, when we need to upload a large file we cannot just use the basic HTML elements. Besides the limitation mentioned above, the simple HTML file upload cannot provide rich upload experience such as chunk upload, pause and pause-resume. So we need to find a better way to upload large file from the client to the server. Upload in Chunks through HTML5 and JavaScript In order to break those limitation mentioned above we will try to upload the large file in chunks. This takes some benefit to us such as - No request size limitation: Since we upload in chunks, we can define the request size for each chunks regardless how big the entire file is. - No timeout problem: The size of chunks are controlled by us, which means we should be able to make sure request for each chunk upload will not exceed the timeout period of both ASP.NET and Windows Azure load balancer. It was a big challenge to upload big file in chunks until we have HTML5. There are some new features and improvements introduced in HTML5 and we will use them to implement our solution. In HTML5, the File interface had been improved with a new method called “slice”. It can be used to read part of the file by specifying the start byte index and the end byte index. For example if the entire file was 1024 bytes, file.slice(512, 768) will read the part of this file from the 512nd byte to 768th byte, and return a new object of interface called "Blob”, which you can treat as an array of bytes. In fact, a Blob object represents a file-like object of immutable, raw data. The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system. For more information about the Blob please refer here. File and Blob is very useful to implement the chunk upload. We will use File interface to represent the file the user selected from the browser and then use File.slice to read the file in chunks in the size we wanted. For example, if we wanted to upload a 10MB file with 512KB chunks, then we can read it in 512KB blobs by using File.slice in a loop. Assuming we have a web page as below. User can select a file, an input box to specify the block size in KB and a button to start upload. 1: <div> 2: <input type="file" id="upload_files" name="files[]" /><br /> 3: Block Size: <input type="number" id="block_size" value="512" name="block_size" />KB<br /> 4: <input type="button" id="upload_button_blob" name="upload" value="upload (blob)" /> 5: </div> Then we can have the JavaScript function to upload the file in chunks when user clicked the button. 1: <script type="text/javascript"> 1: 2: $(function () { 3: $("#upload_button_blob").click(function () { 4: }); 5: });</script> Firstly we need to ensure the client browser supports the interfaces we are going to use. Just try to invoke the File, Blob and FormData from the “window” object. If any of them is “undefined” the condition result will be “false” which means your browser doesn’t support these premium feature and it’s time for you to get your browser updated. FormData is another new feature we are going to use in the future. It could generate a temporary form for us. We will use this interface to create a form with chunk and associated metadata when invoked the service through ajax. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: if (window.File && window.Blob && window.FormData) { 4: alert("Your brwoser is awesome, let's rock!"); 5: } 6: else { 7: alert("Oh man plz update to a modern browser before try is cool stuff out."); 8: return; 9: } 10: }); Each browser supports these interfaces by their own implementation and currently the Blob, File and File.slice are supported by Chrome 21, FireFox 13, IE 10, Opera 12 and Safari 5.1 or higher. After that we worked on the files the user selected one by one since in HTML5, user can select multiple files in one file input box. 1: var files = $("#upload_files")[0].files; 2: for (var i = 0; i < files.length; i++) { 3: var file = files[i]; 4: var fileSize = file.size; 5: var fileName = file.name; 6: } Next, we calculated the start index and end index for each chunks based on the size the user specified from the browser. We put them into an array with the file name and the index, which will be used when we upload chunks into Windows Azure Blob Storage as blocks since we need to specify the target blob name and the block index. At the same time we will store the list of all indexes into another variant which will be used to commit blocks into blob in Azure Storage once all chunks had been uploaded successfully. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: var blockSizeInKB = $("#block_size").val(); 14: var blockSize = blockSizeInKB * 1024; 15: var blocks = []; 16: var offset = 0; 17: var index = 0; 18: var list = ""; 19: while (offset < fileSize) { 20: var start = offset; 21: var end = Math.min(offset + blockSize, fileSize); 22: 23: blocks.push({ 24: name: fileName, 25: index: index, 26: start: start, 27: end: end 28: }); 29: list += index + ","; 30: 31: offset = end; 32: index++; 33: } 34: } 35: }); Now we have all chunks’ information ready. The next step should be upload them one by one to the server side, and at the server side when received a chunk it will upload as a block into Blob Storage, and finally commit them with the index list through BlockBlobClient.PutBlockList. But since all these invokes are ajax calling, which means not synchronized call. So we need to introduce a new JavaScript library to help us coordinate the asynchronize operation, which named “async.js”. You can download this JavaScript library here, and you can find the document here. I will not explain this library too much in this post. We will put all procedures we want to execute as a function array, and pass into the proper function defined in async.js to let it help us to control the execution sequence, in series or in parallel. Hence we will define an array and put the function for chunk upload into this array. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: 5: // start to upload each files in chunks 6: var files = $("#upload_files")[0].files; 7: for (var i = 0; i < files.length; i++) { 8: var file = files[i]; 9: var fileSize = file.size; 10: var fileName = file.name; 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: ... ... 14: 15: // define the function array and push all chunk upload operation into this array 16: blocks.forEach(function (block) { 17: putBlocks.push(function (callback) { 18: }); 19: }); 20: } 21: }); 22: }); As you can see, I used File.slice method to read each chunks based on the start and end byte index we calculated previously, and constructed a temporary HTML form with the file name, chunk index and chunk data through another new feature in HTML5 named FormData. Then post this form to the backend server through jQuery.ajax. This is the key part of our solution. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: blocks.forEach(function (block) { 15: putBlocks.push(function (callback) { 16: // load blob based on the start and end index for each chunks 17: var blob = file.slice(block.start, block.end); 18: // put the file name, index and blob into a temporary from 19: var fd = new FormData(); 20: fd.append("name", block.name); 21: fd.append("index", block.index); 22: fd.append("file", blob); 23: // post the form to backend service (asp.net mvc controller action) 24: $.ajax({ 25: url: "/Home/UploadInFormData", 26: data: fd, 27: processData: false, 28: contentType: "multipart/form-data", 29: type: "POST", 30: success: function (result) { 31: if (!result.success) { 32: alert(result.error); 33: } 34: callback(null, block.index); 35: } 36: }); 37: }); 38: }); 39: } 40: }); Then we will invoke these functions one by one by using the async.js. And once all functions had been executed successfully I invoked another ajax call to the backend service to commit all these chunks (blocks) as the blob in Windows Azure Storage. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.series(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); That’s all in the client side. The outline of our logic would be - Calculate the start and end byte index for each chunks based on the block size. - Defined the functions of reading the chunk form file and upload the content to the backend service through ajax. - Execute the functions defined in previous step with “async.js”. - Commit the chunks by invoking the backend service in Windows Azure Storage finally. Save Chunks as Blocks into Blob Storage In above we finished the client size JavaScript code. It uploaded the file in chunks to the backend service which we are going to implement in this step. We will use ASP.NET MVC as our backend service, and it will receive the chunks, upload into Windows Azure Bob Storage in blocks, then finally commit as one blob. As in the client side we uploaded chunks by invoking the ajax call to the URL "/Home/UploadInFormData", I created a new action under the Index controller and it only accepts HTTP POST request. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: } 8: catch (Exception e) 9: { 10: error = e.ToString(); 11: } 12: 13: return new JsonResult() 14: { 15: Data = new 16: { 17: success = string.IsNullOrWhiteSpace(error), 18: error = error 19: } 20: }; 21: } Then I retrieved the file name, index and the chunk content from the Request.Form object, which was passed from our client side. And then, used the Windows Azure SDK to create a blob container (in this case we will use the container named “test”.) and create a blob reference with the blob name (same as the file name). Then uploaded the chunk as a block of this blob with the index, since in Blob Storage each block must have an index (ID) associated with so that finally we can put all blocks as one blob by specifying their block ID list. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var index = int.Parse(Request.Form["index"]); 9: var file = Request.Files[0]; 10: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 11: 12: var container = _client.GetContainerReference("test"); 13: container.CreateIfNotExists(); 14: var blob = container.GetBlockBlobReference(name); 15: blob.PutBlock(id, file.InputStream, null); 16: } 17: catch (Exception e) 18: { 19: error = e.ToString(); 20: } 21: 22: return new JsonResult() 23: { 24: Data = new 25: { 26: success = string.IsNullOrWhiteSpace(error), 27: error = error 28: } 29: }; 30: } Next, I created another action to commit the blocks into blob once all chunks had been uploaded. Similarly, I retrieved the blob name from the Request.Form. I also retrieved the chunks ID list, which is the block ID list from the Request.Form in a string format, split them as a list, then invoked the BlockBlob.PutBlockList method. After that our blob will be shown in the container and ready to be download. 1: [HttpPost] 2: public JsonResult Commit() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var list = Request.Form["list"]; 9: var ids = list 10: .Split(',') 11: .Where(id => !string.IsNullOrWhiteSpace(id)) 12: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 13: .ToArray(); 14: 15: var container = _client.GetContainerReference("test"); 16: container.CreateIfNotExists(); 17: var blob = container.GetBlockBlobReference(name); 18: blob.PutBlockList(ids); 19: } 20: catch (Exception e) 21: { 22: error = e.ToString(); 23: } 24: 25: return new JsonResult() 26: { 27: Data = new 28: { 29: success = string.IsNullOrWhiteSpace(error), 30: error = error 31: } 32: }; 33: } Now we finished all code we need. The whole process of uploading would be like this below. Below is the full client side JavaScript code. 1: <script type="text/javascript" src="~/Scripts/async.js"></script> 2: <script type="text/javascript"> 3: $(function () { 4: $("#upload_button_blob").click(function () { 5: // assert the browser support html5 6: if (window.File && window.Blob && window.FormData) { 7: alert("Your brwoser is awesome, let's rock!"); 8: } 9: else { 10: alert("Oh man plz update to a modern browser before try is cool stuff out."); 11: return; 12: } 13: 14: // start to upload each files in chunks 15: var files = $("#upload_files")[0].files; 16: for (var i = 0; i < files.length; i++) { 17: var file = files[i]; 18: var fileSize = file.size; 19: var fileName = file.name; 20: 21: // calculate the start and end byte index for each blocks(chunks) 22: // with the index, file name and index list for future using 23: var blockSizeInKB = $("#block_size").val(); 24: var blockSize = blockSizeInKB * 1024; 25: var blocks = []; 26: var offset = 0; 27: var index = 0; 28: var list = ""; 29: while (offset < fileSize) { 30: var start = offset; 31: var end = Math.min(offset + blockSize, fileSize); 32: 33: blocks.push({ 34: name: fileName, 35: index: index, 36: start: start, 37: end: end 38: }); 39: list += index + ","; 40: 41: offset = end; 42: index++; 43: } 44: 45: // define the function array and push all chunk upload operation into this array 46: var putBlocks = []; 47: blocks.forEach(function (block) { 48: putBlocks.push(function (callback) { 49: // load blob based on the start and end index for each chunks 50: var blob = file.slice(block.start, block.end); 51: // put the file name, index and blob into a temporary from 52: var fd = new FormData(); 53: fd.append("name", block.name); 54: fd.append("index", block.index); 55: fd.append("file", blob); 56: // post the form to backend service (asp.net mvc controller action) 57: $.ajax({ 58: url: "/Home/UploadInFormData", 59: data: fd, 60: processData: false, 61: contentType: "multipart/form-data", 62: type: "POST", 63: success: function (result) { 64: if (!result.success) { 65: alert(result.error); 66: } 67: callback(null, block.index); 68: } 69: }); 70: }); 71: }); 72: 73: // invoke the functions one by one 74: // then invoke the commit ajax call to put blocks into blob in azure storage 75: async.series(putBlocks, function (error, result) { 76: var data = { 77: name: fileName, 78: list: list 79: }; 80: $.post("/Home/Commit", data, function (result) { 81: if (!result.success) { 82: alert(result.error); 83: } 84: else { 85: alert("done!"); 86: } 87: }); 88: }); 89: } 90: }); 91: }); 92: </script> And below is the full ASP.NET MVC controller code. 1: public class HomeController : Controller 2: { 3: private CloudStorageAccount _account; 4: private CloudBlobClient _client; 5: 6: public HomeController() 7: : base() 8: { 9: _account = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("DataConnectionString")); 10: _client = _account.CreateCloudBlobClient(); 11: } 12: 13: public ActionResult Index() 14: { 15: ViewBag.Message = "Modify this template to jump-start your ASP.NET MVC application."; 16: 17: return View(); 18: } 19: 20: [HttpPost] 21: public JsonResult UploadInFormData() 22: { 23: var error = string.Empty; 24: try 25: { 26: var name = Request.Form["name"]; 27: var index = int.Parse(Request.Form["index"]); 28: var file = Request.Files[0]; 29: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 30: 31: var container = _client.GetContainerReference("test"); 32: container.CreateIfNotExists(); 33: var blob = container.GetBlockBlobReference(name); 34: blob.PutBlock(id, file.InputStream, null); 35: } 36: catch (Exception e) 37: { 38: error = e.ToString(); 39: } 40: 41: return new JsonResult() 42: { 43: Data = new 44: { 45: success = string.IsNullOrWhiteSpace(error), 46: error = error 47: } 48: }; 49: } 50: 51: [HttpPost] 52: public JsonResult Commit() 53: { 54: var error = string.Empty; 55: try 56: { 57: var name = Request.Form["name"]; 58: var list = Request.Form["list"]; 59: var ids = list 60: .Split(',') 61: .Where(id => !string.IsNullOrWhiteSpace(id)) 62: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 63: .ToArray(); 64: 65: var container = _client.GetContainerReference("test"); 66: container.CreateIfNotExists(); 67: var blob = container.GetBlockBlobReference(name); 68: blob.PutBlockList(ids); 69: } 70: catch (Exception e) 71: { 72: error = e.ToString(); 73: } 74: 75: return new JsonResult() 76: { 77: Data = new 78: { 79: success = string.IsNullOrWhiteSpace(error), 80: error = error 81: } 82: }; 83: } 84: } And if we selected a file from the browser we will see our application will upload chunks in the size we specified to the server through ajax call in background, and then commit all chunks in one blob. Then we can find the blob in our Windows Azure Blob Storage. Optimized by Parallel Upload In previous example we just uploaded our file in chunks. This solved the problem that ASP.NET MVC request content size limitation as well as the Windows Azure load balancer timeout. But it might introduce the performance problem since we uploaded chunks in sequence. In order to improve the upload performance we could modify our client side code a bit to make the upload operation invoked in parallel. The good news is that, “async.js” library provides the parallel execution function. If you remembered the code we invoke the service to upload chunks, it utilized “async.series” which means all functions will be executed in sequence. Now we will change this code to “async.parallel”. This will invoke all functions in parallel. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallel(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); In this way all chunks will be uploaded to the server side at the same time to maximize the bandwidth usage. This should work if the file was not very large and the chunk size was not very small. But for large file this might introduce another problem that too many ajax calls are sent to the server at the same time. So the best solution should be, upload the chunks in parallel with maximum concurrency limitation. The code below specified the concurrency limitation to 4, which means at the most only 4 ajax calls could be invoked at the same time. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallelLimit(putBlocks, 4, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); Summary In this post we discussed how to upload files in chunks to the backend service and then upload them into Windows Azure Blob Storage in blocks. We focused on the frontend side and leverage three new feature introduced in HTML 5 which are - File.slice: Read part of the file by specifying the start and end byte index. - Blob: File-like interface which contains the part of the file content. - FormData: Temporary form element that we can pass the chunk alone with some metadata to the backend service. Then we discussed the performance consideration of chunk uploading. Sequence upload cannot provide maximized upload speed, but the unlimited parallel upload might crash the browser and server if too many chunks. So we finally came up with the solution to upload chunks in parallel with the concurrency limitation. We also demonstrated how to utilize “async.js” JavaScript library to help us control the asynchronize call and the parallel limitation. Regarding the chunk size and the parallel limitation value there is no “best” value. You need to test vary composition and find out the best one for your particular scenario. It depends on the local bandwidth, client machine cores and the server side (Windows Azure Cloud Service Virtual Machine) cores, memory and bandwidth. Below is one of my performance test result. The client machine was Windows 8 IE 10 with 4 cores. I was using Microsoft Cooperation Network. The web site was hosted on Windows Azure China North data center (in Beijing) with one small web role (1.7GB 1 core CPU, 1.75GB memory with 100Mbps bandwidth). The test cases were - Chunk size: 512KB, 1MB, 2MB, 4MB. - Upload Mode: Sequence, parallel (unlimited), parallel with limit (4 threads, 8 threads). - Chunk Format: base64 string, binaries. - Target file: 100MB. - Each case was tested 3 times. Below is the test result chart. Some thoughts, but not guidance or best practice: - Parallel gets better performance than series. - No significant performance improvement between parallel 4 threads and 8 threads. - Transform with binaries provides better performance than base64. - In all cases, chunk size in 1MB - 2MB gets better performance. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article

New features of C# 4.0

This article covers New features of C# 4.0. Article has been divided into below sections. Introduction. Dynamic Lookup. Named and Optional Arguments. Features for COM interop. Variance. Relationship with Visual Basic. Resources. Other interested readings… 22 New Features of Visual Studio 2008 for .NET Professionals 50 New Features of SQL Server 2008 IIS 7.0 New features Introduction It is now close to a year since Microsoft Visual C# 3.0 shipped as part of Visual Studio 2008. In the VS Managed Languages team we are hard at work on creating the next version of the language (with the unsurprising working title of C# 4.0), and this document is a first public description of the planned language features as we currently see them. Please be advised that all this is in early stages of production and is subject to change. Part of the reason for sharing our plans in public so early is precisely to get the kind of feedback that will cause us to improve the final product before it rolls out. Simultaneously with the publication of this whitepaper, a first public CTP (community technology preview) of Visual Studio 2010 is going out as a Virtual PC image for everyone to try. Please use it to play and experiment with the features, and let us know of any thoughts you have. We ask for your understanding and patience working with very early bits, where especially new or newly implemented features do not have the quality or stability of a final product. The aim of the CTP is not to give you a productive work environment but to give you the best possible impression of what we are working on for the next release. The CTP contains a number of walkthroughs, some of which highlight the new language features of C# 4.0. Those are excellent for getting a hands-on guided tour through the details of some common scenarios for the features. You may consider this whitepaper a companion document to these walkthroughs, complementing them with a focus on the overall language features and how they work, as opposed to the specifics of the concrete scenarios. C# 4.0 The major theme for C# 4.0 is dynamic programming. Increasingly, objects are “dynamic” in the sense that their structure and behavior is not captured by a static type, or at least not one that the compiler knows about when compiling your program. Some examples include a. objects from dynamic programming languages, such as Python or Ruby b. COM objects accessed through IDispatch c. ordinary .NET types accessed through reflection d. objects with changing structure, such as HTML DOM objects While C# remains a statically typed language, we aim to vastly improve the interaction with such objects. A secondary theme is co-evolution with Visual Basic. Going forward we will aim to maintain the individual character of each language, but at the same time important new features should be introduced in both languages at the same time. They should be differentiated more by style and feel than by feature set. The new features in C# 4.0 fall into four groups: Dynamic lookup Dynamic lookup allows you to write method, operator and indexer calls, property and field accesses, and even object invocations which bypass the C# static type checking and instead gets resolved at runtime. Named and optional parameters Parameters in C# can now be specified as optional by providing a default value for them in a member declaration. When the member is invoked, optional arguments can be omitted. Furthermore, any argument can be passed by parameter name instead of position. COM specific interop features Dynamic lookup as well as named and optional parameters both help making programming against COM less painful than today. On top of that, however, we are adding a number of other small features that further improve the interop experience. Variance It used to be that an IEnumerable<string> wasn’t an IEnumerable<object>. Now it is – C# embraces type safe “co-and contravariance” and common BCL types are updated to take advantage of that. Dynamic Lookup Dynamic lookup allows you a unified approach to invoking things dynamically. With dynamic lookup, when you have an object in your hand you do not need to worry about whether it comes from COM, IronPython, the HTML DOM or reflection; you just apply operations to it and leave it to the runtime to figure out what exactly those operations mean for that particular object. This affords you enormous flexibility, and can greatly simplify your code, but it does come with a significant drawback: Static typing is not maintained for these operations. A dynamic object is assumed at compile time to support any operation, and only at runtime will you get an error if it wasn’t so. Oftentimes this will be no loss, because the object wouldn’t have a static type anyway, in other cases it is a tradeoff between brevity and safety. In order to facilitate this tradeoff, it is a design goal of C# to allow you to opt in or opt out of dynamic behavior on every single call. The dynamic type C# 4.0 introduces a new static type called dynamic. When you have an object of type dynamic you can “do things to it” that are resolved only at runtime: dynamic d = GetDynamicObject(…); d.M(7); The C# compiler allows you to call a method with any name and any arguments on d because it is of type dynamic. At runtime the actual object that d refers to will be examined to determine what it means to “call M with an int” on it. The type dynamic can be thought of as a special version of the type object, which signals that the object can be used dynamically. It is easy to opt in or out of dynamic behavior: any object can be implicitly converted to dynamic, “suspending belief” until runtime. Conversely, there is an “assignment conversion” from dynamic to any other type, which allows implicit conversion in assignment-like constructs: dynamic d = 7; // implicit conversion int i = d; // assignment conversion Dynamic operations Not only method calls, but also field and property accesses, indexer and operator calls and even delegate invocations can be dispatched dynamically: dynamic d = GetDynamicObject(…); d.M(7); // calling methods d.f = d.P; // getting and settings fields and properties d[“one”] = d[“two”]; // getting and setting thorugh indexers int i = d + 3; // calling operators string s = d(5,7); // invoking as a delegate The role of the C# compiler here is simply to package up the necessary information about “what is being done to d”, so that the runtime can pick it up and determine what the exact meaning of it is given an actual object d. Think of it as deferring part of the compiler’s job to runtime. The result of any dynamic operation is itself of type dynamic. Runtime lookup At runtime a dynamic operation is dispatched according to the nature of its target object d: COM objects If d is a COM object, the operation is dispatched dynamically through COM IDispatch. This allows calling to COM types that don’t have a Primary Interop Assembly (PIA), and relying on COM features that don’t have a counterpart in C#, such as indexed properties and default properties. Dynamic objects If d implements the interface IDynamicObject d itself is asked to perform the operation. Thus by implementing IDynamicObject a type can completely redefine the meaning of dynamic operations. This is used intensively by dynamic languages such as IronPython and IronRuby to implement their own dynamic object models. It will also be used by APIs, e.g. by the HTML DOM to allow direct access to the object’s properties using property syntax. Plain objects Otherwise d is a standard .NET object, and the operation will be dispatched using reflection on its type and a C# “runtime binder” which implements C#’s lookup and overload resolution semantics at runtime. This is essentially a part of the C# compiler running as a runtime component to “finish the work” on dynamic operations that was deferred by the static compiler. Example Assume the following code: dynamic d1 = new Foo(); dynamic d2 = new Bar(); string s; d1.M(s, d2, 3, null); Because the receiver of the call to M is dynamic, the C# compiler does not try to resolve the meaning of the call. Instead it stashes away information for the runtime about the call. This information (often referred to as the “payload”) is essentially equivalent to: “Perform an instance method call of M with the following arguments: 1. a string 2. a dynamic 3. a literal int 3 4. a literal object null” At runtime, assume that the actual type Foo of d1 is not a COM type and does not implement IDynamicObject. In this case the C# runtime binder picks up to finish the overload resolution job based on runtime type information, proceeding as follows: 1. Reflection is used to obtain the actual runtime types of the two objects, d1 and d2, that did not have a static type (or rather had the static type dynamic). The result is Foo for d1 and Bar for d2. 2. Method lookup and overload resolution is performed on the type Foo with the call M(string,Bar,3,null) using ordinary C# semantics. 3. If the method is found it is invoked; otherwise a runtime exception is thrown. Overload resolution with dynamic arguments Even if the receiver of a method call is of a static type, overload resolution can still happen at runtime. This can happen if one or more of the arguments have the type dynamic: Foo foo = new Foo(); dynamic d = new Bar(); var result = foo.M(d); The C# runtime binder will choose between the statically known overloads of M on Foo, based on the runtime type of d, namely Bar. The result is again of type dynamic. The Dynamic Language Runtime An important component in the underlying implementation of dynamic lookup is the Dynamic Language Runtime (DLR), which is a new API in .NET 4.0. The DLR provides most of the infrastructure behind not only C# dynamic lookup but also the implementation of several dynamic programming languages on .NET, such as IronPython and IronRuby. Through this common infrastructure a high degree of interoperability is ensured, but just as importantly the DLR provides excellent caching mechanisms which serve to greatly enhance the efficiency of runtime dispatch. To the user of dynamic lookup in C#, the DLR is invisible except for the improved efficiency. However, if you want to implement your own dynamically dispatched objects, the IDynamicObject interface allows you to interoperate with the DLR and plug in your own behavior. This is a rather advanced task, which requires you to understand a good deal more about the inner workings of the DLR. For API writers, however, it can definitely be worth the trouble in order to vastly improve the usability of e.g. a library representing an inherently dynamic domain. Open issues There are a few limitations and things that might work differently than you would expect. · The DLR allows objects to be created from objects that represent classes. However, the current implementation of C# doesn’t have syntax to support this. · Dynamic lookup will not be able to find extension methods. Whether extension methods apply or not depends on the static context of the call (i.e. which using clauses occur), and this context information is not currently kept as part of the payload. · Anonymous functions (i.e. lambda expressions) cannot appear as arguments to a dynamic method call. The compiler cannot bind (i.e. “understand”) an anonymous function without knowing what type it is converted to. One consequence of these limitations is that you cannot easily use LINQ queries over dynamic objects: dynamic collection = …; var result = collection.Select(e => e + 5); If the Select method is an extension method, dynamic lookup will not find it. Even if it is an instance method, the above does not compile, because a lambda expression cannot be passed as an argument to a dynamic operation. There are no plans to address these limitations in C# 4.0. Named and Optional Arguments Named and optional parameters are really two distinct features, but are often useful together. Optional parameters allow you to omit arguments to member invocations, whereas named arguments is a way to provide an argument using the name of the corresponding parameter instead of relying on its position in the parameter list. Some APIs, most notably COM interfaces such as the Office automation APIs, are written specifically with named and optional parameters in mind. Up until now it has been very painful to call into these APIs from C#, with sometimes as many as thirty arguments having to be explicitly passed, most of which have reasonable default values and could be omitted. Even in APIs for .NET however you sometimes find yourself compelled to write many overloads of a method with different combinations of parameters, in order to provide maximum usability to the callers. Optional parameters are a useful alternative for these situations. Optional parameters A parameter is declared optional simply by providing a default value for it: public void M(int x, int y = 5, int z = 7); Here y and z are optional parameters and can be omitted in calls: M(1, 2, 3); // ordinary call of M M(1, 2); // omitting z – equivalent to M(1, 2, 7) M(1); // omitting both y and z – equivalent to M(1, 5, 7) Named and optional arguments C# 4.0 does not permit you to omit arguments between commas as in M(1,,3). This could lead to highly unreadable comma-counting code. Instead any argument can be passed by name. Thus if you want to omit only y from a call of M you can write: M(1, z: 3); // passing z by name or M(x: 1, z: 3); // passing both x and z by name or even M(z: 3, x: 1); // reversing the order of arguments All forms are equivalent, except that arguments are always evaluated in the order they appear, so in the last example the 3 is evaluated before the 1. Optional and named arguments can be used not only with methods but also with indexers and constructors. Overload resolution Named and optional arguments affect overload resolution, but the changes are relatively simple: A signature is applicable if all its parameters are either optional or have exactly one corresponding argument (by name or position) in the call which is convertible to the parameter type. Betterness rules on conversions are only applied for arguments that are explicitly given – omitted optional arguments are ignored for betterness purposes. If two signatures are equally good, one that does not omit optional parameters is preferred. M(string s, int i = 1); M(object o); M(int i, string s = “Hello”); M(int i); M(5); Given these overloads, we can see the working of the rules above. M(string,int) is not applicable because 5 doesn’t convert to string. M(int,string) is applicable because its second parameter is optional, and so, obviously are M(object) and M(int). M(int,string) and M(int) are both better than M(object) because the conversion from 5 to int is better than the conversion from 5 to object. Finally M(int) is better than M(int,string) because no optional arguments are omitted. Thus the method that gets called is M(int). Features for COM interop Dynamic lookup as well as named and optional parameters greatly improve the experience of interoperating with COM APIs such as the Office Automation APIs. In order to remove even more of the speed bumps, a couple of small COM-specific features are also added to C# 4.0. Dynamic import Many COM methods accept and return variant types, which are represented in the PIAs as object. In the vast majority of cases, a programmer calling these methods already knows the static type of a returned object from context, but explicitly has to perform a cast on the returned value to make use of that knowledge. These casts are so common that they constitute a major nuisance. In order to facilitate a smoother experience, you can now choose to import these COM APIs in such a way that variants are instead represented using the type dynamic. In other words, from your point of view, COM signatures now have occurrences of dynamic instead of object in them. This means that you can easily access members directly off a returned object, or you can assign it to a strongly typed local variable without having to cast. To illustrate, you can now say excel.Cells[1, 1].Value = "Hello"; instead of ((Excel.Range)excel.Cells[1, 1]).Value2 = "Hello"; and Excel.Range range = excel.Cells[1, 1]; instead of Excel.Range range = (Excel.Range)excel.Cells[1, 1]; Compiling without PIAs Primary Interop Assemblies are large .NET assemblies generated from COM interfaces to facilitate strongly typed interoperability. They provide great support at design time, where your experience of the interop is as good as if the types where really defined in .NET. However, at runtime these large assemblies can easily bloat your program, and also cause versioning issues because they are distributed independently of your application. The no-PIA feature allows you to continue to use PIAs at design time without having them around at runtime. Instead, the C# compiler will bake the small part of the PIA that a program actually uses directly into its assembly. At runtime the PIA does not have to be loaded. Omitting ref Because of a different programming model, many COM APIs contain a lot of reference parameters. Contrary to refs in C#, these are typically not meant to mutate a passed-in argument for the subsequent benefit of the caller, but are simply another way of passing value parameters. It therefore seems unreasonable that a C# programmer should have to create temporary variables for all such ref parameters and pass these by reference. Instead, specifically for COM methods, the C# compiler will allow you to pass arguments by value to such a method, and will automatically generate temporary variables to hold the passed-in values, subsequently discarding these when the call returns. In this way the caller sees value semantics, and will not experience any side effects, but the called method still gets a reference. Open issues A few COM interface features still are not surfaced in C#. Most notably these include indexed properties and default properties. As mentioned above these will be respected if you access COM dynamically, but statically typed C# code will still not recognize them. There are currently no plans to address these remaining speed bumps in C# 4.0. Variance An aspect of generics that often comes across as surprising is that the following is illegal: IList<string> strings = new List<string>(); IList<object> objects = strings; The second assignment is disallowed because strings does not have the same element type as objects. There is a perfectly good reason for this. If it were allowed you could write: objects[0] = 5; string s = strings[0]; Allowing an int to be inserted into a list of strings and subsequently extracted as a string. This would be a breach of type safety. However, there are certain interfaces where the above cannot occur, notably where there is no way to insert an object into the collection. Such an interface is IEnumerable<T>. If instead you say: IEnumerable<object> objects = strings; There is no way we can put the wrong kind of thing into strings through objects, because objects doesn’t have a method that takes an element in. Variance is about allowing assignments such as this in cases where it is safe. The result is that a lot of situations that were previously surprising now just work. Covariance In .NET 4.0 the IEnumerable<T> interface will be declared in the following way: public interface IEnumerable<out T> : IEnumerable { IEnumerator<T> GetEnumerator(); } public interface IEnumerator<out T> : IEnumerator { bool MoveNext(); T Current { get; } } The “out” in these declarations signifies that the T can only occur in output position in the interface – the compiler will complain otherwise. In return for this restriction, the interface becomes “covariant” in T, which means that an IEnumerable<A> is considered an IEnumerable<B> if A has a reference conversion to B. As a result, any sequence of strings is also e.g. a sequence of objects. This is useful e.g. in many LINQ methods. Using the declarations above: var result = strings.Union(objects); // succeeds with an IEnumerable<object> This would previously have been disallowed, and you would have had to to some cumbersome wrapping to get the two sequences to have the same element type. Contravariance Type parameters can also have an “in” modifier, restricting them to occur only in input positions. An example is IComparer<T>: public interface IComparer<in T> { public int Compare(T left, T right); } The somewhat baffling result is that an IComparer<object> can in fact be considered an IComparer<string>! It makes sense when you think about it: If a comparer can compare any two objects, it can certainly also compare two strings. This property is referred to as contravariance. A generic type can have both in and out modifiers on its type parameters, as is the case with the Func<…> delegate types: public delegate TResult Func<in TArg, out TResult>(TArg arg); Obviously the argument only ever comes in, and the result only ever comes out. Therefore a Func<object,string> can in fact be used as a Func<string,object>. Limitations Variant type parameters can only be declared on interfaces and delegate types, due to a restriction in the CLR. Variance only applies when there is a reference conversion between the type arguments. For instance, an IEnumerable<int> is not an IEnumerable<object> because the conversion from int to object is a boxing conversion, not a reference conversion. Also please note that the CTP does not contain the new versions of the .NET types mentioned above. In order to experiment with variance you have to declare your own variant interfaces and delegate types. COM Example Here is a larger Office automation example that shows many of the new C# features in action. using System; using System.Diagnostics; using System.Linq; using Excel = Microsoft.Office.Interop.Excel; using Word = Microsoft.Office.Interop.Word; class Program { static void Main(string[] args) { var excel = new Excel.Application(); excel.Visible = true; excel.Workbooks.Add(); // optional arguments omitted excel.Cells[1, 1].Value = "Process Name"; // no casts; Value dynamically excel.Cells[1, 2].Value = "Memory Usage"; // accessed var processes = Process.GetProcesses() .OrderByDescending(p => p.WorkingSet) .Take(10); int i = 2; foreach (var p in processes) { excel.Cells[i, 1].Value = p.ProcessName; // no casts excel.Cells[i, 2].Value = p.WorkingSet; // no casts i++; } Excel.Range range = excel.Cells[1, 1]; // no casts Excel.Chart chart = excel.ActiveWorkbook.Charts. Add(After: excel.ActiveSheet); // named and optional arguments chart.ChartWizard( Source: range.CurrentRegion, Title: "Memory Usage in " + Environment.MachineName); //named+optional chart.ChartStyle = 45; chart.CopyPicture(Excel.XlPictureAppearance.xlScreen, Excel.XlCopyPictureFormat.xlBitmap, Excel.XlPictureAppearance.xlScreen); var word = new Word.Application(); word.Visible = true; word.Documents.Add(); // optional arguments word.Selection.Paste(); } } The code is much more terse and readable than the C# 3.0 counterpart. Note especially how the Value property is accessed dynamically. This is actually an indexed property, i.e. a property that takes an argument; something which C# does not understand. However the argument is optional. Since the access is dynamic, it goes through the runtime COM binder which knows to substitute the default value and call the indexed property. Thus, dynamic COM allows you to avoid accesses to the puzzling Value2 property of Excel ranges. Relationship with Visual Basic A number of the features introduced to C# 4.0 already exist or will be introduced in some form or other in Visual Basic: · Late binding in VB is similar in many ways to dynamic lookup in C#, and can be expected to make more use of the DLR in the future, leading to further parity with C#. · Named and optional arguments have been part of Visual Basic for a long time, and the C# version of the feature is explicitly engineered with maximal VB interoperability in mind. · NoPIA and variance are both being introduced to VB and C# at the same time. VB in turn is adding a number of features that have hitherto been a mainstay of C#. As a result future versions of C# and VB will have much better feature parity, for the benefit of everyone. Resources All available resources concerning C# 4.0 can be accessed through the C# Dev Center. Specifically, this white paper and other resources can be found at the Code Gallery site. Enjoy! span.fullpost {display:none;}

Read the article

A way of doing real-world test-driven development (and some thoughts about it)

- by Thomas Weller

Lately, I exchanged some arguments with Derick Bailey about some details of the red-green-refactor cycle of the Test-driven development process. In short, the issue revolved around the fact that it’s not enough to have a test red or green, but it’s also important to have it red or green for the right reasons. While for me, it’s sufficient to initially have a NotImplementedException in place, Derick argues that this is not totally correct (see these two posts: Red/Green/Refactor, For The Right Reasons and Red For The Right Reason: Fail By Assertion, Not By Anything Else). And he’s right. But on the other hand, I had no idea how his insights could have any practical consequence for my own individual interpretation of the red-green-refactor cycle (which is not really red-green-refactor, at least not in its pure sense, see the rest of this article). This made me think deeply for some days now. In the end I found out that the ‘right reason’ changes in my understanding depending on what development phase I’m in. To make this clear (at least I hope it becomes clear…) I started to describe my way of working in some detail, and then something strange happened: The scope of the article slightly shifted from focusing ‘only’ on the ‘right reason’ issue to something more general, which you might describe as something like 'Doing real-world TDD in .NET , with massive use of third-party add-ins’. This is because I feel that there is a more general statement about Test-driven development to make: It’s high time to speak about the ‘How’ of TDD, not always only the ‘Why’. Much has been said about this, and me myself also contributed to that (see here: TDD is not about testing, it's about how we develop software). But always justifying what you do is very unsatisfying in the long run, it is inherently defensive, and it costs time and effort that could be used for better and more important things. And frankly: I’m somewhat sick and tired of repeating time and again that the test-driven way of software development is highly preferable for many reasons - I don’t want to spent my time exclusively on stating the obvious… So, again, let’s say it clearly: TDD is programming, and programming is TDD. Other ways of programming (code-first, sometimes called cowboy-coding) are exceptional and need justification. – I know that there are many people out there who will disagree with this radical statement, and I also know that it’s not a description of the real world but more of a mission statement or something. But nevertheless I’m absolutely sure that in some years this statement will be nothing but a platitude. Side note: Some parts of this post read as if I were paid by Jetbrains (the manufacturer of the ReSharper add-in – R#), but I swear I’m not. Rather I think that Visual Studio is just not production-complete without it, and I wouldn’t even consider to do professional work without having this add-in installed... The three parts of a software component Before I go into some details, I first should describe my understanding of what belongs to a software component (assembly, type, or method) during the production process (i.e. the coding phase). Roughly, I come up with the three parts shown below: First, we need to have some initial sort of requirement. This can be a multi-page formal document, a vague idea in some programmer’s brain of what might be needed, or anything in between. In either way, there has to be some sort of requirement, be it explicit or not. – At the C# micro-level, the best way that I found to formulate that is to define interfaces for just about everything, even for internal classes, and to provide them with exhaustive xml comments. The next step then is to re-formulate these requirements in an executable form. This is specific to the respective programming language. - For C#/.NET, the Gallio framework (which includes MbUnit) in conjunction with the ReSharper add-in for Visual Studio is my toolset of choice. The third part then finally is the production code itself. It’s development is entirely driven by the requirements and their executable formulation. This is the delivery, the two other parts are ‘only’ there to make its production possible, to give it a decent quality and reliability, and to significantly reduce related costs down the maintenance timeline. So while the first two parts are not really relevant for the customer, they are very important for the developer. The customer (or in Scrum terms: the Product Owner) is not interested at all in how the product is developed, he is only interested in the fact that it is developed as cost-effective as possible, and that it meets his functional and non-functional requirements. The rest is solely a matter of the developer’s craftsmanship, and this is what I want to talk about during the remainder of this article… An example To demonstrate my way of doing real-world TDD, I decided to show the development of a (very) simple Calculator component. The example is deliberately trivial and silly, as examples always are. I am totally aware of the fact that real life is never that simple, but I only want to show some development principles here… The requirement As already said above, I start with writing down some words on the initial requirement, and I normally use interfaces for that, even for internal classes - the typical question “intf or not” doesn’t even come to mind. I need them for my usual workflow and using them automatically produces high componentized and testable code anyway. To think about their usage in every single situation would slow down the production process unnecessarily. So this is what I begin with: namespace Calculator { /// <summary> /// Defines a very simple calculator component for demo purposes. /// </summary> public interface ICalculator { /// <summary> /// Gets the result of the last successful operation. /// </summary> /// <value>The last result.</value> /// <remarks> /// Will be <see langword="null" /> before the first successful operation. /// </remarks> double? LastResult { get; } } // interface ICalculator } // namespace Calculator So, I’m not beginning with a test, but with a sort of code declaration - and still I insist on being 100% test-driven. There are three important things here: Starting this way gives me a method signature, which allows to use IntelliSense and AutoCompletion and thus eliminates the danger of typos - one of the most regular, annoying, time-consuming, and therefore expensive sources of error in the development process. In my understanding, the interface definition as a whole is more of a readable requirement document and technical documentation than anything else. So this is at least as much about documentation than about coding. The documentation must completely describe the behavior of the documented element. I normally use an IoC container or some sort of self-written provider-like model in my architecture. In either case, I need my components defined via service interfaces anyway. - I will use the LinFu IoC framework here, for no other reason as that is is very simple to use. The ‘Red’ (pt. 1) First I create a folder for the project’s third-party libraries and put the LinFu.Core dll there. Then I set up a test project (via a Gallio project template), and add references to the Calculator project and the LinFu dll. Finally I’m ready to write the first test, which will look like the following: namespace Calculator.Test { [TestFixture] public class CalculatorTest { private readonly ServiceContainer container = new ServiceContainer(); [Test] public void CalculatorLastResultIsInitiallyNull() { ICalculator calculator = container.GetService<ICalculator>(); Assert.IsNull(calculator.LastResult); } } // class CalculatorTest } // namespace Calculator.Test This is basically the executable formulation of what the interface definition states (part of). Side note: There’s one principle of TDD that is just plain wrong in my eyes: I’m talking about the Red is 'does not compile' thing. How could a compiler error ever be interpreted as a valid test outcome? I never understood that, it just makes no sense to me. (Or, in Derick’s terms: this reason is as wrong as a reason ever could be…) A compiler error tells me: Your code is incorrect, but nothing more. Instead, the ‘Red’ part of the red-green-refactor cycle has a clearly defined meaning to me: It means that the test works as intended and fails only if its assumptions are not met for some reason. Back to our Calculator. When I execute the above test with R#, the Gallio plugin will give me this output: So this tells me that the test is red for the wrong reason: There’s no implementation that the IoC-container could load, of course. So let’s fix that. With R#, this is very easy: First, create an ICalculator - derived type: Next, implement the interface members: And finally, move the new class to its own file: So far my ‘work’ was six mouse clicks long, the only thing that’s left to do manually here, is to add the Ioc-specific wiring-declaration and also to make the respective class non-public, which I regularly do to force my components to communicate exclusively via interfaces: This is what my Calculator class looks like as of now: using System; using LinFu.IoC.Configuration; namespace Calculator { [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { public double? LastResult { get { throw new NotImplementedException(); } } } } Back to the test fixture, we have to put our IoC container to work: [TestFixture] public class CalculatorTest { #region Fields private readonly ServiceContainer container = new ServiceContainer(); #endregion // Fields #region Setup/TearDown [FixtureSetUp] public void FixtureSetUp() { container.LoadFrom(AppDomain.CurrentDomain.BaseDirectory, "Calculator.dll"); } ... Because I have a R# live template defined for the setup/teardown method skeleton as well, the only manual coding here again is the IoC-specific stuff: two lines, not more… The ‘Red’ (pt. 2) Now, the execution of the above test gives the following result: This time, the test outcome tells me that the method under test is called. And this is the point, where Derick and I seem to have somewhat different views on the subject: Of course, the test still is worthless regarding the red/green outcome (or: it’s still red for the wrong reasons, in that it gives a false negative). But as far as I am concerned, I’m not really interested in the test outcome at this point of the red-green-refactor cycle. Rather, I only want to assert that my test actually calls the right method. If that’s the case, I will happily go on to the ‘Green’ part… The ‘Green’ Making the test green is quite trivial. Just make LastResult an automatic property: [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { public double? LastResult { get; private set; } } One more round… Now on to something slightly more demanding (cough…). Let’s state that our Calculator exposes an Add() method: ... /// <summary> /// Adds the specified operands. /// </summary> /// <param name="operand1">The operand1.</param> /// <param name="operand2">The operand2.</param> /// <returns>The result of the additon.</returns> /// <exception cref="ArgumentException"> /// Argument <paramref name="operand1"/> is < 0.<br/> /// -- or --<br/> /// Argument <paramref name="operand2"/> is < 0. /// </exception> double Add(double operand1, double operand2); } // interface ICalculator A remark: I sometimes hear the complaint that xml comment stuff like the above is hard to read. That’s certainly true, but irrelevant to me, because I read xml code comments with the CR_Documentor tool window. And using that, it looks like this: Apart from that, I’m heavily using xml code comments (see e.g. here for a detailed guide) because there is the possibility of automating help generation with nightly CI builds (using MS Sandcastle and the Sandcastle Help File Builder), and then publishing the results to some intranet location. This way, a team always has first class, up-to-date technical documentation at hand about the current codebase. (And, also very important for speeding up things and avoiding typos: You have IntelliSense/AutoCompletion and R# support, and the comments are subject to compiler checking…). Back to our Calculator again: Two more R# – clicks implement the Add() skeleton: ... public double Add(double operand1, double operand2) { throw new NotImplementedException(); } } // class Calculator As we have stated in the interface definition (which actually serves as our requirement document!), the operands are not allowed to be negative. So let’s start implementing that. Here’s the test: [Test] [Row(-0.5, 2)] public void AddThrowsOnNegativeOperands(double operand1, double operand2) { ICalculator calculator = container.GetService<ICalculator>(); Assert.Throws<ArgumentException>(() => calculator.Add(operand1, operand2)); } As you can see, I’m using a data-driven unit test method here, mainly for these two reasons: Because I know that I will have to do the same test for the second operand in a few seconds, I save myself from implementing another test method for this purpose. Rather, I only will have to add another Row attribute to the existing one. From the test report below, you can see that the argument values are explicitly printed out. This can be a valuable documentation feature even when everything is green: One can quickly review what values were tested exactly - the complete Gallio HTML-report (as it will be produced by the Continuous Integration runs) shows these values in a quite clear format (see below for an example). Back to our Calculator development again, this is what the test result tells us at the moment: So we’re red again, because there is not yet an implementation… Next we go on and implement the necessary parameter verification to become green again, and then we do the same thing for the second operand. To make a long story short, here’s the test and the method implementation at the end of the second cycle: // in CalculatorTest: [Test] [Row(-0.5, 2)] [Row(295, -123)] public void AddThrowsOnNegativeOperands(double operand1, double operand2) { ICalculator calculator = container.GetService<ICalculator>(); Assert.Throws<ArgumentException>(() => calculator.Add(operand1, operand2)); } // in Calculator: public double Add(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } throw new NotImplementedException(); } So far, we have sheltered our method from unwanted input, and now we can safely operate on the parameters without further caring about their validity (this is my interpretation of the Fail Fast principle, which is regarded here in more detail). Now we can think about the method’s successful outcomes. First let’s write another test for that: [Test] [Row(1, 1, 2)] public void TestAdd(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Add(operand1, operand2); Assert.AreEqual(expectedResult, result); } Again, I’m regularly using row based test methods for these kinds of unit tests. The above shown pattern proved to be extremely helpful for my development work, I call it the Defined-Input/Expected-Output test idiom: You define your input arguments together with the expected method result. There are two major benefits from that way of testing: In the course of refining a method, it’s very likely to come up with additional test cases. In our case, we might add tests for some edge cases like ‘one of the operands is zero’ or ‘the sum of the two operands causes an overflow’, or maybe there’s an external test protocol that has to be fulfilled (e.g. an ISO norm for medical software), and this results in the need of testing against additional values. In all these scenarios we only have to add another Row attribute to the test. Remember that the argument values are written to the test report, so as a side-effect this produces valuable documentation. (This can become especially important if the fulfillment of some sort of external requirements has to be proven). So your test method might look something like that in the end: [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 2)] [Row(0, 999999999, 999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, double.MaxValue)] [Row(4, double.MaxValue - 2.5, double.MaxValue)] public void TestAdd(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Add(operand1, operand2); Assert.AreEqual(expectedResult, result); } And this will produce the following HTML report (with Gallio): Not bad for the amount of work we invested in it, huh? - There might be scenarios where reports like that can be useful for demonstration purposes during a Scrum sprint review… The last requirement to fulfill is that the LastResult property is expected to store the result of the last operation. I don’t show this here, it’s trivial enough and brings nothing new… And finally: Refactor (for the right reasons) To demonstrate my way of going through the refactoring portion of the red-green-refactor cycle, I added another method to our Calculator component, namely Subtract(). Here’s the code (tests and production): // CalculatorTest.cs: [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 0)] [Row(0, 999999999, -999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, -double.MaxValue)] [Row(4, double.MaxValue - 2.5, -double.MaxValue)] public void TestSubtract(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Subtract(operand1, operand2); Assert.AreEqual(expectedResult, result); } [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 0)] [Row(0, 999999999, -999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, -double.MaxValue)] [Row(4, double.MaxValue - 2.5, -double.MaxValue)] public void TestSubtractGivesExpectedLastResult(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); calculator.Subtract(operand1, operand2); Assert.AreEqual(expectedResult, calculator.LastResult); } ... // ICalculator.cs: /// <summary> /// Subtracts the specified operands. /// </summary> /// <param name="operand1">The operand1.</param> /// <param name="operand2">The operand2.</param> /// <returns>The result of the subtraction.</returns> /// <exception cref="ArgumentException"> /// Argument <paramref name="operand1"/> is < 0.<br/> /// -- or --<br/> /// Argument <paramref name="operand2"/> is < 0. /// </exception> double Subtract(double operand1, double operand2); ... // Calculator.cs: public double Subtract(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } return (this.LastResult = operand1 - operand2).Value; } Obviously, the argument validation stuff that was produced during the red-green part of our cycle duplicates the code from the previous Add() method. So, to avoid code duplication and minimize the number of code lines of the production code, we do an Extract Method refactoring. One more time, this is only a matter of a few mouse clicks (and giving the new method a name) with R#: Having done that, our production code finally looks like that: using System; using LinFu.IoC.Configuration; namespace Calculator { [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { #region ICalculator public double? LastResult { get; private set; } public double Add(double operand1, double operand2) { ThrowIfOneOperandIsInvalid(operand1, operand2); return (this.LastResult = operand1 + operand2).Value; } public double Subtract(double operand1, double operand2) { ThrowIfOneOperandIsInvalid(operand1, operand2); return (this.LastResult = operand1 - operand2).Value; } #endregion // ICalculator #region Implementation (Helper) private static void ThrowIfOneOperandIsInvalid(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } } #endregion // Implementation (Helper) } // class Calculator } // namespace Calculator But is the above worth the effort at all? It’s obviously trivial and not very impressive. All our tests were green (for the right reasons), and refactoring the code did not change anything. It’s not immediately clear how this refactoring work adds value to the project. Derick puts it like this: STOP! Hold on a second… before you go any further and before you even think about refactoring what you just wrote to make your test pass, you need to understand something: if your done with your requirements after making the test green, you are not required to refactor the code. I know… I’m speaking heresy, here. Toss me to the wolves, I’ve gone over to the dark side! Seriously, though… if your test is passing for the right reasons, and you do not need to write any test or any more code for you class at this point, what value does refactoring add? Derick immediately answers his own question: So why should you follow the refactor portion of red/green/refactor? When you have added code that makes the system less readable, less understandable, less expressive of the domain or concern’s intentions, less architecturally sound, less DRY, etc, then you should refactor it. I couldn’t state it more precise. From my personal perspective, I’d add the following: You have to keep in mind that real-world software systems are usually quite large and there are dozens or even hundreds of occasions where micro-refactorings like the above can be applied. It’s the sum of them all that counts. And to have a good overall quality of the system (e.g. in terms of the Code Duplication Percentage metric) you have to be pedantic on the individual, seemingly trivial cases. My job regularly requires the reading and understanding of ‘foreign’ code. So code quality/readability really makes a HUGE difference for me – sometimes it can be even the difference between project success and failure… Conclusions The above described development process emerged over the years, and there were mainly two things that guided its evolution (you might call it eternal principles, personal beliefs, or anything in between): Test-driven development is the normal, natural way of writing software, code-first is exceptional. So ‘doing TDD or not’ is not a question. And good, stable code can only reliably be produced by doing TDD (yes, I know: many will strongly disagree here again, but I’ve never seen high-quality code – and high-quality code is code that stood the test of time and causes low maintenance costs – that was produced code-first…) It’s the production code that pays our bills in the end. (Though I have seen customers these days who demand an acceptance test battery as part of the final delivery. Things seem to go into the right direction…). The test code serves ‘only’ to make the production code work. But it’s the number of delivered features which solely counts at the end of the day - no matter how much test code you wrote or how good it is. With these two things in mind, I tried to optimize my coding process for coding speed – or, in business terms: productivity - without sacrificing the principles of TDD (more than I’d do either way…). As a result, I consider a ratio of about 3-5/1 for test code vs. production code as normal and desirable. In other words: roughly 60-80% of my code is test code (This might sound heavy, but that is mainly due to the fact that software development standards only begin to evolve. The entire software development profession is very young, historically seen; only at the very beginning, and there are no viable standards yet. If you think about software development as a kind of casting process, where the test code is the mold and the resulting production code is the final product, then the above ratio sounds no longer extraordinary…) Although the above might look like very much unnecessary work at first sight, it’s not. With the aid of the mentioned add-ins, doing all the above is a matter of minutes, sometimes seconds (while writing this post took hours and days…). The most important thing is to have the right tools at hand. Slow developer machines or the lack of a tool or something like that - for ‘saving’ a few 100 bucks - is just not acceptable and a very bad decision in business terms (though I quite some times have seen and heard that…). Production of high-quality products needs the usage of high-quality tools. This is a platitude that every craftsman knows… The here described round-trip will take me about five to ten minutes in my real-world development practice. I guess it’s about 30% more time compared to developing the ‘traditional’ (code-first) way. But the so manufactured ‘product’ is of much higher quality and massively reduces maintenance costs, which is by far the single biggest cost factor, as I showed in this previous post: It's the maintenance, stupid! (or: Something is rotten in developerland.). In the end, this is a highly cost-effective way of software development… But on the other hand, there clearly is a trade-off here: coding speed vs. code quality/later maintenance costs. The here described development method might be a perfect fit for the overwhelming majority of software projects, but there certainly are some scenarios where it’s not - e.g. if time-to-market is crucial for a software project. So this is a business decision in the end. It’s just that you have to know what you’re doing and what consequences this might have… Some last words First, I’d like to thank Derick Bailey again. His two aforementioned posts (which I strongly recommend for reading) inspired me to think deeply about my own personal way of doing TDD and to clarify my thoughts about it. I wouldn’t have done that without this inspiration. I really enjoy that kind of discussions… I agree with him in all respects. But I don’t know (yet?) how to bring his insights into the described production process without slowing things down. The above described method proved to be very “good enough” in my practical experience. But of course, I’m open to suggestions here… My rationale for now is: If the test is initially red during the red-green-refactor cycle, the ‘right reason’ is: it actually calls the right method, but this method is not yet operational. Later on, when the cycle is finished and the tests become part of the regular, automated Continuous Integration process, ‘red’ certainly must occur for the ‘right reason’: in this phase, ‘red’ MUST mean nothing but an unfulfilled assertion - Fail By Assertion, Not By Anything Else!

Read the article

reading the file name from user input in MIPS assembly

- by Hassan Al-Jeshi

I'm writing a MIPS assembly code that will ask the user for the file name and it will produce some statistics about the content of the file. However, when I hard code the file name into a variable from the beginning it works just fine, but when I ask the user to input the file name it does not work. after some debugging, I have discovered that the program adds 0x00 char and 0x0a char (check asciitable.com) at the end of user input in the memory and that's why it does not open the file based on the user input. anyone has any idea about how to get rid of those extra chars, or how to open the file after getting its name from the user?? here is my complete code (it is working fine except for the file name from user thing, and anybody is free to use it for any purpose he/she wants to): .data fin: .ascii "" # filename for input msg0: .asciiz "aaaa" msg1: .asciiz "Please enter the input file name:" msg2: .asciiz "Number of Uppercase Char: " msg3: .asciiz "Number of Lowercase Char: " msg4: .asciiz "Number of Decimal Char: " msg5: .asciiz "Number of Words: " nline: .asciiz "\n" buffer: .asciiz "" .text #----------------------- li $v0, 4 la $a0, msg1 syscall li $v0, 8 la $a0, fin li $a1, 21 syscall jal fileRead #read from file move $s1, $v0 #$t0 = total number of bytes li $t0, 0 # Loop counter li $t1, 0 # Uppercase counter li $t2, 0 # Lowercase counter li $t3, 0 # Decimal counter li $t4, 0 # Words counter loop: bge $t0, $s1, end #if end of file reached OR if there is an error in the file lb $t5, buffer($t0) #load next byte from file jal checkUpper #check for upper case jal checkLower #check for lower case jal checkDecimal #check for decimal jal checkWord #check for words addi $t0, $t0, 1 #increment loop counter j loop end: jal output jal fileClose li $v0, 10 syscall fileRead: # Open file for reading li $v0, 13 # system call for open file la $a0, fin # input file name li $a1, 0 # flag for reading li $a2, 0 # mode is ignored syscall # open a file move $s0, $v0 # save the file descriptor # reading from file just opened li $v0, 14 # system call for reading from file move $a0, $s0 # file descriptor la $a1, buffer # address of buffer from which to read li $a2, 100000 # hardcoded buffer length syscall # read from file jr $ra output: li $v0, 4 la $a0, msg2 syscall li $v0, 1 move $a0, $t1 syscall li $v0, 4 la $a0, nline syscall li $v0, 4 la $a0, msg3 syscall li $v0, 1 move $a0, $t2 syscall li $v0, 4 la $a0, nline syscall li $v0, 4 la $a0, msg4 syscall li $v0, 1 move $a0, $t3 syscall li $v0, 4 la $a0, nline syscall li $v0, 4 la $a0, msg5 syscall addi $t4, $t4, 1 li $v0, 1 move $a0, $t4 syscall jr $ra checkUpper: blt $t5, 0x41, L1 #branch if less than 'A' bgt $t5, 0x5a, L1 #branch if greater than 'Z' addi $t1, $t1, 1 #increment Uppercase counter L1: jr $ra checkLower: blt $t5, 0x61, L2 #branch if less than 'a' bgt $t5, 0x7a, L2 #branch if greater than 'z' addi $t2, $t2, 1 #increment Lowercase counter L2: jr $ra checkDecimal: blt $t5, 0x30, L3 #branch if less than '0' bgt $t5, 0x39, L3 #branch if greater than '9' addi $t3, $t3, 1 #increment Decimal counter L3: jr $ra checkWord: bne $t5, 0x20, L4 #branch if 'space' addi $t4, $t4, 1 #increment words counter L4: jr $ra fileClose: # Close the file li $v0, 16 # system call for close file move $a0, $s0 # file descriptor to close syscall # close file jr $ra Note: I'm using MARS Simulator, if that makes any different

Read the article

Problem with Google Analytics for Android : "Dispatcher thinks it finished, but there were 543 faile

- by PHP_Jedi

Anyone know how to solve this problem? 03-23 13:03:20.585: WARN/googleanalytics(3430): Problem with socket or streams. 03-23 13:03:20.585: WARN/googleanalytics(3430): java.net.SocketException: Broken pipe 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.harmony.luni.platform.OSNetworkSystem.sendStreamImpl(Native Method) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.harmony.luni.platform.OSNetworkSystem.sendStream(OSNetworkSystem.java:498) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.harmony.luni.net.PlainSocketImpl.write(PlainSocketImpl.java:585) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.harmony.luni.net.SocketOutputStream.write(SocketOutputStream.java:59) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:87) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:94) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.http.impl.AbstractHttpClientConnection.doFlush(AbstractHttpClientConnection.java:168) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.http.impl.AbstractHttpClientConnection.flush(AbstractHttpClientConnection.java:173) 03-23 13:03:20.585: WARN/googleanalytics(3430): at com.google.android.apps.analytics.PipelinedRequester.sendRequests(Unknown Source) 03-23 13:03:20.585: WARN/googleanalytics(3430): at com.google.android.apps.analytics.NetworkDispatcher$DispatcherThread$AsyncDispatchTask.dispatchSomePendingEvents(Unknown Source) 03-23 13:03:20.585: WARN/googleanalytics(3430): at com.google.android.apps.analytics.NetworkDispatcher$DispatcherThread$AsyncDispatchTask.run(Unknown Source) 03-23 13:03:20.585: WARN/googleanalytics(3430): at android.os.Handler.handleCallback(Handler.java:587) 03-23 13:03:20.585: WARN/googleanalytics(3430): at android.os.Handler.dispatchMessage(Handler.java:92) 03-23 13:03:20.585: WARN/googleanalytics(3430): at android.os.Looper.loop(Looper.java:123) 03-23 13:03:20.585: WARN/googleanalytics(3430): at android.os.HandlerThread.run(HandlerThread.java:60) 03-23 13:03:21.088: WARN/googleanalytics(3430): Dispatcher thinks it finished, but there were 543 failed events Specially the last line explain why there is lost so much data, as the dispatcher thinks it is done, but have 543 events not dispatched... The application have a good internet connection and there is no problem reaching the app server-side api. I see in analytics that lots of startups and click-events the past few days are lost, even I know the traffic is normal since i can see statistics from the the server api. In the analytics reports I see a day by day under-reporting. So the problems seems to be spreading/growing to all the devices using this application. Im wondering why google does not answer this in their mail-groups - several people have complained about this...well, well... I found this thread relevant: http://stackoverflow.com/questions/682560/java-net-socketexception-broken-pipe But, I'm still not sure if there is anything I can do to fix it or not. If there is nothing I can do to fix it, I guess its not my fault that it got broken. But i got a feeling it is, since the problem got dramatically worse on the last deploy to Android market. Anyone else with experience on Google Analytics for android ?

Read the article

Handling TclErrors in Python

- by anteater7171

In the following code I'll get the following error if I right click the window that pops up. Then go down to the very bottom entry widget then delete it's contents. It seems to be giving me a TclError. How do I go about handeling such an error? The Error Exception in Tkinter callback Traceback (most recent call last): File "C:\Python26\Lib\lib-tk\Tkinter.py", line 1410, in __call__ return self.func(*args) File "C:\Python26\CPUDEMO.py", line 503, in I TL.sclS.set(S1) File "C:\Python26\Lib\lib-tk\Tkinter.py", line 2765, in set self.tk.call(self._w, 'set', value) TclError: expected floating-point number but got "" The Code #F #PIthon.py # Import/Setup import Tkinter import psutil,time import re from PIL import Image, ImageTk from time import sleep class simpleapp_tk(Tkinter.Tk): def __init__(self,parent): Tkinter.Tk.__init__(self,parent) self.parent = parent self.initialize() def initialize(self): Widgets self.menu = Tkinter.Menu(self, tearoff = 0 ) M = [ "Options...", "Exit"] self.selectedM = Tkinter.StringVar() self.menu.add_radiobutton( label = 'Hide', variable = self.selectedM, command = self.E ) self.menu.add_radiobutton( label = 'Bump', variable = self.selectedM, command = self.E ) self.menu.add_separator() self.menu.add_radiobutton( label = 'Options...', variable = self.selectedM, command = self.E ) self.menu.add_separator() self.menu.add_radiobutton( label = 'Exit', variable = self.selectedM, command = self.E ) self.frame1 = Tkinter.Frame(self,bg='grey15',relief='ridge',borderwidth=4,width=185, height=39) self.frame1.grid() self.frame1.grid_propagate(0) self.frame1.bind( "<Button-3><ButtonRelease-3>", self.D ) self.frame1.bind( "<Button-2><ButtonRelease-2>", self.C ) self.frame1.bind( "<Double-Button-1>", self.C ) self.labelVariable = Tkinter.StringVar() self.label = Tkinter.Label(self.frame1,textvariable=self.labelVariable,fg="lightgreen",bg="grey15",borderwidth=1,font=('arial', 10, 'bold')) self.label.grid(column=1,row=0,columnspan=1,sticky='nsew') self.label.bind( "<Button-3><ButtonRelease-3>", self.D ) self.label.bind( "<Button-2><ButtonRelease-2>", self.C ) self.label.bind( "<Double-Button-1>", self.C ) self.F() self.overrideredirect(1) self.wm_attributes("-topmost", 1) global TL1 TL1 = Tkinter.Toplevel(self) TL1.wm_geometry("+0+5000") TL1.overrideredirect(1) TL1.button = Tkinter.Button(TL1,text="? CPU",fg="lightgreen",bg="grey15",activeforeground="lightgreen", activebackground='grey15',borderwidth=4,font=('Arial', 8, 'bold'),command=self.J) TL1.button.pack(ipadx=1) Events def Reset(self): self.label.configure(font=('arial', 10, 'bold'),fg='Lightgreen',bg='grey15',borderwidth=0) self.labela.configure(font=('arial', 8, 'bold'),fg='Lightgreen',bg='grey15',borderwidth=0) self.frame1.configure(bg='grey15',relief='ridge',borderwidth=4,width=224, height=50) self.label.pack(ipadx=38) def helpmenu(self): t2 = Tkinter.Toplevel(self) Tkinter.Label(t2, text='This is a help menu', anchor="w",justify="left",fg="darkgreen",bg="grey90",relief="ridge",borderwidth=5,font=('Arial', 10)).pack(fill='both', expand=1) t2.resizable(False,False) t2.title('Help') menu = Tkinter.Menu(self) t2.config(menu=menu) filemenu = Tkinter.Menu(menu) menu.add_cascade(label="| Exit |", menu=filemenu) filemenu.add_command(label="Exit", command=t2.destroy) def aboutmenu(self): t1 = Tkinter.Toplevel(self) Tkinter.Label(t1, text=' About:\n\n CPU Usage v1.0\n\n Publisher: Drew French\n Date: 05/09/10\n Email: [email protected] \n\n\n\n\n\n\n Written in Python 2.6.4', anchor="w",justify="left",fg="darkgreen",bg="grey90",relief="sunken",borderwidth=5,font=('Arial', 10)).pack(fill='both', expand=1) t1.resizable(False,False) t1.title('About') menu = Tkinter.Menu(self) t1.config(menu=menu) filemenu = Tkinter.Menu(menu) menu.add_cascade(label="| Exit |", menu=filemenu) filemenu.add_command(label="Exit", command=t1.destroy) def A (self,event): TL.entryVariable1.set(TL.sclY.get()) TL.entryVariable2.set(TL.sclX.get()) Y = TL.sclY.get() X = TL.sclX.get() self.wm_geometry("+" + str(X) + "+" + str(Y)) def B(self,event): Y1 = TL.entryVariable1.get() X1 = TL.entryVariable2.get() self.wm_geometry("+" + str(X1) + "+" + str(Y1)) TL.sclY.set(Y1) TL.sclX.set(X1) def C(self,event): s = self.wm_geometry() geomPatt = re.compile(r"(\d+)?x?(\d+)?([+-])(\d+)([+-])(\d+)") m = geomPatt.search(s) X3 = m.group(4) Y3 = m.group(6) M = int(Y3) - 150 P = M + 150 while Y3 > M: sleep(0.0009) Y3 = int(Y3) - 1 self.update_idletasks() self.wm_geometry("+" + str(X3) + "+" + str(Y3)) sleep(2.00) while Y3 < P: sleep(0.0009) Y3 = int(Y3) + 1 self.update_idletasks() self.wm_geometry("+" + str(X3) + "+" + str(Y3)) def D(self, event=None): self.menu.post( event.x_root, event.y_root ) def E(self): if self.selectedM.get() =='Options...': Setup global TL TL = Tkinter.Toplevel(self) menu = Tkinter.Menu(TL) TL.config(menu=menu) filemenu = Tkinter.Menu(menu) menu.add_cascade(label="| Menu |", menu=filemenu) filemenu.add_command(label="Instruction Manual...", command=self.helpmenu) filemenu.add_command(label="About...", command=self.aboutmenu) filemenu.add_separator() filemenu.add_command(label="Exit Options", command=TL.destroy) filemenu.add_command(label="Exit", command=self.destroy) helpmenu = Tkinter.Menu(menu) menu.add_cascade(label="| Help |", menu=helpmenu) helpmenu.add_command(label="Instruction Manual...", command=self.helpmenu) helpmenu.add_separator() helpmenu.add_command(label="Quick Help...", command=self.helpmenu) Title TL.label5 = Tkinter.Label(TL,text="CPU Usage: Options",anchor="center",fg="black",bg="lightgreen",relief="ridge",borderwidth=5,font=('Arial', 18, 'bold')) TL.label5.pack(padx=15,ipadx=5) X Y scale TL.separator = Tkinter.Frame(TL,height=7, bd=1, relief='ridge', bg='grey95') TL.separator.pack(pady=5,padx=5) # TL.sclX = Tkinter.Scale(TL.separator, from_=0, to=1500, orient='horizontal', resolution=1, command=self.A) TL.sclX.grid(column=1,row=0,ipadx=27, sticky='w') TL.label1 = Tkinter.Label(TL.separator,text="X",anchor="s",fg="black",bg="grey95",font=('Arial', 8 ,'bold')) TL.label1.grid(column=0,row=0, pady=1, sticky='S') TL.sclY = Tkinter.Scale(TL.separator, from_=0, to=1500, resolution=1, command=self.A) TL.sclY.grid(column=2,row=1,rowspan=2,sticky='e', padx=4) TL.label3 = Tkinter.Label(TL.separator,text="Y",fg="black",bg="grey95",font=('Arial', 8 ,'bold')) TL.label3.grid(column=2,row=0, padx=10, sticky='e') TL.entryVariable2 = Tkinter.StringVar() TL.entry2 = Tkinter.Entry(TL.separator,textvariable=TL.entryVariable2, fg="grey15",bg="grey90",relief="sunken",insertbackground="black",borderwidth=5,font=('Arial', 10)) TL.entry2.grid(column=1,row=1,ipadx=20, pady=10,sticky='EW') TL.entry2.bind("<Return>", self.B) TL.label2 = Tkinter.Label(TL.separator,text="X:",fg="black",bg="grey95",font=('Arial', 8 ,'bold')) TL.label2.grid(column=0,row=1, ipadx=4, sticky='W') TL.entryVariable1 = Tkinter.StringVar() TL.entry1 = Tkinter.Entry(TL.separator,textvariable=TL.entryVariable1, fg="grey15",bg="grey90",relief="sunken",insertbackground="black",borderwidth=5,font=('Arial', 10)) TL.entry1.grid(column=1,row=2,sticky='EW') TL.entry1.bind("<Return>", self.B) TL.label4 = Tkinter.Label(TL.separator,text="Y:", anchor="center",fg="black",bg="grey95",font=('Arial', 8 ,'bold')) TL.label4.grid(column=0,row=2, ipadx=4, sticky='W') TL.label7 = Tkinter.Label(TL.separator,text="Text Colour:",fg="black",bg="grey95",font=('Arial', 8 ,'bold')) TL.label7.grid(column=1,row=3,stick="W",ipady=10) TL.selectedP = Tkinter.StringVar() TL.opt1 = Tkinter.OptionMenu(TL.separator, TL.selectedP,'Normal', 'White','Black', 'Blue', 'Steel Blue','Green','Light Green','Yellow','Orange' ,'Red',command=self.G) TL.opt1.config(fg="black",bg="grey90",activebackground="grey90",activeforeground="black", anchor="center",relief="raised",direction='right',font=('Arial', 10)) TL.opt1.grid(column=1,row=4,sticky='EW',padx=20,ipadx=20) TL.selectedP.set('Normal') TL.label7 = Tkinter.Label(TL.separator,text="Refresh Rate:",fg="black",bg="grey95",font=('Arial', 8 ,'bold')) TL.label7.grid(column=1,row=5,stick="W",ipady=10) TL.sclS = Tkinter.Scale(TL.separator, from_=10, to=2000, orient='horizontal', resolution=10, command=self.H) TL.sclS.grid(column=1,row=6,ipadx=27, sticky='w') TL.sclS.set(650) TL.entryVariableS = Tkinter.StringVar() TL.entryS = Tkinter.Entry(TL.separator,textvariable=TL.entryVariableS, fg="grey15",bg="grey90",relief="sunken",insertbackground="black",borderwidth=5,font=('Arial', 10)) TL.entryS.grid(column=1,row=7,ipadx=20, pady=10,sticky='EW') TL.entryS.bind("<Return>", self.I) TL.entryVariableS.set(650) # TL.resizable(False,False) TL.title('Options') geomPatt = re.compile(r"(\d+)?x?(\d+)?([+-])(\d+)([+-])(\d+)") s = self.wm_geometry() m = geomPatt.search(s) X = m.group(4) Y = m.group(6) TL.sclY.set(Y) TL.sclX.set(X) if self.selectedM.get() == 'Exit': self.destroy() if self.selectedM.get() == 'Bump': s = self.wm_geometry() geomPatt = re.compile(r"(\d+)?x?(\d+)?([+-])(\d+)([+-])(\d+)") m = geomPatt.search(s) X3 = m.group(4) Y3 = m.group(6) M = int(Y3) - 150 P = M + 150 while Y3 > M: sleep(0.0009) Y3 = int(Y3) - 1 self.update_idletasks() self.wm_geometry("+" + str(X3) + "+" + str(Y3)) sleep(2.00) while Y3 < P: sleep(0.0009) Y3 = int(Y3) + 1 self.update_idletasks() self.wm_geometry("+" + str(X3) + "+" + str(Y3)) if self.selectedM.get() == 'Hide': s = self.wm_geometry() geomPatt = re.compile(r"(\d+)?x?(\d+)?([+-])(\d+)([+-])(\d+)") m = geomPatt.search(s) X3 = m.group(4) Y3 = m.group(6) M = int(Y3) + 5000 self.update_idletasks() self.wm_geometry("+" + str(X3) + "+" + str(M)) TL1.wm_geometry("+0+190") def F (self): G = round(psutil.cpu_percent(), 1) G1 = str(G) + '%' self.labelVariable.set(G1) try: S2 = TL.entryVariableS.get() except ValueError, e: S2 = 650 except NameError: S2 = 650 self.after(int(S2), self.F) def G (self,event): if TL.selectedP.get() =='Normal': self.label.config( fg = 'lightgreen' ) TL1.button.config( fg = 'lightgreen',activeforeground='lightgreen') if TL.selectedP.get() =='Red': self.label.config( fg = 'red' ) TL1.button.config( fg = 'red',activeforeground='red') if TL.selectedP.get() =='Orange': self.label.config( fg = 'orange') TL1.button.config( fg = 'orange',activeforeground='orange') if TL.selectedP.get() =='Yellow': self.label.config( fg = 'yellow') TL1.button.config( fg = 'yellow',activeforeground='yellow') if TL.selectedP.get() =='Light Green': self.label.config( fg = 'lightgreen' ) TL1.button.config( fg = 'lightgreen',activeforeground='lightgreen') if TL.selectedP.get() =='Normal': self.label.config( fg = 'lightgreen' ) TL1.button.config( fg = 'lightgreen',activeforeground='lightgreen') if TL.selectedP.get() =='Steel Blue': self.label.config( fg = 'steelblue1' ) TL1.button.config( fg = 'steelblue1',activeforeground='steelblue1') if TL.selectedP.get() =='Blue': self.label.config( fg = 'blue') TL1.button.config( fg = 'blue',activeforeground='blue') if TL.selectedP.get() =='Green': self.label.config( fg = 'darkgreen' ) TL1.button.config( fg = 'darkgreen',activeforeground='darkgreen') if TL.selectedP.get() =='White': self.label.config( fg = 'white' ) TL1.button.config( fg = 'white',activeforeground='white') if TL.selectedP.get() =='Black': self.label.config( fg = 'black') TL1.button.config( fg = 'black',activeforeground='black') def H (self,event): TL.entryVariableS.set(TL.sclS.get()) S = TL.sclS.get() def I (self,event): S1 = TL.entryVariableS.get() TL.sclS.set(S1) TL.sclS.set(TL.sclS.get()) S1 = TL.entryVariableS.get() TL.sclS.set(S1) def J (self): s = self.wm_geometry() geomPatt = re.compile(r"(\d+)?x?(\d+)?([+-])(\d+)([+-])(\d+)") m = geomPatt.search(s) X3 = m.group(4) Y3 = m.group(6) M = int(Y3) - 5000 self.update_idletasks() self.wm_geometry("+" + str(X3) + "+" + str(M)) TL1.wm_geometry("+0+5000") Loop if name == "main": app = simpleapp_tk(None) app.mainloop()

Read the article

Optimizing a lot of Scanner.findWithinHorizon(pattern, 0) calls

- by darvids0n

I'm building a process which extracts data from 6 csv-style files and two poorly laid out .txt reports and builds output CSVs, and I'm fully aware that there's going to be some overhead searching through all that whitespace thousands of times, but I never anticipated converting about about 50,000 records would take 12 hours. Excerpt of my manual matching code (I know it's horrible that I use lists of tokens like that, but it was the best thing I could think of): public static String lookup(List<String> tokensBefore, List<String> tokensAfter) { String result = null; while(_match(tokensBefore)) { // block until all input is read if(id.hasNext()) { result = id.next(); // capture the next token that matches if(_matchImmediate(tokensAfter)) // try to match tokensAfter to this result return result; } else return null; // end of file; no match } return null; // no matches } private static boolean _match(List<String> tokens) { return _match(tokens, true); } private static boolean _match(List<String> tokens, boolean block) { if(tokens != null && !tokens.isEmpty()) { if(id.findWithinHorizon(tokens.get(0), 0) == null) return false; for(int i = 1; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(id.hasNext() && !id.next().matches(tokens.get(i))) { break; // break to blocking behaviour } } } else { return true; // empty list always matches } if(block) return _match(tokens); // loop until we find something or nothing else return false; // return after just one attempted match } private static boolean _matchImmediate(List<String> tokens) { if(tokens != null) { for(int i = 0; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(!id.hasNext() || !id.next().matches(tokens.get(i))) { return false; // doesn't match, or end of file } } return false; // we have some serious problems if this ever gets called } else { return true; // empty list always matches } } Basically wondering how I would work in an efficient string search (Boyer-Moore or similar). My Scanner id is scanning a java.util.String, figured buffering it to memory would reduce I/O since the search here is being performed thousands of times on a relatively small file. The performance increase compared to scanning a BufferedReader(FileReader(File)) was probably less than 1%, the process still looks to be taking a LONG time. I've also traced execution and the slowness of my overall conversion process is definitely between the first and last like of the lookup method. In fact, so much so that I ran a shortcut process to count the number of occurrences of various identifiers in the .csv-style files (I use 2 lookup methods, this is just one of them) and the process completed indexing approx 4 different identifiers for 50,000 records in less than a minute. Compared to 12 hours, that's instant. Some notes (updated): I don't necessarily need the pattern-matching behaviour, I only get the first field of a line of text so I need to match line breaks or use Scanner.nextLine(). All ID numbers I need start at position 0 of a line and run through til the first block of whitespace, after which is the name of the corresponding object. I would ideally want to return a String, not an int locating the line number or start position of the result, but if it's faster then it will still work just fine. If an int is being returned, however, then I would now have to seek to that line again just to get the ID; storing the ID of every line that is searched sounds like a way around that. Anything to help me out, even if it saves 1ms per search, will help, so all input is appreciated. Thankyou! Usage scenario 1: I have a list of objects in file A, who in the old-style system have an id number which is not in file A. It is, however, POSSIBLY in another csv-style file (file B) or possibly still in a .txt report (file C) which each also contain a bunch of other information which is not useful here, and so file B needs to be searched through for the object's full name (1 token since it would reside within the second column of any given line), and then the first column should be the ID number. If that doesn't work, we then have to split the search token by whitespace into separate tokens before doing a search of file C for those tokens as well. Generalised code: String field; for (/* each record in file A */) { /* construct the rest of this object from file A info */ // now to find the ID, if we can List<String> objectName = new ArrayList<String>(1); objectName.add(Pattern.quote(thisObject.fullName)); field = lookup(objectSearchToken, objectName); // search file B if(field == null) // not found in file B { lookupReset(false); // initialise scanner to check file C objectName.clear(); // not using the full name String[] tokens = thisObject.fullName.split(id.delimiter().pattern()); for(String s : tokens) objectName.add(Pattern.quote(s)); field = lookup(objectSearchToken, objectName); // search file C lookupReset(true); // back to file B } else { /* found it, file B specific processing here */ } if(field != null) // found it in B or C thisObject.ID = field; } The objectName tokens are all uppercase words with possible hyphens or apostrophes in them, separated by spaces. Much like a person's name. As per a comment, I will pre-compile the regex for my objectSearchToken, which is just [\r\n]+. What's ending up happening in file C is, every single line is being checked, even the 95% of lines which don't contain an ID number and object name at the start. Would it be quicker to use ^[\r\n]+.*(objectname) instead of two separate regexes? It may reduce the number of _match executions. The more general case of that would be, concatenate all tokensBefore with all tokensAfter, and put a .* in the middle. It would need to be matching backwards through the file though, otherwise it would match the correct line but with a huge .* block in the middle with lots of lines. The above situation could be resolved if I could get java.util.Scanner to return the token previous to the current one after a call to findWithinHorizon. I have another usage scenario. Will put it up asap.

Read the article

Web Safe Area (optimal resolution) for web app design

- by M.A.X

I'm in the process of designing a new web app and I'm wondering for what 'web safe area' should I optimize the app layout and design. I did some investigation and thinking on my own but wanted to share this to see what the general opinion is. Here is what I found: Optimal Display Resolution: w3schools web stats seems to be the most referenced source (however they state that these are results from their site and is biased towards tech savvy users) http://www.w3counter.com/globalstats.php (aggregate data from something like 15,000 different sites that use their tracking services) StatCounter Global Stats Display Resolution (Stats are based on aggregate data collected by StatCounter on a sample exceeding 15 billion pageviews per month collected from across the StatCounter network of more than 3 million websites) NetMarketShare Screen Resolutions (marketshare.hitslink.com) (a web analytics consulting firm, they get data from browsers of site visitors to their on-demand network of live stats customers. The data is compiled from approximately 160 million visitors per month) Display Resolution Summary: There is a bit of variation between the above sources but in general as of Jan 2011 looks like 1024x768 is about 20%, while ~85% have a higher resolution of at least 1280x768 (1280x800 is the most common of these with 15-20% of total web, depending on the source; 1280x1024 and 1366x768 follow behind with 9-14% of the share). My guess would be that the higher resolution values will be even more common if we filter on North America, and even higher if we filter on N.American corporate users (unfortunately I couldn't find any free geographically filtered statistics). Another point to note is that the 1024x768 desktop user population is likely lower than the aforementioned 20%, seeing as the iPad (1024x768 native display) is likely propping up those number. My recommendation would be to optimize around the 1280x768 constraint (*note: 1280x768 is actually a relatively rare resolution, but I think it's a valid constraint range considering that 1366x768 is relatively common and 1280 is the most common horizontal resolution). Browser + OS Constraints: To further add to the constraints we have to subtract the space taken up by the browser (assuming IE, which is the most space consuming) and the OS (assuming WinXP-Win7): Win7 has the biggest taskbar footprint at a height of 40px (XP's and Vista's is 30px) The default IE8 view uses up 25px at the bottom of the screen with the status bar and a further 120px at the top of the screen with the windows title bar and the browser UI (assuming the default 'favorites' toolbar is present, it would instead be 91px without the favorites toolbar). Assuming no scrollbar, we also loose a total of 4px horizontally for the window outline. This means that we are left with 583px of vertical space and 1276px of horizontal. In other words, a Web Safe Area of 1276 x 583 Is this a correct line of thinking? I tried to Google some design best practices but most still talk about designing around 1024x768 which seems to be quickly disappearing. Any help on this would be greatly appreciated! Thanks.

Read the article

Problem with signal handlers being called too many times [closed]

- by Hristo

how can something print 3 times when it only goes the printing code twice? I'm coding in C and the code is in a SIGCHLD signal handler I created. void chld_signalHandler() { int pidadf = (int) getpid(); printf("pidafdfaddf: %d\n", pidadf); while (1) { int termChildPID = waitpid(-1, NULL, WNOHANG); if (termChildPID == 0 || termChildPID == -1) { break; } dll_node_t *temp = head; while (temp != NULL) { printf("stuff\n"); if (temp->pid == termChildPID && temp->type == WORK) { printf("inside if\n"); // read memory mapped file b/w WORKER and MAIN // get statistics and write results to pipe char resultString[256]; // printing TIME int i; for (i = 0; i < 24; i++) { sprintf(resultString, "TIME; %d ; %d ; %d ; %s\n",i,1,2,temp->stats->mboxFileName); fwrite(resultString, strlen(resultString), 1, pipeFD); } remove_node(temp); break; } temp = temp->next; } printf("done printing from sigchld \n"); } return; } the output for my MAIN process is this: MAIN PROCESS 16214 created WORKER PROCESS 16220 for file class.sp10.cs241.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld MAIN PROCESS 16214 created WORKER PROCESS 16221 for file class.sp10.cs225.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld and the output for the MONITOR process is this: MONITOR: pipe is open for reading MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs241.mbox MONITOR: end of readpipe ( I've taken out repeating lines so I don't take up so much space ) Thanks, Hristo

Read the article

Problem with signal handlers

- by Hristo

how can something print 3 times when it only goes the printing code twice? I'm coding in C and the code is in a SIGCHLD signal handler I created. void chld_signalHandler() { int pidadf = (int) getpid(); printf("pidafdfaddf: %d\n", pidadf); while (1) { int termChildPID = waitpid(-1, NULL, WNOHANG); if (termChildPID == 0 || termChildPID == -1) { break; } dll_node_t *temp = head; while (temp != NULL) { printf("stuff\n"); if (temp->pid == termChildPID && temp->type == WORK) { printf("inside if\n"); // read memory mapped file b/w WORKER and MAIN // get statistics and write results to pipe char resultString[256]; // printing TIME int i; for (i = 0; i < 24; i++) { sprintf(resultString, "TIME; %d ; %d ; %d ; %s\n",i,1,2,temp->stats->mboxFileName); fwrite(resultString, strlen(resultString), 1, pipeFD); } remove_node(temp); break; } temp = temp->next; } printf("done printing from sigchld \n"); } return; } the output for my MAIN process is this: MAIN PROCESS 16214 created WORKER PROCESS 16220 for file class.sp10.cs241.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld MAIN PROCESS 16214 created WORKER PROCESS 16221 for file class.sp10.cs225.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld and the output for the MONITOR process is this: MONITOR: pipe is open for reading MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs241.mbox MONITOR: end of readpipe ( I've taken out repeating lines so I don't take up so much space ) Thanks, Hristo

Read the article

INSERT OR IGNORE in a trigger

- by dan04

I have a database (for tracking email statistics) that has grown to hundreds of megabytes, and I've been looking for ways to reduce it. It seems that the main reason for the large file size is that the same strings tend to be repeated in thousands of rows. To avoid this problem, I plan to create another table for a string pool, like so: CREATE TABLE AddressLookup ( ID INTEGER PRIMARY KEY AUTOINCREMENT, Address TEXT UNIQUE ); CREATE TABLE EmailInfo ( MessageID INTEGER PRIMARY KEY AUTOINCREMENT, ToAddrRef INTEGER REFERENCES AddressLookup(ID), FromAddrRef INTEGER REFERENCES AddressLookup(ID) /* Additional columns omitted for brevity. */ ); And for convenience, a view to join these tables: CREATE VIEW EmailView AS SELECT MessageID, A1.Address AS ToAddr, A2.Address AS FromAddr FROM EmailInfo LEFT JOIN AddressLookup A1 ON (ToAddrRef = A1.ID) LEFT JOIN AddressLookup A2 ON (FromAddrRef = A2.ID); In order to be able to use this view as if it were a regular table, I've made some triggers: CREATE TRIGGER trg_id_EmailView INSTEAD OF DELETE ON EmailView BEGIN DELETE FROM EmailInfo WHERE MessageID = OLD.MessageID; END; CREATE TRIGGER trg_ii_EmailView INSTEAD OF INSERT ON EmailView BEGIN INSERT OR IGNORE INTO AddressLookup(Address) VALUES (NEW.ToAddr); INSERT OR IGNORE INTO AddressLookup(Address) VALUES (NEW.FromAddr); INSERT INTO EmailInfo SELECT NEW.MessageID, A1.ID, A2.ID FROM AddressLookup A1, AddressLookup A2 WHERE A1.Address = NEW.ToAddr AND A2.Address = NEW.FromAddr; END; CREATE TRIGGER trg_iu_EmailView INSTEAD OF UPDATE ON EmailView BEGIN UPDATE EmailInfo SET MessageID = NEW.MessageID WHERE MessageID = OLD.MessageID; REPLACE INTO EmailView SELECT NEW.MessageID, NEW.ToAddr, NEW.FromAddr; END; The problem After: INSERT OR REPLACE INTO EmailView VALUES (1, '[email protected]', '[email protected]'); INSERT OR REPLACE INTO EmailView VALUES (2, '[email protected]', '[email protected]'); The updated rows contain: MessageID ToAddr FromAddr --------- ------ -------- 1 NULL [email protected] 2 [email protected] [email protected] There's a NULL that shouldn't be there. The corresponding cell in the EmailInfo table contains an orphaned ToAddrRef value. If you do the INSERTs one at a time, you'll see that Alice's ID in the AddressLookup table changes! It appears that this behavior is documented: An ON CONFLICT clause may be specified as part of an UPDATE or INSERT action within the body of the trigger. However if an ON CONFLICT clause is specified as part of the statement causing the trigger to fire, then conflict handling policy of the outer statement is used instead. So the "REPLACE" in the top-level "INSERT OR REPLACE" statement is overriding the critical "INSERT OR IGNORE" in the trigger program. Is there a way I can make it work the way that I wanted?

Read the article

i don't understand how...

- by Hristo

how can something print 3 times when it only goes the printing code twice? I'm coding in C and the code is in a SIGCHLD signal handler I created. void chld_signalHandler() { int pidadf = (int) getpid(); printf("pidafdfaddf: %d\n", pidadf); while (1) { int termChildPID = waitpid(-1, NULL, WNOHANG); if (termChildPID == 0 || termChildPID == -1) { break; } dll_node_t *temp = head; while (temp != NULL) { printf("stuff\n"); if (temp-pid == termChildPID && temp-type == WORK) { printf("inside if\n"); // read memory mapped file b/w WORKER and MAIN // get statistics and write results to pipe char resultString[256]; // printing TIME int i; for (i = 0; i < 24; i++) { sprintf(resultString, "TIME; %d ; %d ; %d ; %s\n",i,1,2,temp->stats->mboxFileName); fwrite(resultString, strlen(resultString), 1, pipeFD); } remove_node(temp); break; } temp = temp-next; } printf("done printing from sigchld \n"); } return; } the output for my MAIN process is this: MAIN PROCESS 16214 created WORKER PROCESS 16220 for file class.sp10.cs241.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld MAIN PROCESS 16214 created WORKER PROCESS 16221 for file class.sp10.cs225.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld and the output for the MONITOR process is this: MONITOR: pipe is open for reading MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs241.mbox MONITOR: end of readpipe ( I've taken out repeating lines so I don't take up so much space ) Thanks, Hristo

Read the article

Traditional IO vs memory-mapped

- by Senne

I'm trying to illustrate the difference in performance between traditional IO and memory mapped files in java to students. I found an example somewhere on internet but not everything is clear to me, I don't even think all steps are nececery. I read a lot about it here and there but I'm not convinced about a correct implementation of neither of them. The code I try to understand is: public class FileCopy{ public static void main(String args[]){ if (args.length < 1){ System.out.println(" Wrong usage!"); System.out.println(" Correct usage is : java FileCopy <large file with full path>"); System.exit(0); } String inFileName = args[0]; File inFile = new File(inFileName); if (inFile.exists() != true){ System.out.println(inFileName + " does not exist!"); System.exit(0); } try{ new FileCopy().memoryMappedCopy(inFileName, inFileName+".new" ); new FileCopy().customBufferedCopy(inFileName, inFileName+".new1"); }catch(FileNotFoundException fne){ fne.printStackTrace(); }catch(IOException ioe){ ioe.printStackTrace(); }catch (Exception e){ e.printStackTrace(); } } public void memoryMappedCopy(String fromFile, String toFile ) throws Exception{ long timeIn = new Date().getTime(); // read input file RandomAccessFile rafIn = new RandomAccessFile(fromFile, "rw"); FileChannel fcIn = rafIn.getChannel(); ByteBuffer byteBuffIn = fcIn.map(FileChannel.MapMode.READ_WRITE, 0,(int) fcIn.size()); fcIn.read(byteBuffIn); byteBuffIn.flip(); RandomAccessFile rafOut = new RandomAccessFile(toFile, "rw"); FileChannel fcOut = rafOut.getChannel(); ByteBuffer writeMap = fcOut.map(FileChannel.MapMode.READ_WRITE,0,(int) fcIn.size()); writeMap.put(byteBuffIn); long timeOut = new Date().getTime(); System.out.println("Memory mapped copy Time for a file of size :" + (int) fcIn.size() +" is "+(timeOut-timeIn)); fcOut.close(); fcIn.close(); } static final int CHUNK_SIZE = 100000; static final char[] inChars = new char[CHUNK_SIZE]; public static void customBufferedCopy(String fromFile, String toFile) throws IOException{ long timeIn = new Date().getTime(); Reader in = new FileReader(fromFile); Writer out = new FileWriter(toFile); while (true) { synchronized (inChars) { int amountRead = in.read(inChars); if (amountRead == -1) { break; } out.write(inChars, 0, amountRead); } } long timeOut = new Date().getTime(); System.out.println("Custom buffered copy Time for a file of size :" + (int) new File(fromFile).length() +" is "+(timeOut-timeIn)); in.close(); out.close(); } } When exactly is it nececary to use RandomAccessFile? Here it is used to read and write in the memoryMappedCopy, is it actually nececary just to copy a file at all? Or is it a part of memorry mapping? In customBufferedCopy, why is synchronized used here? I also found a different example that -should- test the performance between the 2: public class MappedIO { private static int numOfInts = 4000000; private static int numOfUbuffInts = 200000; private abstract static class Tester { private String name; public Tester(String name) { this.name = name; } public long runTest() { System.out.print(name + ": "); try { long startTime = System.currentTimeMillis(); test(); long endTime = System.currentTimeMillis(); return (endTime - startTime); } catch (IOException e) { throw new RuntimeException(e); } } public abstract void test() throws IOException; } private static Tester[] tests = { new Tester("Stream Write") { public void test() throws IOException { DataOutputStream dos = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(new File("temp.tmp")))); for(int i = 0; i < numOfInts; i++) dos.writeInt(i); dos.close(); } }, new Tester("Mapped Write") { public void test() throws IOException { FileChannel fc = new RandomAccessFile("temp.tmp", "rw") .getChannel(); IntBuffer ib = fc.map( FileChannel.MapMode.READ_WRITE, 0, fc.size()) .asIntBuffer(); for(int i = 0; i < numOfInts; i++) ib.put(i); fc.close(); } }, new Tester("Stream Read") { public void test() throws IOException { DataInputStream dis = new DataInputStream( new BufferedInputStream( new FileInputStream("temp.tmp"))); for(int i = 0; i < numOfInts; i++) dis.readInt(); dis.close(); } }, new Tester("Mapped Read") { public void test() throws IOException { FileChannel fc = new FileInputStream( new File("temp.tmp")).getChannel(); IntBuffer ib = fc.map( FileChannel.MapMode.READ_ONLY, 0, fc.size()) .asIntBuffer(); while(ib.hasRemaining()) ib.get(); fc.close(); } }, new Tester("Stream Read/Write") { public void test() throws IOException { RandomAccessFile raf = new RandomAccessFile( new File("temp.tmp"), "rw"); raf.writeInt(1); for(int i = 0; i < numOfUbuffInts; i++) { raf.seek(raf.length() - 4); raf.writeInt(raf.readInt()); } raf.close(); } }, new Tester("Mapped Read/Write") { public void test() throws IOException { FileChannel fc = new RandomAccessFile( new File("temp.tmp"), "rw").getChannel(); IntBuffer ib = fc.map( FileChannel.MapMode.READ_WRITE, 0, fc.size()) .asIntBuffer(); ib.put(0); for(int i = 1; i < numOfUbuffInts; i++) ib.put(ib.get(i - 1)); fc.close(); } } }; public static void main(String[] args) { for(int i = 0; i < tests.length; i++) System.out.println(tests[i].runTest()); } } I more or less see whats going on, my output looks like this: Stream Write: 653 Mapped Write: 51 Stream Read: 651 Mapped Read: 40 Stream Read/Write: 14481 Mapped Read/Write: 6 What is makeing the Stream Read/Write so unbelievably long? And as a read/write test, to me it looks a bit pointless to read the same integer over and over (if I understand well what's going on in the Stream Read/Write) Wouldn't it be better to read int's from the previously written file and just read and write ints on the same place? Is there a better way to illustrate it? I've been breaking my head about a lot of these things for a while and I just can't get the whole picture..

Read the article

Authoritative sources about Database vs. Flatfile decision

- by FastAl

<tldr>looking for a reference to a book or other undeniably authoritative source that gives reasons when you should choose a database vs. when you should choose other storage methods. I have provided an un-authoritative list of reasons about 2/3 of the way down this post.</tldr> I have a situation at my company where a database is being used where it would be better to use another solution (in this case, an auto-generated piece of source code that contains a static lookup table, searched by binary sort). Normally, a database would be an OK solution even though the problem does not require a database, e.g, none of the elements of ACID are needed, as it is read-only data, updated about every 3-5 years (also requiring other sourcecode changes), and fits in memory, and can be keyed into via binary search (a tad faster than db, but speed is not an issue). The problem is that this code runs on our enterprise server, but is shared with several PC platforms (some disconnected, some use a central DB, etc.), and parts of it are managed by multiple programming units, parts by the DBAs, parts even by mathematicians in another department, etc. These hit their own platform’s version of their databases (containing their own copy of the static data). What happens is that every implementation, every little change, something different goes wrong. There are many other issues as well. I can’t even use a flatfile, because one mode of running on our enterprise server does not have permission to read files (only databases, and of course, its own literal storage, e.g., in-source table). Of course, other parts of the system use databases in proper, less obscure manners; there is no problem with those parts. So why don’t we just change it? I don’t have administrative ability to force a change. But I’m affected because sometimes I have to help fix the problems, but mostly because it causes outages and tons of extra IT time by other programmers and d*mmit that makes me mad! The reason neither management, nor the designers of the system, can see the problem is that they propose a solution that won’t work: increase communication; implement more safeguards and standards; etc. But every time, in a different part of the already-pared-down but still multi-step processes, a few different diligent, hard-working, top performing IT personnel make a unique subtle error that causes it to fail, sometimes after the last round of testing! And in general these are not single-person failures, but understandable miscommunications. And communication at our company is actually better than most. People just don't think that's the case because they haven't dug into the matter. However, I have it on very good word from somebody with extensive formal study of sociology and psychology that the relatively small amount of less-than-proper database usage in this gigantic cross-platform multi-source, multi-language project is bureaucratically un-maintainable. Impossible. No chance. At least with Human Beings in the loop, and it can’t be automated. In addition, the management and developers who could change this, though intelligent and capable, don’t understand the rigidity of this ‘how humans are’ issue, and are not convincible on the matter. The reason putting the static data in sourcecode will solve the problem is, although the solution is less sexy than a database, it would function with no technical drawbacks; and since the sharing of sourcecode already works very well, you basically erase any database-related effort from this section of the project, along with all the drawbacks of it that are causing problems. OK, that’s the background, for the curious. I won’t be able to convince management that this is an unfixable sociological problem, and that the real solution is coding around these limits of human nature, just as you would code around a bug in a 3rd party component that you can’t change. So what I have to do is exploit the unsuitableness of the database solution, and not do it using logic, but rather authority. I am aware of many reasons, and posts on this site giving reasons for one over the other; I’m not looking for lists of reasons like these (although you can add a comment if I've miss a doozy): WHY USE A DATABASE? instead of flatfile/other DB vs. file: if you need... Random Read / Transparent search optimization Advanced / varied / customizable Searching and sorting capabilities Transaction/rollback Locks, semaphores Concurrency control / Shared users Security 1-many/m-m is easier Easy modification Scalability Load Balancing Random updates / inserts / deletes Advanced query Administrative control of design, etc. SQL / learning curve Debugging / Logging Centralized / Live Backup capabilities Cached queries / dvlp & cache execution plans Interleaved update/read Referential integrity, avoid redundant/missing/corrupt/out-of-sync data Reporting (from on olap or oltp db) / turnkey generation tools [Disadvantages:] Important to get right the first time - professional design - but only b/c it's meant to last s/w & h/w cost Usu. over a network, speed issue (best vs. best design vs. local=even then a separate process req's marshalling/netwk layers/inter-p comm) indicies and query processing can stand in the way of simple processing (vs. flatfile) WHY USE FLATFILE: If you only need... Sequential Row processing only Limited usage append only (no reading, no master key/update) Only Update the record you're reading (fixed length recs only) Too big to fit into memory If Local disk / read-ahead network connection Portability / small system Email / cut & Paste / store as document by novice - simple format Low design learning curve but high cost later WHY USE IN-MEMORY/TABLE (tables, arrays, etc.): if you need... Processing a single db/ff record that was imported Known size of data Static data if hardcoding the table Narrow, unchanging use (e.g., one program or proc) -includes a class that will be shared, but encapsulates its data manipulation Extreme speed needed / high transaction frequency Random access - but search is dependent on implementation Following are some other posts about the topic: http://stackoverflow.com/questions/1499239/database-vs-flat-text-file-what-are-some-technical-reasons-for-choosing-one-over http://stackoverflow.com/questions/332825/are-flat-file-databases-any-good http://stackoverflow.com/questions/2356851/database-vs-flat-files http://stackoverflow.com/questions/514455/databases-vs-plain-text/514530 What I’d like to know is if anybody could recommend a hard, authoritative source containing these reasons. I’m looking for a paper book I can buy, or a reputable website with whitepapers about the issue (e.g., Microsoft, IBM), not counting the user-generated content on those sites. This will have a greater change to elicit a change that I’m looking for: less wasted programmer time, and more reliable programs. Thanks very much for your help. You win a prize for reading such a large post!

Read the article

bin_at in dlmalloc

- by chunhui

In glibc malloc.c or dlmalloc It said "repositioning tricks"As in blew, and use this trick in bin_at. bins is a array,the space is allocated when av(struct malloc_state) is allocated.doesn't it? the sizeof(bin[i]) is less then sizeof(struct malloc_chunk*)? Who can describe this trick for me? I can't understand the bin_at macro.why they get the bins address use this method?how it works? Very thanks,and sorry for my poor English. /* To simplify use in double-linked lists, each bin header acts as a malloc_chunk. This avoids special-casing for headers. But to conserve space and improve locality, we allocate only the fd/bk pointers of bins, and then use repositioning tricks to treat these as the fields of a malloc_chunk*. */ typedef struct malloc_chunk* mbinptr; /* addressing -- note that bin_at(0) does not exist */ #define bin_at(m, i) \ (mbinptr) (((char *) &((m)->bins[((i) - 1) * 2])) \ - offsetof (struct malloc_chunk, fd)) The malloc_chunk struct like this: struct malloc_chunk { INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */ INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */ struct malloc_chunk* fd; /* double links -- used only if free. */ struct malloc_chunk* bk; /* Only used for large blocks: pointer to next larger size. */ struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */ struct malloc_chunk* bk_nextsize; }; And the bin type like this: typedef struct malloc_chunk* mbinptr; struct malloc_state { /* Serialize access. */ mutex_t mutex; /* Flags (formerly in max_fast). */ int flags; #if THREAD_STATS /* Statistics for locking. Only used if THREAD_STATS is defined. */ long stat_lock_direct, stat_lock_loop, stat_lock_wait; #endif /* Fastbins */ mfastbinptr fastbinsY[NFASTBINS]; /* Base of the topmost chunk -- not otherwise kept in a bin */ mchunkptr top; /* The remainder from the most recent split of a small request */ mchunkptr last_remainder; /* Normal bins packed as described above */ mchunkptr bins[NBINS * 2 - 2]; /* Bitmap of bins */ unsigned int binmap[BINMAPSIZE]; /* Linked list */ struct malloc_state *next; #ifdef PER_THREAD /* Linked list for free arenas. */ struct malloc_state *next_free; #endif /* Memory allocated from the system in this arena. */ INTERNAL_SIZE_T system_mem; INTERNAL_SIZE_T max_system_mem; };

Read the article

Given a trace of packets, how would you group them into flows?

- by zxcvbnm

I've tried it these ways so far: 1) Make a hash with the source IP/port and destination IP/port as keys. Each position in the hash is a list of packets. The hash is then saved in a file, with each flow separated by some special characters/line. Problem: Not enough memory for large traces. 2) Make a hash with the same key as above, but only keep in memory the file handles. Each packet is then put into the hash[key] that points to the right file. Problems: Too many flows/files (~200k) and it might run out of memory as well. 3) Hash the source IP/port and destination IP/port, then put the info inside a file. The difference between 2 and 3 is that here the files are opened and closed for each operation, so I don't have to worry about running out of memory because I opened too many at the same time. Problems: WAY too slow, same number of files as 2 so also impractical. 4) Make a hash of the source IP/port pairs and then iterate over the whole trace for each flow. Take the packets that are part of that flow and place them into the output file. Problem: Suppose I have a 60 MB trace that has 200k flows. This way, I would process, say, a 60 MB file 200k times. Maybe removing the packets as I iterate would make it not so painful, but so far I'm not sure this would be a good solution. 5) Split them by IP source/destination and then create a single file for each one, separating the flows by special characters. Still too many files (+50k). Right now I'm using Ruby to do it, which might've been a bad idea, I guess. Currently I've filtered the traces with tshark so that they only have relevant info, so I can't really make them any smaller. I thought about loading everything in memory as described in 1) using C#/Java/C++, but I was wondering if there wouldn't be a better approach here, especially since I might also run out of memory later on even with a more efficient language if I have to use larger traces. In summary, the problem I'm facing is that I either have too many files or that I run out of memory. I've also tried searching for some tool to filter the info, but I don't think there is one. The ones I've found only return some statistics and wouldn't scan for every flow as I need.

Read the article

Calculating a Sample Covariance Matrix for Groups with plyr

- by John A. Ramey

I'm going to use the sample code from http://gettinggeneticsdone.blogspot.com/2009/11/split-apply-and-combine-in-r-using-plyr.html for this example. So, first, let's copy their example data: mydata=data.frame(X1=rnorm(30), X2=rnorm(30,5,2), SNP1=c(rep("AA",10), rep("Aa",10), rep("aa",10)), SNP2=c(rep("BB",10), rep("Bb",10), rep("bb",10))) I am going to ignore SNP2 in this example and just pretend the values in SNP1 denote group membership. So then, I may want some summary statistics about each group in SNP1: "AA", "Aa", "aa". Then if I want to calculate the means for each variable, it makes sense (modifying their code slightly) to use: > ddply(mydata, c("SNP1"), function(df) data.frame(meanX1=mean(df$X1), meanX2=mean(df$X2))) SNP1 meanX1 meanX2 1 aa 0.05178028 4.812302 2 Aa 0.30586206 4.820739 3 AA -0.26862500 4.856006 But what if I want the sample covariance matrix for each group? Ideally, I would like a 3D array, where the I have the covariance matrix for each group, and the third dimension denotes the corresponding group. I tried a modified version of the previous code and got the following results that have convinced me that I'm doing something wrong. > daply(mydata, c("SNP1"), function(df) cov(cbind(df$X1, df$X2))) , , = 1 SNP1 1 2 aa 1.4961210 -0.9496134 Aa 0.8833190 -0.1640711 AA 0.9942357 -0.9955837 , , = 2 SNP1 1 2 aa -0.9496134 2.881515 Aa -0.1640711 2.466105 AA -0.9955837 4.938320 I was thinking that the dim() of the 3rd dimension would be 3, but instead, it is 2. Really this is a sliced up version of the covariance matrix for each group. If we manually compute the sample covariance matrix for aa, we get: [,1] [,2] [1,] 1.4961210 -0.9496134 [2,] -0.9496134 2.8815146 Using plyr, the following gives me what I want in list() form: > dlply(mydata, c("SNP1"), function(df) cov(cbind(df$X1, df$X2))) $aa [,1] [,2] [1,] 1.4961210 -0.9496134 [2,] -0.9496134 2.8815146 $Aa [,1] [,2] [1,] 0.8833190 -0.1640711 [2,] -0.1640711 2.4661046 $AA [,1] [,2] [1,] 0.9942357 -0.9955837 [2,] -0.9955837 4.9383196 attr(,"split_type") [1] "data.frame" attr(,"split_labels") SNP1 1 aa 2 Aa 3 AA But like I said earlier, I would really like this in a 3D array. Any thoughts on where I went wrong with daply() or suggestions? Of course, I could typecast the list from dlply() to a 3D array, but I'd rather not do this because I will be repeating this process many times in a simulation. As a side note, I found one method (http://www.mail-archive.com/[email protected]/msg86328.html) that provides the sample covariance matrix for each group, but the outputted object is bloated. Thanks in advance.

Read the article

How do I prove I should put a table of values in source code instead of a database table?

- by FastAl

<tldr>looking for a reference to a book or other undeniably authoritative source that gives reasons when you should choose a database vs. when you should choose other storage methods. I have provided an un-authoritative list of reasons about 2/3 of the way down this post.</tldr> I have a situation at my company where a database is being used where it would be better to use another solution (in this case, an auto-generated piece of source code that contains a static lookup table, searched by binary sort). Normally, a database would be an OK solution even though the problem does not require a database, e.g, none of the elements of ACID are needed, as it is read-only data, updated about every 3-5 years (also requiring other sourcecode changes), and fits in memory, and can be keyed into via binary search (a tad faster than db, but speed is not an issue). The problem is that this code runs on our enterprise server, but is shared with several PC platforms (some disconnected, some use a central DB, etc.), and parts of it are managed by multiple programming units, parts by the DBAs, parts even by mathematicians in another department, etc. These hit their own platform’s version of their databases (containing their own copy of the static data). What happens is that every implementation, every little change, something different goes wrong. There are many other issues as well. I can’t even use a flatfile, because one mode of running on our enterprise server does not have permission to read files (only databases, and of course, its own literal storage, e.g., in-source table). Of course, other parts of the system use databases in proper, less obscure manners; there is no problem with those parts. So why don’t we just change it? I don’t have administrative ability to force a change. But I’m affected because sometimes I have to help fix the problems, but mostly because it causes outages and tons of extra IT time by other programmers and d*mmit that makes me mad! The reason neither management, nor the designers of the system, can see the problem is that they propose a solution that won’t work: increase communication; implement more safeguards and standards; etc. But every time, in a different part of the already-pared-down but still multi-step processes, a few different diligent, hard-working, top performing IT personnel make a unique subtle error that causes it to fail, sometimes after the last round of testing! And in general these are not single-person failures, but understandable miscommunications. And communication at our company is actually better than most. People just don't think that's the case because they haven't dug into the matter. However, I have it on very good word from somebody with extensive formal study of sociology and psychology that the relatively small amount of less-than-proper database usage in this gigantic cross-platform multi-source, multi-language project is bureaucratically un-maintainable. Impossible. No chance. At least with Human Beings in the loop, and it can’t be automated. In addition, the management and developers who could change this, though intelligent and capable, don’t understand the rigidity of this ‘how humans are’ issue, and are not convincible on the matter. The reason putting the static data in sourcecode will solve the problem is, although the solution is less sexy than a database, it would function with no technical drawbacks; and since the sharing of sourcecode already works very well, you basically erase any database-related effort from this section of the project, along with all the drawbacks of it that are causing problems. OK, that’s the background, for the curious. I won’t be able to convince management that this is an unfixable sociological problem, and that the real solution is coding around these limits of human nature, just as you would code around a bug in a 3rd party component that you can’t change. So what I have to do is exploit the unsuitableness of the database solution, and not do it using logic, but rather authority. I am aware of many reasons, and posts on this site giving reasons for one over the other; I’m not looking for lists of reasons like these (although you can add a comment if I've miss a doozy): WHY USE A DATABASE? instead of flatfile/other DB vs. file: if you need... Random Read / Transparent search optimization Advanced / varied / customizable Searching and sorting capabilities Transaction/rollback Locks, semaphores Concurrency control / Shared users Security 1-many/m-m is easier Easy modification Scalability Load Balancing Random updates / inserts / deletes Advanced query Administrative control of design, etc. SQL / learning curve Debugging / Logging Centralized / Live Backup capabilities Cached queries / dvlp & cache execution plans Interleaved update/read Referential integrity, avoid redundant/missing/corrupt/out-of-sync data Reporting (from on olap or oltp db) / turnkey generation tools [Disadvantages:] Important to get right the first time - professional design - but only b/c it's meant to last s/w & h/w cost Usu. over a network, speed issue (best vs. best design vs. local=even then a separate process req's marshalling/netwk layers/inter-p comm) indicies and query processing can stand in the way of simple processing (vs. flatfile) WHY USE FLATFILE: If you only need... Sequential Row processing only Limited usage append only (no reading, no master key/update) Only Update the record you're reading (fixed length recs only) Too big to fit into memory If Local disk / read-ahead network connection Portability / small system Email / cut & Paste / store as document by novice - simple format Low design learning curve but high cost later WHY USE IN-MEMORY/TABLE (tables, arrays, etc.): if you need... Processing a single db/ff record that was imported Known size of data Static data if hardcoding the table Narrow, unchanging use (e.g., one program or proc) -includes a class that will be shared, but encapsulates its data manipulation Extreme speed needed / high transaction frequency Random access - but search is dependent on implementation Following are some other posts about the topic: http://stackoverflow.com/questions/1499239/database-vs-flat-text-file-what-are-some-technical-reasons-for-choosing-one-over http://stackoverflow.com/questions/332825/are-flat-file-databases-any-good http://stackoverflow.com/questions/2356851/database-vs-flat-files http://stackoverflow.com/questions/514455/databases-vs-plain-text/514530 What I’d like to know is if anybody could recommend a hard, authoritative source containing these reasons. I’m looking for a paper book I can buy, or a reputable website with whitepapers about the issue (e.g., Microsoft, IBM), not counting the user-generated content on those sites. This will have a greater change to elicit a change that I’m looking for: less wasted programmer time, and more reliable programs. Thanks very much for your help. You win a prize for reading such a large post!

Read the article

Square Peg Web: Gets you the traffic to where it matters most: Your Website!

- by demetriusalwyn

Have you decided to start your business online or is your business not reaching the targeted audience? Come to Square Peg Web; where you will find what you want to make your business reach new heights. The team at Square Peg Web is professionals who understand what you want and make sure you get it right. Our confidence stems from the fact of thousands of satisfied clients who keep referring friends and business associates to us and we do not let our clients down. Many companies promise the sky but how far is does their work live up to the promises? We do not know about the others however, we are sure that we strive to put together all our ideas and thoughts to make your website rank among the top. Web hosting is something that needs to have a personal touch; Square Peg Web customizes everything to suit your requirements so that you do not have to look further. With Square Peg Web you have a host of features to make your Business go viral. Some of the product details that are offered with Square Peg Web are unlimited product options/ variants/ properties giving you an option on price modifiers. You get unlimited customized input fields for your products and you can also Customer-define the prices. Square Peg Web provides you an option of using multiple product images with zoom features and one can also list a particular product in several categories. There are other aspects which make Square Peg Web the best choice for your website needs; every sale of yours’ is important to you and to us. We make sure that each sale is tracked by the product and also the list of bestsellers that appeal to the audience. Other comprehensive statistics of Square Peg Web includes searchable order data, an interface for shipments and order fulfillments, export sales & customer data for usage in a spreadsheet and the ability to export orders to QuickBooks format. With Square Peg Web; Admin Panel is a lot simpler. Administrative access is completely password protected and any changes done are all in real-time. You can have absolute control on the cart from anywhere around the world using your web browser and the topping on the cake is the unlimited amount of admin accounts that can be created for you. Square Peg Web offers you a world of experience with the options of choosing from marketing websites to e-commerce and from customized applications to community oriented sites. Some of the projects which appear in the portfolio of Square Peg Web are Online Marketing Web Sites, E-Commerce Web Sites, customized web applications, Blog designing and programming, video sharing and the option of downloading web sites, online advertisements, flash animation, customer and product support web sites, web site re-designing and planning and complete information architecture.

Read the article

JPA Entity Manager resource handling

- by chiragshahkapadia

Every time I call JPA method its creating entity and binding query. My persistence properties are: <property name="hibernate.dialect" value="org.hibernate.dialect.Oracle10gDialect"/> <property name="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.SingletonEhCacheProvider"/> <property name="hibernate.cache.use_second_level_cache" value="true"/> <property name="hibernate.cache.use_query_cache" value="true"/> And I am creating entity manager the way shown below: emf = Persistence.createEntityManagerFactory("pu"); em = emf.createEntityManager(); em = Persistence.createEntityManagerFactory("pu").createEntityManager(); Is there any nice way to manage entity manager resource instead create new every time or any property can set in persistence. Remember it's JPA. See below binding log every time : 15:35:15,527 INFO [AnnotationBinder] Binding entity from annotated class: * 15:35:15,527 INFO [QueryBinder] Binding Named query: * = * 15:35:15,527 INFO [QueryBinder] Binding Named query: * = * 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [EntityBinder] Bind entity com.* on table * 15:35:15,542 INFO [HibernateSearchEventListenerRegister] Unable to find org.hibernate.search.event.FullTextIndexEventListener on the classpath. Hibernate Search is not enabled. 15:35:15,542 INFO [NamingHelper] JNDI InitialContext properties:{} 15:35:15,542 INFO [DatasourceConnectionProvider] Using datasource: 15:35:15,542 INFO [SettingsFactory] RDBMS: and Real Application Testing options 15:35:15,542 INFO [SettingsFactory] JDBC driver: Oracle JDBC driver, version: 9.2.0.1.0 15:35:15,542 INFO [Dialect] Using dialect: org.hibernate.dialect.Oracle10gDialect 15:35:15,542 INFO [TransactionFactoryFactory] Transaction strategy: org.hibernate.transaction.JDBCTransactionFactory 15:35:15,542 INFO [TransactionManagerLookupFactory] No TransactionManagerLookup configured (in JTA environment, use of read-write or transactional second-level cache is not recomm ended) 15:35:15,542 INFO [SettingsFactory] Automatic flush during beforeCompletion(): disabled 15:35:15,542 INFO [SettingsFactory] Automatic session close at end of transaction: disabled 15:35:15,542 INFO [SettingsFactory] JDBC batch size: 15 15:35:15,542 INFO [SettingsFactory] JDBC batch updates for versioned data: disabled 15:35:15,542 INFO [SettingsFactory] Scrollable result sets: enabled 15:35:15,542 INFO [SettingsFactory] JDBC3 getGeneratedKeys(): disabled 15:35:15,542 INFO [SettingsFactory] Connection release mode: auto 15:35:15,542 INFO [SettingsFactory] Default batch fetch size: 1 15:35:15,542 INFO [SettingsFactory] Generate SQL with comments: disabled 15:35:15,542 INFO [SettingsFactory] Order SQL updates by primary key: disabled 15:35:15,542 INFO [SettingsFactory] Order SQL inserts for batching: disabled 15:35:15,542 INFO [SettingsFactory] Query translator: org.hibernate.hql.ast.ASTQueryTranslatorFactory 15:35:15,542 INFO [ASTQueryTranslatorFactory] Using ASTQueryTranslatorFactory 15:35:15,542 INFO [SettingsFactory] Query language substitutions: {} 15:35:15,542 INFO [SettingsFactory] JPA-QL strict compliance: enabled 15:35:15,542 INFO [SettingsFactory] Second-level cache: enabled 15:35:15,542 INFO [SettingsFactory] Query cache: enabled 15:35:15,542 INFO [SettingsFactory] Cache region factory : org.hibernate.cache.impl.bridge.RegionFactoryCacheProviderBridge 15:35:15,542 INFO [RegionFactoryCacheProviderBridge] Cache provider: net.sf.ehcache.hibernate.SingletonEhCacheProvider 15:35:15,542 INFO [SettingsFactory] Optimize cache for minimal puts: disabled 15:35:15,542 INFO [SettingsFactory] Structured second-level cache entries: disabled 15:35:15,542 INFO [SettingsFactory] Query cache factory: org.hibernate.cache.StandardQueryCacheFactory 15:35:15,542 INFO [SettingsFactory] Statistics: disabled 15:35:15,542 INFO [SettingsFactory] Deleted entity synthetic identifier rollback: disabled 15:35:15,542 INFO [SettingsFactory] Default entity-mode: pojo 15:35:15,542 INFO [SettingsFactory] Named query checking : enabled 15:35:15,542 INFO [SessionFactoryImpl] building session factory 15:35:15,542 INFO [SessionFactoryObjectFactory] Not binding factory to JNDI, no JNDI name configured 15:35:15,542 INFO [UpdateTimestampsCache] starting update timestamps cache at region: org.hibernate.cache.UpdateTimestampsCache 15:35:15,542 INFO [StandardQueryCache] starting query cache at region: org.hibernate.cache.StandardQueryCache

Search Results

Search found 7473 results on 299 pages for 'usage statistics'.

Page 239/299 | < Previous Page | 235 236 237 238 239 240 241 242 243 244 245 246 | Next Page >

- by TomTom

- by Phil Cross

- by pinaldave

- by Rick Strahl

- by pinaldave

- by Shaun

- by Thomas Weller

- by Hassan Al-Jeshi

- by PHP_Jedi

- by anteater7171

- by darvids0n

- by M.A.X

- by Hristo

- by Hristo

- by dan04

- by Hristo

- by Senne

- by FastAl

- by chunhui

- by zxcvbnm

- by John A. Ramey

- by FastAl

- by demetriusalwyn

- by chiragshahkapadia

< Previous Page | 235 236 237 238 239 240 241 242 243 244 245 246 | Next Page >